Data Analytics Students Design System Indexing Hate Speech

Data analytics students at the Katz School of Science and Health have designed an intelligent system that identifies and indexes hate speech on the web.

The team—Jacob Goodman ’20K, Qihua Zhu ’20K and Jeeho Bae ’20K—analyzed an online database, called Hatebase, which provides a taxonomy or encyclopedia of hate terminology, to look up hate terms in Twitter feeds and FBI hate-crime public databases.

“We focused on when hate terminology was used, when the meaning of a term was ambiguous and languages in which it was considered hateful,” said Goodman, who with Zhu and Bae graduated in August from the M.S. in Data Analytics and Visualization program.

The project was constructed using a series of computer programs that extract data from other programs, called scrapers, web-service integration methods and batch-file integrations into an Amazon web-services, cloud-based data platform.

The visualization (in Tableau, an interactive software) showed the most prominent uses of hate terms on Twitter as well as the geographical locations for those prominent hate terms. They were able to identify correlations—predictive instances—of term use connecting to spikes in similar locations.

More than 1,500 English-only terms found in Hatebase appeared in 194,203 FBI records over a 20-year period. Nearly 65%, or 125,559 references, were to race or ethnicity; 54,582, or 28%, were to sexual orientation and gender; and 30,039, or 18%, were to religion. Most of them originated from the Los Angeles and San Diego areas in California, East Texas and Miami, Florida.

“Many of these kinds of hate terminology in language start as a form of ethnicity difference,” said Goodman, a business analytics manager at B&H Photo Video in New York City. “We created a small program that connected to Twitter, searched terms and produced tweets matching the terms. We used the data we collected to build an infrastructure.”

The team traced the usage of most of the hateful terms back to a single Twitter screen name, @whitetrshrepair.

The team took Brandon Chiazza’s course, Information Architecture, which is an extension of many core classes in the Data Analytics and Visualization program. In the course, they leverage their acquired skills to develop an end-to-end data-analytics solution in the cloud—everything from designing a project, profiling and assessing data, acquiring it and developing a platform and visualization.

“When they’ve finished, they give a presentation, as if it were to an executive team, demonstrating their proof of concept,” said Chiazza, who is chief technical officer in the New York City Mayor’s Office of Contract Services.

The focus of the project, he said, was more on data engineering than on complex analytical techniques. Chiazza said the methods they used were primarily descriptive statistical outcomes (e.g., averages); however, much of the work required acquiring the data using complex scripts in Python and MySQL databases within a cloud infrastructure.

“Jacob and his team properly implemented the pipeline and emphasized a design-first approach, which helped them map out the project and anticipate potential pitfalls before they developed it,” said Chiazza. “This also allowed the project team to spend less time troubleshooting issues with the technology and to stay focused in order to finish a significantly large project in a short period of time—about six to eight weeks.”

Goodman said he and his former classmates created a source of data they hoped would provide a significant amount of value for research institutions, universities and local governments that are looking to understand how hate technology is used on the web.

“More universities and governments could benefit from large-scale combined datasets like this to help inform censorship algorithms or understand their constituents effectively,” he said.

The Katz School of Science and Health is an academic powerhouse in the heart of New York City. It offers master’s programs in five sectors that are redefining the economy: Artificial Intelligence, Cybersecurity, Biotech and Health, Digital Media, and Fintech. In the lab, classroom and clinic, we lead with kindness, integrity, generosity and a commitment to making the world safer, smarter and healthier.

Click here for information about the Bright Futures Scholarship Initiative that offers a total fixed tuition of $25,000 for science and technology master’s degrees.