Data Analytics Students Design System Indexing Hate Speech
Data analytics students at the Katz School of Science and Health have designed an intelligent system that identifies and indexes hate speech on the web.
The team—Jacob Goodman, Qihua Zhu and Jeeho Bae—analyzed an online database, called Hatebase, which provides a taxonomy or encyclopedia of hate terminology, to look up hate terms in Twitter feeds and FBI hate-crime public databases.
“We focused on when hate terminology was used, when the meaning of a term was ambiguous and languages in which it was considered hateful,” said Goodman, who with Zhu and Bae graduated in August from the M.S. in Data Analytics and Visualization program.
The project was constructed using a series of computer programs that extract data from other programs, called scrapers, web-service integration methods and batch-file integrations into an Amazon web-services, cloud-based data platform.
The visualization (in Tableau, an interactive software) showed the most prominent uses of hate terms on Twitter, as well as the geographical locations for those prominent hate terms. They were able to identify correlations—predictive instances—of term use connecting to spikes in similar locations.
More than 1,500 English-only terms found in Hatebase appeared in 194,203 FBI records over a 20-year period. Nearly 65 percent, or 125,559 references, were to race or ethnicity; 54,582, or 28 percent, were to sexual orientation and gender; and 30,039, or 18 percent, were to religion. Most of them originated from the Los Angeles and San Diego areas, East Texas, and Miami.
“A lot of these kinds of hate terminology in language starts as a form of ethnicity difference,” said Goodman, a business analytics manager at B&H Photo Video in New York. “We created a small program that connected to Twitter, searched terms and produced tweets matching the terms. We used the data we collected to build an infrastructure.”
The team traced the usage of most of the hateful terms back to a single Twitter screen name, @whitetrshrepair.
The team took Brandon Chiazza’s course, Information Architecture, which is an extension of many core classes in the Data Analytics and Visualization program. They leverage their acquired skills to develop an end-to-end, data-analytics solution in the cloud—everything from designing a project, profiling and assessing data, acquiring it, and developing a platform and visualization.
“When they’ve finished, they give a presentation, as if it were to an executive team, demonstrating their proof of concept,” said Chiazza, chief technical officer in the New York City Mayor’s Office of Contract Services.
The focus of the project, he said, was more on data engineering than on complex analytical techniques. Chiazza said the methods they used were primarily descriptive statistical outcomes (e.g., averages), however much of the work required acquiring the data using complex scripts in Python and MySQL databases within a cloud infrastructure.
“Jacob and his team properly implemented the pipeline and emphasized a design-first approach, which helped them map out the project and anticipate potential pitfalls before they developed it,” said Chiazza. “This also allowed the project team to spend less time troubleshooting issues with the technology, stay focused, and finish a significantly large project in a short period of time—about six to eight weeks.”
Goodman said he and his former classmates created a source of data they hoped would provide a significant amount of value for research institutions, universities and local governments that are looking to understand how hate technology is used on the web.
“More universities and governments could benefit from large-scale, combined datasets like this to help inform censorship algorithms or understand their constituents effectively,” he said.
The Katz School of Science and Health is an academic powerhouse in the heart of New York City. It offers master’s programs in five sectors that are redefining the economy: Artificial Intelligence, Biotech and Health, Cybersecurity, Data Analytics and Digital Media. In the lab, classroom and clinic, we lead with kindness, integrity, generosity and a commitment to making the world safer, smarter and healthier.
- Alumni (12)
- Book (3)
- Capstone (1)
- Events (14)
- Faculty (47)
- M.A. in Mathematics (6)
- M.A. in Physics (4)
- M.S. in Artificial Intelligence (11)
- M.S. in Biotechnology Management and Entrepreneurship (18)
- M.S. in Cybersecurity (16)
- M.S. in Data Analytics and Visualization (14)
- M.S. In Digital Marketing and Media (10)
- M.S. in Physician Assistant Studies (6)
- M.S. in Speech-Language Pathology (17)
- Occupational Therapy Doctorate (15)
- Partnership (1)
- Ph.D. in Mathematics (4)
- Program (8)
- Research (20)
- Scholarship (2)
- Students (41)
- Uncategorized (4)
- May 2023 (5)
- March 2023 (5)
- February 2023 (3)
- January 2023 (1)
- December 2022 (5)
- November 2022 (4)
- October 2022 (5)
- September 2022 (3)
- August 2022 (4)
- July 2022 (5)
- June 2022 (3)
- January 2022 (4)
- December 2021 (3)
- November 2021 (1)
- October 2021 (3)
- August 2021 (1)
- July 2021 (1)
- June 2021 (4)
- May 2021 (2)
- April 2021 (3)
- March 2021 (2)
- February 2021 (2)
- January 2021 (1)
- December 2020 (1)
- November 2020 (3)
- October 2020 (4)
- September 2020 (4)
- August 2020 (5)
- July 2020 (10)
- June 2020 (13)
- May 2020 (2)