Antisemitic messages? A guide to high quality annotation and a labeled dataset of tweets

The Institute for the Study of Contemporary Antisemitism published a dataset that will contribute to the automated detection of antisemitic content on social media. AI models can be trained on it.

You can find the full dataset here and the accompanying publication here.

This dataset contains 6,941 tweets that cover a wide range of topics common in conversations about Jews, Israel, and antisemitism between January 2019 and December 2021. The dataset is drawn from representative samples during this period with relevant keywords. 1,250 tweets (18%) meet the IHRA definition of antisemitic messages.  

The dataset has been compiled within the ISCA project using an annotation portal to label tweets as either antisemitic or non-antisemitic. The original data was sourced from annotationportal.com. 

The tweets’ distribution of all messages by year is as follows: 1,499 (22%) from 2019, 3,716 (54%) from 2020, and 1,726 (25%) from 2021. 4,605 (66%) contain the keyword “Jews,” 1,524 (22%) include “Israel,” 529 (8%) feature the derogatory term “ZioNazi*,” and 283 (4%) use the slur “K—s.” Some tweets may contain multiple keywords. 

483 out of the 4,605 tweets with the keyword “Jews” (11%) and 203 out of the 1,524 tweets with the keyword “Israel” (13%) were classified as antisemitic. 97 out of the 283 tweets using the antisemitic slur “K—s” (34%) are antisemitic. Interestingly, many tweets featuring the slur “K—s” actually call out its usage. In contrast, the majority of tweets with the derogatory term “ZioNazi*” are antisemitic, with 467 out of 529 (88%) being classified as such. 

you might also be interested in:

Report to us

If you have experienced or witnessed an incident of antisemitism, extremism, bias, bigotry or hate, please report it using our incident form below:

Subscribe to website

Enter your email address to receive notifications of new items