NLP versus manual classification: spotting toxic comments on Twitter


We can find all sorts of things online. Behaviours that are usually kept private can find a place for public display on the Internet in platforms such as Twitter. That is the case of eating disorders like anorexia and bulimia, a mental health problem that is often concealed offline but that is not only talked about in detail, but also encouraged in certain online communities known as pro-ana (pro-anorexia) and pro-mia (pro-bulimia). Many girls and boys end up in these communities being encouraged to stop eating, while they are “motivated” with negative comments about their bodies (“meanspiration”) and pictures of skeleton-thin bodies. These contents have proven to be very toxic, in the sense that they have a negative impact on self-image and they successfully promote eating disorders. Experts agree that measures need to be taken, but how can these toxic comments be easily found among all the rest? Can we train an NLP model to separate those comments from non-problematic interactions?

Goal, procedure and challenges

In the next paragraphs I will go very briefly through some methodological points. If you are interested in the specifics, please read Dirk’s post.

Goal: In this case study, we wanted to train the machine to spot toxic eating disorder tweets in a Twitter corpus in English.

Procedure: Deciding whether a tweet encourages an eating disorder (ED) or not involves reading the tweet, understanding it and assessing its intention. This process needs to be repeated for every tweet that we want to classify. When the database consists of several hundreds of thousand tweets, the task becomes difficult to tackle.

When we use NLP to help out in such a task, we try to train the computer to mimic that human assessment on a representative portion of the data. When the machine has learned the classification task, it can process very large amounts of texts autonomously. For ever-growing amounts of data, such as social media interactions, these language technology methods save hundreds of hours of manual work.

Challenges: However, life is not all peaches and cream, and these benefits come with important methodological challenges.

The first challenge was getting good data, that is, a large collection of clean, relevant, balanced tweets, without retweets, duplicates or spam (bot-generated nonsense or ads). In a complex case like this, it took us two iterations. Our first corpus (query based on ED-related hashtags) did not retrieve sufficient valid instances, that is, there was a lot of garbage for very few good examples. Our second corpus (based on network) performed considerably better: specifically targeting accounts of users with eating disorders rendered a large, rich and relevant corpus of ca. 500.000 tweets.

The second challenge was delimiting the categories for our classification task: toxic versus non-toxic comments. This is admittedly a difficult case because of its subjective nature. Defining the intention of the writer is often a complex matter. However, when defining what falls into a category, the goal is to be maximally specific, explicit and objective. In a case like ours, defining the category “toxic, eating disorder tweet” is an exercise of reflection and precision that required some reading and discussion.

The third biggest challenge was limiting the biases in annotation, that is, being as objective and consistent as possible when considering a particular tweet as toxic or not. This manual classification is what we use to train the machine, so if it is not done right, the machine will not learn what we want it to learn.

Deciding whether I am not eating today is a toxic comment involves assessing whether it is just an exaggeration or a real anorexic behaviour.

When working with such a subjective category as “toxic, eating disorder tweet”, we have to take into account that our annotation will certainly be influenced by a contextual bias (the instance might be linguistically very vague, which is typical of Twitter and natural language more in general, but we assume that they are talking about eating disorders because we know that in that context –eating disorder accounts- it is probably the case). In our experience, this works cumulatively: the more one annotates, the more one interprets everything in a toxic way. Since two heads are better than one, working with more than one annotator on a portion of the dataset will help in keeping that bias under control. Discussing specific disagreements or doubts with another person forces us to make our reasoning explicit and find patterns in apparently isolated cases.


Eventually, we collected a Twitter corpus of ca. 500.000 tweets. Dirk and I both annotated 500 tweets reaching an inter-annotator agreement slightly over 80%. The machine-learning model achieved the same results: 80%. That means that we can rely on the machine to classify tweets like us in 4 out of 5 cases. We consider it pretty satisfactory for a subjective, binary classification such as the one in this study.

And finally we get to the biggest question: how much time would one save by using this NLP model? Well, if we consider that manual classification of 500 instances takes about an hour, for a 500.000-tweet corpus, you would save 1000 hours, that is 25 full days of work.


This kind of online interactions have real impact on the lives of boys and girls, sometimes very fast, so taking quick action might be crucial.

In the case of tweets encouraging eating disorders, an NLP model such as this one could have real applications in research, on the one hand, and in online content moderation, on the other.

For researchers, it could reduce the time invested in searching for problematic cases online, and would allow to focus on the analysis and deeper understanding of these behaviours. For online content managers, it could help spot problematic interactions in order to moderate their websites or social media.

NLP helps us focus on the goal. Saving time in processing data is important, in order to focus human efforts in other tasks, such as analysis of results and implementation of particular measures.

Share on facebook
Share on twitter
Share on linkedin

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *