Twitter data collection and preparation for NLP

In this post we describe how we dealt with creating a usable tweet-collection to train a machine learning model.
NLP versus manual classification: spotting toxic comments on Twitter

In this case study, we trained the machine to spot toxic eating disorder tweets automatically in a Twitter corpus in English, saving hours of manual annotation.