During the pandemic, the rapid spread of information has been a powerful force for good: Doctors and researchers have shared their findings on the best ways to prevent and treat COVID-19, and governments have quickly issued critical public health recommendations.
But this has also allowed misinformation and conspiracy theories to spread more virulently than ever before. This media environment is polluted by dis/misinformation, and the vast scale of the problem means scalable solutions like machine learning could be needed to rein in the bots, trolls, and conspiracy theories being spread by bad-faith actors for their own malign purposes.
In a newly released RAND study, we looked to identify these kinds of malign operations by analyzing a vast collection of 240,000 COVID-19 English-language news articles published in 2020, from the United States, United Kingdom, Russia, and China.
Analyzing a dataset this large to uncover subtle trends is quite difficult. Reading articles one at a time to uncover narrative threads gives a highly precise view of what is going on (as a companion piece to this one successfully showed). But it is extremely time-consuming and costly. Reading through nearly a quarter million news articles might take an individual analyst many years, and potentially be inaccurate due to human bias. That’s why we decided to turn to machine learning, which allowed us to analyze the entire dataset in mere hours, and generate insights within days.