Visualizing the Twitter discourse on COVID-19

View the Project on GitHub weiaiwayne/COVID19Twitter

Visualizing the Social Media discourse on COVID-19

Reddit Discourse

Twitter Hashtags















#COVID19US, etc.

How the data are collected

My research team started collecting COVID-19-related tweets in late January, 2020, shortly after the lockdown of Wuhan, and weeks before the novel coronavirus was given the official name COVID-19. Understandably, the hashtags included in the project are only a small part of all relevant hashtags; @jasonbaumgartne from found over one million COVID-19-related hashtags the list.

We started off with a low-budget and low-carbon approach to data collection: we set up five Python scripts using 5 different Twitter API tokens on a Raspberry pi 4 to pull tweets from the Twitter REST API. The scripts run at 10~15-hour intervals. Our final dataset is huge, recording millions of tweets, but unlikely all-encompassing due to the rate limits of the REST API and the arbitrary time intervals. The dataset is, at best, a convenience sample of the global Twitterverse.

The visualization you are seeing here is a spin-off of a larger project on the politicization of COVID-19. Even based on the incomplete data, the visualization should give you a glimpse into how the conversation has evolved and where it is heading.


In the coming weeks/months, we will roll out the visualization one hashtag at a time. Because will soon publish 250 million tweets related to the Coronavirus, which is a much bigger dataset than the one we currently use. We plan to build a separate visualization based on the Pushshift dataset in the near future.

Support or Contact

Having question with the visualization? Want a collaboration on research or grants? Please contact me @WeiaiWayne.

Public COVID19 Tweet Datasets

Data from Georgia State University’s Panacea Lab

Data from University of Southern California