Visualizing the Twitter discourse on COVID-19
My research team started collecting COVID-19-related tweets in late January, 2020, shortly after the lockdown of Wuhan, and weeks before the novel coronavirus was given the official name COVID-19. Understandably, the hashtags included in the project are only a small part of all relevant hashtags; @jasonbaumgartne from Pushshift.io found over one million COVID-19-related hashtags the list.
We started off with a low-budget and low-carbon approach to data collection: we set up five Python scripts using 5 different Twitter API tokens on a Raspberry pi 4 to pull tweets from the Twitter REST API. The scripts run at 10~15-hour intervals. Our final dataset is huge, recording millions of tweets, but unlikely all-encompassing due to the rate limits of the REST API and the arbitrary time intervals. The dataset is, at best, a convenience sample of the global Twitterverse.
The visualization you are seeing here is a spin-off of a larger project on the politicization of COVID-19. Even based on the incomplete data, the visualization should give you a glimpse into how the conversation has evolved and where it is heading.
In the coming weeks/months, we will roll out the visualization one hashtag at a time. Because Pushshift.io will soon publish 250 million tweets related to the Coronavirus, which is a much bigger dataset than the one we currently use. We plan to build a separate visualization based on the Pushshift dataset in the near future.
Having question with the visualization? Want a collaboration on research or grants? Please contact me @WeiaiWayne.