By: Bayley Weinstein, Statistics Major, Elon University

Throughout history, data has been an effective tool in tracking diseases and their progression, but no data scientist was prepared for COVID-19 and its horrible and ongoing consequences. Due to delays in data collection and inaccurate data, the models that data scientists have created have not been completely accurate or helpful to policy makers, resulting in a prolonged response and ultimately a delay in the regression of the virus. As a result, the reliability of data science during the COVID-19 pandemic has at times been put into question and it is uncertain when we will be able to fully conquer this virus.

According to a recent article published by Latif et al., data science is defined as an umbrella term that encompasses all techniques that use specific methods, algorithms, and systems to learn from structured and unstructured data. It is also a broad term that covers topics such as machine learning (ML), statistical learning, time-series modeling, data visualization, expert systems, and probabilistic reasoning. Currie et al. provides a detailed review of how statistical modeling can help reduce the effects of COVID-19, such as epidemiological models. Epidemiological models are used to predict the behavior of an infectious disease throughout time.

Compartmental models are used to show populations divided into compartments and the flow of people. The SEIR model shows the flow of people among four states: susceptible (S), exposed (E), infected (I), and recovered (R). The SEIR model has been used to model the spread of COVID-19 more recently (2020). The concept behind this model would seem to be informative for this virus, but we don’t always have correct data on who has been exposed or infected, especially since so many younger people are asymptomatic. This makes it difficult to accurately show the progression of the disease, and the case number is predicted to be higher than what is recorded.

Data Science and COVID-19

In a recent article written by Shah and Steinhardt, they propose that for data science efforts to work during the pandemic, the process must be fixed to address the lag in data collection and access to existing reporting processes. Currently, the data being collected is often inaccurate, unhelpful, and extremely delayed. With reliable and standardized information being recorded, we can then calculate infection rate, and daily growth and transmission rates, which are essential in understanding if policies that are being put into place are effective or not. Adopting technology and data science methods to keep healthcare workers informed will help anticipate and manage the next COVID-19 outbreak and allow regional health care systems to acquire their needs based on local measurements of virus activity (2020).

Although data science cannot eliminate the virus, it can play a crucial role in informing strategies that mitigate the impact. This is especially true in areas related to forecasting the spread of the virus and predicting a myriad of metrics such as positive tests, ICU bed occupancy, and, sadly, mortality rates. How data science is ultimately leveraged for public health decisions and policies rests on the shoulders of our local, state, and federal officials. Ultimately, data science is a tool to help extract value and insight from data, an asset that today we have more of than ever before in human history. And it behooves us as people to work together in developing data-driven solutions to address the most serious pandemic the Earth has faced in the past five decades and counting.


Siddique Latif, Muhammad Usman, Sanaullah Manzoor, Waleed Iqbal, Junaid Qadir, Gareth Tyson, Ignacio Castro, Adeel Razi, Maged N. Kamel Boulos, Adrian Weller and Jon Crowcroft. Leveraging Data Science To Combat COVID-19: A Comprehensive Review (2020).

Christine S.M. Currie, John W. Fowler, Kathy Kotiadis, Thomas Monks, Bhakti Stephan Onggo, Duncan A. Robertson & Antuela A. Tako (2020) How simulation modelling can help reduce the impact of COVID-19, Journal of Simulation, 14:2, 83-97, DOI: 10.1080/17477778.2020.1751570

Shah, N., & Steinhardt, J. (2020). How data science can ease the COVID-19 pandemic. Retrieved July 13, 2020, from

Lepan, Nicholas. (2020). Visualizing the History of Pandemics. Retrieved July 27, 2020, from