Author: Ken Blake, Ph.D.
Description: This script will read the latest COVID-19 data from the Tennessee Department of Health for whatever county you specify below. Next, it will calculate seven-day moving average of new daily cases and new daily tests, give you the option of producing a number of informative plots, save the data as an Excel file on your computer in the same directory as this script, and finally print a subset of key data columns for the most recent 42 days.
Note: The Tennessee Department of Health updates its data every day by 5 p.m., so the data this script accesses and analyzes will change every day at that time.
Note: After you have run the script one time in an environment, you may reduce the script's execution time by adding a #
in front of each pip
in the next three lines of code. For example, #pip install pandas
instead of pip install pandas
. The #
will tell Python to skip the code without running it.
pip install pandas
pip install requests
pip install matplotlib
pip install openpyxl
import pandas as pd
import requests
import matplotlib.pyplot as plt
from pandas import ExcelWriter
print ("All modules imported")
url = 'https://www.tn.gov/content/dam/tn/health/documents/cedep/novel-coronavirus/datasets/Public-Dataset-County-New.XLSX'
r = requests.get(url)
with open("TNCountyData.xlsx", 'wb') as f:
f.write(r.content)
Before continuing: Change Rutherford
to the name of any Tennessee county you want to access data for. To get data for a different county during the same session, return to the code below, type the new county name in place of the previous county name, and click Cell / Run All Below
. Optionally, you can rerun the entire script, but re-retrieving the full data file from the Tennessee Department of Health could take extra time.
CountyChoice = ['Rutherford']
CountyData = pd.read_excel('TNCountyData.xlsx', engine='openpyxl')
CountyData.set_index("COUNTY", inplace=True)
CountyData = CountyData.loc[CountyChoice]
Note: This code calculates seven-day moving averages for NEW_CASES
and NEW_TESTS
. It also calculates a Pct. Positive
figure by dividing the moving seven-day sum of NEW_CASES
by the seven-day moving sum of NEW_TESTS
.
CountyData['New Cases']=CountyData['NEW_CASES']
CountyData['New Tests']=CountyData['NEW_TESTS']
CountyData['MA of New Cases']=CountyData['New Cases'].rolling(7).mean()
CountyData['MA of New Tests']=CountyData['New Tests'].rolling(7).mean()
CountyData['Pct. Positive']=CountyData['New Cases'].rolling(7).sum()/CountyData['New Tests'].rolling(7).sum()
CountyData['Active cases']=CountyData['TOTAL_ACTIVE']
plt.close('all')
Note: The plots that the code below produces will show data for all dates available in the dataset.
The first plot shows the raw count of new cases, plotted by date. It is useful for spotting "record" spikes in daily case counts. To create the plot, delete the #
at the front of the code before running the code.
#CountyData.plot(x='DATE', y='New Cases')
Note: This plot shows the moving average of new cases, plotted by date. It is best for seeing the overall pattern in daily case counts. To create the plot, delete the #
at the front of the code before running the code.
#CountyData.plot(x='DATE', y='MA of New Cases')
Note: This plot shows the daily number of active cases, plotted by date. To create the plot, delete the #
at the front of the code before running the code.
#CountyData.plot(x='DATE', y='Active cases')
Note: This plot shows the moving average of new tests / moving average of new cases, plotted by date. To create the plot, delete the #
at the front of the code before running the code.
#CountyData.plot(x='DATE', y='Pct. Positive')
Note: After running the code below, you can find the county data in an Excel file stored on your computer, in the same directory in which you have stored your script. Each time you run the script, the file will be overwritten with data from the new run. The script will produce an error if the Excel file is open at the time you run the script. To clear the error, close the Excel file, and rerun the code below.
writer = ExcelWriter('CountyData.xlsx')
CountyData.to_excel(writer, "Data")
writer.save()
Note: This code drops all except seven key variables from the dataframe, then displays the last 42 days (six weeks) of data for those variables. The displayed data can be pasted into a Google Sheet for graphing.
CountyData=CountyData.drop([
"TOTAL_CASES",
"NEW_CASES",
"TOTAL_CONFIRMED",
"NEW_CONFIRMED",
"POS_TESTS",
"NEG_TESTS",
"TOTAL_TESTS",
"NEW_TESTS",
"NEW_DEATHS",
"TOTAL_DEATHS",
"NEW_RECOVERED",
"TOTAL_RECOVERED",
"NEW_ACTIVE",
"TOTAL_ACTIVE",
"NEW_HOSPITALIZED",
"TOTAL_HOSPITALIZED",
"TOTAL_PROBABLE",
"NEW_PROBABLE",
"NEW_NEG_TESTS",
"NEW_POS_TESTS",
"NEW_INACTIVE_RECOVERED",
"TOTAL_INACTIVE_RECOVERED",
"TOTAL_DEATHS_BY_DOD"], axis=1)
CountyData.tail(42)
writer = ExcelWriter('CountyData8Columns.xlsx')
CountyData.to_excel(writer, "Data")
writer.save()