Data Journalism

A free YouTube-based minicourse by Ken Blake, Ph.D., Associate Professor of Journalism, Middle Tennessee State University

“But journalists hate math.”

So they often tell me. I tell them I hate it, too. If I loved math, I’d do it myself instead of getting a computer to do it for me. I don’t love math; I love what math does. It uncovers facts, detects patterns and spots unusualness. If you do those things for a living – as many media professionals, especially journalists, do – then a computer can help you do math that can help you do all of those things faster, more accurately and more thoroughly than you ever could on your own. If you’re interested in learning how, the YouTube-hosted videos on this page will introduce you to some of the possibilities. If you like, download the datasets and follow along.

I use the PC version of Excel 2010, Microsoft’s cheap, ubiquitous spreadsheet program. It can do some things that the Mac version can’t. But if you’re a Mac user, you’ll still find plenty worth watching. Everything is free to watch and download. If you’d like to use these materials elsewhere, please just e-mail me first and ask permission. Questions, suggestions and requests are welcome, too. See my contact information.

And check back now and then. I’ll be adding more stuff soon.

— Ken Blake (Last update: 12/8/2014)

Introducing Excel

Brand new to Excel? Start here. The “Introducing Excel” video covers what you need to know to get going, including rows, columns, cell addresses, formula writing, copying and pasting, sorting, and basic functions. Download the raises.xlsx dataset to follow along with what you see in the video.

Manually keying data into Excel is boring. You’re also quite likely to make all kinds of mistakes. Knowing how to import data into Excel will save you loads of time and frustration. The “Importing data into Excel” video shows you some basic techniques that will get you through most situations. The datasets in this compressed TN Pop Estimates 2010 file will let you follow along.

Maximizing Excel

Ready for some more advanced Excel stuff? The video “Rates, ranks and filters in Excel” introduces you to some powerful tools for comparing things – in this case, murder rates for different U.S. cities. Here’s the data file you’ll need, homicide.xlsx, to try it for yourself.

The “Computing ratios in Excel” video demonstrates a technique for evaluating how balanced – or unbalanced – two things are, like arrests of drug dealers and drug users. Want practice? Here’s the data file, drugs.xlsx, which I’ve snagged from an exercise developed by Rich Gordon of the Miami Herald.

For a specialized, but truly handy, tool, watch the video, “Subtotaling in Excel.” You’ll see how to make short work of aggregating political campaign donor information to learn who a politician’s true best friends are. Here’s the data file, alexanderdonors.xlsx.

And now for something that will truly set you apart. Excel’s “pivot table” feature is one of the program’s handiest capabilities. Learn it, and you’ll use it all the time.

First, check out “Using pivot tables to subtotal, sort and filter in Excel 2010” to see how a pivot table can help you do the same thing as the “subtotaling” feature described above – plus a whole lot more. To follow along, use the alexanderdonors.xlsx file.

The “Excel and pivot tables” video will show you how to quickly count types of things and look for patterns that suggest they are connected. For example, might a certain traffic court judge be giving preferential treatment to male traffic offenders? Here is the Traffic citations.xlsx dataset.

Sometimes, the best way to spot a pattern in data is to represent the data visually. The video “Producing graphics in Excel” will show you how to create and edit column charts, pie charts and scatterplots. Follow along with the raises_analyzed.xlsx dataset.

Extending Excel

Excel and Microsoft Access

Excel can do even more if you pair it with other applications. The “Excel and Microsoft Access” video describes how to bring two Excel files into the Microsoft Access relational databse program and crosscheck the files for matching records. Such checks have underpinned a lot of investigative journalism projects. Here is Excel and MS Access data, a compressed file containing the practice datasets.

Excel and Making Maps with Google Fusion Tables

Sometimes, newsworthy patterns have more to do with where things are than with what they add up to. Pairing Excel with Google Fusion Tables can help you produce Web-based, interactive maps. Here’s an example of one made from precinct-level results of an August race for sheriff in Rutherford County, Tennessee:

Interested in learning how to do one of these? Excel and mapping points with Google Fusion Tables shows you how to make a map showing whether any registered sex offenders live too close to schools or daycare centers. You might need to add the Google Fusion Tables app to your Google Drive account, as shown in this video. Also, here’s the data file.

Excel and mapping polygons with Google Fusion Tables describes how you can map things that have borders, like counties, then shade them according to some criteria, like their unemployment rates. You’ll need these practice files.

Excel and mapping points in polygons with Google Fusion Tables shows you how to combine the above two skills to make maps that not only show the borders of things but also show where specific things are within those borders. The example creates a map showing state legislative districts and where each district’s candidates live. Here are the practice files.

Excel and Inferential Statistics

Some events are purely coincidental. The rest aren’t. Instead, they happen because they are related aspects of some kind of systematic process. Accurately reporting the news often requires being able to tell one kind of event from the other. Excel can help you do it. Some examples:

Excel and univariate inferential statistics illustrates how to use Excel to check out a claim that a particular average or a particular percentage showed up purely by coincidence. To follow along, you’ll need to download and unzip these practice files. The video explores two scenarios:

Suppose it’s the middle of winter, and a number of students living in two campus dorms have complained that their rooms are freezing cold. University housing officials have dismissed these complaints as unfounded. They’ve acknowledged that any given room’s temperature might occasionally get a little chilly. But they say the dorms’ heating systems are in good working order and keep all of the rooms’ temperatures at a comfortable, energy-efficient average of 68 degrees. Notice what’s going on here. The housing officials are making a claim about an average. They are claiming the average temperature of all rooms is 68 degrees. Furthermore, they are dismissing uncomfortably cold rooms as mere coincidences – too few and too isolated to cast doubt on the claim that all room temperatures average about 68 degrees or suggest that some systematic process might be responsible for any colder-than-average room temperatures. Excel can help you pick 25 rooms at random to go measure the temperature in. Then it can help you average these 25 temperatures, determine whether they are different from 68, and, if they are, whether they are too different from 68 to be explained as merely coincidental.

Now suppose the university administration has proposed a new dorm visitation policy that would allow dorm residents to have overnight guests in their rooms. Dorm residents are going to vote on the policy, and the administration has promised to enact it if 50 percent or more of the dorm’s residents vote in favor of it. You’ve polled 170 randomly selected dorm residents regarding their feelings about the policy change. Of those polled, 93, or 55 percent, favor the policy change. Sixty-one, or 36 percent, oppose it, and 16, or 9 percent, said they were undecided. Does the 55 percent support in the poll indicate that the new policy will get at least 50 percent support when all residents vote on the policy? Or could the 55 percent support in the poll stem from the coincidental inclusion in the poll’s sample of a greater percentage of policy supporters than the percentage of supporters among all dorm residents?

Excel and bivariate inferential statistics illustrates how to use Excel to evaluate whether a difference between two percentages or between two averages showed up purely by coincidence. Downloading and unzipping these practice files will allow you to try for yourself what you see in the video. The video explores two scenarios:

The first scenario involves checking out whether buses tend to arrive later at stops in the city’s minority neighborhoods than they do at stops in the city’s mostly white neighborhoods.

The second demonstrates a way to investigate whether a nonrandomly larger percentage of burglaries in white neighborhoods get solved than in minority neighborhoods.

Wondering why you should stake your journalistic reputation on the results of an inferential statistical test? For a complete answer, better enroll in the nearest stats course. But Principles of inferential statistics offers a 10-minute video overview that should be understandable by anyone who knows how to flip a coin.

Excel and Content Analysis

Strange as it may sound, it is possible to statistically analyze … words. That’s right. Not just numbers, but words, too. The concept is pretty simple. It is possible to count how many times a word appears in a document. If the word appears frequently in the document, it’s safe to assume that the document put a lot of emphasis on whatever concept that word represents. And emphasis in a document usually reflects the interests and agendas of the person or people who produced the document. Often, such emphasis – or the lack of it – can be newsworthy.

Basic content analysis with Yoshikoder shows you how to use the free Yoshikoder content analysis program to identify themes in transcripts of the weekly radio addresses that President Barack Obama delivered between November 2012 and April 2013.

You can download Yoshikoder for free from the program’s home page. Meanwhile, download this .zip file of Obama’s weekly radio addresses to follow along.