Data sources

I frequently come across data sources that are potentially useful to data journalists and/or media scholars. This is an annotated list of the ones I’ve decided to keep track of, both for myself and for others.

GDELT 2.0 API. Search and download TV news content in real time, using a run-in-a-browser API. As an example, see this graphic showing U.S. news coverage volume, over time, of “Sharpiegate.” See also: News volume among top Democratic presidential primary candidates.

Third Eye. Downloadable, real-time database of chyron text from BBC News, CNN, Fox News, and MSNBC newscasts. Chyrons are the on-screen text snippets that summarize or frame the news being reported.

State networks database. A compilation of many state-to-state relational variables, including measures of shared borders, travel and trade between states, and demographic characteristics of state populations. The 2,550 units in the dataset are dyadic state-pairs (e.g., Alabama–Alaska, Alabama–Arizona, Alabama–Arkansas, and so on, for each state plus the District of Columbia). The dataset’s codebook lists a number of variables that are potentially interesting to media types, including measures of ideological differences among states.

Congressional whip count database, by C. Lawrence Evans. “This data archive features extensive information about the “whip counts” conducted by party leaders in Congress. My hope is that it will be useful to scholars, students, and ordinary citizens interested in how coalitions are built within Congress on some of the most significant issues and bills in modern American history.”

Movie technical data compiled by James E. Cutting, including shot durations for 220 popular movies from 1915 to 2015; shot motion, shot luminance, and shot clutter in 220 movies from 1915 to 2015; and shot scale information and segmentation information for 24 movies from 1940 to 2010.

Television News Archive. Exactly what it sounds like.

Television News “Ngrams.” Offers one-word (1gram/unigram) and two-word (2gram/bigram) ngram/single word histograms at half hour resolution for television news coverage on ABC, Al Jazeera, BBC News, CBS, CNN, DeutscheWelle, FOX, Fox News, NBC, PBS, Russia Today, Telemundo and Univision, using data from the Internet Archive’s Television News Archive.