A short intro to GDELT - Ken Blake, Ph.D.

The Global Database of Events, Language and Tone (GDELT) is a searchable database of news articles published in multiple languages around the world. You can use it to retrieve links to, or images from, articles about particular topics during particular time periods between right now and up to three months ago. You also can produce – and export or embed – interactive visualizations like this one, which shows the “volume” of English-language articles published in the U.S. that mention “Trump.”

You can use GDELT by typing URLs into the address window of any web browser. The things you include in the URL determine the results you will get.

A basic query

Perhaps the most basic query consists of the GDELT path, a “?” connector, and a query command, with the term or terms you want to search for placed within quote marks:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump”

Change “Trump” to whatever you want (e.g., “Joe Biden), and GDELT will search for that term or phrase instead.

Additional query commands

You can add additional instructions to this part of the URL that, for example, adding sourcecountry:US will limit the query to news that originated in the United States:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US

… and adding sourcelang:English will limit the results to U.S.-originated news that is also in English:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English

And adding near5:”wall border” will limit the search to articles that not only mention “Trump” but also mention the words “wall” and “border” within five words of each other. You can use numbers smaller than five if you want a narrower query and larger than five if you want a broader query:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English near5:”wall border”

Adding domainis:nytimes.com will further limit the query to articles appearing at nytimes.com:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English near5:”wall border” domainis:nytimes.com

Note that these additional instructions are preceded by a space, and a colon separates the command (like “domainis”) and argument (like “nytimes.com”).

The GDELT blog lists a number of other instructions that can go into this part of the URL. Look for them between the Full Documentation / Query heading and the Full Documentation / Mode heading. You can add as many of these instructions as you like.

“Mode” commands

The “Mode” section of the URL comes after the query section. Commands included here control what kind of output GDELT will give you in response to your query. Mode commands get preceded by &, with no space between the last command in the query section and the & before the first command in the query section. The default in this section of the URL is &mode=artlist, which tells GDELT to show the results as a list of article links, like the lists you’ve been seeing so far. list of articles:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=artlist

Because &mode=artlist is the default, GDELT will give you and article list if you type no mode commands at all, as we were doing up until now. By default, GDELT limits the number of articles that the &mode=artlist command will show you. You can expand the limit up to 250 by using the command &maxrecords=250 (or any other positive number between 1 and 250):

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=artlist&maxrecords=250

Other display modes include &mode=timelinevolinfo, which produces a chart showing coverage volume over time, plus the query’s 10 most relevant articles at any point in time on the chart:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=timelinevolinfo

The &mode=tonechart option is pretty cool, too. It groups the retrieved articles according to how negative or positive they are:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=tonechart

And for you image buffs, the &mode=imagecollageinfo option returns images associated with the articles your query finds. You might want to add the aforementioned &maxrecords option to get more than the defafult 75 records. In the example, I’ve asked for 100. You can ask for up to 250:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=imagecollageinfo&maxrecords=100

“Format” commands

These commands define the format in which GDELT will give you the results of your search. HTML is the default, which is why the &mode=artlist results, above, appeared as a nicely formatted page in your browser, even though the URL didn’t specify .HTML as the format. There are several format commands, but the most useful one for most people probably will be the &format=csv option. When used in conjunction with &mode=artlist, this command lets you download the search results as a comma-separated value (.csv) file, which can be imported into Excel and other applications:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=artlist&maxrecords=250&format=csv

“Sort” commands

By default, GDELT shows you lists of articles with the articles arranged by descending order of relevance to your search. The &sort= commands let you specify some other sort order. For example, &sort=desc will sort the articles in descending order by date, with the most recent article at the top of the list:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English&mode=artlist&maxrecords=250&sort=datedesc

“Date” commands

By default, GDELT searches through all stories published online during the last three months. The date commands let you specify a more precise search window during the past three months. Broadly, there are two types of data commands. The &timespan= command lets you specify the number of minutes, hours, days, weeks, or months prior to the present moment in which to search for matches. For example, &timespan=1d, in the URL below, will limit the query to cnn.com articles published within the last day:

https://api.gdeltproject.org/api/v2/doc/doc?query=”Trump” sourcecountry:US sourcelang:English domainis:cnn.com&mode=artlist&maxrecords=250&sort=datedesc&timespan=1d

Examples of other possible &timespan specifications include:

&timespan=30 (the most recent 30 minutes. Note that there is no “m” or “minutes” component. Just the number. The documentation says this will work, but I tried it and got an error message.
&timespan=24h (the most recent 24 hours).
&timespan=3w (the most recent three weeks)
&timespan=2m (the most recent two months)

It is also possible to use dates to specify a time range you want GDELT to search. You do it by using combinations of the &startdatetime= and &enddatetime= commands, with each command followed by the date and time in YYYYMMDDHHMMSS format, and no spaces anywhere within or between the commands. For example:

&startdatetime=20190920133005 will show all articles found between now and the year 2019, month 09 (September), day 20 (Sept. 20th), the 13th hour (1 p.m.) 30 minutes into that hour, and five seconds into the 30th minute.
&startdatetime=20190920130000 will show the same as above, but starting at exactly 1 p.m.
&startdatetime=20190920090000 will show the same as above, but starting at exactly 9 a.m.
&enddatetime=20190920090000 will show all matching articles (up to the 250 limit allowed with the &maxrecords=250 command) between three months ago and 9 a.m. on Sept. 20, 2019.
&startdatetime=20190901090000&enddatetime=20190920090000 will show all matching articles (again, up to the 250-article limit) between 9 a.m. on Sept. 1, 2019 and 9 a.m. on Sept. 20, 2019.

Combining &startdatetime= and &enddatetime= to specify a particular time range is a handy way of working around GDELT’s 250-article limit. For example, you’re probably not going to get all “Trump” articles for September of 2019 in a single search, because there probably were more than 250 articles. But if you search each day of September 2019, or each half-day of September 2019, you stand a better chance of capturing all of the “Trump” articles published.

GDELT provides a few form-based tools for querying the GDELT database. They’re less tedious to use than the URL-based commands introduced above. They also cover a considerably longer time period, going back (at present) as far as January 2017 for all news and June 2009 for TV news. But they’re also less flexible. See:

https://api.gdeltproject.org/api/v2/summary/summary. This one lets you specify many of the searches noted above, but by choosing options on a form. Note, too, that you can search web news generally or TV news in particular. TV news content is searchable back to the middle of 2009.
https://api.gdeltproject.org/api/v2/summary/summary?d=web&t=compare. This one lets you simultaneously compare the volume of up to four keywords, or four sets of keywords. For example, try graphing the use of “climate change” and “global warming.” You’ll see that “climate change” has been the preferred news frame. Here, too, you can search web news generally or TV news in particular. In TV news mode, you can compare different networks.