Tweepy Tweet Scraper 3.0 With Iteration

This script will:

To run a block of code, click the block, then press Shift/Enter. Generally, code blocks must be run in order, from the top of the notebook to the bottom, because later code blocks often depend upon things that took place in earlier code blocks. You also may run the entire code by clicking Cell / Run All

Script by Ken Blake, https://drkblake.com/

Required add-ons: The script requires a once-per-environment installation of Tweepy, Pandas, and XlsxWriter. After the first time you run this script in a given Jupyter Notebook environment, you can speed up execution of the script by changing tweepy, pandas and XlsxWriter to #tweepy, #pandas, and #XlsxWriter. Doing so will tell the script to bypass these installation in future executions.

Twitter developer credentials: In the code below, replace PASTEYOURACCESSTOKENHERE, PASTEYOURACCESSTOKENSECRETHERE, PASTEYOURCONSUMERKEYHERE, and PASTEYOURCONSUMERSECRETHERE in the code below with your unique access token, access token secret, consumer key, and consumer secret, respectively. Be sure to keep the ' marks around each credential. You may obtain these credentials for free by applying for a Twitter developer account. To apply, see: https://developer.twitter.com/en/apply-for-access.

Search and iteration settings:

Edit query to specify the search query you want to use. For help constructing your query, see: https://twitter.com/search-advanced. The default query includes -filter:retweets, which excludes retweets, and min_retweets:100, which limits the capture to tweets that have been retweeted at least 100 times. You may edit or omit these criteria, if you like.

Edit tweets_wanted to indicate how many tweets you would like to retrieve.

Edit since to specify a starting date for your search. By default, the Twitter API will sample only about the last seven days.

Edit Iterations to specify how many times you want the program to pull tweets from Twitter.

Edit SecondsBetweenIteration to specify how many seconds you want the program to wait between iterations. For example, specifying 3600 will cause the program to query Twitter every hour, because an hour = 60 seconds x 60 minutes = 3600 seconds. The default settings in the code below will run the script for 24 hours from the time you launch the code, with one hour between each iteration, and gather up to 900 tweets, each retweeted at least 100 times.

Search, structure, save, repeat: This code runs your search, prints a running count of the number of tweets retrieved per iteration, and timestamps and saves the data from each iteration. Nothing to edit or configure, here, unless you want to find the line of code that reads ('TweetFile_{}.xlsx'.format(datetime.today() and change the generic file name prefix TweetFile to a name that is more descriptive of your project. For example, if you were grabbing tweets about Congress, you could change ('TweetFile_{}.xlsx'.format(datetime.today(0 to ('Congress_{}.xlsx'.format(datetime.today(). Be sure to change nothing other than TweetFile. After the script runs, the file will be available on your computer, in the same directory as the script, in a file labeled with the file name prefix you selected plus the date and time of the file's creation.