QUICK DISCLAIMER: this is a quick and dirty solution to a problem, so may not represent best coding practice, and has absolutely no error checking or handling. Use with caution…
A recent project has needed me to scrape some data from Twitter. I considered using Tweepy, but as it was a project for the MSc in Computational Journalism, I thought it would be more interesting to write our own simple Twitter API wrapper in Python.
The code presented here will allow you to make any API request to Twitter that uses a GET request, so is really only useful for getting data from Twitter, not sending it to Twitter. It is also only for using with the REST API, not the streaming API, so if you’re looking for realtime monitoring, this is not the API wrapper you’re looking for. This API wrapper also uses a single user’s authentication (yours), so is not setup to allow other users to use Twitter through your application.
The first step is to get some access credentials from Twitter. Head over to https://apps.twitter.com/ and register a new application. Once the application is created, you’ll be able to access its details. Under ‘Keys and Access Tokens’ are four values we’re going to need for the API - the Consumer Key
and Consumer Secret
, and the Access Token
and Access Token Secret
. Copy all four values into a new python file, and save it as ‘_credentials.py
’. Once we have the credentials, we can write some code to make some API requests!
First, we define a Twitter API object that will carry out our API requests. We need to store the API url, and some details to allow us to throttle our requests to Twitter to fit inside their rate limiting.
class Twitter_API:
def __init__(self):
scheme = "https://"
api_url = "api.twitter.com"
version = "1.1"
self.api_base = scheme + api_url + "/" + version
query_interval = float(15 * 60)/(175)
self.__monitor = {'wait':query_interval,
'earliest':None,
'timer':None}
We add a rate limiting method that will make our API sleep if we are requesting things from Twitter too fast:
def __rate_controller(self, monitor_dict):
if monitor_dict['timer'] is not None:
monitor_dict['timer'].join()
while time.time() < monitor_dict['earliest']:
time.sleep(monitor_dict['earliest'] - time.time())
earliest = time.time() + monitor_dict['wait']
timer = threading.Timer( earliest-time.time(), lambda: None )
monitor_dict['earliest'] = earliest
monitor_dict['timer'] = timer
monitor_dict['timer'].start()
The Twitter API requires us to supply authentication headers in the request. One of these headers is a signature, created by encoding details of the request. We can write a function that will take in all the details of the request (method, url, parameters) and create the signature:
def get_signature(self, method, url, params):
encoded_params = {}
for k, v in params.items():
encoded_k = urllib.parse.quote_plus(str(k))
encoded_v = urllib.parse.quote_plus(str(v))
encoded_params[encoded_k] = encoded_v
sorted_keys = sorted(encoded_params.keys())
signing_string = ""
count = 0
for key in sorted_keys:
signing_string += key
signing_string += "="
signing_string += encoded_params[key]
count += 1
if count < len(sorted_keys):
signing_string += "&"
base_string = method.upper()
base_string += "&"
base_string += urllib.parse.quote_plus(url)
base_string += "&"
base_string += urllib.parse.quote_plus(signing_string)
signing_key = urllib.parse.quote_plus(client_secret) + "&" + urllib.parse.quote_plus(access_secret)
hashed = hmac.new(signing_key.encode(), base_string.encode(), sha1)
signature = base64.b64encode(hashed.digest())
return signature.decode("utf-8")
Finally, we can write a method to actually make the API request:
def query_get(self, endpoint, aspect, get_params={}):
self.__rate_controller(self.__monitor)
str_param_data = {}
for k, v in get_params.items():
str_param_data[str(k)] = str(v)
url = self.api_base + "/" + endpoint + "/" + aspect + ".json"
header_parameters = {
"oauth_consumer_key": client_id,
"oauth_nonce": uuid.uuid4(),
"oauth_signature_method": "HMAC-SHA1",
"oauth_timestamp": time.time(),
"oauth_token": access_token,
"oauth_version": 1.0
}
signing_parameters = {}
for k, v in header_parameters.items():
signing_parameters[k] = v
for k, v in str_param_data.items():
signing_parameters[k] = v
header_parameters["oauth_signature"] = self.get_signature("GET", url, signing_parameters)
header_string = "OAuth "
count = 0
for k, v in header_parameters.items():
header_string += urllib.parse.quote_plus(str(k))
header_string += "=\""
header_string += urllib.parse.quote_plus(str(v))
header_string += "\""
count += 1
if count < 7:
header_string += ", "
headers = {
"Authorization": header_string
}
url = url + "?" + urllib.parse.urlencode(str_param_data)
request = urllib.request.Request(url, headers=headers)
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
print(e)
raise e
except urllib.error.URLError as e:
print(e)
raise e
raw_data = response.read().decode("utf-8")
return json.loads(raw_data)
Putting this all together, we have a simple Python class that acts as an API wrapper for GET requests to the Twitter REST API, including the signing and authentication of those requests. Using it is as simple as:
ta = Twitter_API()
params = {
"screen_name": "martinjc",
}
user_tweets = ta.query_get("statuses", "user_timeline", params)
As always, the full code is online on Github, in both my personal account and the account for the MSc Computational Journalism.