As previously explained, I needed a python script to login to a website so I could access data. There’s loads of examples out on the web of how to do this, my solution (mashed together from many examples) is described below. For the whole script, jump to the end.
Firstly, we need to set some simple variables about the website we’re trying to log in to. Obviously, I’m trying to login to myfitnesspal, but this process should work with most websites that use a simple form + cookie based login process. We need to set the url we are trying to access, where to post the login information to, and a file to store cookies in:
base_url = 'http://www.myfitnesspal.com'
login_action = '/account/login'
cookie_file = 'mfp.cookies'
Then we need to setup our cookie storage, and url opener. We want the opener to be able to handle cookies and redirects:
import urllib, urllib2
import cookielib
cj = cookielib.MozillaCookieJar(cookie_file)
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cj)
)
opener.addheaders = [('User-agent',
('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) '
'AppleWebKit/535.1 (KHTML, like Gecko) '
'Chrome/13.0.782.13 Safari/535.1'))
]
Next we need to open the front page of the website once to set any initial tracking cookies:
response = opener.open(base_url)
cj.save()</pre>
Then finally we can call the login action with our username and password and login to the website:
<pre>
login_data = urllib.urlencode({
'username' : 'my_username',
'password' : 'my_password',
'remember_me' : True
})
login_url = base_url + login_action
response = opener.open(login_url, login_data)
cj.save()
The parameters for the POST request (and the action to POST to) can usually be found by examining the source of the login page.
There you have it - you should now be logged into the website and can access any pages that the logged in user can normally access through a web browser. Any calls using the ‘opener’ created above will present the right cookies for the logged in user. The cookies are saved to file, so next time you run the script you can check for cookies, try and use them, and only re-login if that doesn’t work.
My full version is attached to this post, it’s under a CC-BY-SA license, so feel free to use it for whatever.
Quite how this will cope when websites catch up to the new EU cookie legislation is anyone’s guess. My guess is it won’t.