Logging in to websites with python

June 9, 2011

As previously explained, I needed a python script to login to a website so I could access data. There’s loads of examples out on the web of how to do this, my solution (mashed together from many examples) is described below. For the whole script, jump to the end.

Firstly, we need to set some simple variables about the website we’re trying to log in to. Obviously, I’m trying to login to myfitnesspal, but this process should work with most websites that use a simple form + cookie based login process. We need to set the url we are trying to access, where to post the login information to, and a file to store cookies in:

# url for website        
base_url = 'http://www.myfitnesspal.com'       
# login action we want to post data to       
login_action = '/account/login'       
# file for storing cookies       
cookie_file = 'mfp.cookies'

Then we need to setup our cookie storage, and url opener. We want the opener to be able to handle cookies and redirects:

import urllib, urllib2
import cookielib

# set up a cookie jar to store cookies
cj = cookielib.MozillaCookieJar(cookie_file)

# set up opener to handle cookies, redirects etc
self.opener = urllib2.build_opener(
     urllib2.HTTPRedirectHandler(),
     urllib2.HTTPHandler(debuglevel=0),
     urllib2.HTTPSHandler(debuglevel=0),            
     urllib2.HTTPCookieProcessor(cj)
)
# pretend we're a web browser and not a python script
opener.addheaders = [('User-agent',
    ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) '
     'AppleWebKit/535.1 (KHTML, like Gecko) '
     'Chrome/13.0.782.13 Safari/535.1'))
]

Next we need to open the front page of the website once to set any initial tracking cookies:

# open the front page of the website to set
# and save initial cookies
response = opener.open(base_url)
cj.save()</pre>
Then finally we can call the login action with our username and password and login to the website:
<pre># parameters for login action
login_data = urllib.urlencode({
    'username' : 'my_username',
    'password' : 'my_password',
    'remember_me' : True
})
# construct the url
login_url = base_url + login_action
# then open it
response = opener.open(login_url, login_data)
# save the cookies and return the response       
cj.save()

The parameters for the POST request (and the action to POST to) can usually be found by examining the source of the login page.

There you have it - you should now be logged into the website and can access any pages that the logged in user can normally access through a web browser. Any calls using the ‘opener’ created above will present the right cookies for the logged in user. The cookies are saved to file, so next time you run the script you can check for cookies, try and use them, and only re-login if that doesn’t work.

My full version is attached to this post, it’s under a CC-BY-SA license, so feel free to use it for whatever.

Quite how this will cope when websites catch up to the new EU cookie legislation is anyone’s guess. My guess is it won’t.

Scraping data from MyFitnessPal with python

June 9, 2011

Following my success with extracting my Google Search History in a simple manner, I’ve decided that I should do something similar to extract all the data I’ve been feeding into myfitnesspal for the last 5 months. As I briefly mentioned in the review of the app + website, the progress graphs leave a lot to be desired and there’s very little in the way of analysis of the data. I have a lot of questions about my progress and there is no easy way to answer all of them using just the website. For instance, what is my average sugar intake? Is this more or less than my target intake? How does my weekend nutrition compare to my weekday nutrition? How much beer have I drunk since starting to log all my food?

Unfortunately there isn’t an API for the website yet, so I’m going to need to resort to screen scraping to extract it all. This should be pretty easy using the BeautifulSoup python library, but first I need to get access to the data. My food diary isn’t public, so I need to be logged in to get access to it. This means my scraping script needs to pretend to be a web browser and log me in to the website in order to access the pages I need.

I initially toyed with the idea of reading cookies from the web browser sqlite cookie database, but this is overly complex. It’s actually much easier just using python to do the login as a POST request and to store any cookies received back from that. Fortunately I’m not the first person to try and do this, so there’s plenty of examples on StackOverflow of how to do it. I’ll post my own solution once it’s done.

Losing Weight in 2011 continued... Libra

June 8, 2011

Another of the very useful apps I’ve been using since the start of the “New Regime” is Libra. It totally makes up for the crappy progress graphs on the MyFitnessPal website or in the mobile app.

It’s an app that has a singular purpose: it’s just for tracking weight. But it does it very well. You enter a weight for every day, it works out statistics based on those weights, calculates the trend value for your weight, (so smoothes out daily fluctuations caused by water intake etc), and predicts when you’ll hit your target. It performs a weekly backup of data to the SD card, and will export data in csv format too.

The Libra app is available on the android market for free, with a paid ad-free version also available.

Google Web History - Wordle

June 7, 2011

So, you’ve downloaded your Google search history, what’s the first thing you do? Split all the queries into individual words and make a wordle of course:

There we have it - my use of Google for five years. Turns out I do programming and live in Cardiff. Who’d have thought it?

**edit: **why have I googled for google so much?

Losing Weight in 2011 continued... My Fitness Pal

June 7, 2011

(The first in a series of posts on apps I’ve found useful under ‘the new regime’)

One of the best apps/services I’ve found for general fitness and nutrition and weight loss is MyFitnessPal. I’m fairly sure I wouldn’t have made quite as good progress without it.

The main selling point of the service is that it allows you to track what you eat and what exercise you do, in order to monitor and help regulate your calorific intake. When you create an account you put in the usual details as well as your weight and height, tell it how much activity you do on a daily basis and how much weight you’d like to lose, and it works out how many calories you should eat each day to hit that target. All you have to do is enter the food you eat (by searching the food database) and the exercise you do (cardiovascular/strength training can be entered separately) and it calculates your net deficit/over spend each day.

I’m a big sucker for life-logging, and logging each part of each meal takes this to the extreme. I now have almost 5 months of data on what I’ve been eating. Why I’d want this, I’m not sure, but it’s there now! The service also handles logging stats such as weight, waist, neck and chest measurements, although the progress graphs for these leave a lot to be desired. Like most good services, this is a website with associated mobile app, with android, iPhone and blackberry versions available.

**Features:
**Large database of foods with calorific and nutritional information
Easy logging of food and exercise, weight and measurements
Adjusts calorie allowance as weight changes
Social features - friends, forums etc

**Pros:
**Free!
Mobile app makes it easy to log food or exercise while out and about
Easy to stay on top of calorific intake - actually helps with weight loss
Lots of support, encouragement and advice on forums
Can contribute to database if food is missing
Can report inaccurate data, up-vote correct data

Cons:
Users can contribute to database and some users are stupid
Progress graphs are pretty useless
Mobile app and website sometimes disagree on calories burnt from exercise

The MyFitnessPal website is at http://www.myfitnesspal.com and the mobile apps are available here.

(thanks to my good friend Christopher for the initial heads up on this site!)

How to access and download your Google Web History with wget

June 5, 2011

Google Web History has now been recording all of the searches I made in Google since about 2005. Obviously 6 years of search queries and results is a phenomenal amount of data, and it would be nice to get hold of it all to see what I could make of it. Fortunately Google make the data available as an RSS feed, although it’s not particularly well documented.

(caution - many ‘ifs’ coming up next)

If you’re logged into your Google account the rss feed can be accessed at:

https://www.google.com/history/?q=&output=rss&num=NUM&start=START

If you’re using a *nix based operating system (Linux, Mac OS X etc) you can then use wget on the command line to get the data. The below example works for retrieving the 1000 most recent searches in your history:

wget --user=GOOGLE_USERNAME  \
--password=PASSWORD --no-check-certificate \
"https://www.google.com/history/?q=&output=rss&num=1000&start=0"

If you’ve enabled 2-factor authentication on your google account you’ll need to add an app-specific password for wget so it can access your account - the password in the example above should be this app-specific password, not your main account password. If you haven’t enabled 2 factor authentication then you might be able to use your normal account password, but I haven’t tested this.

A simple bash script will then allow you to download the entire search history:

for START in 0 1000 2000 3000 ... 50000  
do   
 wget --user=GOOGLE_USERNAME \
  --password=WGET_APP_SPECIFIC_PASSWORD --no-check-certificate \
  "https://www.google.com/history/?output=rss&amp;num=1000&amp;start=$START"
done

You may need to adjust the numbers in the first line - I had to go up to 50000 to get my entire search history back to 2005, you may need to make fewer calls if your history is shorter, or more if its longer.

Losing weight in 2011

June 5, 2011

Since the beginning of the year I’ve been living under what we’ve been calling ‘the new regime’. This ‘new regime’ basically involves not living like a fat useless slob, so I’ve been getting fit, eating healthily and losing weight. So far I’ve lost over 10kg and can now run around the park a few times without collapsing to the floor clutching at my chest and screaming about ambulances, so I’d say its going pretty well. The basic concept behind the new regime is:

Eat less + do more = lose weight.

This will be followed once some weight has been lost by:

_Eat a normal amount + do more = stay the same. _

About a month ago, I came across someone somewhere on the internet recommending “The Hacker’s Diet” as a guide for weight loss. Not having read such a guide before I started on ‘the new regime’ I skimmed it a bit; the tl;dr version is:

Eat less + do more = lose weight.

This doesn’t exactly seem like rocket science to me, but lots of people seem to have a problem grasping this concept. The Hacker’s Diet does a pretty decent job of describing the human body as a simple system with inputs and outputs and manages to explain that if you limit your input and increase your output, you get a deficit and lose weight. So if you find anyone that says ‘Oh, I really struggle to lose weight’, slap them round the back of the head and point them in that direction.

The whole point of this post is that the last couple of chapters of the book contain a lot of information about tracking the calories you eat, the calories you burn, analysing trend from daily weight figures and so on. There’s a lot of detail on how to create spreadsheets to calculate weight trends, how to keep a daily log of calorie intake, and pages and pages of calorific information for food. The thing is, it’s 2011 now so none of those chapters are necessary, because as with anything that’s a pain in the rear end there are now loads of apps available to make life easier. As I’ve been using a number of them for 5 months I figure I’ll share the knowledge and review some of them over the next few days.

Summer project

June 3, 2011

Over the summer, Ian and I are going to be supervising a summer project. We’re going to get a 2nd year undergraduate student for an 8 week project. I’ve stuck up a page about it here, and will post whenever there’s something interesting to see.

Paper published

June 2, 2011

Our latest paper (“Opportunistic social dissemination of micro-blogs”) on some of the last work we did for the Socialnets project has finally been published, and can be viewed online here, or in preprint form from my publications page.

Unexpected

June 2, 2011

So I mentioned something about keeping busy? Yeah, well…

A while back I spent a couple of months playing around with an idea that we thought could make an interesting bit of research. It started with a simple modification to a protocol proposed by someone else (Gavidia et al, A gossip-based distributed news service for wireless mesh networks), where we added some elements of self-adaptation and cooperation. As things sometimes go, the results were quite good but not outstanding and we had more pressing things to look at, so we dropped it and moved onto something else.

It’s never nice to just drop work and not get anything from it though, so we wrote a technical report about the work we’d completed that we could stick in a deliverable somewhere. At the same time, we noticed a conference workshop where a paper on the work might fit, and decided it might be an idea to trim the report down for submission. Unfortunately, at that point things kicked off with a couple of journal papers that we’d been working on at the same time so we didn’t have time to do the submission.

Fast forward to the SocialNets/Recognition meeting in mid February and we learn that the deadline for the workshop was extended. On the spur of the moment we decided to have a bash at a paper for it. We cut the tech report down, gave it an edit, and submitted it. Fast forward again to this week and we get the notification through that the paper has been accepted. Previously we had a bit of work that would never see the light of day, buried at the back of an EU deliverable. Now for very little effort we have a published bit of work, I’ve got another publication to add to the list, and a trip to a conference as well. In Italy. In June. Sometimes life is just too cruel 😃