CUROP summer project update

July 12, 2011

I figure it’s about time I posted an update on the summer project that myself and Ian are supervising. Our summer student Nick is now two weeks into the project and seems to be getting to grips nicely with the problem. He also seems to be coping pretty well with two novice supervisors babbling away at him whenever we meet up! If he leaves the office without being completely overwhelmed with information I think it’s been a successful meeting.

So far Nick has managed to code some software that takes textual input, passes it to several web services for keyword detection and then performs searches for relevant content on Google, Yahoo! and Bing. The next stage is to get to grips with the temporal nature of conversations so that the keyword detection and searches are carried out continually, with some control of time scales and how much text is used as input. After that we’ll start to look at algorithms for combining keywords to get the most successful search queries possible. Meanwhile, we need to come up with a human evaluation that isn’t going to make everyone at MobiSoc hate us when we ask them to do it, and an algorithmic evaluation that makes sense and actually evaluates the right parts of the system.

Overall though we’re ahead of schedule, which is a very good place to be. I’ll continue to post about progress as the project goes on.

WowMom 2011

June 29, 2011

Last week I attended WowMom 2011 in Lucca, Italy. The conference was pretty good, but I was mainly there for the Autonomic and Opportunistic Computing (AOC) workshop where I was presenting some work, as I mentioned in an earlier post. The workshop was really interesting, a lot of the work was relevant to work we’d done in the past on the SocialNets project and work we’re doing in the future with the Recognition project. There were some very interesting discussions on areas such as mobility models and mobility traces and the capturing of user data, particularly in the keynote from Tristan Henderson and also in the panel session at the end of the day. I also met some very interesting people from around the place and hopefully will run into them again at some conference or other down the line.

I’ve just got to sort through all my notes now so I can talk about it all at MobiSoc tomorrow lunchtime!

Bad Foursquare Day...

June 17, 2011

I can understand losing mayorships, but when it’s somebody close to you and she steals two from you in one day, it’s ridiculous:

I will have my revenge…

Curop Project

June 14, 2011

Some excellent news yesterday as we’ve heard that we’ve got the CUROP funding for the summer project that I previously mentioned.

All being well, it should start within the next couple of weeks, so I’ll update with the progress once there is some!

Logging in to websites with python

June 9, 2011

As previously explained, I needed a python script to login to a website so I could access data. There’s loads of examples out on the web of how to do this, my solution (mashed together from many examples) is described below. For the whole script, jump to the end.

Firstly, we need to set some simple variables about the website we’re trying to log in to. Obviously, I’m trying to login to myfitnesspal, but this process should work with most websites that use a simple form + cookie based login process. We need to set the url we are trying to access, where to post the login information to, and a file to store cookies in:

# url for website        
base_url = 'http://www.myfitnesspal.com'
# login action we want to post data to
login_action = '/account/login'
# file for storing cookies
cookie_file = 'mfp.cookies'

Then we need to setup our cookie storage, and url opener. We want the opener to be able to handle cookies and redirects:

import urllib, urllib2
import cookielib

# set up a cookie jar to store cookies
cj = cookielib.MozillaCookieJar(cookie_file)

# set up opener to handle cookies, redirects etc
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cj)
)
# pretend we're a web browser and not a python script
opener.addheaders = [('User-agent',
('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) '
'AppleWebKit/535.1 (KHTML, like Gecko) '
'Chrome/13.0.782.13 Safari/535.1'))
]

Next we need to open the front page of the website once to set any initial tracking cookies:

# open the front page of the website to set
# and save initial cookies
response = opener.open(base_url)
cj.save()</pre>
Then finally we can call the login action with our username and password and login to the website:
<pre># parameters for login action
login_data = urllib.urlencode({
'username' : 'my_username',
'password' : 'my_password',
'remember_me' : True
})
# construct the url
login_url = base_url + login_action
# then open it
response = opener.open(login_url, login_data)
# save the cookies and return the response
cj.save()

The parameters for the POST request (and the action to POST to) can usually be found by examining the source of the login page.

There you have it - you should now be logged into the website and can access any pages that the logged in user can normally access through a web browser. Any calls using the ‘opener’ created above will present the right cookies for the logged in user. The cookies are saved to file, so next time you run the script you can check for cookies, try and use them, and only re-login if that doesn’t work.

My full version is attached to this post, it’s under a CC-BY-SA license, so feel free to use it for whatever.

Quite how this will cope when websites catch up to the new EU cookie legislation is anyone’s guess. My guess is it won’t.

Scraping data from MyFitnessPal with python

June 9, 2011

Following my success with extracting my Google Search History in a simple manner, I’ve decided that I should do something similar to extract all the data I’ve been feeding into myfitnesspal for the last 5 months. As I briefly mentioned in the review of the app + website, the progress graphs leave a lot to be desired and there’s very little in the way of analysis of the data. I have a lot of questions about my progress and there is no easy way to answer all of them using just the website. For instance, what is my average sugar intake? Is this more or less than my target intake? How does my weekend nutrition compare to my weekday nutrition? How much beer have I drunk since starting to log all my food?

Unfortunately there isn’t an API for the website yet, so I’m going to need to resort to screen scraping to extract it all. This should be pretty easy using the BeautifulSoup python library, but first I need to get access to the data. My food diary isn’t public, so I need to be logged in to get access to it. This means my scraping script needs to pretend to be a web browser and log me in to the website in order to access the pages I need.

I initially toyed with the idea of reading cookies from the web browser sqlite cookie database, but this is overly complex. It’s actually much easier just using python to do the login as a POST request and to store any cookies received back from that. Fortunately I’m not the first person to try and do this, so there’s plenty of examples on StackOverflow of how to do it. I’ll post my own solution once it’s done.

Losing Weight in 2011 continued... Libra

June 8, 2011

Another of the very useful apps I’ve been using since the start of the “New Regime” is Libra. It totally makes up for the crappy progress graphs on the MyFitnessPal website or in the mobile app.

It’s an app that has a singular purpose: it’s just for tracking weight. But it does it very well. You enter a weight for every day, it works out statistics based on those weights, calculates the trend value for your weight, (so smoothes out daily fluctuations caused by water intake etc), and predicts when you’ll hit your target. It performs a weekly backup of data to the SD card, and will export data in csv format too.

The Libra app is available on the android market for free, with a paid ad-free version also available.

Google Web History - Wordle

June 7, 2011

So, you’ve downloaded your Google search history, what’s the first thing you do? Split all the queries into individual words and make a wordle of course:

There we have it - my use of Google for five years. Turns out I do programming and live in Cardiff. Who’d have thought it?

**edit: **why have I googled for google so much?

Losing Weight in 2011 continued... My Fitness Pal

June 7, 2011

(The first in a series of posts on apps I’ve found useful under ‘the new regime’)

One of the best apps/services I’ve found for general fitness and nutrition and weight loss is MyFitnessPal. I’m fairly sure I wouldn’t have made quite as good progress without it.

The main selling point of the service is that it allows you to track what you eat and what exercise you do, in order to monitor and help regulate your calorific intake. When you create an account you put in the usual details as well as your weight and height, tell it how much activity you do on a daily basis and how much weight you’d like to lose, and it works out how many calories you should eat each day to hit that target. All you have to do is enter the food you eat (by searching the food database) and the exercise you do (cardiovascular/strength training can be entered separately) and it calculates your net deficit/over spend each day.

I’m a big sucker for life-logging, and logging each part of each meal takes this to the extreme. I now have almost 5 months of data on what I’ve been eating. Why I’d want this, I’m not sure, but it’s there now! The service also handles logging stats such as weight, waist, neck and chest measurements, although the progress graphs for these leave a lot to be desired. Like most good services, this is a website with associated mobile app, with android, iPhone and blackberry versions available.

**Features:
**Large database of foods with calorific and nutritional information
Easy logging of food and exercise, weight and measurements
Adjusts calorie allowance as weight changes
Social features - friends, forums etc

**Pros:
**Free!
Mobile app makes it easy to log food or exercise while out and about
Easy to stay on top of calorific intake - actually helps with weight loss
Lots of support, encouragement and advice on forums
Can contribute to database if food is missing
Can report inaccurate data, up-vote correct data

Cons:
Users can contribute to database and some users are stupid
Progress graphs are pretty useless
Mobile app and website sometimes disagree on calories burnt from exercise

The MyFitnessPal website is at http://www.myfitnesspal.com and the mobile apps are available here.

(thanks to my good friend Christopher for the initial heads up on this site!)

How to access and download your Google Web History with wget

June 5, 2011

Google Web History has now been recording all of the searches I made in Google since about 2005. Obviously 6 years of search queries and results is a phenomenal amount of data, and it would be nice to get hold of it all to see what I could make of it. Fortunately Google make the data available as an RSS feed, although it’s not particularly well documented.

(caution - many ‘ifs’ coming up next)

If you’re logged into your Google account the rss feed can be accessed at:

https://www.google.com/history/?q=&output=rss&num=NUM&start=START

If you’re using a *nix based operating system (Linux, Mac OS X etc) you can then use wget on the command line to get the data. The below example works for retrieving the 1000 most recent searches in your history:

wget --user=GOOGLE_USERNAME  \
--password=PASSWORD --no-check-certificate \
"https://www.google.com/history/?q=&output=rss&num=1000&start=0"

If you’ve enabled 2-factor authentication on your google account you’ll need to add an app-specific password for wget so it can access your account - the password in the example above should be this app-specific password, not your main account password. If you haven’t enabled 2 factor authentication then you might be able to use your normal account password, but I haven’t tested this.

A simple bash script will then allow you to download the entire search history:

for START in 0 1000 2000 3000 ... 50000  
do
wget --user=GOOGLE_USERNAME \
--password=WGET_APP_SPECIFIC_PASSWORD --no-check-certificate \
"https://www.google.com/history/?output=rss&amp;num=1000&amp;start=$START"
done

You may need to adjust the numbers in the first line - I had to go up to 50000 to get my entire search history back to 2005, you may need to make fewer calls if your history is shorter, or more if its longer.