Scraping the Assembly…

M’colleague is currently teaching a first-semester module on Data Journalism to the students on our MSc in Computational and Data Journalism. As part of this, they need to do some sort of data project. One of the students is looking at the expenses of Welsh Assembly Members. These are all freely available online, but not in an easy to manipulate form. According to the Assembly they’d be happy to give the data out as a spreadsheet, if we submitted an FOI.

To me, this seems quite stupid. The information is all online and freely accessible. You’ve admitted you’re willing to give it out to anyone who submits an FOI. So why not just make the raw data available to download? This does not sound like a helpful Open Government to me. Anyway, for whatever reason, they’ve chosen not to, and we can’t be bothered to wait around for an FOI to come back. It’s much quicker and easier to build a scraper! We’ll just use selenium to drive a web browser, submit a search, page through all the results collecting the details, then dump it all out to csv. Simple.

Scraping AM expenses
Scraping AM expenses

I built this as a quick hack this morning. It took about an hour or so, and it shows. The code is not robust in any way, but it works. You can ask it for data from any year (or a number of years) and it’ll happily sit there churning its way through the results and spitting them out as both .csv and .json.

All the code is available on Github and it’s under an MIT Licence. Have fun ūüėČ

Atom Plugins for Web Development

I’ve had a number of students in my web-dev module asking me what plugins I’m using in my text editor, so I thought I’d dash off a quick blog post on the plugins I find useful day-to-day. (Actually, most people are normally¬†asking me ‘how did you do that thing where you typed one word and suddenly you had a whole HTML page? The answer is I used a plugin, so ‘what plugins do you use?’ is really the question they should be asking…)

I’m using Atom as my text editor. It’s free, open source, and generally reliable. If you’re a student on my web-dev course you’re stuck using Sublime Text in the lab for now. I’m pretty sure most of the Atom plugins I use have either direct Sublime equivalents, or similarly functioning alternatives.

There’s a guide to Atom packages here¬†and one for Sublime Text here

A quick google for ‘best atom packages web developer’ will probably get you to a far more comprehensive list than this, but here’s my current pick of useful plugins anyway:

emmet

This is essential for anyone writing any amount of HTML. This is the magic package that allows me to write ‘html:5’ in a blank document, hit the shortcut keys (CTRL + E in my setup), and suddenly have a simple boilerplate HTML page.

emmet auto-completion
emmet auto-completion

It’s ace. Not only that, but it can write loads of HTML for you, and all you have to do is write a¬†CSS selector for that HTML:

html-css-selector-expansion
HTML CSS selector expansion

Great stuff. The documentation is here.

atom-beautify

This will tidy up your code automatically, fixing the indentation and spacing etc. It can even be set to automatically tidy your code every time you save a file. Awesome huh? Imagine being set a coursework where some of the marks were dependent on not writing code that looks like it was written by a five-year old child who’s addicted to hitting the tab key, then finding out that there’s software to strap that five-year olds thumbs to his hands so he can’t hit that tab key. Awesome.

atom-beautify
Atom Beautify tidies your code

color-picker

This one adds a colour picker right into atom. Just CMD-SHIFT-C and choose your colours!

color-picker
Colour Picker in atom

Another useful colour related plugin you may want to look at is Pigments, which can highlight colours in your projects, and gather them all together so you can see your palette.

linter

My last recommendation is linter. This plugin will automatically check your code for errors. You’ll need to install linters for whatever language you want to check, like linter-tidy, linter-csslint, linter-pylint and linter-jshint.

linter
Linter finds errors in your code

 

So there we go – a few recommendations to get you started. Found anything else interesting? Let me know!

NHS Hackday 2015

This weekend I took part in an incredibly successful NHS hackday, hosted at Cardiff University and organised by Anne Marie Cunningham and James Morgan. We went as a team from the MSc in Computational Journalism, with myself and Glyn attending along with Pooja, Nikita, Annalisa and Charles. At the last-minute I recruited a couple of ringers as well, dragging along Rhys Priestland Dr William Wilberforce Webberley from Comsc and Dr Matthew Williams, previously of this parish. Annalisa also brought along Dan Hewitt, so in total we had a large and diverse team.

The hackday

This was the first NHS hackday I’d attended, but I believe it’s the second event¬†held in Cardiff, so Anne Marie and the team have it down to a fine art. The whole weekend seemed to go pretty smoothly (barring a couple of misunderstandings on our part regarding¬†the pitch sessions!). It was certainly¬†one of the most well organised events that I’ve attended, with all the necessary ingredients for successful coding: much power, many wifi and plenty of food, snacks and coffee. Anne Marie and the team deserve much recognition and thanks for their hard work. I’m definitely in for next year.

The quality of the projects created at the hackday was incredibly high across the board, which was great to see. One of my favourites used an Oculus Rift virtual reality headset to create a zombie ‘game’ that could be used to test people’s peripheral vision. Another standout was a system for logging and visualising the ANGEL factors describing a patient’s health situation. It was really pleasing to see these rank highly with the judges too, coming in third and second in the overall rankings. Other great projects brought an old Open Source project back to life, created a system for managing groups walking the Wales Coast path, and created automatic notification systems for healthcare processes. Overall it was a really interesting mix of projects, many of which have clear potential to become useful products within or alongside the NHS. As Matt commented in the pub afterwards, it’s probably the first hackday we’ve been to where several of the projects have clear original IP with commercial potential.

Our project

We had decided before the event that we wanted to build some visualisations of health data across Wales, something like nhsmaps.co.uk, but working with local health boards and local authorities in Wales. We split into two teams for the implementation: ‘the data team’ who were responsible for sourcing, processing and inputting data, and the ‘interface team’ who built the front-end and the visualisations.

Progress was good, with Matthew and William quickly defining a schema for describing data so that the data team could add multiple data sets and have the front-end automatically pick them up and be able to visualise them. The CompJ students worked to find and extract data, adding them to the github repository with the correct metadata. Meanwhile, I pulled a bunch of D3 code together for some simple visualisations.

By the end of the weekend we established¬†a fairly decent system. It’s able to visualise a few different types of data, at different resolutions, is mostly mobile friendly, and most importantly is easily extensible and adaptable. It’s online now on our github pages, and all the code and documentation is also in the github repository.

We’ll continue development for a while to improve the usability and code quality, and hopefully we’ll find a community willing to take the code base on and keep improving what could be a fairly useful resource for understanding the health of Wales.

Debrief

We didn’t win any of the prizes, which is understandable. Our project was really focused on the public understanding of the NHS and health, and not for solving a particular need within (or for users of) the NHS. We knew this going in to the weekend, and we’d taken the decision that it was more important to work on a project related to the course, so that the students could experience some of the tools and technologies they’ll be using as the course progresses than to do something more closely aligned with the brief that would have perhaps been less relevant to the students work.

I need to thank Will and Matt for coming and helping the team. Without Matt wrangling the data team and showing them how to create json metadata descriptors we probably wouldn’t have anywhere near as many example datasets as we do. Similarly, without Will’s hard work on the front end interface, the project wouldn’t look nearly as good as it does, or have anywhere near the functionality. His last-minute addition of localstorage for personal datasets was a triumph. (Sadly though he does lose some coder points for user agent sniffing to decide whether to show a mobile interface :-D.) They were both a massive help, and we couldn’t have done it without them.

Also,¬†of course, I need to congratulate the CompJ students, who gave up their weekend to trawl through datasets, pull figures off websites and out of pdf’s, and create the lovely easy to process .csv files we needed. It was a great effort from them, and I’m looking forward to our next Team CompJ hackday outing.

One thing that sadly did stand out was a lack of participation from Comsc undergraduate students, with only one or two attending. Rob Davies stopped by on Saturday, and both Will and I discussed with him what we can do to increase participation in these events. Hopefully we’ll make some progress on that front in time for the next hackday.

Media

There’s some great photos from the event on Flickr, courtesy of Paul Clarke (Saturday¬†and Sunday). I’ve pulled out some of the best of Team CompJ and added them here. All photos are released¬†under a Creative Commons BY-NC 2.0¬†licence.

 

Elsewhere…

We got a lovely write-up about out project from Dyfrig Williams of the Good Practice Exchange at the Wales Audit Office. Dyfrig also curated a great storify of the weekend.

Hemavault labs have done a round up of the projects here

Computational Journalism – ‘a Manifesto’

While Glyn and I have been discussing the new MSc course between ourselves and with others, we have repeatedly come up with the same issues and themes, again and again. As a planning exercise earlier in the summer, we gathered some of these together into a ‘manifesto’.

The manifesto is online on our main ‘Computational Journalism‘ website with a bit of extra commentary, but I thought I’d upload it here as well. Any comments should probably be directed to the article on the CompJ site, so I’ve turned them off just for this article.

 

MSc Computational Journalism about to launch

For the last two years I’ve been working on a project with some colleagues in the school of Journalism, Media and Cultural Studies (JOMEC) here at Cardiff University and it’s finally all coming together. This week we’ve been able to announce that (subject to some final internal paperwork wrangling) we’ll be launching an MSc in Computational Journalism this September. The story of how the course came about is fairly long, but starts simply with a tweet (unfortunately missing the context, but you get the drift):

An offer via social media from someone I’d never met, asking to pick my brains ¬†about an unknown topic. Of course, I jumped at the invite:

That ‘brain picking’ became an interesting chat over coffee in one of the excellent coffee shops in Cardiff, where Glyn and I discussed many things of interest, and many potential areas for collaboration – including the increased use of data and coding within modern journalism. At one point during this chat, m’colleague Glyn said something like “do you know, I think we should run a masters course on this.” I replied with something along the lines of “yes, I think that’s a very good idea.” That short conversation became us taking the idea of a MSc in Computational Journalism to our respective heads of schools, which became us sat around the table discussing what should be in such a course, which then became us (I say us, it was mainly all Richard) writing pages of documentation explaining what the course would be and arguing the case for it to the University. ¬†Last week we held the final approval panel for the course, where both¬†internal and external panel members all agreed that we pretty much knew what we were doing, that the course was a good idea and had the right content, and that we should go ahead and launch it. From 25th July 2012 to 1st April 2014 is a long time to get an MSc up and running, but we’ve finally done it. Over that time I’ve discovered many things about the University and its processes, drunk many pints of fine ale as we try to hammer out a course structure in various pubs around the city, and have come close on at least one occasion to screaming at a table full of people, but now it’s done. As I write, draft press releases are being written, budgets are being sorted, and details are being uploaded to coursefinder. With any luck, September will see us with a batch of students ready and willing to step onto the course for the first time. It’s exciting, and I can’t wait.

KSRI Services Summer School – Social Computing Theory and Hackathon

I was invited by Simon Caton to come to the KSRI Services Summer School, held at KIT in Germany, to help him run a workshop session on Social Computing. ¬†We decided to use the session as a crash course in retrieving and manipulating data from Social Media APIs – showing the students the basics, then running a mini ‘hackathon’ for the students to gain some practical experience.

I think the session went really well, the students seemed to enjoy it and the feedback was very positive. We spent about 90 minutes talking about APIs, JSON, Twitter, Facebook and Foursquare, then set the students off on forming teams and brainstorming ideas. Very quickly they managed to get set up grabbing Twitter data from the streaming API, and coming up with ways of analysing it for interesting facts and statistics.  A number of the students were not coders, and had never done anything like this before, so it was great to see them diving in, setting up servers and running php scripts to grab the data. It was also good to see the level of team work on display; everyone was communicating, dividing the work, and getting on well. Fuelled by a combination of pizza, beer, red bull and haribo they coded into the night, until we drew things to a close at about 10pm and retired to the nearest bar for a pint of debrief.

Hackathon Students
Teams hard at work hacking with Twitter data

It was a really good experience, and I think everyone got something useful out of it. I’m looking forward to the presentations later on today to see what everyone came up with.

Our slides from the talk are available on slideshare. As usual they’re information light and picture heavy, so their usefulness is probably limited!

SWN Festival 2013 – plans

Last year I had a go at creating a couple of web apps based around the bands playing the SWN Festival here in Cardiff. I love SWN with all my heart, it’s a permanent fixture in my calendar and even if (when) I leave Cardiff it’ll be the one thing I come back for every year. It’s a great way to see and discover new bands, but sometimes the sheer volume of music on offer can be overwhelming. So I wanted to see if I could create some web apps that would help to navigate your way through all the bands, and find the ones that you should go and see.

The first was a simple app that gathered artist tags from Last.FM, allowing you to see which artists playing the festival had similar tags – so if you knew you liked one artist you could find other artists tagged with the same terms. The second (which technically wasn’t ever really finished) would allow you to login with a last.fm account and find the artists whose tags best matched the tags for your top artists in your last.fm profile.

I liked both these apps and found them both useful – but I don’t think they went far enough. I only started development late in the year, about a month before the festival, so didn’t have a lot of time to really get into it. This year I’m starting a lot earlier, so I’ve got time to do a lot more.

Firstly I’d like to repeat the apps from last year, but perhaps combining them in some way. I’d like to include more links to the actual music, making it easy to get from an artist to their songs by including embeds from soundcloud, spotify, youtube etc. I’d also like to try making a mobile app guide to the festival (probably as an android app as the official app is iOS only). I’m hopeful that given enough free time I should be able to get some genuinely useful stuff done, and I’ll be blogging about it here as I work on it.