Sustainable Software Institute – Research Data Visualisation Workshop

Last week I  gave a talk and delivered a hands on session at the Sustainable Software Institute’s ‘Research Data Visualisation Workshop‘ which was held at Manchester University. It was a really engaging event, with a lot of good discussion on the issues surrounding data visualisation.

Professor Jessie Kennedy from Edinburgh Napier University gave a great keynote looking at a some key design principles in visualisation, including a number of studies I hadn’t seen before but will definitely be including in my teaching in future.

I gave a talk on ‘Human Science Visualisation’ which really focused on a couple of key issues. Firstly, I tried to illustrate the importance of interactivity in complex visualisations. I then talked about how we as academic researchers need publish our interactive visualisations in posterity, and how we should press academic publishers to help us communicate our data to readers. Finally, I wanted to point people towards the excellent visualisation work being done by data journalists, and that the newsrooms are an excellent source of ideas and tips for data visualisation. The slides for my talk are here. It’s the first time I’ve spoken about visualisation outside of the classroom, and it was a really fun talk to give.

We also had two great talks from Dr Christina Bergmann and Dr Andy South, focusing on issues of biological visualisation and mapping respectively. All the talks generated some good discussion both in the room and online, which was fantastic to see.

In the afternoon I lead a hands on session looking at visualising data using d3. This was the first time I’d taught a session using d3 v4, which made things slightly interesting. I’m not fully up to speed with all the areas of the API that have changed, so getting the live coding right first time was a bit tricky, but I think I managed. Interestingly, I feel that the changes made to the .data(), .exit(), .enter(), update cycle as discussed in Mike’s “What Makes Software Good” make a lot more sense from a teaching perspective. The addition of .merge() in particular helps a great deal. As you might expect from a d3 workshop that lasted a mere three hours, I’m not entirely convinced that everybody ‘got’ it, but I think a most went away satisfied.

Overall it was a very successful workshop. Raniere Silva did an excellent job putting it together and running the day, and I really enjoyed it. I’m looking forward to seeing what other people thought about it too.

NHS Hackday 2015

This weekend I took part in an incredibly successful NHS hackday, hosted at Cardiff University and organised by Anne Marie Cunningham and James Morgan. We went as a team from the MSc in Computational Journalism, with myself and Glyn attending along with Pooja, Nikita, Annalisa and Charles. At the last-minute I recruited a couple of ringers as well, dragging along Rhys Priestland Dr William Wilberforce Webberley from Comsc and Dr Matthew Williams, previously of this parish. Annalisa also brought along Dan Hewitt, so in total we had a large and diverse team.

The hackday

This was the first NHS hackday I’d attended, but I believe it’s the second event held in Cardiff, so Anne Marie and the team have it down to a fine art. The whole weekend seemed to go pretty smoothly (barring a couple of misunderstandings on our part regarding the pitch sessions!). It was certainly one of the most well organised events that I’ve attended, with all the necessary ingredients for successful coding: much power, many wifi and plenty of food, snacks and coffee. Anne Marie and the team deserve much recognition and thanks for their hard work. I’m definitely in for next year.

The quality of the projects created at the hackday was incredibly high across the board, which was great to see. One of my favourites used an Oculus Rift virtual reality headset to create a zombie ‘game’ that could be used to test people’s peripheral vision. Another standout was a system for logging and visualising the ANGEL factors describing a patient’s health situation. It was really pleasing to see these rank highly with the judges too, coming in third and second in the overall rankings. Other great projects brought an old Open Source project back to life, created a system for managing groups walking the Wales Coast path, and created automatic notification systems for healthcare processes. Overall it was a really interesting mix of projects, many of which have clear potential to become useful products within or alongside the NHS. As Matt commented in the pub afterwards, it’s probably the first hackday we’ve been to where several of the projects have clear original IP with commercial potential.

Our project

We had decided before the event that we wanted to build some visualisations of health data across Wales, something like nhsmaps.co.uk, but working with local health boards and local authorities in Wales. We split into two teams for the implementation: ‘the data team’ who were responsible for sourcing, processing and inputting data, and the ‘interface team’ who built the front-end and the visualisations.

Progress was good, with Matthew and William quickly defining a schema for describing data so that the data team could add multiple data sets and have the front-end automatically pick them up and be able to visualise them. The CompJ students worked to find and extract data, adding them to the github repository with the correct metadata. Meanwhile, I pulled a bunch of D3 code together for some simple visualisations.

By the end of the weekend we established a fairly decent system. It’s able to visualise a few different types of data, at different resolutions, is mostly mobile friendly, and most importantly is easily extensible and adaptable. It’s online now on our github pages, and all the code and documentation is also in the github repository.

We’ll continue development for a while to improve the usability and code quality, and hopefully we’ll find a community willing to take the code base on and keep improving what could be a fairly useful resource for understanding the health of Wales.

Debrief

We didn’t win any of the prizes, which is understandable. Our project was really focused on the public understanding of the NHS and health, and not for solving a particular need within (or for users of) the NHS. We knew this going in to the weekend, and we’d taken the decision that it was more important to work on a project related to the course, so that the students could experience some of the tools and technologies they’ll be using as the course progresses than to do something more closely aligned with the brief that would have perhaps been less relevant to the students work.

I need to thank Will and Matt for coming and helping the team. Without Matt wrangling the data team and showing them how to create json metadata descriptors we probably wouldn’t have anywhere near as many example datasets as we do. Similarly, without Will’s hard work on the front end interface, the project wouldn’t look nearly as good as it does, or have anywhere near the functionality. His last-minute addition of localstorage for personal datasets was a triumph. (Sadly though he does lose some coder points for user agent sniffing to decide whether to show a mobile interface :-D.) They were both a massive help, and we couldn’t have done it without them.

Also, of course, I need to congratulate the CompJ students, who gave up their weekend to trawl through datasets, pull figures off websites and out of pdf’s, and create the lovely easy to process .csv files we needed. It was a great effort from them, and I’m looking forward to our next Team CompJ hackday outing.

One thing that sadly did stand out was a lack of participation from Comsc undergraduate students, with only one or two attending. Rob Davies stopped by on Saturday, and both Will and I discussed with him what we can do to increase participation in these events. Hopefully we’ll make some progress on that front in time for the next hackday.

Media

There’s some great photos from the event on Flickr, courtesy of Paul Clarke (Saturday and Sunday). I’ve pulled out some of the best of Team CompJ and added them here. All photos are released under a Creative Commons BY-NC 2.0 licence.

 

Elsewhere…

We got a lovely write-up about out project from Dyfrig Williams of the Good Practice Exchange at the Wales Audit Office. Dyfrig also curated a great storify of the weekend.

Hemavault labs have done a round up of the projects here

GeoJSON and topoJSON for UK boundaries

I’ve just put an archive online containing GeoJSON and topoJSON for UK boundary data. It’s all stored on Github, with a viewer and download site hosted on Github pages.

Browser for the UK topoJSON stored in the Github repository
Browser for the UK topoJSON stored in the Github repository

The data is all created from shapefiles released by the Office of National Statistics, Ordnance Survey and National Records Scotland, all under the Open Government and OS OpenData licences.

In later posts I’ll detail how I created the files, and how to use them to create interactive choropleth maps.

The Graphical Web 2014

photo of the author outside Winchester cathedral
(Grumpy) Winchester Cathedral Selfie

Last week I had a lovely time down in Winchester with m’colleague, attending The Graphical Web 2014. This year the theme was ‘Visual Storytelling’, so I’d gone along to see what new things we could learn about visualisation to include in the MSc in Computational Journalism. We’d also already had a few conversations about the course with people who were going to be at the conference, so we were planning to take the opportunity to chat in person about their involvement.

There were many excellent informative and entertaining talks, ranging from the process behind the redesign of Google Maps, through how Twitter does data visualisation, and on to what happens when your data visualisation becomes immensely popular. I’d highly recommend anyone with an interest in any of this to take some time to look through the schedule and watch the videos of some of the talks – I’ll certainly be forcing the MScCompJ students to watch a few.

Scott Murray educates us on the best design process
Scott Murray educates us on the best design process

There were some interesting messages from people at the conference that I’ll be taking forward with my own work and trying to impart to the students. One that is key, I think, is to strike the right balance between detail and simplicity when presenting data. This was mentioned several times throughout the conference, but it really is important. Too much information in your visualisation and you can alienate the reader and confuse or hide your message. Not enough information and the context is lost, and the use of the design to the more advanced reader is reduced. It’s one of those balancing acts that we find so often when trying to mix both people and computers. Attempting to solve this problem and find this balance is challenging and interesting, and I look forward to seeing how the students next year cope with it.

Overall, it was a really good conference. I met a number of interesting people,  found a whole set of new people to follow on Twitter, and returned to Cardiff excited about the year ahead.

 

not another bloody wordle?!?!

(UPDATE: an earlier version of this was totally wrong. It’s better now.)

Inspired by a Facebook post from a colleague, I decided to waste ten minutes this week knocking together a word cloud from the text of my thesis. The process was pretty straightforward.

First up – extracting the text from the thesis. Like all good scienticians, my thesis was written in LaTeX. I thought I could have used a couple of different tools to extract the plain text from the raw .tex input files, but actually none of the tools available from a quick googling seemed to work properly, so I went with extracting the text from the pdf file instead. Fortunately on Mac OS X this is pretty simple, as you can create a straightforward Automator application to extract the text from any pdf file, as documented in step 2 here.

Once I had the plain text contents of my thesis in a text file it was just a simple few lines of python (using the excellent NLTK) to get a frequency distribution of the words in my thesis:

from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize, sent_tokenize

fdist = FreqDist()
with open("2012chorleymjphd.txt", "r") as inputfile:
    for sentence in sent_tokenize(inputfile.read()):
        for word in word_tokenize(sentence):
            fdist.inc(word.lower())

    for word, count in fdist.iteritems():
        if count > 10:
            print "%s: %d" % (word, count)

Then it was just a matter of copying and pasting the word frequency distribution into wordle:

Thesis Wordle

And there we have it. A not particularly informative but quite nice looking representation of my thesis. As you can guess from the cloud, it’s not the most exciting thesis in the world. Interestingly, the word error doesn’t seem to be there ;-).

Summer Project update

We are storming along with summer projects now, and starting to see some really good results.

Liam Turner (who is starting a PhD in the school in October) has been working hard to create a mobile version of the 4SQPersonality app. His work is coming along really well, with a great mobile HTML version now up and running, a native android wrapper working, and an iOS wrapper on its way. With any luck we’ll have mobile apps for both major platforms ready to be released before the summer is over.

Max Chandler, who is now a second year undergraduate, has done some great work looking at the Foursquare venues within various cities around the UK, analysing them for similarity and spatial distribution. He’s just over halfway through the project now and is beginning to work on visualising the data he’s collected and analysed. He’s creating some interesting interactive visualisations using D3, so as soon as he’s done I’ll link to the website here.

It’s been a really good summer for student projects so far, with some really pleasing results. I’ll post more description of the projects and share some of the results as they come to a close in the coming weeks.