Categories
Blog

Data in Parliament? It’s complicated…

By Michael Smethurst and Ben Worthy

We’re the data and search team in the Parliamentary Digital Service. We’re currently working on:

  1. Building a data platform to power the website. [1]
  2. Designing and developing a data model that properly ties together parliamentary people, processes and outputs.
  3. Improving search internally and externally.

This is easy to write but difficult to do. Mainly because it’s complicated.

It’s complicated…

As with any website that serves a wide variety of users with a wide variety of needs, Parliament’s site has grown organically over the last few years. The website holds a huge amount of information, from details of House of Commons bars, EDMs relating to sports drinks and Jaffa cakes,  lists of church measures subject to scrutiny, to a virtual tour of the Victoria Tower. The problem is that the current site is part powered and fed by data coming from lots of internal business systems cobbled together. Nothing is quite connected.

For example, there are roughly 100 different sub-domains under parliament.uk with bits of parliamentary business sticking out at strange angles from the main body of the site. It combines all the complications of parliamentary business with a hugely complicated interface that happens to be complicated in a completely different way.

For now we’re concentrating on the overall information architecture. Here are three ways in which it’s complicated:

  1. Different strokes for different folks: business applications are often commissioned by individual departments and offices in Parliament. Consequently, they don’t fit together.
  2. Lost in translation: these business applications aim to help with the day-to-day functions of Parliament. Once the data makes its way to data.parliament, the human context is lost and the value of the information degrades (i.e. no one else is sure what it means by the time it gets to the public).
  3. Poor fit: individual datasets are not well linked or labelled. Because it hasn’t been anyone’s job to create and maintain any consistent labels, different labels for the same thing abound e.g. a free-text entry box which results in data like ‘Department of Health’, ‘Health’, ‘health’, ‘doh’, ‘DoH’ etc.

It gets more complicated…

The parliamentary website and the open data portal are the tip of the iceberg – the business data under the waterline isn’t properly joined up because parts of the business aren’t properly joined up. Nothing is joined up. Stating the obvious, if the data doesn’t join, the website won’t link together, which makes fixing the information architecture of the website decidedly non-trivial.

So, redesigning the website isn’t a surface polishing job. There’s a chain of dependencies from the website to the data platform to the business apps to the business.

Luckily, elsewhere in Parliament there’s also the Indexing and Data Management Section (IDMS): a team of librarians responsible for cataloguing parliamentary material for search. Part of our work is to give IDMS the right tools to do their job: namely to index and link data. So far we’ve been referring to the last bit as ‘stapling’. It probably needs a better label but, alas, first labels usually stick. Here’s how it happens – the blue bits link data together creating a common index:

 data-in-parliament-01

So, here’s how the final version should look…

data-in-parliament-02

Data (and content) flow from business apps to the data platform (where they’re stapled) and from there to the website.

And so how do we get there from here?

For the foreseeable future, we need to move from the complicated present to the slightly less complicated future. Something shaped roughly like this. At the same time as doing this work we need to make sure:

  • We ask the questions we need to ask while not accidentally concreting naivety into data models.
  • We design the model to be stable enough to build on top of but flexible enough to cope with changes in Parliament in rapidly changing times.
  • We balance the user needs of external website users with the reporting needs of the business.
  • We design the information architecture of the website and agree on common models with other parliaments.
  • We tread a tightrope between all the things that are complicated without accidentally trying to model the things that are complex.
  • We see what processes can be changed and are capable of changing while identifying all the things we can’t do because they’re impossible given democracy.

Call for assistance

In order to do all of this we need help in three areas:

  • Understanding how academics use parliamentary data and what improvements they’d like to see.
  • Access to expertise around how Parliament works.
  • Access to expertise around how UK parliamentary processes might map to other legislatures.

If you’d like to help please get in touch by emailing us: data@parliament.uk or via Twitter @ukparlidata

There’s a longer and slightly more technical version of this post on the Parliamentary Digital Service blog, available here.

Michael Smethurst is a data architect at the Parliamentary Digital Service. He blogs at http://smethur.st, and tweets @fantasticlife.

Ben Worthy is Lecturer in Politics at Birkbeck, University of London. He tweets @BenWorthy1.

[1] Centralized computing system for collecting, integrating and managing large sets of structured and unstructured data from disparate sources