MPs citing data… drilling down

Indeed it isn’t a dry fact, dry facts are things you can reference and cite, and you haven’t.  More to the point, the very wonderful and independent have already taken this one apart…


In November, and then again last week I presented automatically generated rankings of how likely MPs are to cite their sources.  Today we’re going to look in much more detail at the ten MPs who *do* cite most of the sources.  We want to know two things:

  • which of the MPs are really doing it properly
  • how accurate are the automatically generated rankings.

Along the way we’ll discover a couple of other interesting things.

So as I told you in advance, I used my normal script to download all the tweets from all MPs and rank them. The top ten MPs happened to be:


  • David Cameron
  • Ed Balls
  • Ed Vaizey
  • Jon Cruddas
  • Liz Kendall
  • Mark Lazarowicz
  • Sadiq Khan
  • Steve Baker
  • Tessa Jowell
  • Vince Cable

(presented alphabetically)


I then took all of the tweets from those ten and anonomised them so that examples like this:


“1m vanished from electoral register in just 1 year – why are Party & Partys so complacent with our democracy? (£)

I replaced all twitter tags with “@someone”, all Party names with “party” and removed links. I also replaced names of major leaders with “Leader” and so on.  We were left with an anonomised corpus that you can see here.

This means you occasionally find you are now looking at tweets like:

“@someone @someone @someone @someone @someone @someone @someone @someone so lovely to meet you all!”


So – those anonomised tweets (all 1,500 of them) were reviewed by three people (myself and two others) to see if they would be counted as a ‘fact’. To be clear, we mean – “fact that should be referenced because it’s something we might argue about”  For an example this is a fact:



But this is a diary entry:


Some things are on the edge:


“If the Party win the election, Britain will face the biggest spending cuts of any major advanced economy”

While one could argue that this is presented as a fact, it’s understood by tone that this is a guess (which politician said this is, of course, left as an exercise for the reader).

So the reviews came back (you can see them all in the Google spreadsheet and we accepted majority vote (so if two of us thought it was a fact that needed a reference, then it was a fact that needed a reference). There will doubtless be errors and omissions, and indeed, you can probably find many of both, but we did the best we could, unpaid, and with hours of effort.

How many ‘facts’ are there?

We, as humans found 232 facts in 1,530 tweets.   This means that:

Among 10 of the MPs that were most likely to cite sources, the panel found that about 16% of their tweets needed references.

I suspect this number reflects the general noise of communication.


How effective is the automatic ranking?

One of the things that is an issue with the automatic ranking that you see on the previous page is that it works by a fairly simple heuristic of “If the tweet has a number in it, it’s probably a fact”.  One of the things I wanted to do with today’s post was look into how accurate that was.  So this table looks into the relationship between how ‘facty’ our judges found it things that matched the heuristic:

Tweets containing number
3 votes for fact 72.08%
2 votes for fact 59.49%
0 vote for fact(100 sample tweets) 21%

This is actually biased against the heuristic because a lot of “With 89 days to go until the election” tweets that made it into the 0 vote sample.

It’s clear that containing a number correlates fairly well with being regarded as a fact by our judges. This is a stronger result than I expected and makes me more comfortable with the ranking as a whole.


So how does the top ten shake out? (Or: Just how sick are you of looking at Ed Vaizey’s Tweets)


Ed Vaizey tweets a lot, a very lot. And he’s pretty good at referencing them


Here’s one I like…

…although I’m unconvinced that having lots of advertising is a good thing… In any case I quite like Ed’s tweets, although they do have the slight feel that he may have privatised his Twitter and sold it to Sky Broadband.

In any case this is how the world shakes out for the top ten, ranked by percentage:

Rank MP Facts Cited Facts Percentage
1= Jon Cruddas 3 3 100.00%
1= Liz Kendall 1 1 100.00%
1= Mark Lazarowicz 7 7 100.00%
4 Ed Vaizey 73 64 87.67%
5 Ed Balls 9 7 77.78%
6 Vince Cable 13 10 76.92%
7 Tessa Jowell 40 30 75.00%
8 Sadiq Khan 41 25 60.98%
9 Steve Baker 5 3 60.00%
10 David Cameron 26 12 46.15%

(The eagle-eyed amongst you will notice that there are slightly fewer tweets here than you would expect – that’s got a lot to do with the person who stole my laptop with the full corpus on it last week)

The thing to notice is: this isn’t that many facts for some of these guys. It’s interesting to consider what the purpose of an MPs twitter is (and, of course, they might use them differently) is it to respond to constituents? Promote debate? Defend their actions? Make arguments to sway people? Or is it to throw mud, distort issues and generally heckle? Almost everything I want an MP to do requires referencing their sources….


Some extra bits…

For those paying close attention, you’ll notice that the top ten MPs in this article isn’t the same as the top ten MPs in the last article. That’s because I download the MPs data in march and then passed (the anonmised) data onto my judges.   Then my laptop got stolen with the corpus on.  I redownloaded the main data and published it last week but it was different data to the data my judges had worked on.  So today’s post is based on the March data and last week’s is based on May data. Hope that clears things up.

Leave a Reply