Before Christmas I did a proof of concept ranking of MPs based on how often they cited their sources
via twitter. It was fairly popular and has resulted in an interesting range of correspondence, so I thought it was time to revisit the post and make it a little more scientifically solid. I believe that the world is a better place when politicians actually cite sources rather than plucking facts out of the air and if I have to embarrass some of them into it then that’s fine by me.
I’m writing (and publishing) this post before we work out the ranking because
- good science is setting out your method in advance.
- this is the sort of thing where it’s easy to accuse people of political bias
- because I’m going to ask for people to help!
As you will recall, the last post did everything automatically because it’s obviously very difficult to go thought all 227,392 of the tweets in the study by hand.
Instead I used some simple code to count up how often the MPs included figures, and how many of those posts included links. I was very clear that this was a very rough metric at the time and, in fact, this is why I only listed the top 20 MPs and the bottom 16 or so – it’s a little easy for slight style differences to make quite a lot of difference in the ranking. The main thing, however is this: if you find your MP in the bottom 20 you should probably ask yourself why that’s happened…
However, for the next version I’d like to do something a little more impressive and scientific. Here’s the process I’m going to follow, paying particular attention to the changes from last time.
First: cull the MPs with few tweets. Once I’ve downloaded the fresh data set, I’m going to cull all of those MPs who had fewer than five tweets that included numbers (a worrying thing on it’s own I think, but irrelevant to the main issue), this stops strange things like Glenda Jackson coming very high up because she happened to cite a source in the one tweet she made with a number.
Second: extract the top and bottom 10 tweeters. Culling the MPs will give us a basic ranking. I’m then going to take the top 10 and bottom 10 MPs and subject them to a more rigorous examination. I’m going to anonomise their (full set of) tweets by removing their name and also keywords like ‘Tories’ or ‘Labour’ and scramble the order (I’m also going to remove any links that exist). This anonomised set will go over to a range of volunteer reviewers who will go though and pick out those tweets that represent facts.
If all the reviewers recognise something as an intended fact being quoted then it goes in the list of proper facts, and I then re-calculate the rankings for the top and bottom 10 MPs using this set. So we can be relatively sure that the top and bottom of the list are looked at in a much more scientific way.
I like to think ‘publish’ goes without saying, but in this case I mean: “publish, along with the code, which will have to be modified, the original set of tweets, the list of MPs after the cull, the set of tweets judged ‘facts’ and all of the numbers involved, under a creative commons licence”.
How you can help
I’m looking for people willing to give me a hand with the ‘identifying facts from within the tweets of the top and bottom ten’ bit. I’d like to avoid it being ‘Joe’s mates’ because there is an inherent selection bias there. Feel free to mail email@example.com
if you are interested in getting involved.