Here’s an addition to my last post on using Wikipedia data to analyse attention for the US presidential elections 2012. Here’s another look at the interest not for the candidates’ Wikipedia pages but the general pages for the elections 2008 and 2012. Compared to the candidates’ pages, the attention for the general election page is much lower than for the candidates. Here’s the average values for October 2012:
Mitt Romney (2012): 98,138 Views / day
Barack Obama (2012): 63,104 Views / day
United States presidential election, 2012 (2012): 38,770
United States presidential election, 2008 (2008): 27,907
This monthly average hints at the 2012 elections being very exciting as the general election pages on Wikipedia have seen a 39% traffic increase compared to last elections. This also hold for the following time-series:
While the attention for the election pages in 2012 did not reach the level it had during the 2008 primaries, from mid October the 2012 campaigns were much more interesting according to the Wikipedia numbers. In 2008 we have seen a drop in attention before election day, in 2012 the suspense seems to build up.
One of the most interesting challenges of data science are predictions for important events such as national elections. With all those data streams of billions of posts, comments, likes, clicks etc. there should be a way to identify the most important correlations to make predictions about real-world behavior such as: going to the voting booth and chosing a candidate.
A very interesting data source in this respect is the Wikipedia. Why? Because Wikipedia is
a) open (data on page-views, edits, discussions are freely available on daily or even hourly basis),
b) huge (WP currently ranks as #6 of all web sites worldwide and reaches about a quarter of all online users),
c) specific (people visit the Wikipedia because they want to know something about some topic)
The first step was comparing the candidates Barack Obama and Mitt Romney over time. The resulting graph clearly shows the pivoting points of Obama’s presidential career (click to zoom):
But it also shows how strong Mitt Romney has been since the Republican primaries in January 2012. His Wikipedia page had attracted a lot more visitors in August and September 2012 than his presidential rival’s. Of course, this measure only shows attention, not sentiment. So it cannot be inferred from this data whether the peaks were positive or negative peaks. In terms of Wikipedia attention, Romney’s infamous 47% comments in September 2012 were more than 1/3 as important as Obama’s inauguration in January 2009.
Now, let’s add some further curves to this graph: Obama’s and McCain’s Wikipedia attention during the last elections:
Here’s another version with weekly data:
It’s almost instantly clear how much more attention Obama’s 2008 campaign (in red) gathered in comparison with his 2012 campaign (in green). On the other hand, Mitt Romney is at least when it comes to Wikipedia attention more interesting than McCain had been.
Here’s a comparison of Obama’s 2008 campaign vs. his 2012 campaign:
The last question: Is Mitt Romney 2012 as strong as Obama had been in 2008? Here’s a direct comparison:
A side-remark: I also did a correlation of this data set with Google Correlate. And guess what: The strongest correlation of the data for Obama’s 2012 campaign is the Google search query for “barack obama wikipedia”. There still seem to be a huge number of people using Google as their Wikipedia search-engine.
But this result could also be interpreted the other way round: If there is a strong correlation between Wikipedia usage and Google search queries, this makes Wikipedia an even more important data source for analyses.