The second round of the Czech 2013 presidential election looks like a close tie, with highly polarized candidates and their support bases alike.
In the last few days, we’ve created an application which enables everybody to play an armchair politologist in their browser, modelling possible scenarios of the election outcome. You can see the final version of the application pictured below (click to load it).
In this post, I’d like to walk you through the design and implementation process, as well as elaborate on the technical solution. For the political context, visit the Wikipedia page about the elections or the accompanying article on iDnes.cz (in Czech).
Given the tight election outcome predictions, my friends Eliška, Josef and Vojta have debated how to create a web application to play with all the possibilities in an entertaining and informative way, reusing the knowledge gained by handling heaps of data in our dayjobs.
From the start, we knew we will use the magnificient D3.js library, which makes creating highly sophisticated, interactive and good-looking visualizations relatively easy. Our model and template was quite clearly the famous, unmatchable 512 Paths to the White House interactive graphic created by Mike Bostock and Shan Carter in the New York Times.
The first step was to create a minimal visual representation of results from the election’s first round. Luckily, it was my off duties day, so I have spent the Friday afternoon fiddling with D3.js and created the initial mockup:
Looks funny, right? Absolutely. But it also demonstrates the single most important feature of D3.js: it’s not a charting library. It does not come with a set of predefined visualization types. It opts for a different approach: it makes it relatively easy to create the visualization from a set of flexible primitives and finely-tuned utility functions.
D3.js goes to great lengths to promote this principle: even the most primitive of all possible chart types, the bar chart, isn’t offered as a pre-packaged solution, but as a simple-to-follow tutorial. The underlying concept is called a data join — in essence, D3.js is just a very pleasant way of setting up a binding between your data and the graphical elements on the screen.
I have ended the first day with a rough vision for the application, and set out to prepare a conference talk for Saturday :)
The next day, Vojta and me have spent some time playing with different layouts and interactive features. You can see how the “turntable faders” idea resurfaced in the form of range sliders for setting the voters participation and the candidate split.
At that time, the code was a sprawling mess, a big ball of d3 declarations, magic numbers in positioning offsets, and duplicated code. A time for rewrite, clearly. So, all of us have spent a chilly Sunday afternoon by the office whiteboard, deconstructing the data set (7 first round candidates, 2 finalists, undecided voters), experimenting with the layout options and visual encoding of the data.
We agreed upon a different grid, which would make enough room for all the control elements, photos, captions, etc.
On Monday, we have split the duties: I started polishing the visual aspects of the application and Vojta started working on a set of Chef recipes for building the supporting infrastructure.
I have ended the day with a rough version of the minimal feature set, with dubious behaviour and buggy semantics. Big things have small beginnings… If you’re particularly nosy, you can clone the Git repository and devour all the silly mistakes and dirty implementation details just by checking out different commits in the history.
On the infrastructure front, we needed a reliable webserver and a flexible storage solution.
It sounds quite silly to create a Chef-based, fully automated provisioning toolchain for an application which will be online for couple of days, right? Not. It follows the single most important principle of the #devops movement: the ability to rebuild your infrastructure from scratch from a set of provisioning scripts, application code, data backup and bare computing power.
For the storage layer, we chose Elasticsearch, a very powerful and flexible search engine, which allows us to store the JSON data generated by our application’s users directly, without any serialization or translation. Elasticsearch is blazingly fast, resilient, and we have plenty of experience with it. This choice was not particularly hard for us.
Additionally, with all the anonymous numerical data stored in Elasticsearch, it will be quite easy to analyze it later by using Elasticsearch’s faceting features (computing the statistical values for the outcome estimations, creating date histograms, etc).
Thanks to the Chef ecosystem, the whole stack is installed and configured in an automated manner in Amazon EC2, all the services are guarded by Monit, data are backed up as EBS snapshots. If needed, we’re able to recreate the whole stack in five minutes. (If you’re interested, we’re using a process based on the Deploying Elasticsearch with Chef Solo tutorial.)
On Tuesday, we have divided our time between two major tasks: first, fixing all the bugs in the application business and drawing logic, and second, creating a proper visual design for the application. Since we have been talking with an online newspaper about the possibility of publishing the application to a wide audience, I have settled on giving the application a decisive newspaper look, which could be described as “a magazine spread come alive”. (This task was quite enjoyable compared to all the calculator-driven coding of the application, which brought dreadful memories of my career as a Flash designer and ActionScript developer for me, and cheerful memories of teenage Atari programming for Josef.)
After a short nap, here comes Wednesday, our go-to-live day. We had an excellent support from our
publisher, and began fighting security restrictions of
design quirks, fine-tuning the design of
<input[range]> sliders in Microsoft Explorer 10,
and stroking our chins about possible support for Firefox, which, amusingly, does not support the
range slider yet. You know, the final phase of any web-based software project.
(We had a brief affair with the html5slider library, but after extensive checking decided to pull it from the already published application. The inconsistencies, no clear way on how to re-trigger the initialization, and many subtle problems just weren’t worth it. So far, we have received only limited complaints; browser feature matrix is clearly a very Darwinian field…)
All the frenzy caused us to miss an optimal prime time for publishing the application and accompanying article — together with the publisher, we scheduled the publication for early morning the next day, giving us enough time to miss another night’s sleep and add many nice features in the process.
On Thursday, we have awaken to find the accompanying article featured on the homepage of the biggest online newspaper in Czech Republic, iDnes.cz, being the second most visited story for the better part of the day. The application visits have been peaking around 3,000 visits per hour for the whole morning and more then 1,000 scenarios have been saved. The audience response was very positive, comparing us (embarrasingly enough :) to the famed New York Times example mentioned earlier.
If anything, the whole story is a reminder how powerful the data driven journalism approach can be, and that it’s quite within reach of most newspapers — they just have to care enough. While an application like this is not something you can whip up in an afternoon and go home — after all, it took two highly skilled developers more than three 12–hour days to create it —, with tools such as D3.js, Elasticsearch, Chef, Ruby and Amazon EC2, it’s an enjoyable and rewarding process.
Spending most of the day in a purple haze, we have been spelunking around the access and error logs with Splunk, watching office colleagues creating and sharing their own scenarios, and debating the election campaign details and silliness.
Nevertheless, above all, we hope that the application will persuade people to go out voting tommorrow. Because the future is predictable, but uncertain.