Data Visualisation: the Good, the Bad and the Ugly (1)

Do you know how to best represent the number of subscribers of a website over time?

Here is my first puzzle in the GBUDV series. Today, we will start off slowly by talking about a website’s subscribers. What’s the best way to show quantitative data over time?

To learn more about the GBUDV project, read the introductory article.

The story

Let’s imagine we have a fictitious website called Happy Site. It’s a wonderful platform where you just need to create an account to post and exchange with others about little happy moments of life. There are two types of accounts: the Free one that enables you to talk with the community and read any article; and the Premium one that requires a monthly fee but allows you to share your thoughts through tons of articles. Okay, great.

Our website’s great logo

Now, we’re just about to show our boss the evolution of the number of subscribers. The thing is: results aren’t great. According to the traffic stats, these last few years, we have lost many subscribers – and in particular Premiums.

How should we go and present our data?

The Good’s message:

Damn. We’re kinda losing followers here. Especially those with a Premium account.

The Bad’s message:

Hum. 2014 seems quite important. Right?

The Ugly’s message:

It’s going up! Yeah! Wait, what am I looking at exactly?

As a side note, I’d like to point out the three graphs have the same title in this example. It is plain and descriptive: ‘HappySite’s subscribers’. It does not give any particular meaning to the data, it does not convey any message.

We’ll see later in the GBUDV series that what surrounds your plot is as important as what’s in it: a confusing title above a good diagram can be really bad for your presentation.

The puzzle

Why is the Good good?

Even though we’d rather hide it, our data essentially proves Happy Site is losing more and more subscribers. With this visualisation, we clearly see that the total amount of followers is decreasing each year. Moreover, using stacked bars is a good way of showing parts of a whole: here, you immediately see what proportion of followers have a Free account or a Premium account compared to the total.

Why is the Bad bad?

While pie charts are a nice-looking visual, they tend to be misused. The issue with those is that it doesn’t make much sense to represent the evolution of a quantitative variable in a time-series with this parts-of-a-whole representation. Sure, your data is in it somewhere, but connecting the dots is too difficult: you only guess figures get lower as years go by.

Plus, putting the two types of accounts side by side makes it harder to identify how they relate to each other each year and it is nearly impossible to see how much the proportion of Premium subscribers has fallen at first.

Why is the Ugly ugly?

This could work and deceive our boss into thinking we’re actually making progress! Why is it? By using a soft set of colors that aren’t in contrast but instead work together, you can convince an inattentive reader only the darker part matters. And this darker part obviously grows regularly.

The trick is to transform your data from raw figures into percentage. By representing the proportion of Premiums versus Frees and stacking them on top of each other, you do get a rising curve… since there are fewer and fewer Premiums! By definition, this means that the light-blue area that corresponds to Premiums is smaller and the dark-blue area that corresponds to Frees is larger.

This visualisation tells you absolutely nothing of the total number of subscribers each year and therefore hides the terrible truth: everyone’s leaving Happy Site!

A peek at the next puzzle…

Next time, we’ll talk about starships and intergalactic fleets. We’ll have a dataset that links 3 different features for each item: can you guess what type of representation is the best for this sort of data?

Don’t hesitate to post your ideas in the comments! 🙂

Leave a Reply

Your email address will not be published.