The fourth and final Puzzle of the GBUDV series continues on with chocolate, because we love chocolate! As we’ve seen before, colours and shapes must be used with caution, or you risk misrepresenting the information in your data. We are going to see how best to show the evolution of our variables over time.
To learn more about the GBUDV project, read the introductory article.
The ChocoChoc’s Factory is known for producing tasty candies and chocolates. The CEO, Martin Luke Sanders, has but one creed: ‘Make people happy, give them candy! Make people passionate, give them chocolate!’. Despite some scandals and crisis, the Factory has been running since 2003.
We’ve been tasked with observing the chocolate stocks and prices between 2003 and 2011 to determine the important moments in the life of the company and, perhaps, prepare it a brighter future.
The Good’s message:
|Although stocks have been going up and down, prices have remained quite steady. The Factory is selling as high in 2011 as it did in 2003. Plus, there seems to be a clear correlation between stocks and prices that we can even assume to be a causality link: the less products ChocoChoc has, the more costly they are.|
The Bad’s message:
|Did I ever tell you how much I love tables? It’s, like, text and lines, and no colours, and plain, and all the data is stuffed with no distinctive mark of any kind. It makes it so much harder to spot the interesting information.|
The Ugly’s message:
|Not much change, I think. No, wait: ha! I knew it, that title’s right! Prices have gone up these last few years, we’re finally making profit! And the best thing is we even managed to keep our stocks roughly constant.|
Why is the Good good?
When you want to show two values that are probably linked and vary together, a two-line plot is usually a clever choice. Thanks to this representation, the audience quickly get that stocks and prices changed over time and that one rising corresponds to the other falling down. So, it’s great for a comparative study of time-series.
Because the two axis are separated, you can put all the useful info (label and units) without any chance of collision. But an issue with these graphs is that each axis has its own range… therefore you can be tempted to associate values that are actually very different (for example, noticing where the two curves intersect doesn’t tell you much since their values truly don’t live in the same world). When comparing the two lines, you should only look at their general behaviour rather than their exact values.
While I didn’t highlight any particular point in this chart, you see it would be very easy to add an arrow or a note in the remaining space to give additional information – look at the picture below that provides some clues for these evolutions:
A word of caution, though: a famous saying in data science is ‘correlation does not imply causality’. You must always make sure you’re are not explaining one measurement with something completely unrelated… Bobby Henderson, the inventor of the Flying Spaghetti Monster (or Pastafarianism) parodic religion used this natural bias in his theory to show how the end of pirates caused the terrible climate change we are facing now 😉
Why is the Bad bad?
From my experience, tables are rarely a good way to go. Displaying data should be about, well, displaying it. Not just printing it in cells. The issue with this representation is that it doesn’t output levels of importance and it does not tell any story about your data. Sure, the reader can take it upon himself to extract valuable information from it, since you did lay it all there for her/him. But that should be the job of the one who creates the data visualisation, not of the one it is presented to.
Today, most of the softwares used to make these kind of spreadsheets (Microsoft Excel, LibreOffice, Google Sheets…) allow you to do conditional formatting that helps you put things forward with different background or font colours, various font weights and families, etc. So if you really wish to stick with a table output, use these tools to at least get some piece of information across more easily! Here, it would be interesting to shine the spotlight on the lowest and the highest values in each column so you can start and associate the two series.
Why is the Ugly ugly?
Everything starts with the title: this one is completely oriented and imposes a conclusion before you’ve even taken a look at the charts. You’re already willing to believe prices are higher in 2011 than in 2003, while they are a bit lower.
As usual, the Ugly tries to disturb us by switching from quantitative values to percentages. So given how little our stocks and prices vary relatively, this visualisation mostly conveys the idea that everything has remained constant – when you see the chart on the left, it isn’t obvious stocks have varied so much between 2007 and 2009.
The second trick has to do, once more, with colours. See how I grouped the years thanks to colour? It doesn’t make any sense to do so, of course, but your mind takes the easy road and decides to study the data with these groups.
We know our eyes have trouble perceiving the area size well: it’s worse when you compare two zones of different colours! And, to complicate things even more, you have two graphs to look at at the same time. A first glance will probably convince you all the pieces are about as large. At this point, if you’re like me, you’ll start a 4-steps process:
- you start by thinking there hasn’t been any evolution… but this seems silly, why would anyone show this type of data to an audience?
- on the left pie chart, you have three groups all equivalent; so each one is a third of a circle – alright, let’s take this as reference
- you force yourself to find differences
- you eventually identify that on the ‘prices’ pie chart, red seems bigger than blue: prices have gone up since 2009! And you compare to your reference on the left: yes, it’s more than a third, so it truly is the biggest and ChocoChoc’s is on the rise!
During this train of thoughts, you have looked globally at information that had been voluntarily separated; you have decided of your own reference; you have forced things to appear; you have blinked left and right about 1000 times.
Chances are: the Ugly got you.
The end of the journey
That’s it, folks, we’ve reached the end of the GBUDV series! Throughout this project, I’ve tried to examine with you how deceitful visualisations can be. The world we live in puts so much weight on information and data we have to be careful where it comes from and how it is communicated. Data representations are only one method to dupe the audience but hopefully, these past few days, these examples helped you wrap your head around a few key concepts.
I myself learnt a lot about data visualisation while making this project and I am really happy I got to share it with you. I hope you like it and I am looking forward to reading your comments, remarks and suggestions!
Still, this was just a quick peek at a very vast domain. If you want to learn more about data visualisation, you’ll probably find many resources on the Internet – many of them for free. I’ll leave you with a few links I’ve found quite interesting:
– 25 Tips to Instantly Improve Your Data Visualization Design (by Katy French): a nice article to make your data representations even better!
– Best Resources (by Infogr.am): a list of data visualisation resources (books, articles, websites…)
– 10 Useful Python Data Visualization Libraries for Any Discipline (by Melissa Bierly): a well-detailed list of Python libraries for data visualisation that states the good and the bad for each