The GBUDV series continues with Puzzle #2: “The Intergalactic Starfleet”. The big question we’re going to examine this time is how to show data with a large amount of variables.
To learn more about the GBUDV project, read the introductory article.
The year is 6730. Mankind has evolved and it has developed incredible technologies to embark on the conquest of space. A few months ago, the Intergalactic Fleet launched from Earth to journey across the Milky Way. The first reports are coming back and we need to check whether everything is working according to plan.
Each ship has sent a log file containing 4 indicators: its distance to the Mothership orbiting near our planet, its speed, its cost and its type (Speeder, Fighter, Cargo or Planet Base). The Admiral has asked us to sum up this information for the upcoming roundtable discussion.
These logs have arrived in random order from the depths of the galaxy and our systems are sadly unable to reorder the data in some other way automatically! So we are initially working on unsorted data.
The Good’s message:
|There are four types of ships, each quite distinct from each other. Cost more or less varies opposite to the speed and the Planet Base is a unique item. The speed of cargos is steady while the speeders’ may double between two ships.|
The Bad’s message:
|There might be something. Give me an hour or two and I will definitely come up with something about this peak for PB01-A5.|
The Ugly’s message:
|Those very nice colours probably prove a point. For example, PB01-A5 is roughly equivalent to all the Fighters combined. Except for the speed parameter. And also the cost. Meh.|
Why is the Good good?
Our goal is to analyse the fleet and how each ship has done so far, thus grouping ships together by categories is a good first step – it will be easier to process data if we separate light vehicles from large slow cargos. The contrasted colour scheme helps your eye quickly isolate the 4 types of ships. Thanks to the bubble plot, we can even display the third measurement – the ship’s cost – without making the first two too difficult to read. Each axis has its own unit system and its own adapted range, so points aren’t smashed together.
Finally, the title is simple and puts forward the fact our ships can be put in categories. We use the remaining space in the right corner to give additional information on the variable with no axis.
We also avoid the ‘unordered data’ difficulty with this type of chart that automatically resorts things by categories, no matter what data item is read first.
Why is the Bad bad?
First of all, the title is so vague it doesn’t tell you anything about the data – we already knew we would talk about it starships, it’s useless to repeat it. Secondly, using a line chart isn’t the right choice for our example for four reasons:
- all the variables share the same y-axis while they all measure different things: we can’t even label the axis here!
- variables are measured in different units and span in very different ranges: the huge interval the ‘Distance to Mothership’ feature lives in forces us to have a y-axis with an insanely wide range completely inadequate for the other two measurements – so they are compressed at the bottom of the graph
- since our data isn’t ordered, we cannot see the ships’ types and everything is sort of mixed up
- because they are on the same graph, we might think some variables are correlated even though they are not; for example, there is no reason speed and cost should be directly linked – but because of the aforementioned problem with the y-axis, they seem to evolve together…
The only positive thing about this representation is that the legend clearly states the 3 main features we measured for each ship.
To conclude, this visualisation doesn’t make any point other than: ‘man, managing this fleet isn’t a piece of cake’ 😉
Why is the Ugly ugly?
For once, this Ugly relies more on general layout than data manipulation to deceit the reader. By stacking three graphs one after the other, you force the audience to scroll down and, therefore, look at them one by one. So you prevent them from getting the global picture. To me, it is a tour-de-force to have any comprehensive intuition of the ships’ specs and of their nature when you have all this data stripped in distinct plots.
Both the title and the legend at the bottom of the graph are here to dupe the reader. Both insist on individuality and isolate every ship from the others; despite the colors, you tend to think we have 15 ships to examine separately.
Here, a semi-circle chart is not a bad choice per se: it does give a feel of how each ship contributes to the entire fleet’s features. Thanks to the color theme, you even understand there are several groups – although you don’t count them as easily as with the Good.
However, this visualization only shows proportions and we don’t know what is the order of magnitude of the 3 indicators. Plus, it can be misleading because, by definition, it compresses the representation in a smaller area than a radial or a pie chart would. Thus the human eye doesn’t see the size difference between the areas as quickly. Even worse, since ship’s logs are not sorted by ship’s category, you can convince yourself some areas have matching sizes and interpret that wrongly: for example, wouldn’t you say that in the ‘speed’ plotting, the sum of all the yellows is about the same as the sum of the purples? Of course, by grouping the areas together (like in the chart below) we would notice Speeders are all faster than Fighters and that it’s only because there are more Fighters that they balance out, but here…
A peek at the next puzzle…
Puzzle #3 will be about cookies, because everyone love cookies! 🙂