Image courtesy of Andy Kirk
Data visualisation has become ever more important as the volume of data is increasing. You see data everywhere, in simple infographics, in the sports reporting and in daily news. We are constantly bombarded with it, and are easily confused by it all. Is it because when we go through our studies as young adults we are only presented with a very small variety of charts? This limitation in our understanding of datasets could be helped by using better visuals.
Andy Kirk, the founder of Visualising Data, is hoping to give scientists a set of tools that will help them to communicate their data in a better way, not only to external audiences, but also to themselves. Being able to visually manipulate data could give scientists a better insight into the stories hidden in their research.
http://www.nature.com/multimedia/podcast/naturejobs/naturejobs-2014-05-01.mp3I caught up with Andy at the British Library during one of his workshops. For Andy, it is to do with “how we give a physical form to the subject data variables. Beyond that, it is also all the other presentational factors.” Here, Andy is referring to the aesthetics to create interactivity, this includes simple things like colour and font type. But some of the more complex issues like arrangement, annotations and architecture are also important to consider. These combined factors “make-or-break the success of a visualisation, particularly when it is a communication device.”
Before you get around to communicating your data however, it will be used during your analysis. “You want to make sense of the data,” says Andy. “To find patterns that you’ve not seen before. You’re moving beyond looking at data, and seeing it.”
The main difference between using data as an analysis tool and a communication tool is that when it is a communication tool you have a different audience. Like a journalist, you now have readers and therefore need to understand your audience. “Characterise the audience” that you’re trying to engage with your data and your science, says Andy. Impact is a big part of the science and academic agenda. So like any other form of science communication, you need to ask yourself: “What do I think they’ll be interested in? What slices of analysis, what slices of a story can I engage with them?” After all, you are the expert. You are the one that completed the research, did the experiments and collected the data.
When analysing the data, you know what you know, and you know what you don’t know. You have the power and capability to “explore data and to tease out new insights, new patterns, new discoveries that,” says Andy, “either confirm what you knew or provide a new enlightenment of a subject.”
To do this, it’s as simple as playing around with different layouts and visualisations. “If we look at visualisation as communication, then things like chart types, this is our visual language. This is the syntax, the verbs that we’ve got to use now to tell stories.” And for many of us, this is made up of a core set of maybe 4 or 5 different types: the bar graph, scatter plot, pie chart, or line graph. What Andy does in his work shops is give scientists a broader vocabulary with which to tell the stories in the data. “There are endless ways we can portray that data.”
Andy thinks that the problem really lies in the way we are taught to analyse data. The visual literacy to read and interpret these graphics doesn’t go beyond those core types mentioned earlier. “We get by, we make sense of a bar chart.” But if scientists decide to go down the more complex visuals route, how can they make sure that they don’t break any bridges between them and their audience? “For a designer, for a creator of these graphics, you need to achieve that through the exponotory features of these graphics,” says Andy. “The labelling, the introductions, the “how to read a graphic” elements.”
“A lot of it is just common sense, caring about the audience: What do I need to give them to learn and read this story that I am portraying?”
A key part of this communication is telling a good story. Story telling in data visualisation comes from threading different elements together into a sequence with a narrative. This is particularly relevant for time-based data.
Andy believes that visualising data as a scientist for analysis comes in two perspectives, the first of which is the Sherlock Holmes perspective: you have a certain hypothesis, and you’ll test it out in the data. You’ll then combine the variables that might lead to a discovery or a confirmation of a hypothesis, or reveal something entirely new. “On the other side, you’ve got this idea of prospecting. You’re going to play with the data, try different combinations of variables. You’re going to almost follow a scent of enquiry and see what clues you will find along the way.” It’s trying to find those unknown unknowns that the biggest challenge, and playing around with the data can help you find them. “Looking at the raw data, you would never find those things. That’s what we’re trying to find with visualisation: seeing the data for the first time.”
There have been many faux-pas when it comes to visualising scientific data, and Andy mentioned several of them in his workshop that morning. But he says one of the biggest ones is when “you’re visualising something that is just inaccessible for a general audience, when it is intended for a general audience.” So even if the subject matter is complex, you still need to find a suitable way to communicate it in the right context.
The second faux-pas is a fundamental misunderstanding of how we perceive fundamental chart types. He uses the simple bar chart as an example:
“The way that we read back a bar chart is by judging the absolute length of the bar that is portrayed. Now if you chop off that bar, and start the baseline at something that isn’t zero, you’re distorting our interpretation of what length that bar actually means.” The other is when visualisation is used as a means of showing off, rather than using it as a tool for understanding.
But in the end, humans aren’t the best at interpreting statistics, so the helping hand that scientists can give their audiences by using visualisation design can be extremely beneficial. “It’s what I describe as the annotation lay: the level of user assistance you need to give your readers in how to read and digest and consume this graphic,” says Andy. On the simplest level, this could be a clear, simple title. Conclusions and take-aways can also be given to help them. It’s almost as if you could recreate that sense of “being stood in front of a big chart on a big display, and you [the scientist] being stood there, pointing out the key things with your hand…. So we have to make sure that if we’re not there to physically point out things, and physically there to coach people through how to read and interpret something, the properties on the chart do that.”
Here are Andy’s top 5 things to think about when putting together your data visuals:
1) What is the intent of the project? Is it to inform, persuade, change behaviour, enlighten or entertain?
2) When exploring data, get a sense of physicality. Get a sense of range and variable types, as this is linked to the architecture of the data.
3) What is the story? What is the narrative and what questions do we want our audience to ask and answer when consuming the graphic? That is a journalistic sensibility, as there are endless ways in which the data can be sliced and diced as science has big data sets and a variety of variables, you need to be able to find the focus.
4) Chart types: “The chart types are the way of delivering the story. The correct deployment of a chart type will deliver the stories, the questions we’ve already identified.”
5) Presentation layers: key design choices.
So you want to be a data scientist?