Making People Understand Your Data: A Data Visualization Tutorial
Data visualization is creating a virtual representation to make it easier for the reader to understand the data. Its aim is to help user make better data-driven decisions. Whether deliberately or not, you are or will already be doing Data Visualization as a part of a job or a school project. This simple fact makes a lot of people underestimate the importance of creating good visualizations.
Creating a graph or a chart is simple enough, but creating compelling one that is easy to understand and looks good is an art in and of itself. Find out how to create graphs the correct way to make your data clearer in this article.
The key points that you need to understand about Data Visualization are:
- Figure out what kind of information you want to present.
- Tailor your visualization according to the user you will be presenting to.
- Use effects such as color, lines, and emphasis as needed.
- Lastly, Less is More!
What do you want your data to say?
The first step in creating beautiful visualizations is determining what is it are you trying to present? If you decided to choose the charts and then the data, it will end up confusing the audience or even worse, mislead them.
So you have to fit the chart to the data, not the other way around. Here are some data types that are commonly used in industries and projects and what chart is best to represent them:
For trends, use line charts
Line charts are one of the most commonly used charts because it demonstrates an overall trend with little chance of misinterpretation. Specifically, they are good for depicting changes in values over a period of time. Quantitative data such as demand forecasts, products sold, and population growth are typical examples of using bar charts.
For showing showing proportions or comparing values, use bar or pie charts
Both bar and pie charts are great for showing the differences between categorical values. It can also be used as a side-by-side comparison spanning between different categories. Both shows how an individual category fares against other categories (for bar charts) or in total (for pie charts)
For bar charts, it’s important to use 0 as a guideline, or else your data will mislead users into thinking that a certain category is significantly better than the rest. You can see it in this example:
For cases like these where the value between several categories are not big enough, it’s better to use pie charts to show the differences between them such as in this example below. It’s important to not show more than 6 categories in a pie chart, as the number of categories shown increases, the difference between each becomes less significant.
For comparing proportions, use are charts
This chart is rarely used for typical use because it’s quite difficult to understand. Are charts show the overall volume as well as the proportion taken up by each category. It’s usually used to show proportions over a period of time, such as a revenue vs cost chart in the example below.
The chart above shows how much revenue overlaps cost. It shows that in certain times of the year, the cash flow is really tight as opposed to other years.
For showing distributions, use histograms
Another chart that is usually used in industries is a histogram, where this chart shows how often each value occurs in a dataset. Examples of these are population distribution or income distribution where each ‘bar’ shows the number of data inside a category. The difference between histograms and bar charts is in the category used. For histograms, the categories must be ordered (such as age groups) whereas bar charts doesn’t (such as brand name).
For showing relationships, use scatter plots
Scatter plots shows the relationship of data between two categories. It shows whether there are an correlations between the data or not. The important thing here is to ensure that the data from both category is numerical.
Who will be seeing your visualizations?
How you present your visualizations depends greatly on who your target audience is. What do they intend to do with this data? What cultural, domain, or industry needs require them to see the data? How you show your data should be different whether you’re presenting to executives, lead data engineer, the sales team, or a group of high school students.
How to style your data to make it more effective?
Visualizations often use styles to make it easier to understand. This can even be influenced by branding, as certain publications tend to use specific color palette to enhance their branding. For more info on this issue, you can go and look at each respective Brand manuals. An example of Brand Manual that people usually use as a guideline is the BBC Global Experience Language and the Cato Institute Data Visualization Guidelines.
The elements that you can tweak to make your data visualization complete are usually these:
- Shapes
- Color
- Typography
- Iconography
- Legends
It’s important that you don’t think much on how attractive your visualizations look, but rather focus on whether the elements will help you to achieve your goal when making the chart.
Styling shapes
The shapes of your figures can be adjusted according to the required levels of precision. Data that are used for comparison or functions that require a certain level or precision should use sharp, defined edges whereas data that is used to convey a general idea can use shapes with less detail.
Using colors
Color is used to differentiate your data in several ways. As mentioned above, certain brands tend to use specific colors in specific ways to show their branding on certain issues. In general, colors are usually used to differentiate categories, representing a specific quantity, highlighting an issue, and expressing a certain meaning.
The first specific type of use is using color for different data types. Categorical palettes are used to distinguish different chunks of data that do not have any specific orders, while a gradient is commonly used for showing data in a certain range.
Color can also be used to highlight a certain data. Use one primary color while using gray to indicate which data is the most important one that the audience should focus on.
Lastly, Less is More!
You should avoid using too many elements or flashy designs in your visualizations. Keep it as simple as possible and use the bare minimum to get your point across to the audience. If you want to read up more, James Cheshire wrote a great article on what not to do in Data Visualizations, which you can read up more on here.
If you are interested in reading more, Claus Wilke created a great and comprehensive guide on making visualizations in his book, ‘Fundamentals of Data Visualization’.
I hope this short and simple tutorial is enough to get you started on the right foot in Data Visualization and avoid the mistakes most people make in visualizing data. Stay curious!