Anything that’s backed up by numbers and charts is obviously telling the truth, right? Why would scientists, politicians and companies want to purposefully deceive the audience by messing with statistical data in order to make themselves look good?
We just answered our own question.
In an age where information is demanded and consumed at an unprecedented rate, statistics have skyrocketed in popularity, and so too has data visualizations that accompany these studies in order to make all of the data easily digestible for the audience. That being said, some visualizers take advantage of how frequently their audience accepts graphs without question and take the extra step to mislead them with unethical graphical practices. This can cause the audience to misinterpret the visualization or lose any understanding of the data, which can hurt both your message and your reputation (Hubspot).
There is, however, a lot more to statistics than wrongfully misleading its viewers. Statistics (and their accompanying visualizations) have the power to evoke certain emotions from their audience based on colors, context, and appearance. There are also discussions surrounding statistics and possible subjectivity behind their experiments and visualizations that may prevent true objectivity from ever occurring.
The following charts and images will visualize the many ways data can be misleading, how data can provoke emotions, and why it may not be possible to ever achieve true objectivity in statistical experiments.
Misleading Graphs and how to spot them
It is really sad to see that over time, statisticians and data visualizers have discovered numerous ways to take their data and skew them in order to mislead their audience into thinking something that is not entirely accurate. Because visualizations and statistics have become so prominent in recent years, it has become just as important to learn all of the various types of visualizations and the many ways they can be edited to provide inaccurate trends and conclusions.
This is an image of 15 days in the early stages of the coronavirus lockdown, graphed to show the rise in new cases. It is presented to look like that there is a constant and possibly controllable rise in the amount of new cases every day. If someone were to look at this at surface value, they may not think that the virus is that much of a threat and that the US is doing a decent job of preventing an exponential growth in the positive cases.
This, however, is a really good example of how messing with the scale on the y-axis can throw off your viewer. If you look on the left side of the graph above, you can see that while all of the y-values are even spaced out, the difference between each interval can vary from 50 (such as 300-350 above) to 10 (such as 90-100 above). This messes with our brains because we assume that the different in height is proportional to the values (Lea Gaslowitz). This type of manipulation is most common on bar graphs and line charts, where the data is taken categorically or over time, respectively. Luckily, this manipulation is the easiest to point out and the most common misleading tactic, so learning to look at the y-axis can help you become more knowledgeable in reading what the data is really trying to tell you.
One other common way that data visualizers can manipulate their audience is by providing data sets that look like they cause each other, but in reality are only correlating. The difference between the two is that correlation involves the similarity between two trends over a period of time, whereas causation is the direct connection between those tow events, clearly implying that one causes the other to happen, something that correlating data do not guarantee.
There are so many different types of data sets, where if you look hard enough, you can find two of the most distant pieces of data with zero connection to each other, yet their graphs may be correlated in their shape, leading unsuspecting viewers to falsely interpret that they cause each other. This type of manipulation can be achieved rather easily, as one out of twenty variables will inevitably be deemed significant without any direct correlation (Lebied). The graph above shows the rate of the amount of merchandise eBay grosses over time and the total money people are spending online shopping on Black Friday. While the lines may be very close in shape, one cannot simply assue that one variable causes the other and vice versa. Instead, it’s best to think of a possible hidden third variable that can truly cause both of these trends, such as the increase in internet users in this case.
Speaking of the rise of total internet users, nearly everybody has been forced to stay at home and work from the internet since the start of the coronavirus lockdown. This means that people are more anxious for information and want to know how impactful this virus will be. Thankfully, data visualizers have been hard at work trying to get us the most accurate and up-to-date information as possible about the virus, and this visualization comes in the form of…a pie chart. While pie charts are one of the most common types of charts, they are very controversial because viewers have a harder time interpreting relative angles and sizes consistently with pie charts than they would with a standard bar or line chart (Hubspot). This example, however, is arguably a very effective use of pie charts, as there are only two variables (fatal and non-fatal cases) that make it very easy to tell which variable covers more data.
The problem with this visualization is the contextual information behind it. The data above is supposed to reflect the relative severity of COVID-19 as opposed to other virus outbreaks in the past. The designers seriously downplay COVID-19’s impact as they place its pie chart on the far left, next to the seasonal flu chart, which does not have a hint of red on its chart. Despite all four of these charts being seemingly accurate, the visual proximity of placing the COVID-19 chart on the far left throws viewers off by making them think that COVID-19 is not as severe as SARS or MERS and is instead closer to the standard flu in terms of impact (Doan), which could not be further from the truth when looking back. This visualization actually uses the Gestalt principle of proximity in order to mislead viewers because the designers wanted to show that COVID-19 was not that serious by placing it closest to the flu chart, thus implying that the two smallest charts form a group (Bonner) and have minimal impact as opposed to the other two outbreaks.
While grouped data can be misleading based on how the data is organized, other visualizations can be much worse if they choose to show only a small portion of the data set while leaving out the rest of the picture. Imagine looking at this graph and only seeing what is inside the red circle. You would probably believe that the Earth is getting cooler and that we are getting more ice on the planet in 2009 as opposed to 1989. Well, as you can see, that is merely a pipe dream as the only time where there is more ice in 2009 as opposed to 1989 is around March and April. This type of data is called “cherry picking,” or only choosing a small selection of data to represent an entire data set. This method makes it way too easy to change the data’s narrative, and should therefore not be trusted if the timeline on the x-axis looks too short (Koehrsen).
Due to the extreme disparity between the cherry-picked data above and its overall big picture, it is also fair to say that this graph is an example of Simpson’s paradox, where looking at the data as whole tells a completely different story than if you looked at a small group of data (Star). The story is whether or not Earth is losing ice, and the paradox reveals that the cherry-picked data shows the complete oppostie (and therefore wholly untrue) story than looking at all the data. This type of data is a little bit trickier to spot because one would need to know the context behind the data sets in order to call the designers out, but if they can find a different interpretation of the same data or experiment, that should be the key in determining whether or not the original graph was messed with in order to tell a different story.
Before you can focus on attempting to tell a different story with your data, you must first at least be able to tell a story with your data. This is a visualization showing Popchips as the superior snack to standard fried or kettle chips. The image features a very clever graph, where the chips are used as bars to make a bar graph. The problem is, that’s it. There is nothing else to this chart. There is not a single label and it is impossible to tell what the chips are scaled off of. These visualizations normally have a narrative impact that “stem from visual comparisons using simple, abstract representations of data” (Ma, et al). In this graph, however, there is no data to represent. There is no way to know whether or not the large stack of Popchips represents that they are half the fat of the other chips, contain half as much sugar, half as many calroies, or more chips per serving.
Considering that this is an advertisement, Popchips are employing a method Curtis Newbold calls “Statistics Appeal,” or the ability to persuade people to buy a product based on numbers and statistics. By stacking their chips up higher, they are portraying to their audience that they can eat more Popchips without having to feel guilty about it. Without any labels or statistical data to accompany the chips, it is tough to take Popchips’s claim seriously.
The last tactic used to mislead viewers that will be discussed here is the misuse of sampling in order to create biased results. Anytime you see a data visualization, it should be in your best interest to find out what methods were used when sampling and who in the population was sampled for the study. In these situations, it is worth asking who is being included in the sample and who is being excluded (Koehrsen). The image above seems like a facutally correct chart, displaying properly spaced and shaped bars that showcase which countries would not be able to live without internet. This chart is also using effective graphical properties such as size, orientation and color in order to make the graphical elements more valuable to the user of the representation (IDF).
Despite how effective this visualization may appear, one just needs to read the very bottom of the graph, which says “online poll of adults.” Normally, this is an okay way to sample people, but if you look at the top of the chart and realize what this data is really supposed to represent, then things start to look suspicious. Why sample people online if you are looking to figure out whether or not people can imagine life without the internet? This eliminates any possibility of targeting demographics that were either old enough to remember life without the internet or those who are currently living today without any internet.
There are plenty of statistical studies and visualizations out there, and alongside that are numerous ways in which statistical information or their visualizations can be misleading towards their audience. If you are passionate to learn all about the data and implications within a statistical experiment, then it is best you look at multiple studies or all kinds, as “no single study or approach, no matter how rigorously applied, can complete the picture” (Kurlansky). Understanding the common ways statistical data can mislead its viewers is one of the best skills to develop, as it can prevent you from falling for these types of visualizations and can allow you to understand that there is a lot more to the data than what is shown on the graph.
Visualizations and Evoking Emotion
Oftentimes visualizations, whether or not they are accurately portrayed, can be so strong and effective that they actually evoke certain emotions from their viewers. Through the use of colors, powerful messages and effective visualizations, some images can do a lot more to an audience besides sharing some numbers.
Not only does this graph provide a really effective background image that cleverly intertwines with the graph and immediately makes us understand what the data is about without having to read a single word, but it can also evoke a huge emotional response, especially if interpreted incorrectly. This is a graph that showcases the most common injuries children have when they need to be hospitalized. The last part is the most important because that changes the narrative completely. By eliminating children without injuries and children who get injured but do not need to be hospitalized, the graph is representing a much smaller population than one may realize. The bold title at the top of the visualization, “common injuries children suffer” does not help, and it can evoke the very understandable emotion of fear amongst parents who may see this chart and believe that five percent of all children suffer spinal injuries (Stephanie). Fear may have been the emotion the designers were intending anyway, but they may not have expected for it to be evoked to the extent that their audience may have misinterpreted it.
This is another slightly vague graph that can evoke a lot of emotion from those who view this. This is an attempt to infuriate those who care deeply about education by showing them that putting a week’s worth of global military spending towards global education for children can grant every child 12 years of great schooling. Sometimes, when participants observe data on a visualization, emotions are evoked, which “play a significant role in shaping whether participants felt like looking, how they engaged and the information that they took from the visualizations ” (Kennedy). On the right hand side, there are 365 tanks, with eight of them colored white, signifying how small the eight days is compared to an entire year. The black and white color schemes dominate the visualization, as anything white refers to the education benefits and how small a diversion we need to make, whereas anything black refers to the evil or powerful military (Cao), while also highlighting the magnitude of the possible education benefits.
This type of image may cause people to think about themselves, what they eat, and whether or not they are living their healthiest life. The top ten fattest countries in the world are charted, and the U.S. is ranked 9th with 74.1% of its population being considered overweight. Not only does the visualization target the top ten countries, but the designers chose to target the entire world by showing a graphic of an obese man on the right side with some text that shows how much of the world’s population is obese. The designers may have tried to evoke guilt with this graphic and included an image of a vegetable bowl on the top left of the visualization in order to convince the viewers to think about their health and their diet a bit more seriously.
If designers know their audience when making visualizations, they can employ “user-centered design” into them, where their work will “focus on the users from a cognitive, affective, and behavioral point of view” (Estrada, 143). The viewers can then experience the emotional responses that they were supposed to elicit due to the designers working to target thir visualizations for that particular audience. This allows for statistical data to be analyzed and interpreted in a much stronger and personla manner as opposed to listing a bunch of numbers for the viewers.
Is true objectivity impossible to attain?
A lot of the critiques data visualizations receive include the designers providing inaccurate data or the experimenter including bias in the study. All of those notions lead the viewers to believe that those misleading visualizations are subjective and should not be trusted.
What if I told you, however, that it is completely impossible for a statistical test to be truly objective, and that even the most impartial of studies have the slightest degree of subjectivity? It’s certainly worth thinking about as you view some visualizations.
This is a perfectly accurate bar chart that lists the enjoyment values people get out or certain candies. This is a survey result that takes data from individuals and maps them on a chart in order for people to interpret. The candy are not ranked based on average enjoyment level in this chart, either. The subjectivity comes in from the surveyor’s choice to include and exclude certain types of candy. For example, any type of chocolate is nowhere to be seen on the chart. Colin Blyth calls subjective methods in statistics the “construction (C) process,” as its aim is to build “models that have not been thought of before, by searching through possible models that might be tried, and questions that may be asked” (20). Blyth argues that it is not possible for scientists and experimenters to come up with original and unique data sets through objective means alone, as it takes a little bit of subjectivity to think of and decide what study you want to conduct and what narrative you want to tell.
Here is yet another chart that is perfectly reasonable, acceptable, and even effective. The bars for each of the top 15 most stressful life events are evenly spaced, sized correctly, and the gradient from orange to red represents an increasing level of danger (Cao) as the life events become more stressful. If you step back to think about what decisions the experimenter made, it becomes clear that there is a lot more freedom in the creation of this experiment than you may think. The experimenter got to choose what categories to include, who to sample, how to sample them, and how to present the data. The goal of an experiment is to appear objective, as true objectivity may never be attainable. This is why they “hide the researcher’s degree of freedom from the public, unless they can be made of appear objective” (Gelman, 3). If researchers do not post their freedoms while designing the experiment, no one would have any reason to suspect that they were completely impartial towards the experiment, so long as the visualization that accompanies the data remains accurate and impartial as well.
Statistics is a wonderful world, and being able to have immediate access to unprecedented volumes of experiments, studies, and data Visualizations is one of the greatest things to come out of the internet age. While it is unfortunate to know that every set of data we’ve ever looked at has had some sort of subjectivity behind the scenes, it does not take away from the importance of statistics in bringing knowledge to our lives. As you view more and more visualizations, remember that not every chart or table is going to tell the complete truth, and if you are feeling a certain emotion from a visualization, then its designers may have intended for that to happen. If there is one thing that is worth learning and should not be taken for granted, it’s statistical models and their accompanying data visualizations. The world would be a very sad and less info-driven place without them.
Blyth, Colin R. “Subjective vs. Objective Methods in Statistics.” The American Statistician, vol. 26, no. 3, 1972, pp. 20–22. JSTOR, www.jstor.org/stable/2682860.
Bonner, Carolann. “Using Gestalt Principles for Natural Interactions.” Thoughtbot, 23 Mar, 2019. https://thoughtbot.com/blog/gestalt-principles (Module 2)
Cao, Jerry. “Web design color theory: how to create the right emotions with color in web design.” TheNextWeb, 7 Apr, 2015. https://thenextweb.com/dd/2015/04/07/how-to-create-the-right-emotions-with-color-in-web-design/ (Module 2)
“Data Visualization 101: How to Design Charts and Graphs.” Hubspot, Visage. https://cdn2.hubspot.net/hub/53/file-863940581-pdf/Data_Visualization_101_How_to_Design_Charts_and_Graphs.pdf (Module 3)
Doan, Sara. “Misrepresenting COVID-19: Lying With Charts During the Second Golden Age of Data Design.” Journal of Business and Technical Communication, Sept. 2020.
Estrada, Fabiola Cristina Rodriguez and Lloyd Spencer Davis. Improving Visual Communication of Science Through the Incorporation of Graphic Design Theories and Practices into Science Communication. SAGE Publications, 2014. (Module 6)
Gaslowitz, Lea. “How to spot a misleading graph.” YouTube, uploaded by TED-Ed, 6 Jul, 2017. https://www.youtube.com/watch?v=E91bGT9BjYk
Gelman, Andrew and Christian Hennig. “Beyond subjective and objective in statistics”. Royal Statistical Society, 2017, http://www.stat.columbia.edu/~gelman/research/published/gelman_hennig_full_discussion.pdf
Kennedy, Helen, and Rosemary Lucy Hill. “The Feeling of Numbers: Emotions in Everyday Engagements with Data and Their Visualisation.” Sociology, vol. 52, no. 4, Aug. 2018, pp. 830–848.
Koehrsen, Will. “Lessons on How to Lie With Statistics.” Medium, 28 Jul, 2019. https://towardsdatascience.com/lessons-from-how-to-lie-with-statistics-57060c0d2f19
Kurlansky, Paul. “Lies, damned lies, and statistics.” The Journal of Thoracic and Cardiovascular Surgery, vol. 150, no. 1, Jul. 2015, pp. 20-21.
Lebied, Mona. “Misleading Statistics Examples – Discover the Potential for Misuse of Statistics and Data in the Digital Age.” DataPine, 8 Aug, 2018. https://www.datapine.com/blog/misleading-statistics-and-data/ (Module 7)
Ma, Kwan-Liu, et al. Scientific Storytelling Using Visualization. IEEE Computer Graphics and Applications, 2 Jan, 2012. http://vis.cs.ucdavis.edu/papers/Scientific_Storytelling_CGA.pdf (Module 6)
Newbold, Curtis. “Statistics Appeal (Advertising).” The Visual Communication Guy. 6 Oct, 2017. https://thevisualcommunicationguy.com/2017/10/06/statistics-appeal-advertising/
Stephanie. “Misleading Graphs: Real Life Examples.” Statistics How-To. 24 Jan, 2014. https://www.statisticshowto.com/misleading-graphs/
“This is how easy it is to lie with statistics.” YouTube, uploaded by Zach Star, 4 Feb, 2019. https://www.youtube.com/watch?v=bVG2OQp6jEQ
“Visual Mapping – The Elements of Information Visualization.” Interaction Design Foundation. Jul,2020. https://www.interaction-design.org/literature/article/visual-mapping-the-elements-of-information-visualization (Module 3)