how we position and what we compare

When visualizing data, one piece of advice I often give is to consider what you want your audience to be able to compare, and align those things to a common baseline and put them as close together as possible. This makes the comparison easy. If we step back and consider this more generally, the way we organize our data has implications on what our audience can more (or less) easily do with the data and what they are able to easily (or not so easily) compare.

I was working with a client recently when this came into play. The task was to visualize funnel data for a number of cohorts. For each cohort, there were a number of funnel stages, or “gates,” where accounts could fall out: targeted, engaged, pitched, and adopted. Each of these stage represents some portion of those accounts that made it through the previous stage. In this case, the client wanted to compare all of this across a handful of cohorts and regions. Here is an anonymized version of the original graph:

 
Cohort Analysis 1.png
 

There are some things I like about this visual. Everything is titled and labeled. So, while it takes a bit of time to orient and figure out what I’m looking at, the words are all there so that I can eventually figure this out, helping to make the data accessible. But when I step back and think about what I can easily do with the current arrangement of the data, there are a number of limitations. Let’s consider the relative levels of work it takes to make various comparisons within this set of graphs.

The easiest comparison for me to make is looking at a given region within a given cohort and focusing on the relative stages of the funnel. For example, if we start at the top left, I can easily compare for the Q1 Cohort in North America the purple vs. blue vs. orange vs. green bar. This is because they are both (1) aligned to a common baseline and (2) close in proximity (directly next to each other).

The next most straightforward comparison I can make is for a given stage in the funnel, I can compare across the various regions for a given cohort. So again, starting at the top left, I can compare within the Q1 Cohort the first purple bar (Targeted in North America) scanning right to the next purple bar (Targeted in EMEA), and so on. They are still aligned to a common baseline, but in this case they aren’t right next to each other (I’m inclined to take my index finger and trace along to help with this comparison). This is a little harder than the first comparison described above, but still possible.

The next comparison I can make—and this one is quite a bit more difficult—is a step in the funnel for a given region across cohorts. Again, starting at the top left, I can take that initial purple bar (Targeted in North America) and now scan downwards to compare to that same point for the Q2 cohort and the Q3 cohort. This is harder, because these bars are not aligned to a common baseline and they are also not next to each other. I can see that the bottom leftmost purple bar is bigger than the ones above it. But if I need to have a sense of how much bigger, that’s hard for me to wrap my head around. The numbers are there via the y-axis to make it possible, but it means I'm having to remember numbers and perhaps do a bit of math as I scan across the bars, which is simply more work.

And if we step back and think about it… comparisons across cohorts… this is actually potentially one of the most important comparisons that we’d like to be able to make! Visualizing and arranging our data differently could make this easier.

Perhaps it’s just me (and this really could be the case), but when I think of cohort analysis, it actually reminds me of my days in banking (a former life) and decay curves, and when I think of “curves,” it makes me think of lines, which makes me want to draw some lines over these bars… Actually, let’s try that. Here’s what it looks like if I draw lines over the bars in the first graph (Q1 cohort):

 
Cohort Analysis 2_short.png
 

While I’m at it, I might as well draw lines across the other graphs, too:

 
Cohort Analysis 3.png
 

And now that we have the lines, we don’t need the bars…

 
Cohort Analysis 4.png
 

The bars would have likely been too much to put into a single graph. But now that I’ve replaced what was previously four bars with a single line—thus remaking my original 16 bars in each graph into 4 lines, or if we multiply that across the three graphs, I’ve turned 48 bars into 12 lines—those, I can potentially all put into a single graph. It would look like this:

 
Cohort Analysis 5.png
 

While it’s nice to have everything in a single graph, those lines on their own don’t make much sense. Next, I’ll add the requisite details: axis labels and titles so we know what we’re looking at.

 
Cohort Analysis 6.png
 

Note that I didn’t have space to write out “Targeted,” “Engaged,” “Pitched,” and “Adopted” for every single data point. Instead, I chose to use just the first letter of each of these along the x-axis, and then I have a legend of sorts below the region that lists out what each of these letters means. This may not be a perfect solution, but every decision when we visualize data involves tradeoffs, and I’ve decided I’m ok with the tradeoffs here.

You’ll perhaps notice here that I haven’t labeled the various cohorts yet. With this view, I could focus on one at a time (calling out either via text or my spoken narrative if talking through this live to make it clear what we are focusing on). For example, maybe first I want to set the stage and focus on the Q1 cohort and how it looked across the various funnel stages and regions:

 
Cohort Analysis 7.png
 

I could then do the same for the Q2 cohort (lower across everywhere: Is this expected? What drove this? My voiceover could lend commentary to raise or answer these questions):

 
Cohort Analysis 8.png
 

Then finally, I could do the same for the Q3 cohort (ah, now our metrics have recovered from their lows in the Q2 cohort and are now even higher than Q1, did we do something specific to achieve this? Looks like we targeted a higher proportion of the overall cohort, and it’s interesting to see how that impacted the downstream funnel stages):

 
Cohort Analysis 9.png
 

Note with this view, I could also focus on a given region at a time. For example, it might be interesting to note that these metrics are lower across all cohorts in North America compared to the other regions:

 
Cohort Analysis 10.png
 

Or the spread in APAC across cohorts might be noteworthy, as it’s the largest variance across cohorts compared to the other regions:

 
Cohort Analysis 11.png
 

This piece-by-piece emphasis could work well in a live presentation. But in the case where this is for a report or presentation that will be sent out where we’d likely have a single version of the graph (vs. the multiple iterations that can work well in a live setting so you can focus your audience on what you’re talking about as you discuss the various details), I’d venture to guess that the most recent cohort (Q3) is perhaps the most relevant, so let’s bring our focus back to that:

 
Cohort Analysis 12.png
 

Within the Q3 cohort, we may consider emphasizing one or a couple of data points. Data markers and labels are one way to draw attention and signal importance. If I put them everywhere, we’ll quickly end up with a cluttered mess. But if I’m strategic about which I show, I can help guide my audience towards specific comparisons within the data. For example, if the ultimate success metric is what proportion of accounts have adopted whatever it is we’re tracking (I’ve anonymized that detail away here), I might emphasize just those data points for the most recent cohort:

 
Cohort Analysis 13.png
 

Given the spatial separation between regions, I don’t necessarily have to introduce color here. But if I want to include some text to lend additional context about what’s going on in each region and what’s driving it, I could introduce color into the graph and then use that same color schematic for my annotations, tying those together visually:

 
Cohort Analysis 14.png
 

Let’s take a quick look at the before-and-after:

Cohort Analysis 15.png

Any time you create a visual, take a step back and think about what you want to allow your audience to do with the data. What should they be able to most easily compare? The design choices you make—how you visualize and arrange the data—can make those comparisons easy or difficult. Aim to make it easy.

The Excel file with the above visuals can be downloaded here. I should perhaps mention a hack I used to achieve this overall layout: each cohort is a single line graph in Excel, where I’ve formatted it so there is no connecting line between the Adopted point for one region and the Targeted point in the following region. (It may be brute force, but it works!)

introducing the SWD podcast

 
ColePodcast.jpg
 

I'm very excited to officially launch the storytelling with data podcast! This first episode focuses on feedback in data visualization. I discuss the value of both giving and receiving data visualization feedback and potential problem areas to avoid. Hear The Economist's response to the recent hurricane data visualization challenge as well as answers to reader questions on the topics of when to use graphs, considerations with dashboards, and data viz 101 book recommendations.

Big thanks to Timo Elliston, friend and awesome NYC composer/musician, for the amazing original music, and to hubby Randy for encouraging all of this in the first place, equipping our office with recording gear, and for always being my biggest supporter.

I hope your enjoyment of the session is as great as the fun we had making it happen. If you like what you hear, please be sure to rate the SWD podcast on your favorite podcast platform!

Links mentioned during the podcast:

 

Feedback? email feedback@storytellingwithdata.com
Blog post: SWD makeover challenge on The Economist’s hurricane graph
Article: “Design & Redesign in Data Visualization” by Fernanda Viegas & Martin Wattenberg
Blog post: my guiding principles
Article: The subtle art that differentiates good designers from great designers by UX Planet
Blog post: a tale about opportunity
Book: The Big Book of Dashboards by Steve Wexler, Jeff Shaffer & Andy Cotgreave
Book: The WSJ Guide to Information Graphics by Dona Wong
Book: Show Me the Numbers by Stephen Few
Book: The Visual Display of Quantitative Information by Edward Tufte
Questions? email askcole@storytellingwithdata.com

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.

making a case for stacked bars

When I review common types of visuals used in a business setting during my workshops, one of the graphs we discuss is the stacked bar chart. I typically say something like the following:

“While we’re on the topic of bars, let’s talk about another common bar chart: the stacked bar graph. Stacked bars work well when you want to compare totals across different categories, and then within a given category, you want some understanding of the subcomponent pieces. Notice though, that they work less well if you want to compare those subcomponent pieces across categories. This is because as soon as you get past the first series, you no longer have a consistent baseline to use to compare. This is a harder comparison for our eyes to make, so something to keep in mind when reaching for stacked bars.”

I’ve noticed lately, though, that when a client shares a stacked bar as a makeover candidate, it usually gets remade into something else. Which has me pondering… Is there really a good use case for the (non-100%) stacked bar?

The most common scenario in which I see stacked bars used is to show total over time and also the change in composition. For example, I was just looking at a workshop example like this that depicted revenue over time: the overall height of the bars represented total revenue and then each bar was subdivided into source of revenue, in an attempt to show how the source of revenue was shifting over time as overall revenue changed. Here’s a quick sketch of what it looked like:

Stacked bar sketch - before.png
 

To me, this was a clear case where, by attempting to answer too many questions with a single visual, we don’t answer any one of them as well as we could if we broke it up into multiple visuals. In this case, my recommended view had a line graph showing total revenue over time, and a 100% stacked bar chart to show the relative shift in composition (revenue source). This is roughly what the visuals looked like (the real version was paired with additional explanatory text and emphasis that I've omitted for simplicity):

Stacked bar sketch - after.png

There is a challenge that frequently arises with the stacked bar: if anything interesting is happening further up the stack, it becomes challenging to see it because it’s stacked on top of other things that are also changing. This means potentially important components of the data or what we can learn from it can get lost or missed. In the above scenario, for example, there was an interesting shift happening in source of revenue over time that was hard to see in the original graph.

I’ve been racking my brain for good examples of stacked bars to figure out whether I should change how I discuss their use. I’ll ask for your help on this front momentarily.

I do have a couple of examples that are top of mind. There is a horizontal stacked bar that I highlight in Chapter 6 of my book as a model visualization:

In the above example, the most important thing is the length of the overall bars. It’s interesting to know what the subcomponent pieces are as a proportion of the given bar, but there isn’t a strong need to be able to compare those subcomponent pieces across bars. I think this works. Though I should mention that I’ve also received feedback (a one-off comment, so no idea how representative it is) that this particular visual is confusing.

As I write this, it occurs to me that I did also use a stacked bar in my prior blog post. This was a case where I wanted the audience to focus on the stacked piece (there were only two data series stacked on each other in this example), but the big picture opportunity the stacked portion illustrated across the various bars was more important than specific comparisons between the bars.

With that prelude, I'll turn the conversation over to you—have you seen examples of stacked bars that are effective? These can be vertical or horizontal (I think it's coincidence that the two I highlight above that work are horizontal and the one that didn't as well is vertical, but perhaps that's not the case?). Also, I limit my question here to the non-100% versions, as I do think there are more common use cases for 100% stacked bars, since you get additional flexibility with multiple baselines to align by and compare across (top and bottom-most data series in vertical 100% stacked, or left and right-most in horizontal 100% stacked). But I’m struggling to come up with many great use cases for simple stacked bars.

Please share your thoughts and examples by emailing them to stackedbars@storytellingwithdata.com by Wednesday, 11/22. Is there an example from your work where you’ve used a stacked bar effectively that you can share? (Please don’t share anything confidential—anonymize as needed.) Have you seen good examples in the media or elsewhere? Are there use cases you can imagine where a stacked bar would work well? What considerations should we keep in mind when using stacked bars? It will work best if you can share a visual, even if it's a simple sketch like I’ve included above.

I will pull together what I receive into a follow-up post. Stay tuned on that front and in the meantime I look forward to hearing from you RE: stacked bars!

a tale about opportunity

One statement that I make often and emphasize repeatedly in my work is that when it comes to explanatory analysis, we should never simply show data; rather, we should make data a pivotal point in an overarching narrative or story. Today, I’ll take you through an example that illustrates this transition from showing data to using data to answer a question in a way that leads to new insight.

Let’s assume you work for the pharmaceutical company, Gleam. At Gleam, you focus on Product X (common abbreviation: PX), a medication for Aglebazoba (this is a real example, but I’ve anonymized it and had some fun with the names to preserve confidentiality—these names sound like a foreign language because that’s how pharmaceutical naming sounds to me!). You’ve been tasked with providing an update on Product X’s penetration in the marketplace.

After considering this for a bit and discussing with some colleagues, you decide there are two important things to consider. First, the disease doesn’t affect everyone equally. Rather, diagnoses tend to be classified by severity into Mild, Moderate, and Severe. So you decide that categorizing the data in this way will make sense. Second, when thinking about how to measure penetration, you decide that the population of those diagnosed with the disease is the most straightforward way to quantify the potential market currently. Given these considerations and the data you have on hand, you create the following visual.

Opportunity1.png
 

This graph looks pretty good. The design is clean, everything is titled and labeled. Severity increases as we move up the graph, which makes sense. N counts were included to tell me how many people each bar represents. Color has been used sparingly to focus the audience's attention, with words at the top to tell them why they should focus there. Let's consider the takeaway highlighted here: a greater proportion of Moderate patients are taking PX compared to the total diagnosed with Moderate severity Aglebazoba. That's interesting. But does it answer the question we set out to?

In the above, we're graphing the % of total across two categories: (1) total patients diagnosed and (2) total patients taking Product X. But what if rather than severity as a % of total, we make severity the primary category and within that look at those taking the drug out of total diagnosed? I'll do this in the following step, and will also switch from graphing percents to graphing the absolute numbers (we'll incorporate the percents back in momentarily). 

Opportunity2.png
 

In the above view, the overall length of the bars represents the total number of patients diagnosed with Aglebazoba. The blue portion represents those taking Product X. If percents are important, we could add labels on the blue bars. I'll do that in the next view. Note now that this isn't % of total taking Product X, but rather the % taking Product X out of the total diagnosed with the given level of severity.

Opportunity3.png
 

So 35% of those diagnosed as Severe are taking PX, 61% of those with Moderate severity are taking PX, and 23% of those with Mild severity take the drug. Note that we can see the same thing here that was highlighted in the original graph: a higher proportion of those with Moderate severity are taking Product X compared to the other severity levels. But with this view, I can also see something new: opportunity. The blue portions of the bar represent those currently taking PX. Which means the grey portions of the bar represent those who aren't currently taking Product X... but potentially could be. Let's show this as empty space to be filled in:

Opportunity4.png
 

Now I can see the opportunity. But let's emphasize that even more, via darker, thicker lines:

Opportunity5.png
 

When I look at the above, the labels in the blue portion of the bars seem to be competing for attention with the opportunity in white. That's an easy fix: let's label the white portion instead.

Opportunity6.png
 

I recognize I may be bothering some people when I graph absolute numbers and label with percents. If you fall within that camp, we could address by taking the percents out of the graph...

Opportunity7.png
 

...but then tie the percents back in when we put all of the words around the visual to help make sure it makes sense to my audience and that they focus on the takeaway that I want them to. I see this as a tale about opportunity. Let's use words to make that point clear to my audience:

Opportunity8.png
 

After you've created a graph in response to a question, consider that question again. Too often, I find that we stick with the first way we aggregate the data and first view of it that we land on. It's easy to provide data that is relevant to a question without actually answering the question. If we step back and think about what sort of tale we can use the data to tell—is it a success, a failure, a call to action, or, as we've seen here, a tale about opportunity—it may reveal new ways to aggregate or visualize the data that will help you help your audience understand something new.

If interested, you can download the Excel file with the above visuals.

10/31 update: A couple people have commented that the tendency is to want to tie the blue percents in the text to the blue portions of the bars in the final iteration above, which is confusing. This is a great point (that's the Gestalt Principle of similarity, by the way, that makes us want to connect similar elements, like things that are colored the same). I've made an update to outline the opportunity in black and use black for those percents instead, as a way to visually make a distinction between the blue (people taking PX) and black (opportunity: those who aren't but could be taking PX) and tying the black portion visually to the percents in text through similar use of color. See below for the updated version. I think this resolves that prior confusion—let me know what you think!

Related thought: this is a great example of why it can be useful to seek input from others on our visual designs. When we get familiar with our data, we know intuitively how we want others to look at it, but this isn't necessarily how they will. Soliciting a fresh perspective is a great way to see our data through someone else's eyes and learn from this how to potentially further improve or refine our approach. Thanks for the feedback!

Opportunity9.png
 

learning through questions

 
Ask Cole FINAL.png
 

If you are a parent or spend time with young children I’m sure you can relate when I say, "Wow, kids ask a ton of questions, like, a TON of questions!" The remarkable thing is that they do so, all the time, everywhere, throughout the day and for some reason especially at bedtime (though I’m starting to become wise to their crafty delay tactics). Between my two boys, they often take a tag team approach—one asks the initial question, then the other chimes in with a follow-on query. Take for example, a recent dialogue during lunch:

AVERY: Why did the squirrels eat all the apples on our tree?
ME: Well, squirrels have to eat, just like you. They find their food outside, in places like our apple tree.
AVERY: But why doesn't the squirrel's mommy make them peanut butter and jelly sandwiches so we can have our apples?
DORIAN: I like peanut butter and jelly sandwiches. Do you like them too, Mommy?
ME: Yes, Dorian. Avery—squirrels can't really make sandwiches, that's why they look for nuts and fruit. Since our apple tree is there, they found the apples. Maybe a mommy squirrel was finding food for her baby squirrel.
DORIAN: Where do baby squirrels come from?
ME: Where's Daddy, boys?

If I step back, I can see that the seemingly never ending series of “Why? But why? How come?” is actually a very important part of kids' learning, development and retention. One can practically hear the gears turning in their heads as they process things from multiple angles.

Shifting to my work with storytelling with data—I notice that you have a lot of questions as well. Your queries come to me through many different channels—during workshops, after speaking engagements, via email, TwitterLinkedIn, Facebook & Instagram, in comments on my blog and YouTube channel. I enjoy engaging on these questions because I know this helps with the learning process and ultimately helps you be more effective and confident telling your stories with data. I also know that if someone takes the time to ask a question, there’s a good chance someone else was pondering the same thing.

My limited bandwidth makes it challenging to answer every single inquiry (and I'm sure I've missed some over time), so I’m excited to launch a new forum for answering a number of your questions each month. I'll be doing so through a novel medium for me—a podcast. I love podcasts because you can listen (and learn!) almost anywhere—on your morning run, during your daily commute, or while lounging at home. The SWD team will scour the various channels I mentioned for posted inquiries, but you can jump ahead of those lines by simply emailing your question to us at: askcole@storytellingwithdata.com. We’re recording our first ask cole podcast now to be aired soon, so submit the questions that are top of mind and will help you learn and make progress with your work.

...and for now, if we could just hold off on the squirrel chatter, that’d be great!

Looking forward to hearing from you!

/
CLICK HERE TO JOIN OUR MAILING LIST

SEARCH STORYTELLING WITH DATA: © 2010-2017 Cole Nussbaumer Knaflic. All rights reserved. STORYTELLING WITH DATA and the STORYTELLING WITH DATA logo are trademarks of Cole Nussbaumer Knaflic.