Separating the l33t from the graph

Posted on October 18, 2011 by Cath@VWXYNot?

Were you ever explicitly taught which graphs to use to represent different types of scientific data?

I remember some very basic lessons on this subject in high school maths (and possibly biology), but once I reached university it was never again included in my formal scientific education. However, just as I received very little formal English grammar instruction but have always had a (generally) good feel for what’s right and what’s wrong just from reading anything and everything I could get my hands on*, I’ve managed to absorb some scientific graph conventions – seemingly by osmosis – from the literature, lab meetings, seminars, and poster sessions.

Unfortunately, I’ve seen enough recent examples of poor graph format choices to make me wonder whether universal formal training in this particular Dark Art is warranted…

…or whether certain people who are really far enough along in their careers to know better just don’t pay enough attention to seminars and papers.

Take the example below. The left panel recreates a graph I saw presented recently, but with fictional data; the right panel is the way I would have done it. The y axis could represent any phenotype of interest, so I left it blank:

Doesn’t the second version give you a much better sense of the relative effectiveness of the two test compounds? The original took me too long to decipher, and required what I thought was far too much on-screen text to label the various data points; this meant that I missed much of what the presenter was saying as I tried to figure out what the data were saying.

On a similar note I’ve also seen people present multiple Western blot panels (for four different conditions, sampled at the same time points) side-by-side, with separate (but identical) time point labels across the top of each one, instead of stacking them one under the other. Again, I would have found it much easier to compare the effects of the different variables if the latter approach had been used.

The second example (from a different person) is less clear-cut, I think, as the original version (on the left, again with fictional data) is just as informative as my version (on the right):

However, I think a bar chart is preferable in this case; the line chart is reminiscent of a dose-response curve, which this most certainly is not.

Were you taught which graphs to use, or did you just figure it out for yourself?

Does anyone know of any good resources to which I could direct any future offenders?

extrapolate to derive the coefficient of patheticness!

~~~~~~~~~~~~~~~~~~~~~~~~~~~

*any grammatical errors in this post are, of course, intentional.

About Cath@VWXYNot?

"one of the sillier science bloggers [...] I thought I should give a warning to the more staid members of the community." - Bob O'Hara, December 2010

View all posts by Cath@VWXYNot? →

This entry was posted in career, communication, education, English language, science. Bookmark the permalink.

25 Responses to Separating the l33t from the graph

Alyssa says:

October 19, 2011 at 12:38 am

You know, now that you’ve brought it to my attention, I have never been told what kind of graph to use for different data sets. I guess I have just figured it out by seeing what others do. Perhaps that’s why I’m always incredibly impressed when I see a new type of graphical representation.
Liz says:

October 19, 2011 at 3:02 am

ooh, poor graph format choices are a definite pet peeve of mine. Your ‘uncorrected’ example one would drive me nuts!

I do remember a course in 1st year engineering where we learned a variety of non-technical skills and graph format was one topic. I don’t remember many details but some were just strange (does anyone use the “box-and-whisker” graph in real life??)

Like Alyssa, I do appreciate a unique and informative graph format but I find, when people try to be novel, it turns out badly more often than not. In a recent journal club, one of the main figures in a high impact journal was a bar graph arranged such that the bars extended radially. I don’t know if this graph has a name or a purpose, but the circular shape didn’t seem to add any relevant info in this case and just led to us discussing the weirdness of the graph as opposed to the content of the figure – possibly this was the sneaky intent of the authors, who knows!
- chall says:
  
  October 19, 2011 at 4:06 pm
  
  I’ve used the “box-and-whisker” (boxplot) graphs for a lot of my mouse studies since it’s a good/easy way to see outliers and medians/averages etc btw various groups of mice/treatments/expression and more….
  
  That said, it’s not always ended up in the actual paper but for Ig-titers (dose/response/protection) comparisons between different groups and doses it made it since it might show things much clearer than other graphs.
  - Cath@VWXYNot? says:
    
    October 20, 2011 at 5:12 pm
    
    I’d never heard them called that (I immediately knew what you meant, but Googled to make sure!), but I still see a lot of people use them. Clinicians seem especially fond of them.
    - chall says:
      
      October 20, 2011 at 5:15 pm
      
      yes, they do show when a patient/(mouse ^^) is outside the “mean/median” so you can sort of tell the outliers are not “what you would expect. Although, you still show them, and don’t disregard them – which is important when you talk about “side effects” for example. At least that’s what they’ve told me when I started using them a lot…. 😉
ricardipus says:

October 19, 2011 at 3:31 am

The left panel of your second example is just plain dumb. Don’t people know you NEVER connect points that are from unrelated experiments? There’s no slope to draw, just two discrete points. Your bar chart is the correct approach.

Oh, I guess this stuff bugs me too. Almost as much as generic Excel colour schemes. 😉

You realize you’ve just raised the bar for the hockey pool results, right?
ricardipus says:

October 19, 2011 at 3:32 am

Also, where are the error bars? Just sayin’. 😉
rpg says:

October 19, 2011 at 9:18 am

Wow.

I guess I must have been taught at some point–but like you, for me it’s something that just feels right. Grammar too, innit?
KJHaxton says:

October 19, 2011 at 9:31 am

This is currently the curse of my waking hours – trying to get students to understand what an effective graph looks like versus the default format that excel picked. The best advice I’ve heard is to look at how data are represented in good journals related to your work and use those styles. Deviate from that only when absolutely necessary and justifiable. I’m not in favour of field specific conventions of presentation (restrictive and anal) but if people can identify spectral data or dose response data at a glance because of effective presentation then that’s the best way. In a few months, when the growling and gnashing of teeth have passed, I may post some images of graphs that upset me…
Klaas Wynne says:

October 19, 2011 at 10:24 am

By the way, it is really bad practice to connect data points with lines as you do in in your first graph on the right. I was always told (god knows who by) that you should only use a line if that line represents a fitting function.
- ricardipus says:
  
  October 19, 2011 at 1:36 pm
  
  +1. As drawn, it implies that something is known about the behaviour of the system at intermediate time points. Which might be a reasonable assumption for the untreated sample (looks like no effect to me), but is nevertheless unproven by the data.
  
  *goes off an a tangential rant about something or other*
  - Cath@VWXYNot? says:
    
    October 19, 2011 at 7:23 pm
    
    see, I’ve never been taught that (nor Chall’s point about not connecting points unless there are at least four of them), although it is obviously very sensible. Can I plead ignorance due to the examples I learned from being full of instances of connected data points? 🙂
    
    I maintain that the lines in my first example make it easier to do a quick-n-dirty comparison between the two test compounds 😉
chall says:

October 19, 2011 at 4:01 pm

ARGH/ I wrote the longest answer but the page just wiped it all clean (darned IE problems)

I’ll be brief and state my views in bullet points since I’m angry now
*it’s a pet peeve of mine
*i learned it in Statistics class (and Math) and later taught my undergraduate students some basic “n-value, triplicates, standard deviation/error depending on population genetics or different samples, graph options and most importantly NOT to draw a line between points that have a)no function made for them, but most importantly b)3 dots are not a line/curve. Need 4 to be able to make a ‘true’ function…..
*I searched and founf three good sites that summarize the “choose a graph”
First – Canadian governmental : http://www.statcan.gc.ca/edu/power-pouvoir/ch9/using-utilisation/5214829-eng.htm
Second: free summarizing on one page (partly similar to the handout help I gave my students, although I did mention st dev bars on samples to show at least triplicates)
http://go.hrw.com/resources/go_mt/hm3/so/c3ch9bso.pdf
Third an overview of which obvious choices to make: http://math.youngzones.org/stat_graph.html
And forth and last a quick comparison why sometimes (always!) is important to ask WHAT is it you want to show?! http://cstl.syr.edu/fipse/tabbar/compare/compare.htm

(As a side note, I would like to be able to say “No, you can’t use Excel for the statstict but rather a real statstical program where you can actually add Standard deviation on each category. Not to mention actually calculate things….” not that I hate excel, but it’s not a statstical program….
- Laurence Cox says:
  
  October 22, 2011 at 9:35 pm
  
  I pretty much agree with everything you wrote, but have a couple of points to add:
  
  1) All the examples use linear axes, but non-linear axes can be useful too. A log scale will convert an exponential curve into a straight line, whilst a log-log plot will do the same for a power law curve. You can even get probability axes that transform a Gaussian into a straight line, useful for showing small deviations from a Gaussian curve.
  
  2) The circular plot Liz referred to is usually called a radar plot (or sometimes a polar plot). It is really only useful if you want to plot something with an angular dependence, for example the visibility of an object as a function of its orientation. It came into common use for representing the response of a radar antenna to off-axis sources (hence the name). In this case the radial axes are often logarithmic.
  
  Also, here is a useful little program you can download free:
  http://www.ucs.louisiana.edu/~kxk4695/StatCalc.htm
  - Laurence Cox says:
    
    October 23, 2011 at 3:26 pm
    
    It’s normally bad form to reply to one’s own comments, but I thought of a good illustration of the value of non-linear axes.
    
    In the final graph of the blog where Cath plots “Cath’s blog satisfaction index” against number of comments and invites the reader to extrapolate to derive the coefficient of patheticness, the points look quite close to y=2^(x-1). Suppose this was the case, then plotting log(y) rather than y against x would give a straight line and deriving the coefficient of patheticness would be far easier for the reader.
    
    Similarly, the points might be derived from experiment – the number of times Cath checks on the comments to her blog post – and we might be interested in whether the best fit to this is y=2^(x-1) or perhaps something slightly different, say y=1.9^(x-0.9). The log scale on the y axis makes this sort of distinction much easier to see.
    
    So a good general rule for graphs is to choose axes that make life easiest for the reader.
    
    In some work that I do, I use log-log plots because I am looking at something that has a power-law decay (y = x^-n), where the gradients (n) of the power law and the transitions from one gradient to another seem to be good discriminators between samples of different types.
    - Cath@VWXYNot? says:
      
      October 24, 2011 at 10:14 pm
      
      Excellent points, Laurence – again, I’ve never explicitly been taught about when to use a log axis to present scientific data, although I do remember it from A Level maths.
      
      I don’t think I’ve ever generated any real-life data that should have been plotted on a log axis, but I may be completely wrong, being ignorant of such things 🙂
Nina says:

October 19, 2011 at 6:25 pm

Argh. These are all awful examples and I feel bad for the people who use them.
I was taught all during my courses what graphs to use. We always had to present stuff and then the profs would critizise our graphs in “public”. That surely helped.
I also TOTALLY agree with Chall that excel is NOT a statistics program. At best it is a tool to convert your data into txt files to import into statistics programs …

FInally, I know English grammar is hard, but I learned it in highschool and already then had the feeling my grammar was better than my American cousin’s…
Cath@VWXYNot? says:

October 19, 2011 at 7:21 pm

Alyssa, do you read the Flowing Data and Information is Beautiful blogs? There are some amazing examples there, but they’re not exactly applicable to your standard scientific presentation!

Liz, that radial graph sounds infuriating! It sounds almost worse than a pie chart for comparing two different values!

Ricardipus, sorry about the colour scheme – I always change the colours for presentations, publications and other formal documents, but didn’t bother for this blog post! I also couldn’t be bothered to create fictional replicates for each point that would generate the right means with reasonable error bars. Deal with it ;-p

(actually, I don’t think either of the examples I used had error bars in the original version, either. The second one definitely didn’t, and I don’t remember them being on the first one either).

RPG, I grammar good without no learning.

KJ, being able to immediately identify what kind of data you’re looking at is an aspect I hadn’t considered.

I hope you can bring yourself to post some of those graphs at some point!

Chall, I am always quick to admit that my grasp of stats beyond the absolute basics is extremely shaky! Luckily, I don’t have to actively do any more – just edit the text used to describe the stats (hopefully without altering the meaning!)

Thanks for the links – I had a quick glance at them but will read them in more depth later!

Nina, that experience of public criticism must have been very valuable! I’ve never seen anyone pick on the choice of graphs when sitting through one of these presentations, but maybe I should start a trend 🙂

I actually learned quite a bit about English grammar when I first started being taught the French and German versions – certain things suddenly made more sense!

My parents were both languages teachers before their retirement, and my Dad especially despaired of his students’ grasp of English grammar. A student once actually told him that it was unfair that they had to learn this stuff in French; apparently “English is so much easier because we don’t have grammar!”
bean-mom says:

October 21, 2011 at 3:22 am

Cath,
During my brief stint as a full-time science editor, I saw some terrible data presentations. But not as terrible as what you’ve shown here.

Yup, I was never formally taught data presentation… just as I wasn’t formally taught how to give a talk or write a paper. You observe, see what works and what doesn’t, do it yourself for the first time (and take the heat) and learn. Yesterday i was talking with a grad student and realized how very little *formal* training we get in what is most important. But the truth is I think that applies in most fields; you learn on the job, as you go along. Formal seminars/classes/workshops often don’t seem that helpful (although maybe that’s just because I haven’t been to good ones).

By the way, chall, I just did my first real box-and-whisker plot this week! I love it! Nice way to show distribution of data and see differences you would miss if you were to just take the means (esp if your data is not normally distributed). I like graphically showing the outliars, too.

Oh, by the way, I can’t articulate any rules of grammar at all. Has never stopped me from editing grammar.
Cath@VWXYNot? says:

October 22, 2011 at 1:01 am

Oh, you are so right… the most important stuff is the least likely to be explicitly taught! I’m lucky in that I’m very good at learning from example and mimicking the conventions of different formats (as, I suspect, are most scientists), but I do think that many people would benefit from formalising this aspect of our training.

Your last statement rings very true as well; I’ve often been asked by non-native English speakers why I changed their grammar, and have never had a better answer than “because it didn’t look right”. This used to drive my former French room-mate crazy, as she wanted me to correct every mistake she made, and then wanted to know the actual rules; I eventually gave her my Dad’s phone number and let her bother him instead! She called him two or three times in total, I think – he’s a teacher so I figured it’s his job 😀 (don’t worry, he loved it)
bean-mom says:

October 22, 2011 at 1:57 am

The professional scientific editor at my institute (who works with any lab or person that needs him) can tell you precisely what the grammatical rules are and why what you’ve written is wrong and why his edits are correct. Frankly, I don’t think that in itself makes him a better editor. . . but it does mean that he can justify himself with technical rules and he can explain the rules. And I understand why that would be reassuring =) (I think I would go nuts, too, if I were trying to write in a foreign language and no one would explain to me what the rules were!)
- Cath@VWXYNot? says:
  
  October 24, 2011 at 10:15 pm
  
  Agreed! Sometimes I think it would be nice to be able to provide better explanations for my edits, but like you said, it (probably) wouldn’t actually change the nature of my edits! (much!)
Massimo says:

October 23, 2011 at 1:57 pm

Interesting subject. In physics you are never really taught how to plot stuff either, not formally and in any detail anyway, and I think it is because you run into plots right away, almost starting from your first physics class. It must be sort of understood that people figure out how to plot.
I have never run into egregious cases of plots botched as badly as in your examples, but I have my personal “pet peeves”.
One thing that truly annoys me, is when I see plots in which points are connected by straight lines, like in your above examples. I am thinking that it normally is done by the plotting software by default, and people do not bother to remove the connecting lines, but in my opinion it is just plainly wrong. It is misleading, in that it implies (at least implicitly) a specific behaviour of the underlying quantity. In my opinion, points have to be plotted by themselves, without any connecting line. The only line that one should draw is a fitting line.
- Cath@VWXYNot? says:
  
  October 24, 2011 at 10:20 pm
  
  Yep, no more linked dots for me after what I’ve learned from this comments thread! (Except for in hockey pool updates, because I maintain that in that context the crossing lines make it easier to immediately determine who’s had the most dramatic change in fortunes and should therefore be either mocked or envied).
  
  I don’t remember seeing many graphs in my earliest biology classes – so much of it is qualitative rather than quantitative. We did some experiments that generated quantitative data in high school biology, but not as many as in chemistry (I’ve managed to blank out most of my physics instruction), but in high school I seem to remember that we were always told what kind of graph to draw (I didn’t have a computer at home that would do anything other than play 8-bit games loaded from a cassette player – almost nobody did – so I do literally mean draw). So my first genuine experiences must have been in undergrad labs.