{"id":230,"date":"2009-05-27T12:37:16","date_gmt":"2009-05-27T10:37:16","guid":{"rendered":"http:\/\/occamstypewriter.org\/boboh\/2009\/05\/27\/help_how_do_i_deal_with_microarrays\/"},"modified":"2009-05-27T12:37:16","modified_gmt":"2009-05-27T10:37:16","slug":"help_how_do_i_deal_with_microarrays","status":"publish","type":"post","link":"https:\/\/occamstypewriter.org\/boboh\/2009\/05\/27\/help_how_do_i_deal_with_microarrays\/","title":{"rendered":"Help!  How Do I Deal With Microarrays?"},"content":{"rendered":"<p>In the past I have ranted about <a href=\"http:\/\/network.nature.com\/people\/boboh\/blog\/2008\/08\/19\/why-p-values-are-evil\">the evils of p-values<\/a> and also how <a href=\"http:\/\/network.nature.com\/people\/boboh\/blog\/2008\/06\/29\/its-the-wrong-data-grommit\">we&#8217;re not collecting the right sort of data<\/a>.  Both of these have just collided in my work, and I&#8217;m not sure what to do.<\/p>\n<p><!--more--><br \/>\nThe problem (simplified, and with details removed to protect the guilty) is this.  We have a large microarray study.  The data are expression levels of thousands of genes in some treatments, with a few random effects thrown in as well.  What we want to do it to pick out the interesting genes, i.e. those which show a difference between treatments, and those which have a big random effect.  And, of course, those with both.  Once we have a list of these, we can ask whether particular types of gene are behaving in interesting ways, or whether it seems random.<br \/>\n<a href=\"http:\/\/bioinformatics.biology.ualberta.ca\/\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/bioinformatics.biology.ualberta.ca\/images\/cmm_microarray.jpg\" alt=\"\" width=\"500\" height=\"494\" \/><\/a><br \/>\n<em>A randomly selected image of a lot of spots<\/em><br \/>\nNow, the traditional way of dealing with this is to calculate p-values, and declare the genes with p&lt;0.05 (possibly after a correction for multiple tests) interesting.  For reason why that&#039;s a bad idea <a href=\"http:\/\/network.nature.com\/people\/boboh\/blog\/2008\/08\/19\/why-p-values-are-evil\">go here<\/a>, and don&#8217;t come back until you&#8217;ve fully digested the lesson, and have taken to heart that the author is the greatest thinker since the inventor of the number <a href=\"http:\/\/wiki.answers.com\/Q\/What_do_you_get_when_you_multiply_6_by_9\">54<\/a>.  What makes this worse is the random effect: the test would be of whether the variance was greater than zero.  But in reality the variance is always &gt;0, so the test is really really silly.<br \/>\nInstead we could simply set a threshold, say a value of 2, and say any gene with a treatment effect larger than this (or smaller than -2) is &#8220;significant&#8221;.  But if we do that, we will tend to pick genes with less reliable estimates (because they are more likely by chance to get beyond the threshold).<br \/>\nSo, having eliminated the two obvious approaches, what to do?  At the moment I don&#8217;t know, which is why I&#8217;m blogging this: I&#8217;m hoping for some suggestions, or at least hints.  I&#8217;m particularly vague on what will be done with the genes afterwards.  It seems that people produce summaries like pie charts, and say things like &#8220;this group of genes is over-represented&#8221;.  What is not clear to me is what else is done: what questions are being asked?<br \/>\nIf the questions are clear and precise, then I think the statistics can take over: for example we can weight the importance of genes by the reliability of the estimates (e.g. through their standard errors).  It might even be unnecessary to pick out important genes: we can use all of them, or be liberal in picking out genes.<br \/>\nSo, I&#8217;m interested to hear any thoughts on this.  In particular, if we have a treatment, and some genes that it (might) affect, what sort of questions are being asked about those genes?  How do we go from this list of genes to something that&#8217;s useful?  Or, better, how do we <em>want<\/em> go from this list of genes to something that&#8217;s useful?  If we can refine the biological questions, we can wheel out the statistical machinery more effectively.<br \/>\nAs you can see, this is a research problem, so any useful ideas might turn into a paper.  Contributions will be fully acknowledged, of course.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the past I have ranted about the evils of p-values and also how we&#8217;re not collecting the right sort of data. Both of these have just collided in my work, and I&#8217;m not sure what to do.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-230","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/posts\/230","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/comments?post=230"}],"version-history":[{"count":0,"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/posts\/230\/revisions"}],"wp:attachment":[{"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/media?parent=230"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/categories?post=230"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/occamstypewriter.org\/boboh\/wp-json\/wp\/v2\/tags?post=230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}