Statisticians have met the need to test hundreds or thousands of genomics hypotheses simultaneously with novel empirical Bayes methods that combine advantages of traditional Bayesian and frequentist statistics. Techniques for estimating the local false discovery rate assign probabilities of differential gene expression, genetic association, etc. without requiring subjective prior distributions. This book brings these methods to scientists while keeping the mathematics at an elementary level. Readers will learn the fundamental concepts behind local false discovery rates, preparing them to analyze their own genomics data and to critically evaluate published genomics research.
Key Features:
* dice games and exercises, including one using interactive software, for teaching the concepts in the classroom
* examples focusing on gene expression and on genetic association data and briefly covering metabolomics data and proteomics data
* gradual introduction to the mathematical equations needed
* how to choose between different methods of multiple hypothesis testing
* how to convert the output of genomics hypothesis testing software to estimates of local false discovery rates
* guidance through the minefield of current criticisms of p values
* material on non-Bayesian prior p values and posterior p values not previously published
Autorentext
David R. Bickel is an Associate Professor in the Department of Biochemistry, Microbiology and Immunology of the University of Ottawa and a Core Member of the Ottawa Institute of Systems Biology. Since 2011, he has been teaching classes focused on the statistical analysis of genomics data. While working as a biostatistician in academia and industry, he has published new statistical methods for analyzing genomics data in leading statistics and bioinformatics journals. He is also investigating the foundations of statistical inference. For recent activity, see davidbickel.com or follow him at @DavidRBickel (Twitter).
Inhalt
1. Basic probability and statistics
Biological background
Probability distributions
Probability functions
Contingency tables
Hypothesis tests and p values
Bibliographical notes
Exercises (PS1-PS3)
2. Introduction to likelihood
Likelihood function defined
Odds and probability: What's the difference?
Bayesian uses of likelihood
Bibliographical notes
Exercises (L1-L3)
3. False discovery rates
Introduction
Local false discovery rate
Global and local false discovery rates
Computing the LFDR estimate
Bibliographical notes
Exercises (L4; A-B)
4. Simulating and analyzing gene expression data
Simulating gene expression with dice
DE games
Effects and Estimates (E&E)
Under the hood: normal distributions
Bibliographical notes
Exercises (C-E; G1-G4)
5. Variations in dimension and data
Introduction
High-dimensional genetics
Subclasses and superclasses
Medium number of features
Bibliographical notes
Exercise (G5)
6. Correcting bias in estimates of the false discovery rate
Why correct the bias in estimates of the false discovery rate?
A misleading estimator of the false discovery rate 66
Corrected and re-ranked estimators of the local false discovery rate
Application to gene expression data analysis
Bibliographical notes
Exercises (CFDR0-CFDR3)
7. The L value: An estimated local false discovery rate to replace a p value
What if I only have one p value? Am I doomed?
The L value to the rescue!
The multiple-test L value
Bibliographical notes
Exercises (LV1-LV9)
8. Maximum likelihood and applications
Non-Bayesian uses of likelihood
Empirical Bayes uses of likelihood
Bibliographical notes
Exercises (M1-M2)
Appendix A. Generalized Bonferroni correction derived from conditional compatibility
A non-Bayesian approach to testing single and multiple hypotheses
Bibliographical notes
Appendix B. How to choose a method of hypothesis testing
Guidelines for scientists performing statistical hypothesis tests
Bibliographical notes
Appendix. Bibliography