Computational analysis of natural science experiments often confronts noisy data due to natural variability in environment or measurement. Drawing conclusions in the face of such noise entails a statistical analysis. Parametric statistical methods assume that the data is a sample from a population that can be characterized by a specific distribution (e.g., a normal distribution). When the assumption is true, parametric approaches can lead to high confidence predictions. However, in many cases particular distribution assumptions do not hold. In that case, assuming a distribution may yield false conclusions. The companion book Statistics is Easy, gave a (nearly) equation-free introduction to nonparametric (i.e., no distribution assumption) statistical methods. The present book applies data preparation, machine learning, and nonparametric statistics to three quite different life science datasets. We provide the code as applied to each dataset in both R and Python 3. We also include exercises for self-study or classroom use.
Autorentext
Manpreet Singh Katari is a Clinical Associate Professor and the Coordinator of Computational Studies in the Biology Department of New York University. In addition to teaching courses ranging from Statistics, Programming, Machine Learning, and Analysis of Next-Generation Sequencing Data, he also collaborates with researchers in the area of Plant Systems Biology. His main passion is in developing software that empowers researchers to analyze, integrate, and visualize large-scale genomic datasets. Although his work has been primarily in the model plant species Arabidopsis thaliana he has applied his knowledge to many crops, such as Rice, Corn, Banana, and Cassava, and also to human disease datasets such as cancer.