Big Data in Omics and Imaging: Integrated Analysis and Causal Inference addresses the recent development of integrated genomic, epigenomic and imaging data analysis and causal inference in big data era. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), genome-wide expression studies (GWES), and epigenome-wide association studies (EWAS), the overall contribution of the new identified genetic variants is small and a large fraction of genetic variants is still hidden. Understanding the etiology and causal chain of mechanism underlying complex diseases remains elusive. It is time to bring big data, machine learning and causal revolution to developing a new generation of genetic analysis for shifting the current paradigm of genetic analysis from shallow association analysis to deep causal inference and from genetic analysis alone to integrated omics and imaging data analysis for unraveling the mechanism of complex diseases.
FEATURES
- Provides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.
- Introduce causal inference theory to genomic, epigenomic and imaging data analysis
- Develop novel statistics for genome-wide causation studies and epigenome-wide causation studies.
- Bridge the gap between the traditional association analysis and modern causation analysis
- Use combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networks
- Present statistical methods and computational algorithms for searching causal paths from genetic variant to disease
- Develop causal machine learning methods integrating causal inference and machine learning
- Develop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks
The book is designed for graduate students and researchers in genomics, epigenomics, medical image, bioinformatics, and data science. Topics covered are: mathematical formulation of causal inference, information geometry for causal inference, topology group and Haar measure, additive noise models, distance correlation, multivariate causal inference and causal networks, dynamic causal networks, multivariate and functional structural equation models, mixed structural equation models, causal inference with confounders, integer programming, deep learning and differential equations for wearable computing, genetic analysis of function-valued traits, RNA-seq data analysis, causal networks for genetic methylation analysis, gene expression and methylation deconvolution, cell -specific causal networks, deep learning for image segmentation and image analysis, imaging and genomic data analysis, integrated multilevel causal genomic, epigenomic and imaging data analysis.
Autorentext
Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.
Inhalt
1. Genotype-Phenotype Network Analysis
Undirected Graphs for Genotype Network
Gaussian Graphic Model
Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model
Coordinate Descent Algorithm and Graphical Lasso
Multiple Graphical Models
Directed Graphs and Structural Equation Models for Networks
Directed Acyclic Graphs
Linear Structural Equation Models
Estimation Methods
Sparse Linear Structural Equations
Penalized Maximum Likelihood Estimation
Penalized Two Stage Least Square Estimation
Penalized Three Stage Least Square Estimation
Functional Structural Equation Models for Genotype-Phenotype Networks
Functional Structural Equation Models
Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models
Causal Calculus
Effect Decomposition and Estimation
Graphical Tools for Causal Inference in Linear SEMs
Identification and Single-door Criterion
Instrument Variables
Total Effects and Backdoor Criterion
Counterfactuals and Linear SEMs
Simulations and Real Data Analysis
Simulations for Model Evaluation
Application to Real Data Examples
Appendix 1A
Appendix 1B
Exercises
Figure Legend
2 Causal analysis and network biology
Bayesian Networks as a General Framework for Causal Inference
Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks
Structural Equations and Score Metrics for Continuous Causal Networks
Multivariate SEMs for Generating Node Core Metrics
Mixed SEMs for Pedigree-based Causal Inference
Bayesian Networks with Discrete and Continuous Variable
Two-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
Multiple Network Penalized Functional Logistic Regression Models for NGS Data
Multi-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
Other Statistical Models for Quantifying Node Score Function
Integer Programming for Causal Structure Leaning
Introduction
Integer Linear Programming Formulation of DAG Learning
Cutting Plane for Integer Linear Programming
Branch and Cut Algorithm for Integer Linear Programming
Sink Finding Primal Heuristic Algorithm
Simulations and Real Data Analysis
Simulations
Real Data Analysis
Figure Legend
Software Package
Appendix 2A Introduction to Smoothing Splines
Smoothing Spline Regression for a Single Variable
Smoothing Spline Regression for Multiple Variables
Appendix 2B Penalized Likelihood Function for Jointly Observational and Interventional Data
Exercises
Figure Legend
3. Wearable Computing and Genetic Analysis of Function-valued Traits
Classification of Wearable Biosensor Data
Introduction
Functional Data Analysis for Classification of Time Course Wearable Biosensor Data
Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data
Deep Learning for Physiological Time Series Data Analysis
Association Studies of Function-Valued Traits
Introduction
Functional Linear Models with both Functional Response and Predictors for Association Analysis of Function-valued Traits
Test Statistics
Null Distribution of Test Statistics
Power
Real Data Analysis
Association Analysis of Multiple Function-valued Traits
Gene-gene Interaction Analysis of Function-Valued Traits
Introduction
Functional Regression Models
Estimation of Interaction Effect Function
Test Statistics
Simulations
Real Data Analysis
Figure Legend
Appendix 3.A Gradient Methods for Parameter Estimation in the Convolutional Neural
Networks
Multilayer Feedforward Pass
Backpropagation Pass
Convolutional Layer
Exercises
4. RNA-seq Data Analysis
Normalization Methods on RNA-seq Data Analysis
Gene Expression
RNA Sequencing Expression Profiling
Methods for Normalization
Differential Expression Analysis for RNA-Seq Data
Distribution-based Approach to Differential Expression Anal…