Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.



Autorentext

The authors are international authorities and leaders on the presented topics. All are fellows of the Institute of Mathematical Statistics and the American Statistical Association.

Jianqing Fan is Frederick L. Moore Professor, Princeton University. He is co-editing Journal of Business and Economics Statistics and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, and Journal of Econometrics and has been recognized by the 2000 COPSS Presidents' Award, AAAS Fellow, Guggenheim Fellow, Guy medal in silver, Noether Senior Scholar Award, and Academician of Academia Sinica.

Runze Li is Elberly family chair professor and AAAS fellow, Pennsylvania State University, and was co-editor of The Annals of Statistics.

Cun-Hui Zhang is distinguished professor, Rutgers University and was co-editor of Statistical Science.

Hui Zou is professor, University of Minnesota and was action editor of Journal of Machine Learning Research.



Klappentext

Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.



Zusammenfassung
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Inhalt

I. Introduction

Rise of Big Data and Dimensionality

Biological Sciences

Health Sciences

Computer and Information Sciences

Economics and Finance

Business and Program Evaluation

Earth Sciences and Astronomy

Impact of Big Data

Impact of Dimensionality

Computation

Noise Accumulation

Spurious Correlation

Statistical theory

Aim of High-dimensional Statistical Learning

What big data can do

Scope of the book

2. Multiple and Nonparametric Regression

Introduction

Multiple Linear Regression

The Gauss-Markov Theorem

Statistical Tests

Weighted Least-Squares

Box-Cox Transformation

Model Building and Basis Expansions

Polynomial Regression

Spline Regression

Multiple Covariates

Ridge Regression

Bias-Variance Tradeo

Penalized Least Squares

Bayesian Interpretation

Ridge Regression Solution Path

Kernel Ridge Regression

Regression in Reproducing Kernel Hilbert Space

Leave-one-out and Generalized Cross-validation

Exercises

3. Introduction to Penalized Least-Squares

Classical Variable Selection Criteria

Subset selection

Relation with penalized regression

Selection of regularization parameters

Folded-concave Penalized Least Squares

Orthonormal designs

Penalty functions

Thresholding by SCAD and MCP

Risk properties

Characterization of folded-concave PLS

Lasso and L Regularization

Nonnegative garrote

Lasso

Adaptive Lasso

Elastic Net

Dantzig se…

Titel
Statistical Foundations of Data Science
EAN
9781466510852
Format
E-Book (pdf)
Veröffentlichung
20.09.2020
Digitaler Kopierschutz
Adobe-DRM
Anzahl Seiten
774