Statistical Foundations of Data Science - Jianqing Fan, Runze Li, Cun-Hui Zhang

- DE
- FR

E-Book (epub) Statistical Foundations of Data Science von Jianqing Fan, Runze Li, Cun-Hui Zhang

Statistical Foundations of Data Science Jianqing Fan, Runze Li, Cun-Hui Zhang E-Books Englisch

Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Autorentext

The authors are international authorities and leaders on the presented topics. All are fellows of the Institute of Mathematical Statistics and the American Statistical Association.

Jianqing Fan is Frederick L. Moore Professor, Princeton University. He is co-editing Journal of Business and Economics Statistics and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, and Journal of Econometrics and has been recognized by the 2000 COPSS Presidents' Award, AAAS Fellow, Guggenheim Fellow, Guy medal in silver, Noether Senior Scholar Award, and Academician of Academia Sinica.

Runze Li is Elberly family chair professor and AAAS fellow, Pennsylvania State University, and was co-editor of The Annals of Statistics.

Cun-Hui Zhang is distinguished professor, Rutgers University and was co-editor of Statistical Science.

Hui Zou is professor, University of Minnesota and was action editor of Journal of Machine Learning Research.

Klappentext

Inhalt

I. Introduction

Rise of Big Data and Dimensionality

Biological Sciences

Health Sciences

Computer and Information Sciences

Economics and Finance

Business and Program Evaluation

Earth Sciences and Astronomy

Impact of Big Data

Impact of Dimensionality

Computation

Noise Accumulation

Spurious Correlation

Statistical theory

Aim of High-dimensional Statistical Learning

What big data can do

Scope of the book

2. Multiple and Nonparametric Regression

Introduction

Multiple Linear Regression

The Gauss-Markov Theorem

Statistical Tests

Weighted Least-Squares

Box-Cox Transformation

Model Building and Basis Expansions

Polynomial Regression

Spline Regression

Multiple Covariates

Ridge Regression

Bias-Variance Tradeo

Penalized Least Squares

Bayesian Interpretation

Ridge Regression Solution Path

Kernel Ridge Regression

Regression in Reproducing Kernel Hilbert Space

Leave-one-out and Generalized Cross-validation

Exercises

3. Introduction to Penalized Least-Squares

Classical Variable Selection Criteria

Subset selection

Relation with penalized regression

Selection of regularization parameters

Folded-concave Penalized Least Squares

Orthonormal designs

Penalty functions

Thresholding by SCAD and MCP

Risk properties

Characterization of folded-concave PLS

Lasso and L Regularization

Nonnegative garrote

Lasso

Adaptive Lasso

Elastic Net

Dantzig selector

SLOPE and Sorted Penalties

Concentration inequalities and uniform convergence

A brief history of model selection

Bayesian Variable Selection

Bayesian view of the PLS

A Bayesian framework for selection

Numerical Algorithms

Quadratic programs

Least angle regression_

Local quadratic approximations

Local linear algorithm

Penalized linear unbiased selection_

Cyclic coordinate descent algorithms

Iterative shrinkage-thresholding algorithms

Projected proximal gradient method

ADMM

Iterative Local Adaptive Majorization and Minimization

Other Methods and Timeline

Regularization parameters for PLS

Degrees of freedom

Extension of information criteria

Application to PLS estimators

Residual variance and refitted cross-validation

Residual variance of Lasso

Refitted cross-validation

Extensions to Nonparametric Modeling

Structured nonparametric models

Group penalty

Applications

Bibliographical notes

Exercises

4. Penalized Least Squares: Properties

Performance Benchmarks

Performance measures

Impact of model uncertainty

Bayes lower bounds for orthogonal design

Minimax lower bounds for general design

Performance goals, sparsity and sub-Gaussian noise

Penalized L Selection

Lasso and Dantzig Selector

Selection consistency

Prediction and coefficient est…

Titel

Statistical Foundations of Data Science

Autor

Jianqing Fan

Runze Li

Cun-Hui Zhang

Hui Zou

EAN

9780429527616

Format

E-Book (epub)

Hersteller

Taylor & Francis eBooks

Veröffentlichung

20.09.2020

Digitaler Kopierschutz

Adobe-DRM

Anzahl Seiten

774