Gives a comprehensive and systematic account of high-dimensional data analysis, including variable selection via regularization methods and sure independent feature screening methods. It is a valuable reference for researchers involved with model selection, variable selection, machine learning, and risk management.



Autorentext

The authors are international authorities and leaders on the presented topics. All are fellows of the Institute of Mathematical Statistics and the American Statistical Association.

Jianqing Fan is Frederick L. Moore Professor, Princeton University. He is co-editing Journal of Business and Economics Statistics and was the co-editor of The Annals of Statistics, Probability Theory and Related Fields, and Journal of Econometrics and has been recognized by the 2000 COPSS Presidents' Award, AAAS Fellow, Guggenheim Fellow, Guy medal in silver, Noether Senior Scholar Award, and Academician of Academia Sinica.

Runze Li is Elberly family chair professor and AAAS fellow, Pennsylvania State University, and was co-editor of The Annals of Statistics.

Cun-Hui Zhang is distinguished professor, Rutgers University and was co-editor of Statistical Science.

Hui Zou is professor, University of Minnesota and was action editor of Journal of Machine Learning Research.



Klappentext

Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications.

The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.



Inhalt

I. Introduction

Rise of Big Data and Dimensionality

Biological Sciences

Health Sciences

Computer and Information Sciences

Economics and Finance

Business and Program Evaluation

Earth Sciences and Astronomy

Impact of Big Data

Impact of Dimensionality

Computation

Noise Accumulation

Spurious Correlation

Statistical theory

Aim of High-dimensional Statistical Learning

What big data can do

Scope of the book

2. Multiple and Nonparametric Regression

Introduction

Multiple Linear Regression

The Gauss-Markov Theorem

Statistical Tests

Weighted Least-Squares

Box-Cox Transformation

Model Building and Basis Expansions

Polynomial Regression

Spline Regression

Multiple Covariates

Ridge Regression

Bias-Variance Tradeo

Penalized Least Squares

Bayesian Interpretation

Ridge Regression Solution Path

Kernel Ridge Regression

Regression in Reproducing Kernel Hilbert Space

Leave-one-out and Generalized Cross-validation

Exercises

3. Introduction to Penalized Least-Squares

Classical Variable Selection Criteria

Subset selection

Relation with penalized regression

Selection of regularization parameters

Folded-concave Penalized Least Squares

Orthonormal designs

Penalty functions

Thresholding by SCAD and MCP

Risk properties

Characterization of folded-concave PLS

Lasso and L Regularization

Nonnegative garrote

Lasso

Adaptive Lasso

Elastic Net

Dantzig selector

SLOPE and Sorted Penalties

Concentration inequalities and uniform convergence

A brief history of model selection

Bayesian Variable Selection

Bayesian view of the PLS

A Bayesian framework for selection

Numerical Algorithms

Quadratic programs

Least angle regression_

Local quadratic approximations

Local linear algorithm

Penalized linear unbiased selection_

Cyclic coordinate descent algorithms

Iterative shrinkage-thresholding algorithms

Projected proximal gradient method

ADMM

Iterative Local Adaptive Majorization and Minimization

Other Methods and Timeline

Regularization parameters for PLS

Degrees of freedom

Extension of information criteria

Application to PLS estimators

Residual variance and refitted cross-validation

Residual variance of Lasso

Refitted cross-validation

Extensions to Nonparametric Modeling

Structured nonparametric models

Group penalty

Applications

Bibliographical notes

Exercises

4. Penalized Least Squares: Properties

Performance Benchmarks

Performance measures

Impact of model uncertainty

Bayes lower bounds for orthogonal design

Minimax lower bounds for general design

Performance goals, sparsity and sub-Gaussian noise

Penalized L Selection

Lasso and Dantzig Selector

Selection consistency

Prediction and coefficient estimation errors

Model size and least squares after selection

Properties of the Dantzig selector

Regularity conditions on the design matrix

Properties of Concave PLS

Properties of penalty functions

Local and oracle solutions

Properties of local solutions

Global and approximate global solutions

Smaller and Sorted Penalties

Sorted concave penalties and its local approximation

Approximate PLS with smaller and sorted penalties

Properties of LLA and LCA

Bibliographical notes

Exercises

5. Generalized Linear Models and Penalized Likelihood

Generalized Linear Models

Exponential family

Elements of generalized linear models

Maximum likelihood

Computing MLE: Iteratively reweighed least squares

Deviance and Analysis of Deviance

Residuals

Examples

Bernoulli and binomial models

Models for count responses

Models for nonnegative continuous responses

Normal error models

Sparest solution in high confidence set

A general setup

Examples

Properties

Variable Selection via Penalized Likelihood

Algorithms

Titel
Statistical Foundations of Data Science
EAN
9780429527616
Format
E-Book (epub)
Veröffentlichung
20.09.2020
Digitaler Kopierschutz
Adobe-DRM
Anzahl Seiten
774