Master advanced topics in the analysis of large, dynamically dependent datasets with this insightful resource

Statistical Learning with Big Dependent Data delivers a comprehensive presentation of the statistical and machine learning methods useful for analyzing and forecasting large and dynamically dependent data sets. The book presents automatic procedures for modelling and forecasting large sets of time series data. Beginning with some visualization tools, the book discusses procedures and methods for finding outliers, clusters, and other types of heterogeneity in big dependent data. It then introduces various dimension reduction methods, including regularization and factor models such as regularized Lasso in the presence of dynamical dependence and dynamic factor models. The book also covers other forecasting procedures, including index models, partial least squares, boosting, and now-casting. It further presents machine-learning methods, including neural network, deep learning, classification and regression trees and random forests. Finally, procedures for modelling and forecasting spatio-temporal dependent data are also presented.

Throughout the book, the advantages and disadvantages of the methods discussed are given. The book uses real-world examples to demonstrate applications, including use of many R packages. Finally, an R package associated with the book is available to assist readers in reproducing the analyses of examples and to facilitate real applications.

Analysis of Big Dependent Data includes a wide variety of topics for modeling and understanding big dependent data, like:

* New ways to plot large sets of time series

* An automatic procedure to build univariate ARMA models for individual components of a large data set

* Powerful outlier detection procedures for large sets of related time series

* New methods for finding the number of clusters of time series and discrimination methods , including vector support machines, for time series

* Broad coverage of dynamic factor models including new representations and estimation methods for generalized dynamic factor models

* Discussion on the usefulness of lasso with time series and an evaluation of several machine learning procedure for forecasting large sets of time series

* Forecasting large sets of time series with exogenous variables, including discussions of index models, partial least squares, and boosting.

* Introduction of modern procedures for modeling and forecasting spatio-temporal data

Perfect for PhD students and researchers in business, economics, engineering, and science: Statistical Learning with Big Dependent Data also belongs to the bookshelves of practitioners in these fields who hope to improve their understanding of statistical and machine learning methods for analyzing and forecasting big dependent data.



Autorentext

Daniel Peña, PhD, is Professor of Statistics at Universidad Carlos III de Madrid, Spain. He received his PhD from Universidad Politecnica de Madrid in 1976 and has taught at the Universities of Wisconsin-Madison, Chicago and Carlos III de Madrid, where he was Rector from 2007 to 2015.

Ruey S. Tsay, PhD, is the H.G.B Alexander Professor of Econometrics & Statistics at the Booth School of Business, University of Chicago, United States. He received his PhD in 1982 from the University of Wisconsin-Madison. His research focuses on areas of business and economic forecasting, financial econometrics, risk management, and analysis of big dependent data.



Inhalt

Preface xvii

1. Introduction To Big Dependent Data 1

1.1 Examples of Dependent Data 2

1.2 Stochastic Processes 9

1.2.1 Scalar Processes 9

1.2.1.1 Stationarity 10

1.2.1.2 White Noise Process 12

1.2.1.3 Conditional Distribution 12

1.2.2 Vector Processes 12

1.2.2.1 Vector White Noises 15

1.2.2.2 Invertibility 15

1.3 Sample Moments of Stationary Vector Process 15

1.3.1 Sample Mean 16

1.3.2 Sample Covariance and Correlation Matrices 17

1.4 Nonstationary Processes 21

1.5 Principal Component Analysis 23

1.5.1 Discussion 26

1.5.2 Properties of the PCs 27

1.6 Effects of Serial Dependence 31

Appendix 1.A: Some Matrix Theory 34

Exercises 35

References 36

2. Linear Univariate Time Series 37

2.1 Visualizing a Large Set of Time Series 39

2.1.1 Dynamic Plots 39

2.1.2 Static Plots 44

2.2 Stationary ARMA Models 49

2.2.1 The Autoregressive Process 50

2.2.1.1 Autocorrelation Functions 51

2.2.2 The Moving Average Process 52

2.2.3 The ARMA Process 54

2.2.4 Linear Combinations of ARMA Processes 55

2.3 Spectral Analysis of Stationary Processes 58

2.3.1 Fitting Harmonic Functions to a Time Series 58

2.3.2 The Periodogram 59

2.3.3 The Spectral Density Function and Its Estimation 61

2.4 Integrated Processes 64

2.4.1 The Random Walk Process 64

2.4.2 ARIMA Models 65

2.4.3 Seasonal ARIMA Models 67

2.4.3.1 The Airline Model 69

2.5 Structural and State Space Models 71

2.5.1 Structural Time Series Models 71

2.5.2 State-Space Models 72

2.5.3 The Kalman Filter 76

2.6 Forecasting with Linear Models 78

2.6.1 Computing Optimal Predictors 78

2.6.2 Variances of the Predictions 80

2.6.3 Measuring Predictability 81

2.7 Modeling a Set of Time Series 82

2.7.1 Data Transformation 83

2.7.2 Testing forWhite Noise 85

2.7.3 Determination of the Difference Order 85

2.7.4 Model Identification 87

2.8 Estimation and Information Criteria 87

2.8.1 Conditional Likelihood 87

2.8.2 On-line Estimation 88

2.8.3 Maximum Likelihood (ML) Estimation 90

2.8.4 Model Selection 91

2.8.4.1 The Akaike Information Criterion (AIC) 91

2.8.4.2 The Bayesian Information Criterion (BIC) 92

2.8.4.3 Other Criteria 92

2.8.4.4 Cross-Validation 93

2.9 Diagnostic Checking 95

2.9.1 Residual Plot 96

2.9.2 Portmanteau Test for Residual Serial Correlations 96

2.9.3 Homoscedastic Tests 97

2.9.4 Normality Tests 98

2.9.5 Checking for Deterministic Components 98

2.10 Forecasting 100

2.10.1 Out-of-Sample Forecasts 100

2.10.2 Forecasting with Model Averaging 100

2.10.3 Forecasting with Shrinkage Estimators 102

Appendix 2.A: Difference Equations 103

Exercises 108

References 108

3. Analysis of Multivariate Time Series 111

3.1 Transfer Function Models 112

3.1.1 Single Input and Single Output 112

3.1.2 Multiple Inputs and Multiple Outputs 118

3.2 Vector AR Models 118

3.2.1 Impulse Response Function 120

3.2.2 Some Special Cases 121

3.2.3 Estimation 122

3.2.4 Model Building 123

3.2.5 Prediction 125

3.2.6 Forecast Error Variance Decomposition 127

3.3 Vector Moving-Average Models 135

3.3.1 Properties of VMA Models 136

3.3.2 VMA Modeling 136

3.4 Stationary VARMA Models 140

3.4.1 Are VAR Models Sufficient? 140

3.4.2 Properties of VARMA Models 141

3.4.3 Modeling VARMA Process 141

3.4.4 Use...

Titel
Statistical Learning for Big Dependent Data
EAN
9781119417415
Format
E-Book (epub)
Hersteller
Veröffentlichung
16.03.2021
Digitaler Kopierschutz
Adobe-DRM
Dateigrösse
34.22 MB
Anzahl Seiten
560