Data Science and Machine Learning - Zdravko Botev, Dirk P. Kroese, Thomas Taimre

- DE
- FR

E-Book (pdf) Data Science and Machine Learning von Zdravko Botev, Dirk P. Kroese, Thomas Taimre

Data Science and Machine Learning Zdravko Botev, Dirk P. Kroese, Thomas Taimre E-Books Englisch

"This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" -Nicholas Hoell, University of Toronto

"This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche. -Adam Loy, Carleton College

The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

Key Features:

Focuses on mathematical understanding.
Presentation is self-contained, accessible, and comprehensive.
Extensive list of exercises and worked-out examples.
Many concrete algorithms with Python code.
Full color throughout.

Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures

Autorentext

Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method-an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

Zdravko Botev, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

Thomas Taimre, PhD, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland. His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.

Inhalt

Preface

Notation

Importing, Summarizing, and Visualizing Data

Introduction

Structuring Features According to Type

Summary Tables

Summary Statistics

Visualizing Data

Plotting Qualitative Variables

Plotting Quantitative Variables

Data Visualization in a Bivariate Setting

Exercises

Statistical Learning

Introduction

Supervised and Unsupervised Learning

Training and Test Loss

Tradeoffs in Statistical Learning

Estimating Risk

In-Sample Risk

Cross-Validation

Modeling Data

Multivariate Normal Models

Normal Linear Models

Bayesian Learning

Exercises

Monte Carlo Methods

Introduction .

Monte Carlo Sampling

Generating Random Numbers

Simulating Random Variables

Simulating Random Vectors and Processes

Resampling

Markov Chain Monte Carlo

Monte Carlo Estimation

Crude Monte Carlo

Bootstrap Method

Variance Reduction

Monte Carlo for Optimization

Simulated Annealing

Cross-Entropy Method

Splitting for Optimization

Noisy Optimization

Exercises

Unsupervised Learning

Introduction

Risk and Loss in Unsupervised Learning

Expectation-Maximization (EM) Algorithm

Empirical Distribution and Density Estimation

Clustering via Mixture Models

Mixture Models

EM Algorithm for Mixture Models

Clustering via Vector Quantization

K-Means

Clustering via Continuous Multiextremal Optimization

Hierarchical Clustering

Principal Component Analysis (PCA)

Motivation: Principal Axes of an Ellipsoid

PCA and Singular Value Decomposition (SVD)

Exercises

Regression

Introduction

Linear Regression

Analysis via Linear Models

Parameter Estimation

Model Selection and Prediction

Cross-Validation and Predictive Residual Sum of Squares

In-Sample Risk and Akaike Information Criterion

Categorical Features

Nested Models

Coefficient of Determination

Inference for Normal Linear Models

Comparing Two Normal Linear Models

Confidence and Prediction Intervals

Nonlinear Regression Models

Linear Models in Python

Modeling

Analysis

Analysis of Variance (ANOVA)

Confidence and Prediction Intervals

Model Validation

Variable Selection

Generalized Linear Models

Exercises

Regularization and Kernel Methods

Introduction

Regularization

Reproducing Kernel Hilbert Spaces

Construction of Reproducing Kernels

Reproducing Kernels via Feature Mapping

Kernels from Characteristic Functions

Reproducing Kernels Using Orthonormal Features

Kernels from Kernels

Representer Theorem

Smoothing Cubic Splines

Gaussian Process Regression

Kernel PCA

Exercises

Classification

Introduction

Classification Metrics

Classification via Bayes' Rule

Linear and Quadratic Discriminant Analysis

Logistic Regression and Softmax Classification

K-nearest Neighbors Classification

Support Vector Machine

Classification with Scikit-Learn

Exercises

Decision Trees and Ensemble Methods

Introduction

Top-Down Construction of Decision Trees

Regional Prediction Functions

Splitting Rules

Termination Criterion

Basic Implementation

Additional Considerations

Binary Versus Non-Binary Trees

Data Preprocessing

Alternative Splitting Rules

Categorical Variables

Missing Values

Controlling the Tree Shape

Cost-Complexity Pruning

Advantages and Limitations of Decision Trees

Bootstrap Aggregation

Random Forests

Boosting

Exercises

Deep Learning

Introduction

Feed-Forward Neural Networks

Back-Propagation

Methods for Training

Steepest Descent

Levenberg-Marquardt Method

Limited-Memory BFGS Method

Adaptive Gradient Methods

Examples in Python

Simple Polynomial Regression

Image Classif…

Titel

Data Science and Machine Learning

Untertitel

Mathematical and Statistical Methods

Autor

EAN

9781000730777

Format

E-Book (pdf)

Hersteller

Taylor & Francis eBooks

Genre

Grundlagen

Veröffentlichung

20.11.2019

Digitaler Kopierschutz

Adobe-DRM

Anzahl Seiten

532