From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).

Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.

The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.



Autorentext

Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research.

Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award.

Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".



Klappentext

From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).

Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.

The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.



Inhalt

Preface

Background and motivation

Intended audience

Key features of this book

Changes in the second edition

Key role of technology

How to use this book

Acknowledgments

I Part I: Introduction to Data Science

1. Prologue: Why data science?

What is data science?

Case study: The evolution of sabermetrics

Datasets

Further resources

2. Data visualization

The federal election cycle

Composing data graphics

Importance of data graphics: Challenger

Creating effective presentations

The wider world of data visualization

Further resources

Exercises

Supplementary exercises

3. A grammar for graphics

A grammar for data graphics

Canonical data graphics in R

Extended example: Historical baby names

Further resources

Exercises

Supplementary exercises

4. Data wrangling on one table

A grammar for data wrangling

Extended example: Ben's time with the Mets

Further resources

Exercises

Supplementary exercises

5. Data wrangling on multiple tables

inner_join()

left_join()

Extended example: Manny Ramirez

Further resources

Exercises

Supplementary exercises

6. Tidy data

Tidy data

Reshaping data

Naming conventions

Data intake

Further resources

Exercises

Supplementary exercises

7. Iteration

Vectorized operations

Using across() with dplyr functions

The map() family of functions

Iterating over a one-dimensional vector

Iteration over subgroups

Simulation

Extended example: Factors associated with BMI

Further resources

Exercises

Supplementary exercises

8. Data Science Ethics

Introduction

Truthful falsehoods

Role of data science in society

Some settings for professional ethics

Some principles to guide ethical action

Algorithmic bias

Data and disclosure

Reproducibility

Ethics, collectively

Professional guidelines for ethical conduct

Further resources

Exercises

Supplementary exercises

II Part II: Statistics and Modeling

9. Statistical foundations

Samples and populations

Sample statistics

The bootstrap

Outliers

Statistical models: Explaining variation

Confounding and accounting for other factors

The perils of p-values

Further resources

Exercises

Supplementary exercises

10. Predictive modeling

Predictive modeling

Simple classification models

Evaluating models

Extended example: Who has diabetes?

Further resources

Exercises

Supplementary exercises

11. Supervised learning

Non-regression classifiers

Parameter tuning

Example: Evaluation of income models redux

Extended example: Who has diabetes this time?

Regularization

Further resources

Exercises

Supplementary exercises

12. Unsupervised learning

Clustering

Dimension reduction

Further resources

Exercises

Supplementary exercises

13. Simulation

Reasoning in reverse

Extended example: Grouping cancers

Randomizing functions

Simulating variability

Random networks

Key principles of simulation

Further resources

Exercises

Supplementary exercises

III Part III: Topics in Data Science

14. Dynamic and customized data graphics

Rich Web content using Djs and htmlwidget…

Titel
Modern Data Science with R
EAN
9780429577505
Format
E-Book (pdf)
Veröffentlichung
13.04.2021
Digitaler Kopierschutz
Adobe-DRM
Anzahl Seiten
650