From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Autorentext
Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research.
Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award.
Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".
Klappentext
From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Inhalt
Preface
Background and motivation
Intended audience
Key features of this book
Changes in the second edition
Key role of technology
How to use this book
Acknowledgments
I Part I: Introduction to Data Science
1. Prologue: Why data science?
What is data science?
Case study: The evolution of sabermetrics
Datasets
Further resources
2. Data visualization
The federal election cycle
Composing data graphics
Importance of data graphics: Challenger
Creating effective presentations
The wider world of data visualization
Further resources
Exercises
Supplementary exercises
3. A grammar for graphics
A grammar for data graphics
Canonical data graphics in R
Extended example: Historical baby names
Further resources
Exercises
Supplementary exercises
4. Data wrangling on one table
A grammar for data wrangling
Extended example: Ben's time with the Mets
Further resources
Exercises
Supplementary exercises
5. Data wrangling on multiple tables
inner_join()
left_join()
Extended example: Manny Ramirez
Further resources
Exercises
Supplementary exercises
6. Tidy data
Tidy data
Reshaping data
Naming conventions
Data intake
Further resources
Exercises
Supplementary exercises
7. Iteration
Vectorized operations
Using across() with dplyr functions
The map() family of functions
Iterating over a one-dimensional vector
Iteration over subgroups
Simulation
Extended example: Factors associated with BMI
Further resources
Exercises
Supplementary exercises
8. Data Science Ethics
Introduction
Truthful falsehoods
Role of data science in society
Some settings for professional ethics
Some principles to guide ethical action
Algorithmic bias
Data and disclosure
Reproducibility
Ethics, collectively
Professional guidelines for ethical conduct
Further resources
Exercises
Supplementary exercises
II Part II: Statistics and Modeling
9. Statistical foundations
Samples and populations
Sample statistics
The bootstrap
Outliers
Statistical models: Explaining variation
Confounding and accounting for other factors
The perils of p-values
Further resources
Exercises
Supplementary exercises
10. Predictive modeling
Predictive modeling
Simple classification models
Evaluating models
Extended example: Who has diabetes?
Further resources
Exercises
Supplementary exercises
11. Supervised learning
Non-regression classifiers
Parameter tuning
Example: Evaluation of income models redux
Extended example: Who has diabetes this time?
Regularization
Further resources
Exercises
Supplementary exercises
12. Unsupervised learning
Clustering
Dimension reduction
Further resources
Exercises
Supplementary exercises
13. Simulation
Reasoning in reverse
Extended example: Grouping cancers
Randomizing functions
Simulating variability
Random networks
Key principles of simulation
Further resources
Exercises
Supplementary exercises
III Part III: Topics in Data Science
14. Dynamic and customized data graphics
Rich Web content using Djs and htmlwidget…