Data mining is well on its way to becoming a recognized discipline in the overlapping areas of IT, statistics, machine learning, and AI. Practical Data Mining for Business presents a user-friendly approach to data mining methods, covering the typical uses to which it is applied. The methodology is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. The book is formatted to allow statisticians, computer scientists, and economists to cross-reference from a particular application or method to sectors of interest.
Autorentext
Andrea Ahlemeyer-Stubbe, Director Strategic Analytics, DRAFTFCB München GmbH, Germany
Shirley Coleman, Principal Statistician, Industrial Statistics Research Unit, School of Maths and Statistics, Newcastle University, UK
Klappentext
A Practical Guide to Data Mining for Business and Industry
A Practical Guide to Data Mining for Business and Industry presents a user friendly approach to data mining methods and provides a solid foundation for their application. The methodology presented is complemented by case studies to create a versatile reference book, allowing readers to look for specific methods as well as for specific applications. This book is designed so that the reader can cross-reference a particular application or method to sectors of interest. The necessary basic knowledge of data mining methods is also presented, along with sector issues relating to data mining and its various applications.
A Practical Guide to Data Mining for Business and Industry:
- Equips readers with a solid foundation to both data mining and its applications
- Provides tried and tested guidance in finding workable solutions to typical business problems
- Offers solution patterns for common business problems that can be adapted by the reader to their particular areas of interest
- Focuses on practical solutions whilst providing grounding in statistical practice
- Explores data mining in a sales and marketing context, as well as quality management and medicine
- Is supported by a supplementary website (www.wiley.com/go/data_mining) featuring datasets and solutions
Aimed at statisticians, computer scientists and economists involved in data mining as well as students studying economics, business administration and international marketing.
Inhalt
Glossary of terms xii
Part I Data Mining Concept 1
1 Introduction 3
1.1 Aims of the Book 3
1.2 Data Mining Context 5
1.2.1 Domain Knowledge 6
1.2.2 Words to Remember 7
1.2.3 Associated Concepts 7
1.3 Global Appeal 8
1.4 Example Datasets Used in This Book 8
1.5 Recipe Structure 11
1.6 Further Reading and Resources 13
2 Data Mining Definition 14
2.1 Types of Data Mining Questions 15
2.1.1 Population and Sample 15
2.1.2 Data Preparation 16
2.1.3 Supervised and Unsupervised Methods 16
2.1.4 Knowledge-Discovery Techniques 18
2.2 Data Mining Process 19
2.3 Business Task: Clarification of the Business Question behind the Problem 20
2.4 Data: Provision and Processing of the Required Data 21
2.4.1 Fixing the Analysis Period 22
2.4.2 Basic Unit of Interest 23
2.4.3 Target Variables 24
2.4.4 Input Variables/Explanatory Variables 24
2.5 Modelling: Analysis of the Data 25
2.6 Evaluation and Validation during the Analysis Stage 25
2.7 Application of Data Mining Results and Learning from the Experience 28
Part II Data Mining Practicalities 31
3 All about data 33
3.1 Some Basics 34
3.1.1 Data, Information, Knowledge and Wisdom 35
3.1.2 Sources and Quality of Data 36
3.1.3 Measurement Level and Types of Data 37
3.1.4 Measures of Magnitude and Dispersion 39
3.1.5 Data Distributions 41
3.2 Data Partition: Random Samples for Training, Testing and Validation 41
3.3 Types of Business Information Systems 44
3.3.1 Operational Systems Supporting Business Processes 44
3.3.2 Analysis-Based Information Systems 45
3.3.3 Importance of Information 45
3.4 Data Warehouses 47
3.4.1 Topic Orientation 47
3.4.2 Logical Integration and Homogenisation 48
3.4.3 Reference Period 48
3.4.4 Low Volatility 48
3.4.5 Using the Data Warehouse 49
3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 50
3.5.1 Database Management System (DBMS) 51
3.5.2 Database (DB) 51
3.5.3 Database Communication Systems (DBCS) 51
3.6 Data Marts 52
3.6.1 Regularly Filled Data Marts 53
3.6.2 Comparison between Data Marts and Data Warehouses 53
3.7 A Typical Example from the Online Marketing Area 54
3.8 Unique Data Marts 54
3.8.1 Permanent Data Marts 54
3.8.2 Data Marts Resulting from Complex Analysis 56
3.9 Data Mart: Do's and Don'ts 58
3.9.1 Do's and Don'ts for Processes 58
3.9.2 Do's and Don'ts for Handling 58
3.9.3 Do's and Don'ts for Coding/Programming 59
4 Data Preparation 60
4.1 Necessity of Data Preparation 61
4.2 From Small and Long to Short and Wide 61
4.3 Transformation of Variables 65
4.4 Missing Data and Imputation Strategies 66
4.5 Outliers 69
4.6 Dealing with the Vagaries of Data 70
4.6.1 Distributions 70
4.6.2 Tests for Normality 70
4.6.3 Data with Totally Different Scales 70
4.7 Adjusting the Data Distributions 71
4.7.1 Standardisation and Normalisation 71
4.7.2 Ranking 71
4.7.3 BoxCox Transformation 71
4.8 Binning 72
4.8.1 Bucket Method 73
4.8.2 Analytical Binning for Nominal Variables 73
4.8.3 Quantiles 73
4.8.4 Binning in Practice 74
4.9 Timing Considerations 77
4.10 Operational Issues 77
5 Analytics 78
5.1 Introduction 79
5.2 Basis of Statistical Tests 80
5.2.1 Hypothesis Tests and P Values 80
5.2.2 Tolerance Intervals 82
5.2.3 Standard Errors and...