In order to carry out data analytics, we need powerful and flexible computing software. However the software available for data analytics is often proprietary and can be expensive. This book reviews Apache tools, which are open source and easy to use. After providing an overview of the background of data analytics, covering the different types of analysis and the basics of using Hadoop as a tool, it focuses on different Hadoop ecosystem tools, like Apache Flume, Apache Spark, Apache Storm, Apache Hive, R, and Python, which can be used for different types of analysis. It then examines the different machine learning techniques that are useful for data analytics, and how to visualize data with different graphs and charts.
Autorentext
Inhalt
Part I: Data Analytics and Hadoop
Introduction to Data Analytics
Introduction to Hadoop
Data Analytics with Map Reduce
Part II: Tools for Data Analytics
Apache Pig
Apache Hive
Apache Spark
Apache Flume
Apache Storm
Python
R
Part III: Machine Learning for Data Analytics
Basics of Machine Learning
Linear Regression
Logistic Regression
Machine Learning on Spark
Part IV: Exploring and Visualizing Data
Introduction to Visualization
Principles of Data VisualizationVisualization Charts
Popular Visualization Tools
Data Visualization with Hadoop
Part V: Case Studies
Product Recommendation
Market Basket Analysis