Tap the power of Big Data with Microsoft technologies
Big Data is here, and Microsoft's new Big Data platform is a
valuable tool to help your company get the very most out of it.
This timely book shows you how to use HDInsight along with
HortonWorks Data Platform for Windows to store, manage, analyze,
and share Big Data throughout the enterprise. Focusing primarily on
Microsoft and HortonWorks technologies but also covering open
source tools, Microsoft Big Data Solutions explains best
practices, covers on-premises and cloud-based solutions, and
features valuable case studies.
Best of all, it helps you integrate these new solutions with
technologies you already know, such as SQL Server and Hadoop.
* Walks you through how to integrate Big Data solutions in your
company using Microsoft's HDInsight Server, HortonWorks Data
Platform for Windows, and open source tools
* Explores both on-premises and cloud-based solutions
* Shows how to store, manage, analyze, and share Big Data through
the enterprise
* Covers topics such as Microsoft's approach to Big Data,
installing and configuring HortonWorks Data Platform for Windows,
integrating Big Data with SQL Server, visualizing data with
Microsoft and HortonWorks BI tools, and more
* Helps you build and execute a Big Data plan
* Includes contributions from the Microsoft and HortonWorks Big
Data product teams
If you need a detailed roadmap for designing and implementing a
fully deployed Big Data solution, you'll want Microsoft Big Data
Solutions.
Autorentext
Adam Jorgensen is the President of Pragmatic Works and
the Executive Vice President of PASS. He has extensive experience
with data warehousing, analytics, and NoSQL architectures.
James Rowland-Jones is a principal consultant for The Big
Bang Data Company. He specializes in big data warehouse solutions
that leverage SQL Server Parallel Data Warehouse and Hadoop
ecosystems.
John Welch is Vice President of Software Development at
Pragmatic Works, where he leads the development of a suite of BI
and data products for SQL Server and related technologies.
Dan Clark is a senior BI consultant for Pragmatic Works.
Dan has published several books and numerous articles on .NET
programming and BI development.
Christopher Price is a senior consultant with Microsoft.
His focus is on ETL, data integration, data quality, MDM, SSAS,
SharePoint, and all things big data.
Brian Mitchell is the lead architect of the Microsoft Big
Data Center of Expertise. He focuses exclusively on DW/BI
solutions.
Zusammenfassung
Tap the power of Big Data with Microsoft technologies
Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies.
Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop.
- Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools
- Explores both on-premises and cloud-based solutions
- Shows how to store, manage, analyze, and share Big Data through the enterprise
- Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more
- Helps you build and execute a Big Data plan
- Includes contributions from the Microsoft and HortonWorks Big Data product teams
If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.
Inhalt
Introduction xv
Part I What Is Big Data? 1
Chapter 1 Industry Needs and Solutions 3
What's So Big About Big Data? 4
A Brief History of Hadoop 5
Google 5
Nutch 6
What Is Hadoop? 6
Derivative Works and Distributions 7
Hadoop Distributions 8
Core Hadoop Ecosystem 9
Important Apache Projects for Hadoop 11
The Future for Hadoop 17
Summary 17
Chapter 2 Microsoft's Approach to Big Data 19
A Story of Better Together 19
Competition in the Ecosystem 20
SQL on Hadoop Today 21
Hortonworks and Stinger 21
Cloudera and Impala 23
Microsoft's Contribution to SQL in Hadoop 25
Deploying Hadoop 25
Deployment Factors 26
Deployment Topologies 29
Deployment Scorecard 33
Summary 36
Part II Setting Up for Big Data with Microsoft 37
Chapter 3 Configuring Your First Big Data Environment 39
Getting Started 39
Getting the Install 40
Running the Installation 40
On-Premise Installation: Single-Node Installation 41
HDInsight Service: Installing in the Cloud 51
Windows Azure Storage Explorer Options 52
Validating Your New Cluster 55
Logging into HDInsight Service 55
Verify HDP Functionality in the Logs 57
Common Post-Setup Tasks 58
Loading Your First Files 58
Verifying Hive and Pig 60
Summary 63
Part III Storing and Managing Big Data 65
Chapter 4 HDFS, Hive, HBase, and HCatalog 67
Exploring the Hadoop Distributed File System 68
Explaining the HDFS Architecture 69
Interacting with HDFS 72
Exploring Hive: The Hadoop Data Warehouse Platform 75
Designing, Building, and Loading Tables 76
Querying Data 77
Configuring the Hive ODBC Driver 77
Exploring HCatalog: HDFS Table and Metadata Management 78
Exploring HBase: An HDFS Column-Oriented Database 80
Columnar Databases 81
Defining and Populating an HBase Table 82
Using Query Operations 83
Summary 84
Chapter 5 Storing and Managing Data in HDFS 85
Understanding the Fundamentals of HDFS 86
HDFS Architecture 87
NameNodes and DataNodes 89
Data Replication 90
Using Common Commands to Interact with HDFS 92
Interfaces for Working with HDFS 92
File Manipulation Commands 94
Administrative Functions in HDFS 97
Moving and Organizing Data in HDFS 100
Moving Data in HDFS 100
Implementing Data Structures for Easier Management 101
Rebalancing Data 102
Summary 103
Chapter 6 Adding Structure with Hive 105
Understanding Hive's Purpose and Role 106
Providing Structure for Unstructured Data 107
Enabling Data Access and Transformation 114
Differentiating Hive from Traditional RDBMS Systems 115
Working with Hive 116
Creating and Querying Basic Tables 117
Creating Databases 117
Creating Tables 118
Adding and Deleting Data 121
Querying a Table 123
Using Advanced Data Structures with Hive 126
Setting Up Partitioned Tables 126
Loading Partitioned Tables 128
Using Views 129
Creating Indexes for Tables 130
Summary 131
Chapter 7 Expanding Your Capability with HBase and HCatalog 133
Using HBase 134
Creating HBase Tables 134
Loading Data into an HBase Table 136…