Tap the power of Big Data with Microsoft technologies

Big Data is here, and Microsoft's new Big Data platform is a
valuable tool to help your company get the very most out of it.
This timely book shows you how to use HDInsight along with
HortonWorks Data Platform for Windows to store, manage, analyze,
and share Big Data throughout the enterprise. Focusing primarily on
Microsoft and HortonWorks technologies but also covering open
source tools, Microsoft Big Data Solutions explains best
practices, covers on-premises and cloud-based solutions, and
features valuable case studies.

Best of all, it helps you integrate these new solutions with
technologies you already know, such as SQL Server and Hadoop.

* Walks you through how to integrate Big Data solutions in your
company using Microsoft's HDInsight Server, HortonWorks Data
Platform for Windows, and open source tools

* Explores both on-premises and cloud-based solutions

* Shows how to store, manage, analyze, and share Big Data through
the enterprise

* Covers topics such as Microsoft's approach to Big Data,
installing and configuring HortonWorks Data Platform for Windows,
integrating Big Data with SQL Server, visualizing data with
Microsoft and HortonWorks BI tools, and more

* Helps you build and execute a Big Data plan

* Includes contributions from the Microsoft and HortonWorks Big
Data product teams

If you need a detailed roadmap for designing and implementing a
fully deployed Big Data solution, you'll want Microsoft Big Data
Solutions.



Autorentext

Adam Jorgensen is the President of Pragmatic Works and
the Executive Vice President of PASS. He has extensive experience
with data warehousing, analytics, and NoSQL architectures.

James Rowland-Jones is a principal consultant for The Big
Bang Data Company. He specializes in big data warehouse solutions
that leverage SQL Server Parallel Data Warehouse and Hadoop
ecosystems.

John Welch is Vice President of Software Development at
Pragmatic Works, where he leads the development of a suite of BI
and data products for SQL Server and related technologies.

Dan Clark is a senior BI consultant for Pragmatic Works.
Dan has published several books and numerous articles on .NET
programming and BI development.

Christopher Price is a senior consultant with Microsoft.
His focus is on ETL, data integration, data quality, MDM, SSAS,
SharePoint, and all things big data.

Brian Mitchell is the lead architect of the Microsoft Big
Data Center of Expertise. He focuses exclusively on DW/BI
solutions.



Zusammenfassung

Tap the power of Big Data with Microsoft technologies

Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies.

Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop.

  • Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools
  • Explores both on-premises and cloud-based solutions
  • Shows how to store, manage, analyze, and share Big Data through the enterprise
  • Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more
  • Helps you build and execute a Big Data plan
  • Includes contributions from the Microsoft and HortonWorks Big Data product teams

If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.



Inhalt
Introduction xv

Part I What Is Big Data? 1

Chapter 1 Industry Needs and Solutions 3

What's So Big About Big Data? 4

A Brief History of Hadoop 5

Google 5

Nutch 6

What Is Hadoop? 6

Derivative Works and Distributions 7

Hadoop Distributions 8

Core Hadoop Ecosystem 9

Important Apache Projects for Hadoop 11

The Future for Hadoop 17

Summary 17

Chapter 2 Microsoft's Approach to Big Data 19

A Story of Better Together 19

Competition in the Ecosystem 20

SQL on Hadoop Today 21

Hortonworks and Stinger 21

Cloudera and Impala 23

Microsoft's Contribution to SQL in Hadoop 25

Deploying Hadoop 25

Deployment Factors 26

Deployment Topologies 29

Deployment Scorecard 33

Summary 36

Part II Setting Up for Big Data with Microsoft 37

Chapter 3 Configuring Your First Big Data Environment 39

Getting Started 39

Getting the Install 40

Running the Installation 40

On-Premise Installation: Single-Node Installation 41

HDInsight Service: Installing in the Cloud 51

Windows Azure Storage Explorer Options 52

Validating Your New Cluster 55

Logging into HDInsight Service 55

Verify HDP Functionality in the Logs 57

Common Post-Setup Tasks 58

Loading Your First Files 58

Verifying Hive and Pig 60

Summary 63

Part III Storing and Managing Big Data 65

Chapter 4 HDFS, Hive, HBase, and HCatalog 67

Exploring the Hadoop Distributed File System 68

Explaining the HDFS Architecture 69

Interacting with HDFS 72

Exploring Hive: The Hadoop Data Warehouse Platform 75

Designing, Building, and Loading Tables 76

Querying Data 77

Configuring the Hive ODBC Driver 77

Exploring HCatalog: HDFS Table and Metadata Management 78

Exploring HBase: An HDFS Column-Oriented Database 80

Columnar Databases 81

Defining and Populating an HBase Table 82

Using Query Operations 83

Summary 84

Chapter 5 Storing and Managing Data in HDFS 85

Understanding the Fundamentals of HDFS 86

HDFS Architecture 87

NameNodes and DataNodes 89

Data Replication 90

Using Common Commands to Interact with HDFS 92

Interfaces for Working with HDFS 92

File Manipulation Commands 94

Administrative Functions in HDFS 97

Moving and Organizing Data in HDFS 100

Moving Data in HDFS 100

Implementing Data Structures for Easier Management 101

Rebalancing Data 102

Summary 103

Chapter 6 Adding Structure with Hive 105

Understanding Hive's Purpose and Role 106

Providing Structure for Unstructured Data 107

Enabling Data Access and Transformation 114

Differentiating Hive from Traditional RDBMS Systems 115

Working with Hive 116

Creating and Querying Basic Tables 117

Creating Databases 117

Creating Tables 118

Adding and Deleting Data 121

Querying a Table 123

Using Advanced Data Structures with Hive 126

Setting Up Partitioned Tables 126

Loading Partitioned Tables 128

Using Views 129

Creating Indexes for Tables 130

Summary 131

Chapter 7 Expanding Your Capability with HBase and HCatalog 133

Using HBase 134

Creating HBase Tables 134

Loading Data into an HBase Table 136…

Titel
Microsoft Big Data Solutions
EAN
9781118742099
ISBN
978-1-118-74209-9
Format
E-Book (pdf)
Hersteller
Herausgeber
Veröffentlichung
19.02.2014
Digitaler Kopierschutz
Adobe-DRM
Dateigrösse
15.06 MB
Anzahl Seiten
408
Jahr
2014
Untertitel
Englisch