Hybrid Modern Data Architecture
with Microsoft and Apache Hadoop

© Hortonworks Inc. 2014
Your Presenters
• Oliver Chiu (twitter name )
– Title
– Years of experience
– Fun Fact

• John Kreisa (@marked_man)
– VP Strategic Marketing, Hortonworks
– Over 20 years in data management as a
developer and a marketer
– Avid camper
Poll 1: What stage are you looking in Hadoop
• Research
• Evaluation
• Trial
• Haven’t started research
Today’s Topics
• Introduction
• What is a Hybrid Modern Data Architecture (MDA)?
• Apache Hadoop in the Hybrid MDA
• The Hybrid MDA and Microsoft
• Q&A
DATA	
  	
  SYSTEM	
  

APPLICATIO
NS	
  

Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014
APPLICATIONS	
  

Modern Data Architecture Enabled
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  
Hadoop Powers Modern Data Architecture
Hadoop Cluster
compute
&
storage

.

.

.

.

.

.

.

compute
&
storage

.

.

.

Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware

Apache Hadoop is an open source project
governed by the Apache Software Foundation
(ASF) that allows you to gain insight from massive
amounts of structured and unstructured data
quickly and without significant investment.
3

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role in the
Modern Data Architecture

Integrated

Interoperable with
existing data center
investments

Key Services
Skills
Leverage your existing skills:
development, operations,
analytics

Platform, operational and
data services essential for
the enterprise
Use Cases for the MDA
Industry

Use Case
New Account Risk Screens

Infrastructure Investment

Government

Server Logs, Text, Social
Clickstream, Text

Localized, Personalized Promotions

Geographic
Clickstream
Sensor

Assembly Line Quality Assurance

Sensor

Crowdsourced Quality Assurance

Oil & Gas

Machine, Server Logs

Supply Chain and Logistics

Pharmaceuticals

Machine, Geographic

Website Optimization

Healthcare

Geographic, Sensor, Text

360° View of the Customer

Manufacturing

Server Logs

Real-time Bandwidth Allocation

Retail

Trading Risk

Call Detail Records (CDRs)

Telecom

Text, Server Logs

Insurance Underwriting

Social

Use Genomic Data in Medical Trials

Structured

Monitor Patient Vitals in Real-Time

Sensor

Recruit and Retain Patients for Drug Trials

Social, Clickstream

Improve Prescription Adherence

Social, Unstructured, Geographic

Unify Exploration & Production Data

Sensor, Geographic & Unstructured

Monitor Rig Safety in Real-Time

Sensor, Unstructured

ETL Offload in Response to Federal Budgetary Pressures

Financial Services

Type of Data

Structured

Sentiment Analysis for Government Programs
© Hortonworks Inc. 2013

Social

Page 9
New!
Power BI
Public Preview

DEV	
  &	
  DATA	
  TOOLS	
  

Microsoft Applications

DATA	
  	
  SYSTEM	
  

APPLICATIONS	
  

Microsoft in the Modern Data Architecture

OPERATIONAL	
  TOOLS	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  
Today’s Topics
• Introduction
• What is a Hybrid Modern Data Architecture (MDA)?
• Apache Hadoop in the Hybrid MDA
• The Hybrid MDA and Microsoft
• Q&A
Hortonworks and Microsoft

Engineering alignment
Corporate alignment
Field Alignment
End-to-End Data Platform

SQL Server

PDW

SQL Server for
DW in Azure

Hortonworks
Data Platform

PDW vNext
(PDW +
HDInsight)

Windows Azure
HDInsight
Hadoop Solutions From Microsoft

Hortonworks Data Platform

PDW vNext
(PDW + HDInsight)

Windows Azure
HDInsight
Hortonworks Data Platform for Windows

Hortonworks Data Platform
Parallel Data Warehouse Next w/ HDInsight

PDW vNext
(PDW + HDInsight)
Select
…

Result
Set

PolyBase

Hadoop
Data
Microsoft Confidential

Relatio
nal
Data

17
Scale out technologies in SQL Server Parallel Data Warehouse

18
Windows Azure HDInsight

Windows Azure
HDInsight
Master Chief meets
Big Data
§  In-game analysis detects cheaters
and improves experience for
everyone
§  Enables targeted campaigns that
improve customer retention
Hadoop Solutions From Microsoft

Hortonworks Data Platform

PDW vNext
(PDW + HDInsight)

Windows Azure
HDInsight
Hortonworks & Microsoft
Reference
Architecture

Management and Monitoring

Development and Data Tools

SOURCE DATA
Query/Visualization/
Reporting/Analytics

AMBARI
Databases

DATA SERVICES

HBASE

Files

LOAD
Servers &
Mainframe

PIG

HCATALOG

MAPREDUCE

SQOOP

JDBC

TEZ

HADOOP
Data Services
INTERFACE
Governance

HDFS
SQOOP
Java RPC

FLUME
Web HDFS

Sensor data

ODBC

YARN

JMS Queue’s
Social

HIVE

Exchange

JAVA RPC

Replication

Enterprise
Repositories
More about Microsoft and Hortonworks
http://hortonworks.com/labs/Microsoft

Get started with Hortonworks Sandbox
http://hortonworks.com/hadoop-tutorial/partner-tutorial-microsoft/

Follow us:
@hortonworks @MicrosoftBI

Question & Answer session will be conducted electronically,
using the panel to the right of your screen

Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data

  • 1.
    Hybrid Modern DataArchitecture with Microsoft and Apache Hadoop © Hortonworks Inc. 2014
  • 2.
    Your Presenters • Oliver Chiu(twitter name ) – Title – Years of experience – Fun Fact • John Kreisa (@marked_man) – VP Strategic Marketing, Hortonworks – Over 20 years in data management as a developer and a marketer – Avid camper
  • 3.
    Poll 1: Whatstage are you looking in Hadoop • Research • Evaluation • Trial • Haven’t started research
  • 4.
    Today’s Topics • Introduction • What isa Hybrid Modern Data Architecture (MDA)? • Apache Hadoop in the Hybrid MDA • The Hybrid MDA and Microsoft • Q&A
  • 5.
    DATA    SYSTEM   APPLICATIO NS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014
  • 6.
    APPLICATIONS   Modern DataArchitecture Enabled Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &  TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)  
  • 7.
    Hadoop Powers ModernData Architecture Hadoop Cluster compute & storage . . . . . . . compute & storage . . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.
  • 8.
    3 Requirements for HadoopAdoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Leverage your existing skills: development, operations, analytics Platform, operational and data services essential for the enterprise
  • 9.
    Use Cases forthe MDA Industry Use Case New Account Risk Screens Infrastructure Investment Government Server Logs, Text, Social Clickstream, Text Localized, Personalized Promotions Geographic Clickstream Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Oil & Gas Machine, Server Logs Supply Chain and Logistics Pharmaceuticals Machine, Geographic Website Optimization Healthcare Geographic, Sensor, Text 360° View of the Customer Manufacturing Server Logs Real-time Bandwidth Allocation Retail Trading Risk Call Detail Records (CDRs) Telecom Text, Server Logs Insurance Underwriting Social Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time Sensor, Unstructured ETL Offload in Response to Federal Budgetary Pressures Financial Services Type of Data Structured Sentiment Analysis for Government Programs © Hortonworks Inc. 2013 Social Page 9
  • 10.
    New! Power BI Public Preview DEV  &  DATA  TOOLS   Microsoft Applications DATA    SYSTEM   APPLICATIONS   Microsoft in the Modern Data Architecture OPERATIONAL  TOOLS   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)  
  • 11.
    Today’s Topics • Introduction • What isa Hybrid Modern Data Architecture (MDA)? • Apache Hadoop in the Hybrid MDA • The Hybrid MDA and Microsoft • Q&A
  • 12.
    Hortonworks and Microsoft Engineeringalignment Corporate alignment Field Alignment
  • 13.
    End-to-End Data Platform SQLServer PDW SQL Server for DW in Azure Hortonworks Data Platform PDW vNext (PDW + HDInsight) Windows Azure HDInsight
  • 14.
    Hadoop Solutions FromMicrosoft Hortonworks Data Platform PDW vNext (PDW + HDInsight) Windows Azure HDInsight
  • 15.
    Hortonworks Data Platformfor Windows Hortonworks Data Platform
  • 16.
    Parallel Data WarehouseNext w/ HDInsight PDW vNext (PDW + HDInsight)
  • 17.
  • 18.
    Scale out technologiesin SQL Server Parallel Data Warehouse 18
  • 20.
  • 21.
    Master Chief meets BigData §  In-game analysis detects cheaters and improves experience for everyone §  Enables targeted campaigns that improve customer retention
  • 23.
    Hadoop Solutions FromMicrosoft Hortonworks Data Platform PDW vNext (PDW + HDInsight) Windows Azure HDInsight
  • 24.
    Hortonworks & Microsoft Reference Architecture Managementand Monitoring Development and Data Tools SOURCE DATA Query/Visualization/ Reporting/Analytics AMBARI Databases DATA SERVICES HBASE Files LOAD Servers & Mainframe PIG HCATALOG MAPREDUCE SQOOP JDBC TEZ HADOOP Data Services INTERFACE Governance HDFS SQOOP Java RPC FLUME Web HDFS Sensor data ODBC YARN JMS Queue’s Social HIVE Exchange JAVA RPC Replication Enterprise Repositories
  • 27.
    More about Microsoftand Hortonworks http://hortonworks.com/labs/Microsoft Get started with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/partner-tutorial-microsoft/ Follow us: @hortonworks @MicrosoftBI Question & Answer session will be conducted electronically, using the panel to the right of your screen