2. Big Data is Changing
Traditional Data
Warehousing
… data warehousing has reached the
most significant tipping point since
its inception. The biggest, possibly
most elaborate data management
system in IT is changing.
– Gartner, “The State of Data Warehousing”*
* Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)
Data Sources
ETL
Data Warehouse
BI and Analytics
3. Big Data is Driving Transformative Changes
Traditional Big Data
Relational Data
with highly modeled schema
All Data
with schema agility
Specialized Hardware Commodity Hardware
Data
characteristics
Costs
Culture
Operational Reporting
Focus on rear-view analysis
Experimentation leading
to intelligent action
With machine learning, graph, a/b testing
4. Big Data Introduces a
Culture of Experimentation
Tangerine instantly adapts to customer feedback to
offer customers what they want, when they want it
“I can see us…creating predictive, context-aware financial
services applications that give information based on
the time and where the customer is.”
Billy Lo
Head of Enterprise Architecture
Scenario
Lack of insight for targeted campaigns
Inability to support data growth
Solution
Azure HDInsight (Hadoop-as-a-service) with the Analytics
Platform System (APS) enables instant analysis of social
sentiment and customer feedback across digital, face-to-
face and phone.
Result
• Reduced time to customer insight
• Ability to make changes to campaigns or adjust product
rollouts based on real-time customer reactions
• Ability to offer incentives and new services to retain—
and grow—its customer base
5. However, there are challenges to Big Data…
Obtaining skills
and capabilities
Determining how
to get value
Integrating with
existing IT investments
*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
6. But, Microsoft has done it before
We needed to better leverage data and analytics to do
more experimentation
So we:
• Designed a data lake for everyone to put their data into
• Built tools approachable by any developer
• Created machine learning tools for collaborating
across large experiment models
Result:
• Across Microsoft, ten thousand developers doing
experimentation leading to better insights
• Leading to growth in our Microsoft businesses:
• Office productivity revenue (45%YoY)*
• Intelligent Cloud (100% YoY)*
• Bing search share doubles
2010 2011 2012 2013 2014 2015
Growth of data @ Microsoft
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
PetabytesExabytes
* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast
7. Microsoft is now taking
everything we’ve
learned on this journey
and bringing it to our
customers
Technology. Cost. Culture.
8. Big Data as a Cornerstone of Cortana Intelligence
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop / Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
9. Azure
HDInsight
Hadoop and Spark
as a Service on Azure
Fully-managed Hadoop and Spark
for the cloud
100% Open Source Hortonworks
data platform
Clusters up and running in minutes
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
10. Comprehensive Set of Managed Apache Big Data Projects
• Scale to petabytes on demand
• Process unstructured and semi-structured data
• Develop in Java, .NET, and more
• Skip buying and maintaining hardware
• Deploy in Windows or Linux
• Spin up an Apache Hadoop cluster in minutes
• Visualize your Hadoop data in Excel
• Easily integrate on-premises Hadoop clusters
Core Engine
Batch
Map Reduce
Script
Pig
SQL
Hive
NoSQL
HBase
Streaming
Storm
In-Memory
Spark
11. Azure
Data Lake Store
A Hyper-Scale
Repository for Big Data
Analytics Workloads
Hadoop File System (HDFS) for the cloud
No limits to scale
Store any data in its native format
Enterprise-grade access control,
encryption at rest
Optimized for analytic workload performance
12. Azure Data Lake Store
Distributed, parallel file system in
the cloud
Performance-tuned and optimized
for analytics
No fixed size limits
Stores all data types
Highly available with local & geo
redundant storage
WebHDFS REST API
Supported by leading
Hadoop distros
Role-based security
Low latency and high throughput
workloads
YARN
HDFS
HDInsightAnalytics
Service
Store
U-SQL
Clickstream
Sensors
Video
Social
Web
Devices
Relational
Applications
13. Azure
Data Lake Analytics
A new distributed
analytics service
Distributed analytics service built on
Apache YARN
Elastic scale per query lets users focus on
business goals—not configuring hardware
Includes U-SQL—a language that unifies the
benefits of SQL with the expressive
power of C#
Integrates with Visual Studio to develop,
debug, and tune code faster
Federated query across Azure data sources
Enterprise-grade role based access control
14. Typical Azure Big Data Architecture
Azure
API
Management
Backend Services
Data
sources
Apps
Sensors
and
devices
Event Hubs
Machine Learning
HDInsight
(Apache Spark)
Storage
Power BIStream Analytics
SQL Data Warehouse
Azure Data Factory & Azure Data Catalog
15. Highest availability
guarantee in the industry
for peace of mind
• Managed, monitored and
supported by Microsoft
• Enterprise-leading SLA—
99.9% uptime
• No IT resources needed for
upgrades and patching
• Microsoft monitors your
deployment so you don’t
have to
*Applies to HDInsight only
99.9% SLA
16. Runs in the Most Datacenters Worldwide
Azure doubling compute
and storage every 6 months
*Applies to HDInsight only
Central US
Iowa
West US
California
East US
Virginia
North Central US
Illinois
South Central US
Texas
Brazil South
Sao Paulo State
West Europe
Netherlands
China North*
Beijing
China South*
Shanghai
Japan East
Tokyo, Saitama
Japan West
Osaka
East Asia
Hong Kong
SE Asia
Singapore
Australia South East
Victoria
Australia East
New South Wales
India Central
Pune
North Europe
Ireland
East US 2
Virginia
17. Lower Total Cost
of Ownership
• No hardware
• Hadoop support included with
Azure support
• Pay only for what you use
• Independently scale storage
and compute
• No need to hire specialized
operations team
• 63% lower total cost of
ownership than on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud
with Microsoft Azure HDInsight”
18. Recognized by
Top Analysts
Forrester Wave for Big Data
Hadoop Cloud
• Named industry leader by
Forrester with the most
comprehensive, scalable, and
integrated platforms*
• Recognized for its cloud-first
strategy that is paying off*
*The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
19. Microsoft Data
Science Summit
Get hands-on with the latest cutting edge technologies
with Big Data, Machine Learning and Open Source
at the Microsoft Data Science Summit.
Hear from thought leaders, data scientists, engineers and
customers solving real world problems, make expert connections
to help you put these technologies to work for your business.
September 26-27, 2016
Atlanta, GA
Register Now!
aka.ms/microsoftdatasciencesummit
Target audience
• Data Scientists
• Big Data Engineers
• Machine Learning Practitioners/Engineers
• Data Science/Engineering Managers
Why attend
Readiness with architectural guidance &
hands-on training to operationalize
solutions at scale
Real world examples with how to apply
machine learning & data science techniques
to your business
Networking with the experts and the
community to bring your data to life