SlideShare a Scribd company logo
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
Why Deploy To the Cloud?
Microsoft’s Solution
How Do I Get Started?
What if you could handle big data?
Introducing Apache Hadoop
Data volume
Data variety
Data velocity
Hadoop is a platform with portfolio of projects
A Hadoop distribution is a package of projects
Business applications of Hadoop
New analytic applications from new data
Main differences vs RDBMS/NoSQL
Pros
• Not a type of database, but rather a open-source software ecosystem that allows for massively
parallel computing
• No inherent structure (no conversion to relational or JSON needed)
• Good for batch processing, large files, volume writes, parallel scans, sequential access
• Great for large, distributed data processing tasks where time isn’t a constraint (i.e. end-of-day
reports, scanning months of historical data)
• Tradeoff: In order to make deep connections between many data points, the technology sacrifices
speed
• Some NoSQL databases such as HBase are built on top of HDFS
Main differences vs RDBMS/NoSQL
Cons
• File system, not a database
• Not good for millions of users, random access, fast individual record lookups or updates (OLTP)
• Not so great for real-time analytics
• Lacks: indexing, metadata layer, query optimizer, memory management
• Same cons at non-relational: no ACID support, data integrity, limited indexing, weak SQL, etc
• Security limitations
What Is Hadoop?
Microsoft’s Solution
How Do I Get Started?
Challenges with implementing Hadoop
Why Cloud + Big Data?
Speed Scale Economics
Always Up,
Always On
Open and flexibleTime to value
Data of all Volume,
Variety, Velocity
Massive Compute
and Storage
Deployment
expertise
Why Hadoop in the Cloud?
Scenarios For Deploying Hadoop As Hybrid
What Is Hadoop?
Why Deploy To the Cloud?
Microsoft’s Solution
How Do I Get Started?
Introducing Azure HDInsight
Microsoft contributions to Hadoop
Microsoft + Hortonworks
HDInsight Built for Windows or Linux
HDInsight Supports Hive
HDInsight Supports HBase
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMaster
Coordination
Region Server Region Server Region Server Region Server
HDInsight Supports Mahout
HDInsight Supports Storm
Stream
processin
g
Search and query
Data analytics (Excel)
Web/thick client
dashboards
Devices to take action
RabbitMQ /
ActiveMQ
Spark for Azure HDInsight
In Memory Processing on Multiple Workloads
Azure
HDInsight
In Memory
Spark
• Single execution model for multiple
tasks
• Processing up to 100x faster
performance
• Developer friendly (Java, Python, Scala)
• BI tool of choice (Power BI, Tabelau,
Qlik, SAP)
• Notebook experience (Jupyter/iPython,
Zeppelin)
HDInsight Allows You To Add Hadoop Projects
Easy for Developers
Easy for Data Scientists
Easy for Business Analysts
R Server for HDInsight
• Familiarity of R (most popular language for data
scientists)
• Scalability of Hadoop and Spark
• Up to 7x faster using Spark engine
• Train and run ML models on datasets of any size
• Cloud managed solution (easy setup, elastic,
SLA)
Introducing Azure HDInsight
Hyper scale Infrastructure is the enabler
32 Regions Worldwide, 24 Generally Available…
 100+ datacenters
 Top 3 networks in the world
 2.5x AWS, 7x Google DC Regions
 G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…
Operational
Announced/Not Operational
Central US
Iowa
West US
California
East US
Virginia
US Gov
Virginia
North Central US
Illinois
US Gov
Iowa
South Central US
Texas
Brazil South
Sao Paulo State
West Europe
Netherlands
China North *
Beijing
China South *
Shanghai
Japan East
Tokyo, Saitama
Japan West
Osaka
India South
Chennai
East Asia
Hong Kong
SE Asia
Singapore
Australia South East
Victoria
Australia East
New South Wales
India Central
Pune
Canada East
Quebec City
Canada Central
Toronto
India West
Mumbai
Germany North East **
Magdeburg
Germany Central **
Frankfurt
North Europe
Ireland
East US 2
Virginia
United Kingdom
RegionsUnited Kingdom
Regions
US DoD East
TBD
US DoD West
TBD
* Operated by 21Vianet ** Data Stewardship by Deutsche Telekom
Seoul
Korea (2)
Why Microsoft Azure?
Azure Storage
Azure Blob Storage
Azure Data Lake Store
No hardware challenges
Deployed in minutes
Mission Critical, Enterprise Ready
Maintenance done for you
Low Cost
$£€¥
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Introducing Azure HDInsight
Bringing Hadoop to a billion people
Making advanced analytics accessible to Hadoop
Cloud
HDInsight vs HDP on Azure VM
HDInsight HDP on Azure VM
PaaS (setup, scale, manage, patch, etc) IaaS
Managed by Microsoft Managed by customer
Storage separate (Blob or ADLS) Storage in VM (local disk), but can also
have storage in Azure blob or ADLS
Delete VM keeps data Delete VM deletes data (unless external)
Up to 30-days behind latest HDP version Latest HDP Version
Limited Hadoop projects Unlimited Hadoop projects
Microsoft supports VM and Hadoop Microsoft: VM, HDP: Hadoop
No on-prem version On-prem version
Distributed, parallel analytics
framework
U-SQL (based on C# and SQL)
Dial for scale
Hides infrastructure complexity
Visual Studio integration
Instant scale on demand
Reduced learning curve
Azure Data Lake Analytics
Azure Services for big data analytics
YARN
HDFS
HDInsightAnalytics
Service
Store
Partners
U-SQL
Clickstream
Sensors
Video
Social
Web
Devices
Relational
Applications
56
What Is Hadoop?
Why Deploy To the Cloud?
Microsoft’s Solution
Get Started
http://azure.microsoft.com/en-us/documentation/services/hdinsight/
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map/
http://www.microsoftvirtualacademy.com/training-courses/getting-started-with-microsoft-big-data
http://channel9.msdn.com/Shows/Data-Exposed
http://azure.microsoft.com/en-us/pricing/free-trial/
Azure getting started
• Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/
• Startups: BizSpark, $750/month free Azure, BizSpark Plus - $120k/year free Azure,
https://www.microsoft.com/bizspark/
• MSDN subscription, Data Platform MVP, $150/month free Azure, https://azure.microsoft.com/en-
us/pricing/member-offers/msdn-benefits/
• Microsoft Educator Grant Program, faculty - $250/month free Azure for a year, students -
$100/month free Azure for 6 months, https://azure.microsoft.com/en-us/pricing/member-
offers/msdn-benefits/
• Microsoft Azure for Research Grant, http://research.microsoft.com/en-us/projects/azure/default.aspx
• DreamSpark for students, https://www.dreamspark.com/Student/Default.aspx
• DreamSpark for academic institutions: https://www.dreamspark.com/Institution/Subscription.aspx
• Various Microsoft funds so you can learn the technologies or build a client solution
Pricing for HDInsight
CAPABILITIES STANDARD PREMIUM PREVIEW
Big Data Workloads
Standard Hadoop and Open Source Projects
(Core Hadoop & YARN, Hive & HCatalog, Tez,
Pig, Sqoop, Oozie, Zookeeper, Phoenix)
Columnar NoSQL (HBase)
Stream processing (Storm)
Interactive processing, real-time stream
processing & ML (Spark)
Big Data statistics predictive modeling, and
machine learning with R Server
Enterprise Readiness
Administration – manage, monitor &
troubleshoot clusters
Hadoop version upgrades and patching –
Automatic patching and upgrades
Encryption of data at rest
Price Standard price per Node HDInsight Standard Price + $0.02/Core-hour for each core
used in the cluster during preview (75% discount)
Resources
 What is HDInsight? http://bit.ly/1WpS0at
 Hadoop and Microsoft http://bit.ly/20Cg2hA
 Introduction to Hadoop http://bit.ly/1WpTstq
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the
“Presentations” link on the top menu)

More Related Content

What's hot

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Skillspeed
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
Amazon Web Services
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 

What's hot (20)

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 

Viewers also liked

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
James Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
James Serra
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
James Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
James Serra
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
James Serra
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
What is it like to work at Microsoft?
What is it like to work at Microsoft?What is it like to work at Microsoft?
What is it like to work at Microsoft?
James Serra
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
James Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
James Serra
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
James Serra
 

Viewers also liked (20)

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Implement SQL Server on an Azure VM
Implement SQL Server on an Azure VMImplement SQL Server on an Azure VM
Implement SQL Server on an Azure VM
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
What is it like to work at Microsoft?
What is it like to work at Microsoft?What is it like to work at Microsoft?
What is it like to work at Microsoft?
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 

Similar to Introduction to Microsoft’s Hadoop solution (HDInsight)

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
Naoki (Neo) SATO
 
Adam azure presentation
Adam   azure presentationAdam   azure presentation
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
Ravi Patel
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Capture the Cloud with Azure
Capture the Cloud with AzureCapture the Cloud with Azure
Capture the Cloud with Azure
Shahed Chowdhuri
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
taimur hafeez
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
Marco Parenzan
 
Intro to Product Development
Intro to Product DevelopmentIntro to Product Development
Intro to Product Development
Puja Pramudya
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
Abhimanyu Singhal
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
DataStax
 
Keep Calm and CF Push on Azure
Keep Calm and CF Push on AzureKeep Calm and CF Push on Azure
Keep Calm and CF Push on Azure
VMware Tanzu
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Azure is for Everyone
Azure is for EveryoneAzure is for Everyone
Azure is for Everyone
responsiveX
 

Similar to Introduction to Microsoft’s Hadoop solution (HDInsight) (20)

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
Adam azure presentation
Adam   azure presentationAdam   azure presentation
Adam azure presentation
 
Uotm workshop
Uotm workshopUotm workshop
Uotm workshop
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Capture the Cloud with Azure
Capture the Cloud with AzureCapture the Cloud with Azure
Capture the Cloud with Azure
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Intro to Product Development
Intro to Product DevelopmentIntro to Product Development
Intro to Product Development
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
 
Keep Calm and CF Push on Azure
Keep Calm and CF Push on AzureKeep Calm and CF Push on Azure
Keep Calm and CF Push on Azure
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Azure is for Everyone
Azure is for EveryoneAzure is for Everyone
Azure is for Everyone
 

More from James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
James Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
How to build your career
How to build your careerHow to build your career
How to build your career
James Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
James Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
James Serra
 

More from James Serra (16)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 

Recently uploaded

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 

Recently uploaded (20)

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 

Introduction to Microsoft’s Hadoop solution (HDInsight)

  • 1. James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  • 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3. Why Deploy To the Cloud? Microsoft’s Solution How Do I Get Started?
  • 4. What if you could handle big data?
  • 9. Hadoop is a platform with portfolio of projects
  • 10. A Hadoop distribution is a package of projects
  • 12. New analytic applications from new data
  • 13. Main differences vs RDBMS/NoSQL Pros • Not a type of database, but rather a open-source software ecosystem that allows for massively parallel computing • No inherent structure (no conversion to relational or JSON needed) • Good for batch processing, large files, volume writes, parallel scans, sequential access • Great for large, distributed data processing tasks where time isn’t a constraint (i.e. end-of-day reports, scanning months of historical data) • Tradeoff: In order to make deep connections between many data points, the technology sacrifices speed • Some NoSQL databases such as HBase are built on top of HDFS
  • 14. Main differences vs RDBMS/NoSQL Cons • File system, not a database • Not good for millions of users, random access, fast individual record lookups or updates (OLTP) • Not so great for real-time analytics • Lacks: indexing, metadata layer, query optimizer, memory management • Same cons at non-relational: no ACID support, data integrity, limited indexing, weak SQL, etc • Security limitations
  • 15. What Is Hadoop? Microsoft’s Solution How Do I Get Started?
  • 17. Why Cloud + Big Data? Speed Scale Economics Always Up, Always On Open and flexibleTime to value Data of all Volume, Variety, Velocity Massive Compute and Storage Deployment expertise
  • 18. Why Hadoop in the Cloud?
  • 19. Scenarios For Deploying Hadoop As Hybrid
  • 20. What Is Hadoop? Why Deploy To the Cloud? Microsoft’s Solution How Do I Get Started?
  • 24. HDInsight Built for Windows or Linux
  • 26. HDInsight Supports HBase Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Name Node Job Tracker HMaster Coordination Region Server Region Server Region Server Region Server
  • 28. HDInsight Supports Storm Stream processin g Search and query Data analytics (Excel) Web/thick client dashboards Devices to take action RabbitMQ / ActiveMQ
  • 29. Spark for Azure HDInsight In Memory Processing on Multiple Workloads Azure HDInsight In Memory Spark • Single execution model for multiple tasks • Processing up to 100x faster performance • Developer friendly (Java, Python, Scala) • BI tool of choice (Power BI, Tabelau, Qlik, SAP) • Notebook experience (Jupyter/iPython, Zeppelin)
  • 30. HDInsight Allows You To Add Hadoop Projects
  • 32. Easy for Data Scientists
  • 33. Easy for Business Analysts
  • 34. R Server for HDInsight • Familiarity of R (most popular language for data scientists) • Scalability of Hadoop and Spark • Up to 7x faster using Spark engine • Train and run ML models on datasets of any size • Cloud managed solution (easy setup, elastic, SLA)
  • 36. Hyper scale Infrastructure is the enabler 32 Regions Worldwide, 24 Generally Available…  100+ datacenters  Top 3 networks in the world  2.5x AWS, 7x Google DC Regions  G Series – Largest VM in World, 32 cores, 448GB Ram, SSD… Operational Announced/Not Operational Central US Iowa West US California East US Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo State West Europe Netherlands China North * Beijing China South * Shanghai Japan East Tokyo, Saitama Japan West Osaka India South Chennai East Asia Hong Kong SE Asia Singapore Australia South East Victoria Australia East New South Wales India Central Pune Canada East Quebec City Canada Central Toronto India West Mumbai Germany North East ** Magdeburg Germany Central ** Frankfurt North Europe Ireland East US 2 Virginia United Kingdom RegionsUnited Kingdom Regions US DoD East TBD US DoD West TBD * Operated by 21Vianet ** Data Stewardship by Deutsche Telekom Seoul Korea (2)
  • 44. Low Cost $£€¥ *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 46. Bringing Hadoop to a billion people
  • 47. Making advanced analytics accessible to Hadoop Cloud
  • 48. HDInsight vs HDP on Azure VM HDInsight HDP on Azure VM PaaS (setup, scale, manage, patch, etc) IaaS Managed by Microsoft Managed by customer Storage separate (Blob or ADLS) Storage in VM (local disk), but can also have storage in Azure blob or ADLS Delete VM keeps data Delete VM deletes data (unless external) Up to 30-days behind latest HDP version Latest HDP Version Limited Hadoop projects Unlimited Hadoop projects Microsoft supports VM and Hadoop Microsoft: VM, HDP: Hadoop No on-prem version On-prem version
  • 49. Distributed, parallel analytics framework U-SQL (based on C# and SQL) Dial for scale Hides infrastructure complexity Visual Studio integration Instant scale on demand Reduced learning curve Azure Data Lake Analytics Azure Services for big data analytics YARN HDFS HDInsightAnalytics Service Store Partners U-SQL Clickstream Sensors Video Social Web Devices Relational Applications 56
  • 50. What Is Hadoop? Why Deploy To the Cloud? Microsoft’s Solution
  • 52. Azure getting started • Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/ • Startups: BizSpark, $750/month free Azure, BizSpark Plus - $120k/year free Azure, https://www.microsoft.com/bizspark/ • MSDN subscription, Data Platform MVP, $150/month free Azure, https://azure.microsoft.com/en- us/pricing/member-offers/msdn-benefits/ • Microsoft Educator Grant Program, faculty - $250/month free Azure for a year, students - $100/month free Azure for 6 months, https://azure.microsoft.com/en-us/pricing/member- offers/msdn-benefits/ • Microsoft Azure for Research Grant, http://research.microsoft.com/en-us/projects/azure/default.aspx • DreamSpark for students, https://www.dreamspark.com/Student/Default.aspx • DreamSpark for academic institutions: https://www.dreamspark.com/Institution/Subscription.aspx • Various Microsoft funds so you can learn the technologies or build a client solution
  • 53. Pricing for HDInsight CAPABILITIES STANDARD PREMIUM PREVIEW Big Data Workloads Standard Hadoop and Open Source Projects (Core Hadoop & YARN, Hive & HCatalog, Tez, Pig, Sqoop, Oozie, Zookeeper, Phoenix) Columnar NoSQL (HBase) Stream processing (Storm) Interactive processing, real-time stream processing & ML (Spark) Big Data statistics predictive modeling, and machine learning with R Server Enterprise Readiness Administration – manage, monitor & troubleshoot clusters Hadoop version upgrades and patching – Automatic patching and upgrades Encryption of data at rest Price Standard price per Node HDInsight Standard Price + $0.02/Core-hour for each core used in the cluster during preview (75% discount)
  • 54. Resources  What is HDInsight? http://bit.ly/1WpS0at  Hadoop and Microsoft http://bit.ly/20Cg2hA  Introduction to Hadoop http://bit.ly/1WpTstq
  • 55. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

Editor's Notes

  1. Introduction to Microsoft’s Hadoop solution (HDInsight) Did you know Microsoft provides a Hadoop Platform-as-a-Service (PaaS)? It’s called Azure HDInsight and it deploys and provisions managed Apache Hadoop clusters in the cloud, providing a software framework designed to process, analyze, and report on big data with high reliability and availability. HDInsight uses the Hortonworks Data Platform (HDP) Hadoop distribution that includes many Hadoop components such as HBase, Spark, Storm, Pig, Hive, and Mahout. Join me in this presentation as I talk about what Hadoop is, why deploy to the cloud, and Microsoft’s solution.
  2. Fluff, but point is I bring real work experience to the session
  3. http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/ Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?   Slide talk track: What is the “traditional” data warehouse? IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  4. http://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/ Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?   Slide talk track: What is the “traditional” data warehouse? IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  5. Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?   Slide talk track: What is the “traditional” data warehouse? IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  6. Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?   Slide talk track: What is the “traditional” data warehouse? IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  7. http://hadoop.apache.org/who.html
  8. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ Hadoop Common – Contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster Hadoop MapReduce – A programming model for large scale data processing.  It is designed for batch processing.  Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc).  But Java is the most popular Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0).  YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
  9. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ Hadoop Common – Contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster Hadoop MapReduce – A programming model for large scale data processing.  It is designed for batch processing.  Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc).  But Java is the most popular Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0).  YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
  10. Hardware acquisition (Capex up front) Scale constrained to on-premise procurement (resource and capacity planning) Skilled Hadoop Expertise Tuning + Maintenance
  11. Gartner has increased its sizing and forecast for cloud compute services, reflecting greater interest among our client base than had been expected. We expect the 2013 market to be worth $8 billion (up from $6.8 billion forecast last year) and the 2014 market to be worth $10 billion.
  12. Why Hadoop in the cloud? You can deploy Hadoop in a traditional on-site datacenter. Some companies–including Microsoft–also offer Hadoop as a cloud-based service. One obvious question is: why use Hadoop in the cloud? Here's why a growing number of organizations are choosing this option. The cloud saves time and money Open source doesn't mean free. Deploying Hadoop on-premises still requires servers and skilled Hadoop experts to set up, tune, and maintain them. A cloud service lets you spin up a Hadoop cluster in minutes without up-front costs. See how Virginia Tech is using Microsoft's cloud instead of spending millions of dollars to establish their own supercomputing center. The cloud is flexible and scales fast In the Microsoft Azure cloud, you pay only for the compute and storage you use, when you use it. Spin up a Hadoop cluster, analyze your data, then shut it down to stop the meter. We quickly spun up the Azure HDInsight cluster and processed six years worth of data in just a few hours, and then we shut it down&ellipsis; processing the data in the cloud made it very affordable. –Paul Henderson, National Health Service (U.K.) The cloud makes you nimble Create a Hadoop cluster in minutes–and add nodes on-demand. The cloud offers organizations immediate time to value. It was simply so much faster to do this in the cloud with Windows Azure. We were able to implement the solution and start working with data in less than a week. –Morten Meldgaard, Chr. Hansen
  13. Joint engineering teams – joint people on each other’s teams. They report to Hortonworks and Hortonworks report to us. We share resources and have a joint engineering team. This is not just a re-licensing of a distro. We are one engineering team. Milestone cadence – we do a 12-month calendar year roadmap with each other. That roadmap is how each company lock to. They revisit it every 3 months. When Cloudera released Impala, Microsoft and Hortonworks changed their resources to start working on Stinger project. In CY13, this became a joint commitment to build Tez (vectorized query executor) and Stinger Joint planning and execution – Joint signoffs (Linux, Windows Azure) – Microsoft and Hortonworks signedoff on Linux, on Windows, and on Azure. Aligned offerings – HDP on Windows HDI Appliance, HDI Service Joint roadmap CY planning/ roadmap development Quarterly adjustments Hortonworks is making Hadoop an enterprise viable platform, by the end of 2015 more than half the worlds data will be processed by Hadoop <look at partner page on Hortonworks site> Engineering alignment Microsoft worked with Hortonworks to develop Hortonworks Data Platform for Windows HDInsight on Windows Azure and HDInsight on PDW is based on HDP Microsoft and Hortonworks are working on various Hadoop sub-projects together (Stinger, Tez) contributed over 6k engineering hours, 25k+ lines of code Corporate alignment Microsoft is one of Hortonworks strategic partners (Q for Audrey) Joint Marketing alignment (webinars, events strategy, Press, etc.) Joint Support alignment (Microsoft provides Lvl1 support and pays Hortownworks for Lvl2 and Lvl3) Field Alignment Hortonworks field reps will get quota compensation or relief if HDP for Windows OR if PDW, Azure is sold Microsoft field reps work with Hortonworks reps to get credibility from Hadoop story
  14. https://mahout.apache.org/users/basics/algorithms.html
  15. Slide 1 – it’s supposed to represent a zoom in of the right most box with certain new details. Slide 2 and 3 are the same slide. I received feedback from my original slide 2 below and I made slide 3.  They still didn’t like it.  Here is the feedback: Original feedback:   Interactive experience conceptual picture Make the top and bottom be proportional Drop the boxes and headings (OSS notebooks, 3rd party BI tools) Power BI should stand out but the current image is not adding new value, can we just use Power BI logo and also some additional image to have it stand out   Secondary feedback about slide 3   In the image that shows Power BI, we do still want to make Power BI pop above the others and have it featured, but Ranga was a little unsure about having the images of the devices included. Let’s get a version that shows it both ways, (with the devices, and making the Power BI section stand out in another way, ie size, but not including the devices,) and then let him decide which one he prefers.   Sldie 4 General cleanup
  16. http://www.zdnet.com/article/microsoft-claims-azure-now-used-by-half-of-the-fortune-500/
  17. Azure Blob (WASB, SAS keys, manually manage storage expansion) vs Azure data lake store (webHDFS, AD, auto shared) http://www.jamesserra.com/archive/2014/02/what-is-hdinsight/
  18. There does not have to be multiple ADLS Accounts. You can use one ADLS account since there is the promise of infinite scale. You could have multiple accounts if they are used for wider access control, different owners, billing/chargeback, management/backup strategy differences, come from different subscriptions, etc. You could also have an HDI cluster with one or more WASBs and one or more ADLS accounts. And each of those WASBs and each ADLS account can be accessed by multiple HDIs.
  19. Case Study: http://www.microsoft.com/casestudies/Windows-Azure/Virginia-Polytechnic-Institute-and-State-University/University-Transforms-Life-Sciences-Research-with-Big-Data-Solution-in-the-Cloud/710000003381 Video: http://www.youtube.com/watch?v=4StwPG0qWT0 Virginia Tech is one of the country’s leading research institutions. The university manages a research portfolio of US$454 million. Virginia Tech began previously used a network of supercomputers to locate undetected genes in a massive genome database. This and related work by other institutions has the potential to lead to exciting medical breakthroughs, including new cancer therapies and antibiotics used to combat the emergence of drug-resistant bugs. However, as the size of genome databases grows, this no longer became attenable. Of the estimated 2,000 DNA sequencers worldwide, they are generating 15 petabytes of genome data every year. Their existing computational and storage resources required to work with data sets of this size wasn’t keeping up. Rather than getting a grant for millions of dollars to establish their own supercomputing center, Virginia Tech went for Azure HDInsight and only paying for the compute they use. Benefits include: Significant cost savings going from a multi-million supercomputer center to paying for only the compute you need in the cloud. Ability to access the cloud from anywhere and on any device (even outside the laboratory) Azure is able to elastically scale and keep up with the amount of data being generated Ultimate benefit: Some day be able to find a treatment for cancer
  20. BlackBall is a tea and dessert company based in Taipei, Taiwan. Established in 2006, BlackBall has expanded from 20 stores in Taiwan to 40 more locations throughout Malaysia. To support regional growth, BlackBall wanted better insight into business operations. Each store sent point-of-sale (POS) data to the corporate headquarters, where managers manually entered the information into spreadsheets. They also monitored other sources such as social media, but it was difficult to make connections between the disparate sources of information. “Our main challenge involved reporting,” says Andrew Cheong, Senior Manager at BlackBall. “There were a lot of questions that we were just unable to ask about customer behavior. Without insight into regional demand, it’s very difficult to grow our business.” In addition to wanting to more targeted promotions, the company wanted to improve product distribution. BlackBall had built its reputation on the quality of its highly perishable ingredients, and getting them to the right place at the right time was critical to the company’s success. “We use a lot of fruits in our desserts, and if they don’t taste right, then we lose our competitive advantage,” says Cheong. BlackBall offers more than 70 products, and constantly works on developing new ingredients and combinations. By monitoring customer feedback on social media as well as sales data, the company can plan more strategically. For example, BlackBall is using the information to develop new products and execute more effective promotions. “By capitalizing on the strengths of HDInsight Service, we have a much better understanding of what people want,” says Cheong. “We also found that our marketing can be vastly improved because we can quickly identify top-selling products and push that information out to each store. We used to only be able to do that on a weekly basis, but now we can do that in near-real time.” The company is already gaining surprising insights that it can use to improve product distribution and marketing. For example, BlackBall expected to see that the weather affected sales; however, the results were not what it anticipated. “Before, we thought that people would choose cold drinks and desserts in hot weather,” says Cheong. “But contrary to our assumptions, in certain outlets we saw an opposite trend.” As a result, the company can ensure that those outlets are equipped with the staff and supplies needed to meet customer demand.