SlideShare a Scribd company logo
1 of 46
Azure HDInsight: Fully Managed, Full Spectrum Open Source Analytics
Ashish Thapliyal
Principal Product Manager
Azure HDInsight
Microsoft Corporation
https://blogs.msdn.microsoft.com/ashish/
https://twitter.com/ashishth
Agenda
HDInsight Intro & Recent Updates
Investment Areas:
• Reducing Complexity
• Fast Performance: Interactive Query
• Monitoring
• Security
Open source
analytics service
for the Enterprise
Fully-managed Hadoop and Spark for the cloud. 99.9% SLA
100% Open Source Hortonworks data platform
Clusters up and running in minutes
Familiar BI tools, interactive open source notebooks
Scale clusters on demand
Secure Hadoop workloads via Active Directory and Ranger
Compliance for Open Source bits
Best in class monitoring with Azure Log Analytics
Native Integration with leading ISVs
Recent Updates
More value to our customers
Up to 52% price reduction
Additional 80% price reduction for R Server for Azure HDInsight

GA
Released
Sept ‘17Interactive Query
blazing fast SQL queries on
hyper-scale data
https://docs.microsoft.com/en-
us/azure/hdinsight/interactive-
query/apache-interactive-query-get-
started
• Fast Interactive SQL queries on petabyte-
scale data
• Intelligent Caching / leverage local SSDs
• Modern scalable query concurrency
architecture
• Rich connectivity with the most popular
authoring tools
• No data format conversion in order to get
faster results
• Enterprise Grade Security and Monitoring

GA
Released
Sept ‘17HDInsight Integration
with Azure Log Analytics
Enterprise grade production monitoring for
Hadoop and Spark workloads
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-hadoop-
oms-log-analytics-tutorial
• Monitor all of the HDInsight clusters and
other Azure resources with a single pane of
glass.
• Extendable workload specific dashboards
along with sophisticated analytical query
language for deep analytics.
• Collect and correlate data from multiple Open
Source services.
• Alerts on critical issues with built-in Log
Analytics alerting infrastructure.
• Troubleshoot issues faster by having Hadoop,
Yarn, Spark, Kafka, HBase, Hive, Storm logs,
and Metrics in one place.
• Perform rich log exploration with interactive
queries

Public Preview
Sept ‘17
VSCode Integration
with HDInsight
First class cross-platform
integration with Spark & Hive
workloads
User Manual:
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-for-
vscode?branch=pr-en-us-26060
• Interactive responses brings the best
properties of Python and Spark with
flexibility to execute one or multiple
statements.
• Built in Python language service such as
IntelliSense auto suggest, auto complete,
error marker, among others.
• Preview and export your PySpark interactive
query results to csv, json, and excel format.
• Integration with Azure for HDInsight cluster
management and query submissions.
• Link with Spark UI and Yarn UI for further
trouble shooting.

Public Preview
Sept ‘17
Advanced development
tools for Spark
Distributed debugging of
Spark code running across
multiple Spark executors
User Manual:
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-apache-
spark-intellij-tool-debug-remotely-
through-ssh
• Use IntelliJ to run and debug Spark application
remotely on an HDInsight cluster anytime.
Developers can inspect variables, watch
intermediate data, step through code, and
finally edit the app and resume execution – all
against Azure HDInsight clusters with cluster
data.
• Set a breakpoint for both driver and executor
code. Debugging executor code lets
developers detect data-related errors by
viewing RDD intermediate values, tracking
distributed task operations, and stepping
through execution units.
• Set a breakpoint in Spark external libraries
allowing developers to step into Spark code
and debug in the Spark framework.
• View both driver and executor code
execution logs in the console panel.

GA
Released
Dec ‘17
Apache Kafka
for HDInsight
Enterprise proven Kafka service for the
cloud
99.9% SLAs
Highest level of availability with rack
awareness
Native integration with Azure Managed disks
means faster data ingestion and lowers costs
Build real-time solutions faster
Get a cluster up and running in 4 clicks
Easy data mirroring and setup
Out-of-the box alerting and monitoring
Integration with Apache Spark and Apache
Storm
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-apache-
kafka-get-started

Public Preview
Dec 17
Enterprise Security
Package for HDInsight
Enterprise grade security for Hadoop and
Spark workloads
https://docs.microsoft.com/en-
us/azure/hdinsight/hdinsight-domain-
joined-introduction
• Multi-user authentication using Active
Directory or Azure Active Directory.
• Multi-user Zeppelin notebook with
collaborative data science experience.
• Role based access control for Ambari
operations.
• Fine grained role based access control for
Hive SQL and Spark SQL using Apache
Ranger.
• Data masking of sensitive data using Apache
Ranger.
• Seamless integration with file and folder level
ACLs in Azure Data Lake Store.
• Audit all access to sensitive data as well as
changes to access policies.
• Transparent server side encryption at rest as
well as encryption in transit.
Reducing Complexity
Open Source Big Data is Complex
DataLakeProbe
HBaseHealthProbe
HBaseMetricsProbe
HBaseProbe
HdfsProbe
HdinsightZookeeperProbe
……..
EdgenodeSSHWatchdog
GatewayTCPPingWatchdog
SSHTCPPingWatchdog
RStudioWatchdog
CertRolloverWatchdog
JobSubmissionPingWatchdog
OozieWatchdog
DataNodesUpWatchdog
NodeManagersUpWatchdog
ResourceHealthWatchdog
AzureNodeStatusWatchdog
ClusterMALoggingHashWatchdo
g
ClusterAvailabilityWatchdog
ClusterHealthWatchdog
……..
namenode_ha_health
ams_metrics_collector_process
ams_metrics_collector_autostart
ams_metrics_collector_hbase_master_p
rocess
namenode_last_checkpoint
namenode_webui
increase_nn_heap_usage_daily
hive_metastore_process
ambari_server_stale_alerts
ambari_server_agent_heartbeat
metrics_monitor_process_percent
……….
Microsoft Confidential
Fast Performance with Interactive Query
Ingest Transform
Convert to
ORC/ Parquet
Load to
Relational
Store
Serve
Ingest Transform
Convert to
ORC/ Parquet
Load to
Relational
Store
Serve
Time
Ingest Serve
o Hive Low Latency and Analytical Processing (LLAP)
o Serves queries directly from Azure BLOB/ADLS
o Works with TEXT, JSON, CSV, TSV, ORC, Parquet
o Super fast performance with TEXT data
o Modern scalable query concurrency architecture
o Security with Apache Ranger and Active Directory
HDInsight Interactive Query architecture
Memory + SSD cache
Intelligent cache
DRAM
SSD
ADLS/BLOBStore
Automatically reacting to changes in underlying data
o Shared cache between queries
o Cache eviction is based on source file last modified date
o Every query will check modified date, and reload if a new file has
arrived
Updates
• LLAP, Spark, and Presto against 1 TB derived from the TPC-DS benchmark
• Out of the box HDInsight Configuration
• 45 queries derived from the TPC-DS benchmark that ran on all engines successfully
How about large scale?
• We used number of different concurrency levels to test the
concurrency performance
• 99 queries on 1 TB data with 32 worker node cluster with max
concurrency set to 32.
Test 1: Run all 99 queries, 1 at a time - Concurrency = 1
Test 2: Run all 99 queries, 2 at a time - Concurrency = 2
Test 3: Run all 99 queries, 4 at a time - Concurrency = 4
Test 4: Run all 99 queries, 8 at a time - Concurrency = 8
Test 5: Run all 99 queries, 16 at a time - Concurrency = 16
Test 6: Run all 99 queries, 32 at a time - Concurrency = 32
Test 7: Run all 99 queries, 64 at a time - Concurrency = 64
HDInsight: Log Analytics Integration
OMS Agent for
Linux
HDInsight nodes (Head, Worker ,
Zookeeper )
FluentD
HDInsight
plugin
1. Plugin for ‘in_tail’ for all Logs, allows
regexp to create JSON object
2. Filter for WARN and above for each
Log Type. `grep` filter plugin
3. Output to out_oms_api Type
4. Exec plugin for Metrics
HBaseConfigosmconfig
Spark
Hive/ LLAP
Storm
Kafka
Config
Config
Config
Config
Log Analytics(OMS) Service
HDInsight: Enterprise Security Package
o Available for Hadoop, Spark, LLAP in preview
o Enabled with AAD + AAD DS, AD on IaaS set up
o AAD DS is available in ARM VNET and new Azure
portal
o Ranger database can be external to the cluster
Product demand analysis
Delivery and Operations
Customer ID Name Cell phone Email Address City State Zip Credit card
413707 LUNA PARK 3122049789 luna.park@gmail.com 3250 W FOSTER AVE CHICAGO IL 60625 4147202109819679
391234 MARIE 3121069067 marie@outlook.com 4729 N LINCOLN AVE CHICAGO IL 60625 5166550002516678
413751 MANU WORKY 8471909522 manu.work@gmail.com 11601 W TOUHY AVE CHICAGO IL 60666 5159550002367622
413708 STEVE BENCH 3122049411 steve.bench@outlook.com 325 N LA SALLE ST BLDG CHICAGO IL 60654 4149098188760969
… … ... … … … …
Customer ID Reviews Rating
413707 SPICY, YET HEALTHY. WOULD ORDER AGAIN 9.3
391234 HATS OFF TO MAINTAIN PROPER 4.6
413751 AMAZING FOOD PREPARED RIGHT 9.4
413708 Decent Food 7.1
… …. ….
Id Customer ID Orders placed Discount Date Revenue
102456 68252 277 $526.30 8/1/2016 $2,243.70
102457 413488 282 $84.60 8/1/2016 $2,735.40
102458 250405 134 $281.40 8/1/2016 $1,058.60
102459 114533 141 $253.80 8/1/2016 $1,156.20
102460 315209 289 $346.80 8/1/2016 $2,543.20
… … … … … …
Id Customer ID Time taken Cost Date
102456 68252 63 $224.00 8/1/2016
102457 413488 65 $235.00 8/1/2016
102458 250405 67 $245.00 8/1/2016
102459 114533 71 $227.00 8/1/2016
102460 315209 72 $213.00 8/1/2016
… … … … …
Thank You
Ashish Thapliyal
Azure HDInsight
Microsoft Corporation
© 2018 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or
other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information
provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related Content

What's hot

Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization Sematext Group, Inc.
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
 
NGSIv2 Overview for Developers that Already Know NGSIv1
NGSIv2 Overview for Developers that Already Know NGSIv1NGSIv2 Overview for Developers that Already Know NGSIv1
NGSIv2 Overview for Developers that Already Know NGSIv1FIWARE
 
利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據Yu-Lun Chen
 
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Evention
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logsMathew Beane
 
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...Jürgen Ambrosi
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentationsharonyb
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...DataStax
 
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li Databricks
 
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersDenis Magda
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesAll Things Open
 
Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Velocidex Enterprises
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 

What's hot (20)

Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Securing Hadoop @eBay
Securing Hadoop @eBaySecuring Hadoop @eBay
Securing Hadoop @eBay
 
NGSIv2 Overview for Developers that Already Know NGSIv1
NGSIv2 Overview for Developers that Already Know NGSIv1NGSIv2 Overview for Developers that Already Know NGSIv1
NGSIv2 Overview for Developers that Already Know NGSIv1
 
利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據
 
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...
7° Sessione - L’intelligenza artificiale a supporto della ricerca, servizi di...
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Couchbase presentation
Couchbase presentationCouchbase presentation
Couchbase presentation
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
 
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
From Data to Actions and Insights at Conviva with Rui Zhang and Yan Li
 
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and Engineers
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use Cases
 
Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 

Similar to HDInsight Interactive Query

The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Machine Data 101
Machine Data 101Machine Data 101
Machine Data 101Splunk
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightSplunk
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightSplunk
 
Machine Data 101 Workshop
Machine Data 101 Workshop Machine Data 101 Workshop
Machine Data 101 Workshop Splunk
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionDaniel Zivkovic
 
Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureMicrosoft Tech Community
 
PCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
Microsoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsMicrosoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsTomasz Wisniewski
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 

Similar to HDInsight Interactive Query (20)

The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Machine Data 101
Machine Data 101Machine Data 101
Machine Data 101
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into Insight
 
Machine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into InsightMachine Data 101: Turning Data Into Insight
Machine Data 101: Turning Data Into Insight
 
Machine Data 101 Workshop
Machine Data 101 Workshop Machine Data 101 Workshop
Machine Data 101 Workshop
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
 
Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
 
PCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest Software
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Microsoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsMicrosoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOps
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 

More from Ashish Thapliyal

Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practicesAshish Thapliyal
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaAshish Thapliyal
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightAshish Thapliyal
 
HDInsight Security & Compliance
HDInsight Security & ComplianceHDInsight Security & Compliance
HDInsight Security & ComplianceAshish Thapliyal
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryAshish Thapliyal
 
HDInsight HBase replication
HDInsight HBase replicationHDInsight HBase replication
HDInsight HBase replicationAshish Thapliyal
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightAshish Thapliyal
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsAshish Thapliyal
 
Monitor Azure HDInsight with Azure Log Analytics
Monitor Azure HDInsight with Azure Log AnalyticsMonitor Azure HDInsight with Azure Log Analytics
Monitor Azure HDInsight with Azure Log AnalyticsAshish Thapliyal
 
HDInsight HBase Performance best practices
HDInsight HBase Performance best practicesHDInsight HBase Performance best practices
HDInsight HBase Performance best practicesAshish Thapliyal
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightAshish Thapliyal
 
DIY: TPCDS HDInsight Benchmark
DIY: TPCDS HDInsight BenchmarkDIY: TPCDS HDInsight Benchmark
DIY: TPCDS HDInsight BenchmarkAshish Thapliyal
 

More from Ashish Thapliyal (13)

Introduction and HDInsight best practices
Introduction and HDInsight best practicesIntroduction and HDInsight best practices
Introduction and HDInsight best practices
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
 
HDInsight Security & Compliance
HDInsight Security & ComplianceHDInsight Security & Compliance
HDInsight Security & Compliance
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
 
HDInsight HBase replication
HDInsight HBase replicationHDInsight HBase replication
HDInsight HBase replication
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight Deployments
 
Monitor Azure HDInsight with Azure Log Analytics
Monitor Azure HDInsight with Azure Log AnalyticsMonitor Azure HDInsight with Azure Log Analytics
Monitor Azure HDInsight with Azure Log Analytics
 
HDInsight HBase Performance best practices
HDInsight HBase Performance best practicesHDInsight HBase Performance best practices
HDInsight HBase Performance best practices
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsight
 
DIY: TPCDS HDInsight Benchmark
DIY: TPCDS HDInsight BenchmarkDIY: TPCDS HDInsight Benchmark
DIY: TPCDS HDInsight Benchmark
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 

HDInsight Interactive Query

  • 1. Azure HDInsight: Fully Managed, Full Spectrum Open Source Analytics Ashish Thapliyal Principal Product Manager Azure HDInsight Microsoft Corporation https://blogs.msdn.microsoft.com/ashish/ https://twitter.com/ashishth
  • 2. Agenda HDInsight Intro & Recent Updates Investment Areas: • Reducing Complexity • Fast Performance: Interactive Query • Monitoring • Security
  • 3. Open source analytics service for the Enterprise Fully-managed Hadoop and Spark for the cloud. 99.9% SLA 100% Open Source Hortonworks data platform Clusters up and running in minutes Familiar BI tools, interactive open source notebooks Scale clusters on demand Secure Hadoop workloads via Active Directory and Ranger Compliance for Open Source bits Best in class monitoring with Azure Log Analytics Native Integration with leading ISVs
  • 5. More value to our customers Up to 52% price reduction Additional 80% price reduction for R Server for Azure HDInsight
  • 6.  GA Released Sept ‘17Interactive Query blazing fast SQL queries on hyper-scale data https://docs.microsoft.com/en- us/azure/hdinsight/interactive- query/apache-interactive-query-get- started • Fast Interactive SQL queries on petabyte- scale data • Intelligent Caching / leverage local SSDs • Modern scalable query concurrency architecture • Rich connectivity with the most popular authoring tools • No data format conversion in order to get faster results • Enterprise Grade Security and Monitoring
  • 7.  GA Released Sept ‘17HDInsight Integration with Azure Log Analytics Enterprise grade production monitoring for Hadoop and Spark workloads https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-hadoop- oms-log-analytics-tutorial • Monitor all of the HDInsight clusters and other Azure resources with a single pane of glass. • Extendable workload specific dashboards along with sophisticated analytical query language for deep analytics. • Collect and correlate data from multiple Open Source services. • Alerts on critical issues with built-in Log Analytics alerting infrastructure. • Troubleshoot issues faster by having Hadoop, Yarn, Spark, Kafka, HBase, Hive, Storm logs, and Metrics in one place. • Perform rich log exploration with interactive queries
  • 8.  Public Preview Sept ‘17 VSCode Integration with HDInsight First class cross-platform integration with Spark & Hive workloads User Manual: https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-for- vscode?branch=pr-en-us-26060 • Interactive responses brings the best properties of Python and Spark with flexibility to execute one or multiple statements. • Built in Python language service such as IntelliSense auto suggest, auto complete, error marker, among others. • Preview and export your PySpark interactive query results to csv, json, and excel format. • Integration with Azure for HDInsight cluster management and query submissions. • Link with Spark UI and Yarn UI for further trouble shooting.
  • 9.  Public Preview Sept ‘17 Advanced development tools for Spark Distributed debugging of Spark code running across multiple Spark executors User Manual: https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-apache- spark-intellij-tool-debug-remotely- through-ssh • Use IntelliJ to run and debug Spark application remotely on an HDInsight cluster anytime. Developers can inspect variables, watch intermediate data, step through code, and finally edit the app and resume execution – all against Azure HDInsight clusters with cluster data. • Set a breakpoint for both driver and executor code. Debugging executor code lets developers detect data-related errors by viewing RDD intermediate values, tracking distributed task operations, and stepping through execution units. • Set a breakpoint in Spark external libraries allowing developers to step into Spark code and debug in the Spark framework. • View both driver and executor code execution logs in the console panel.
  • 10.  GA Released Dec ‘17 Apache Kafka for HDInsight Enterprise proven Kafka service for the cloud 99.9% SLAs Highest level of availability with rack awareness Native integration with Azure Managed disks means faster data ingestion and lowers costs Build real-time solutions faster Get a cluster up and running in 4 clicks Easy data mirroring and setup Out-of-the box alerting and monitoring Integration with Apache Spark and Apache Storm https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-apache- kafka-get-started
  • 11.  Public Preview Dec 17 Enterprise Security Package for HDInsight Enterprise grade security for Hadoop and Spark workloads https://docs.microsoft.com/en- us/azure/hdinsight/hdinsight-domain- joined-introduction • Multi-user authentication using Active Directory or Azure Active Directory. • Multi-user Zeppelin notebook with collaborative data science experience. • Role based access control for Ambari operations. • Fine grained role based access control for Hive SQL and Spark SQL using Apache Ranger. • Data masking of sensitive data using Apache Ranger. • Seamless integration with file and folder level ACLs in Azure Data Lake Store. • Audit all access to sensitive data as well as changes to access policies. • Transparent server side encryption at rest as well as encryption in transit.
  • 13. Open Source Big Data is Complex
  • 14.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Microsoft Confidential Fast Performance with Interactive Query
  • 23. Ingest Transform Convert to ORC/ Parquet Load to Relational Store Serve
  • 24. Ingest Transform Convert to ORC/ Parquet Load to Relational Store Serve Time
  • 26. o Hive Low Latency and Analytical Processing (LLAP) o Serves queries directly from Azure BLOB/ADLS o Works with TEXT, JSON, CSV, TSV, ORC, Parquet o Super fast performance with TEXT data o Modern scalable query concurrency architecture o Security with Apache Ranger and Active Directory
  • 27. HDInsight Interactive Query architecture Memory + SSD cache
  • 28. Intelligent cache DRAM SSD ADLS/BLOBStore Automatically reacting to changes in underlying data o Shared cache between queries o Cache eviction is based on source file last modified date o Every query will check modified date, and reload if a new file has arrived Updates
  • 29. • LLAP, Spark, and Presto against 1 TB derived from the TPC-DS benchmark • Out of the box HDInsight Configuration • 45 queries derived from the TPC-DS benchmark that ran on all engines successfully
  • 30.
  • 31. How about large scale?
  • 32.
  • 33. • We used number of different concurrency levels to test the concurrency performance • 99 queries on 1 TB data with 32 worker node cluster with max concurrency set to 32. Test 1: Run all 99 queries, 1 at a time - Concurrency = 1 Test 2: Run all 99 queries, 2 at a time - Concurrency = 2 Test 3: Run all 99 queries, 4 at a time - Concurrency = 4 Test 4: Run all 99 queries, 8 at a time - Concurrency = 8 Test 5: Run all 99 queries, 16 at a time - Concurrency = 16 Test 6: Run all 99 queries, 32 at a time - Concurrency = 32 Test 7: Run all 99 queries, 64 at a time - Concurrency = 64
  • 34.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. OMS Agent for Linux HDInsight nodes (Head, Worker , Zookeeper ) FluentD HDInsight plugin 1. Plugin for ‘in_tail’ for all Logs, allows regexp to create JSON object 2. Filter for WARN and above for each Log Type. `grep` filter plugin 3. Output to out_oms_api Type 4. Exec plugin for Metrics HBaseConfigosmconfig Spark Hive/ LLAP Storm Kafka Config Config Config Config Log Analytics(OMS) Service
  • 43. o Available for Hadoop, Spark, LLAP in preview o Enabled with AAD + AAD DS, AD on IaaS set up o AAD DS is available in ARM VNET and new Azure portal o Ranger database can be external to the cluster
  • 44.
  • 45. Product demand analysis Delivery and Operations Customer ID Name Cell phone Email Address City State Zip Credit card 413707 LUNA PARK 3122049789 luna.park@gmail.com 3250 W FOSTER AVE CHICAGO IL 60625 4147202109819679 391234 MARIE 3121069067 marie@outlook.com 4729 N LINCOLN AVE CHICAGO IL 60625 5166550002516678 413751 MANU WORKY 8471909522 manu.work@gmail.com 11601 W TOUHY AVE CHICAGO IL 60666 5159550002367622 413708 STEVE BENCH 3122049411 steve.bench@outlook.com 325 N LA SALLE ST BLDG CHICAGO IL 60654 4149098188760969 … … ... … … … … Customer ID Reviews Rating 413707 SPICY, YET HEALTHY. WOULD ORDER AGAIN 9.3 391234 HATS OFF TO MAINTAIN PROPER 4.6 413751 AMAZING FOOD PREPARED RIGHT 9.4 413708 Decent Food 7.1 … …. …. Id Customer ID Orders placed Discount Date Revenue 102456 68252 277 $526.30 8/1/2016 $2,243.70 102457 413488 282 $84.60 8/1/2016 $2,735.40 102458 250405 134 $281.40 8/1/2016 $1,058.60 102459 114533 141 $253.80 8/1/2016 $1,156.20 102460 315209 289 $346.80 8/1/2016 $2,543.20 … … … … … … Id Customer ID Time taken Cost Date 102456 68252 63 $224.00 8/1/2016 102457 413488 65 $235.00 8/1/2016 102458 250405 67 $245.00 8/1/2016 102459 114533 71 $227.00 8/1/2016 102460 315209 72 $213.00 8/1/2016 … … … … …
  • 46. Thank You Ashish Thapliyal Azure HDInsight Microsoft Corporation © 2018 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  1. For quick PySpark developers who value productivity of Python language, the new VSCode plugin for HDInsight offers first class integration with this popular code editor. Developers can edit their scripts on laptops and submit PySpark statements to HDInsight cluster with interactive responses. This interactivity brings the best properties of Python and Spark to developers and makes their life more enjoyable and productive