SlideShare a Scribd company logo
Spark on HDInsight
Seattle Spark Meetup on March 9, 2016
Presenters: Judy Nash & Lin Chan
About Us
 Azure HDInsight Service
 Azure’s answer to big data with open source tech
 deploy and manage clusters hosting Hadoop, HBase, Storm, and now Spark
 Our Goal – Make Spark easy to use on Azure
 How Do We Make It Happen
 Deploy new spark clusters via SDK and Portal
 Pre-configure and tune cluster for optimal experience
 Adopt open source technologies to enhance spark workload
 Contribute back to open source
About the Talk
 How to Build an Enterprise-ready Spark System
 Deep Dive of HDInsight’s Spark Cluster
 Cluster Architecture
 Resource Manager
 End-to-end Workflows
 Business Intelligence
 Remote Job Submission
Spark Cluster Architecture
Why Yarn?
 Standalone
 Better UI
 Less memory overhead
 Faster application launch time
 YARN
 Better community support
 More powerful resource management
 Share resources with other job workflows
 More user friendly to users who knew Hadoop on yarn already
Business Intelligence Workflow
Addressing Multi-tenancy
 Fair Scheduler
 Allow sharing resources between queries within thrift server
 Important for BI customers who share a cluster. Avoid bad query taking over a
cluster.
 To Use, set default queue type as “fair” scheduling
 Dynamic Allocation
 Allow sharing resources between thrift and other applications
 Leave minimum footprints for customers who do not use thrift, but able to expand
to maximum resource allowed when customers execute expensive queries
What is Livy?
 REST Server allowing remote job submission
 2 modes currently: batch & interactive
 Open source project
 Co-development with Cloudera
Batch Job Submission
Sample Call
 Submit a batch job
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X
POST -d '{
"file":"wasb://mycontainer@mystorageaccount.blob.core.windows.net/data/Spar
kSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }'
"https://mysparkcluster.azurehdinsight.net/livy/batches"
 Check the job status
curl -k --user "admin:mypassword1!" -v -X GET
"https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"
Interactive session
Sample Call
 Start a Scala interactive session
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{
"kind":"spark" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions"
 Post a statement
curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d
'{"code":"1+1" }'
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"
 Check the statement result
curl -k --user "admin:mypassword1!" -v -X GET
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"
 Terminate a session
curl -k --user "admin:mypassword1!" -v -X DELETE
"https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}”
Integration with
Jupyter
Livy vs Job Server
 Had Job Server initially
 Job server is not easy to use for simple jar submission or notebook case
 Job server is good for embedding Spark work within a bigger app
 Client mode is coming to Livy soon
 Partner with Cloudera is important
More on Livy
 HDI online documentation: https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-livy-rest-interface
 Livy Repo: https://github.com/cloudera/livy
More on HDInsight
 HDInsight Blog
 https://blogs.msdn.microsoft.com/azuredatalake/
 Contact Us
 Lin Chan https://www.linkedin.com/in/linchanms
 Judy Nash https://www.linkedin.com/in/judynash

More Related Content

What's hot

Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
DataWorks Summit
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
Bigstep
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
John Archer
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
azure synapse analytics end-to-end solution-hands-on at 20200728
azure synapse analytics end-to-end solution-hands-on at 20200728azure synapse analytics end-to-end solution-hands-on at 20200728
azure synapse analytics end-to-end solution-hands-on at 20200728
Daichi Isami
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with Databricks
Amazon Web Services
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
DataWorks Summit
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
John Archer
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
DataStax Academy
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Mesosphere Inc.
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Progress
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
Matt Ingenthron
 
The next-phase-of-distributed-systems-with-apache-ignite
The next-phase-of-distributed-systems-with-apache-igniteThe next-phase-of-distributed-systems-with-apache-ignite
The next-phase-of-distributed-systems-with-apache-ignite
Dani Traphagen
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
John Burwell
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
BlueData, Inc.
 
Building Enterprise Clouds - Key Considerations and Strategies - RED HAT
Building Enterprise Clouds - Key Considerations and Strategies - RED HATBuilding Enterprise Clouds - Key Considerations and Strategies - RED HAT
Building Enterprise Clouds - Key Considerations and Strategies - RED HAT
Fadi Semaan
 

What's hot (20)

Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
azure synapse analytics end-to-end solution-hands-on at 20200728
azure synapse analytics end-to-end solution-hands-on at 20200728azure synapse analytics end-to-end solution-hands-on at 20200728
azure synapse analytics end-to-end solution-hands-on at 20200728
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with Databricks
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OSManage Microservices & Fast Data Systems on One Platform w/ DC/OS
Manage Microservices & Fast Data Systems on One Platform w/ DC/OS
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
The next-phase-of-distributed-systems-with-apache-ignite
The next-phase-of-distributed-systems-with-apache-igniteThe next-phase-of-distributed-systems-with-apache-ignite
The next-phase-of-distributed-systems-with-apache-ignite
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Building Enterprise Clouds - Key Considerations and Strategies - RED HAT
Building Enterprise Clouds - Key Considerations and Strategies - RED HATBuilding Enterprise Clouds - Key Considerations and Strategies - RED HAT
Building Enterprise Clouds - Key Considerations and Strategies - RED HAT
 

Viewers also liked

Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
Jim O'Neil
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
Nicolas Takashi
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
 
Azure IOT
Azure IOTAzure IOT
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Mike Martin
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Paco Nathan
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
SwitchPitch
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
Willem Meints
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Koray Kocabas
 
Going serverless
Going serverlessGoing serverless
Going serverless
TechExeter
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
Steve Lee
 
Software scope
Software scopeSoftware scope
Software scope
Shubham Dubey
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
Katie Bradford
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Toradex
 
Azure functions
Azure functionsAzure functions
Azure functions
vivek p s
 
Going serverless
Going serverlessGoing serverless
Going serverless
Jeremy Green
 

Viewers also liked (20)

Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Azure IOT
Azure IOTAzure IOT
Azure IOT
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
 
Software scope
Software scopeSoftware scope
Software scope
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
 
Azure functions
Azure functionsAzure functions
Azure functions
 
Going serverless
Going serverlessGoing serverless
Going serverless
 

Similar to Spark on Azure HDInsight - spark meetup seattle

Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
Jenn Rawlins
 
Drupal In The Cloud
Drupal In The CloudDrupal In The Cloud
Drupal In The Cloud
Bret Piatt
 
Openstack workshop @ Kalasalingam
Openstack workshop @ KalasalingamOpenstack workshop @ Kalasalingam
Openstack workshop @ Kalasalingam
Beny Raja
 
Workshop - Openstack, Cloud Computing, Virtualization
Workshop - Openstack, Cloud Computing, VirtualizationWorkshop - Openstack, Cloud Computing, Virtualization
Workshop - Openstack, Cloud Computing, Virtualization
Jayaprakash R
 
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaSOpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
Daniel Krook
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
Roberto Polli
 
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
Amazon Web Services
 
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfDIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
confluent
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Mark Hinkle
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Cloud Expo East 2013: Essential Open Source Software for Building the Open CloudCloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Mark Hinkle
 
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio TavillaOpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
Lorenzo Carnevale
 
Cisco Cloud Computing and Open Stack: Velocity 2011
Cisco Cloud Computing and Open Stack: Velocity 2011Cisco Cloud Computing and Open Stack: Velocity 2011
Cisco Cloud Computing and Open Stack: Velocity 2011
Cisco Service Provider
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
Cloud Computing using OpenStack
Cloud Computing using OpenStackCloud Computing using OpenStack
Cloud Computing using OpenStack
Jobayer Almahmud Hossain (RHCA, RHCDS, RHCSS)
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
Amazon Web Services Korea
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PivotalOpenSourceHub
 
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
DevDay.org
 

Similar to Spark on Azure HDInsight - spark meetup seattle (20)

Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Drupal In The Cloud
Drupal In The CloudDrupal In The Cloud
Drupal In The Cloud
 
Openstack workshop @ Kalasalingam
Openstack workshop @ KalasalingamOpenstack workshop @ Kalasalingam
Openstack workshop @ Kalasalingam
 
Workshop - Openstack, Cloud Computing, Virtualization
Workshop - Openstack, Cloud Computing, VirtualizationWorkshop - Openstack, Cloud Computing, Virtualization
Workshop - Openstack, Cloud Computing, Virtualization
 
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaSOpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
OpenStack and Cloud Foundry - Pair the leading open source IaaS and PaaS
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
AWS re:Invent 2016: Open Source at AWS—Contributions, Support, and Engagement...
 
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfDIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
963
963963
963
 
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Cloud Expo East 2013: Essential Open Source Software for Building the Open CloudCloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
 
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio TavillaOpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
OpenStack Identity - Keystone (liberty) by Lorenzo Carnevale and Silvio Tavilla
 
Cisco Cloud Computing and Open Stack: Velocity 2011
Cisco Cloud Computing and Open Stack: Velocity 2011Cisco Cloud Computing and Open Stack: Velocity 2011
Cisco Cloud Computing and Open Stack: Velocity 2011
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Cloud Computing using OpenStack
Cloud Computing using OpenStackCloud Computing using OpenStack
Cloud Computing using OpenStack
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
[DevDay 2016] OpenStack and approaches for new users - Speaker: Chi Le – Head...
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Spark on Azure HDInsight - spark meetup seattle

  • 1. Spark on HDInsight Seattle Spark Meetup on March 9, 2016 Presenters: Judy Nash & Lin Chan
  • 2. About Us  Azure HDInsight Service  Azure’s answer to big data with open source tech  deploy and manage clusters hosting Hadoop, HBase, Storm, and now Spark  Our Goal – Make Spark easy to use on Azure  How Do We Make It Happen  Deploy new spark clusters via SDK and Portal  Pre-configure and tune cluster for optimal experience  Adopt open source technologies to enhance spark workload  Contribute back to open source
  • 3. About the Talk  How to Build an Enterprise-ready Spark System  Deep Dive of HDInsight’s Spark Cluster  Cluster Architecture  Resource Manager  End-to-end Workflows  Business Intelligence  Remote Job Submission
  • 5. Why Yarn?  Standalone  Better UI  Less memory overhead  Faster application launch time  YARN  Better community support  More powerful resource management  Share resources with other job workflows  More user friendly to users who knew Hadoop on yarn already
  • 7. Addressing Multi-tenancy  Fair Scheduler  Allow sharing resources between queries within thrift server  Important for BI customers who share a cluster. Avoid bad query taking over a cluster.  To Use, set default queue type as “fair” scheduling  Dynamic Allocation  Allow sharing resources between thrift and other applications  Leave minimum footprints for customers who do not use thrift, but able to expand to maximum resource allowed when customers execute expensive queries
  • 8. What is Livy?  REST Server allowing remote job submission  2 modes currently: batch & interactive  Open source project  Co-development with Cloudera
  • 10. Sample Call  Submit a batch job curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{ "file":"wasb://mycontainer@mystorageaccount.blob.core.windows.net/data/Spar kSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches"  Check the job status curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"
  • 12. Sample Call  Start a Scala interactive session curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{ "kind":"spark" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions"  Post a statement curl -k --user "admin:mypassword1!" -v -H 'Content-Type: application/json' -X POST -d '{"code":"1+1" }' "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"  Check the statement result curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}/statements"  Terminate a session curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/sessions/{sessionId}”
  • 14. Livy vs Job Server  Had Job Server initially  Job server is not easy to use for simple jar submission or notebook case  Job server is good for embedding Spark work within a bigger app  Client mode is coming to Livy soon  Partner with Cloudera is important
  • 15. More on Livy  HDI online documentation: https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-livy-rest-interface  Livy Repo: https://github.com/cloudera/livy
  • 16. More on HDInsight  HDInsight Blog  https://blogs.msdn.microsoft.com/azuredatalake/  Contact Us  Lin Chan https://www.linkedin.com/in/linchanms  Judy Nash https://www.linkedin.com/in/judynash

Editor's Notes

  1. HDInsight – an Azure service dedicated to hosting big data solutions from open source communities. Azure service dedicated to deploy and manage clusters hosting big data solutions from open source
  2. Key concepts * What does the node types do * Introduce cluster daemons * Mentions HA, monitoring, telemetry – future spark talk topics 
  3. Talk Points What is business intelligence? Who are the customers? What is thrift? An open source protocol that handles data transfers between client and services. Similar to SOAP in functionality. Spark Thrift server -> at launch time creates a spark SQL application session -> sends queries to Spark SQL for processing