Submit Search
Upload
Intro to Spark & Zeppelin - Crash Course - HS16SJ
•
10 likes
•
1,837 views
DataWorks Summit/Hadoop Summit
Follow
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Read less
Read more
Technology
Report
Share
Report
Share
1 of 51
Download Now
Download to read offline
Recommended
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
More Related Content
What's hot
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
The Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hortonworks
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
DataWorks Summit
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Apache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Automated Analytics at Scale
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
What's hot
(20)
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
The Elephant in the Clouds
The Elephant in the Clouds
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
Modernise your EDW - Data Lake
Modernise your EDW - Data Lake
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Apache Hadoop Crash Course
Apache Hadoop Crash Course
Automated Analytics at Scale
Automated Analytics at Scale
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Viewers also liked
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
Apache spark Intro
Apache spark Intro
Tudor Lapusan
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Koji Kawamura
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
Koji Kawamura
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Aerospike, Inc.
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
Koji Kawamura
Togaf introduction and core concepts
Togaf introduction and core concepts
Paul Sullivan
Apache Hadoop YARN
Apache Hadoop YARN
Adam Kawa
TOGAF 9 Architectural Artifacts
TOGAF 9 Architectural Artifacts
Maganathin Veeraragaloo
TOGAF Complete Slide Deck
TOGAF Complete Slide Deck
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
Hiveを高速化するLLAP
Hiveを高速化するLLAP
Yahoo!デベロッパーネットワーク
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
Henry Saputra
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
Terence Yim
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
DataWorks Summit/Hadoop Summit
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
Sam Mandebvu
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
Viewers also liked
(20)
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
Apache spark Intro
Apache spark Intro
Data Science with Apache Spark - Crash Course - HS16SJ
Data Science with Apache Spark - Crash Course - HS16SJ
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
Kafka含むデータ処理フローを NiFiで構築するさまを実演する5分間
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
Apache NiFiで、楽して、つながる、広がる IoTプロジェクト
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
Togaf introduction and core concepts
Togaf introduction and core concepts
Apache Hadoop YARN
Apache Hadoop YARN
TOGAF 9 Architectural Artifacts
TOGAF 9 Architectural Artifacts
TOGAF Complete Slide Deck
TOGAF Complete Slide Deck
Hiveを高速化するLLAP
Hiveを高速化するLLAP
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Similar to Intro to Spark & Zeppelin - Crash Course - HS16SJ
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
DataWorks Summit/Hadoop Summit
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
Intel® Software
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Abdelkrim Hadjidj
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
Edelweiss Kammermann
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
MarketingArrowECS_CZ
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Anand Haridass
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Frank Munz
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Neo4j
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
Data core overview - haluk-final
Data core overview - haluk-final
Haluk Ulubay
Similar to Intro to Spark & Zeppelin - Crash Course - HS16SJ
(20)
Apache Spark Crash Course
Apache Spark Crash Course
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
Intro to Spark with Zeppelin
Intro to Spark with Zeppelin
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Oracle SPARC T7 a M7 servery
Oracle SPARC T7 a M7 servery
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Data core overview - haluk-final
Data core overview - haluk-final
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Recently uploaded
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
Bachir Benyammi
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
Udaiappa Ramachandran
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
Brian Pichman
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Aijun Zhang
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
Seth Reyes
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
Jamie (Taka) Wang
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
DianaGray10
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
Mahmoud Rabie
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
Adam Moalla
20150722 - AGV
20150722 - AGV
Jamie (Taka) Wang
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UbiTrack UK
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
Tarek Kalaji
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
Daniel Santiago Silva Capera
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Will Schroeder
Nanopower In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
Pedro Manuel
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
Matt Ray
20230104 - machine vision
20230104 - machine vision
Jamie (Taka) Wang
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
David Newbury
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
DianaGray10
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
bruanjhuli
Recently uploaded
(20)
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
20150722 - AGV
20150722 - AGV
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Nanopower In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
20230104 - machine vision
20230104 - machine vision
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
Intro to Spark & Zeppelin - Crash Course - HS16SJ
1.
Robert Hryniewicz Data Evangelist @RobHryniewicz Hands-on Intro to Spark & Zeppelin Crash Course
2.
2 © Hortonworks Inc. 2011 –2016. All Rights Reserved The “Big Data” Problem à A single machine cannot process or even store all the data! Problem Solution Ã
Distribute data over large clusters Difficulty à How to split work across machines? à Moving data over network is expensive à Must consider data & network locality à How to deal with failures? à How to deal with slow nodes?
3.
3 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Background
4.
4 © Hortonworks Inc. 2011 –2016. All Rights Reserved Access Rates At least an order of magnitude difference between memory and hard drive / network speed FAST slow
slow
5.
5 © Hortonworks Inc. 2011 –2016. All Rights Reserved What is Spark? Ã Apache Open Source Project
- originally developed at AMPLab (University of California Berkeley) à Data Processing Engine - focused on in-memory distributed computing use-cases à API - Scala, Python, Java and R
6.
6 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Ecosystem Spark Core Spark SQL Spark Streaming
MLLib GraphX
7.
7 © Hortonworks Inc. 2011 –2016. All Rights Reserved Why Spark? Ã Elegant Developer APIs –
Single environment for data munging and Machine Learning (ML) Ã In-memory computation model – Fast! – Effective for iterative computations and ML Ã Machine Learning – Implementation of distributed ML algorithms – Pipeline API (Spark ML)
8.
8 © Hortonworks Inc. 2011 –2016. All Rights Reserved History of Hadoop & Spark
9.
9 © Hortonworks Inc. 2011 –2016. All Rights Reserved Apache Spark Basics
10.
10 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Context à Main entry point for Spark functionality Ã
Represents a connection to a Spark cluster à Represented as sc in your code What is it?
11.
11 © Hortonworks Inc. 2011 –2016. All Rights Reserved RDD - Resilient Distributed Dataset Ã
Primary abstraction in Spark – An Immutable collection of objects (or records, or elements) that can be operated on in parallel à Distributed – Collection of elements partitioned across nodes in a cluster – Each RDD is composed of one or more partitions – User can control the number of partitions – More partitions => more parallelism à Resilient – Recover from node failures – An RDD keeps its lineage information -> it can be recreated from parent RDDs à Created by starting with a file in Hadoop Distributed File System (HDFS) or an existing collection in the driver program à May be persisted in memory for efficient reuse across parallel operations (caching)
12.
12 © Hortonworks Inc. 2011 –2016. All Rights Reserved RDD – Resilient Distributed Dataset Partition 1 Partition 2 Partition 3 RDD 2 Partition 1 Partition 2 Partition 3 Partition 4 RDD 1 Cluster Nodes
13.
13 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark SQL
14.
14 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark SQL Overview à Spark module for structured data processing (e.g. DB tables, JSON files) Ã
Three ways to manipulate data: – DataFrames API – SQL queries – Datasets API à Same execution engine for all three à Spark SQL interfaces provide more information about both structure and computation being performed than basic Spark RDD API
15.
15 © Hortonworks Inc. 2011 –2016. All Rights Reserved DataFrames à Conceptually
equivalent to a table in relational DB or data frame in R/Python à API available in Scala, Java, Python, and R à Richer optimizations (significantly faster than RDDs) à Distributed collection of data organized into named columns à Underneath is an RDD
16.
16 © Hortonworks Inc. 2011 –2016. All Rights Reserved DataFrames CSVAvro HIVE Spark SQL Text Col1 Col2
… … ColN DataFrame (with RDD underneath) Column Row Created from Various Sources à DataFrames from HIVE: – Reading and writing HIVE tables, including ORC à DataFrames from files: – Built-in: JSON, JDBC, ORC, Parquet, HDFS – External plug-in: CSV, HBASE, Avro à DataFrames from existing RDDs – with toDF()function Data is described as a DataFrame with rows, columns and a schema
17.
17 © Hortonworks Inc. 2011 –2016. All Rights Reserved SQL Context and Hive Context à Entry point into all functionality in Spark SQL Ã
All you need is SparkContext val sqlContext = SQLContext(sc) SQLContext à Superset of functionality provided by basic SQLContext – Read data from Hive tables – Access to Hive Functions à UDFs HiveContext val hc = HiveContext(sc) Use when your data resides in Hive
18.
18 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark SQL Examples
19.
19 © Hortonworks Inc. 2011 –2016. All Rights Reserved DataFrame Example val
df = sqlContext.table("flightsTbl") df.select("Origin", "Dest", "DepDelay").show(5) Reading Data From Table +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 8| | IAD| TPA| 19| | IND| BWI| 8| | IND| BWI| -4| | IND| BWI| 34| +------+----+--------+
20.
20 © Hortonworks Inc. 2011 –2016. All Rights Reserved DataFrame Example df.select("Origin",
"Dest", "DepDelay”).filter($"DepDelay" > 15).show(5) Using DataFrame API to Filter Data (show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
21.
21 © Hortonworks Inc. 2011 –2016. All Rights Reserved SQL Example // Register
Temporary Table df.registerTempTable("flights") // Use SQL to Query Dataset sqlContext.sql("SELECT Origin, Dest, DepDelay FROM flights WHERE DepDelay > 15 LIMIT 5").show Using SQL to Query and Filter Data (again, show delays more than 15 min) +------+----+--------+ |Origin|Dest|DepDelay| +------+----+--------+ | IAD| TPA| 19| | IND| BWI| 34| | IND| JAX| 25| | IND| LAS| 67| | IND| MCO| 94| +------+----+--------+
22.
22 © Hortonworks Inc. 2011 –2016. All Rights Reserved RDD vs. DataFrame
23.
23 © Hortonworks Inc. 2011 –2016. All Rights Reserved RDDs vs. DataFrames RDD DataFrame à Lower-level API (more control) Ã
Lots of existing code & users à Compile-time type-safety à Higher-level API (faster development) à Faster sorting, hashing, and serialization à More opportunities for automatic optimization à Lower memory pressure
24.
24 © Hortonworks Inc. 2011 –2016. All Rights Reserved Data Frames
are Intuitive RDD Example Equivalent Data Frame Example dept name age Bio H Smith 48 CS A Turing 54 Bio B Jones 43 Phys E Witten 61 Find average age by department?
25.
25 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark SQL Optimizations à Spark SQL uses an underlying optimization engine (Catalyst) –
Catalyst can perform intelligent optimization since it understands the schema à Spark SQL does not materialize all the columns (as with RDD) only what’s needed
26.
26 © Hortonworks Inc. 2011 –2016. All Rights Reserved Catalyst: Spark SQL optimizer à Query or data frame operations modeled as a tree Ã
Logical plan created and optimized à Various physical plans created; best plan chosen à Code generation and execution
27.
27 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Streaming
28.
28 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Streaming à Extension of Spark Core API Ã
Stream processing of live data streams – Scalable – High-throughput – Fault-tolerant Overview
29.
29 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Streaming
30.
30 © Hortonworks Inc. 2011 –2016. All Rights Reserved Spark Streaming à Apply transformations over a sliding window of data, e.g. rolling average Window Operations
31.
31 © Hortonworks Inc. 2011 –2016. All Rights Reserved Apache Zeppelin & HDP Sandbox
32.
32 © Hortonworks Inc. 2011 –2016. All Rights Reserved Apache Zeppelin – A Modern Web-based Data Science Studio Ã
Data exploration and discovery à Visualization à Deeply integrated with Spark and Hadoop à Pluggable interpreters à Multiple languages in one notebook: R, Python, Scala
33.
33 © Hortonworks Inc. 2011 –2016. All Rights Reserved
34.
34 © Hortonworks Inc. 2011 –2016. All Rights Reserved
35.
35 © Hortonworks Inc. 2011 –2016. All Rights Reserved
36.
36 © Hortonworks Inc. 2011 –2016. All Rights Reserved What’s not included with Spark? ResourceManagement Storage Applications Spark Core Engine Scala Java Python libraries MLlib (Machine learning) Spark SQL* Spark Streaming* Spark Core Engine
37.
37 © Hortonworks Inc. 2011 –2016. All Rights Reserved HDP Sandbox What’s included in the Sandbox? Ã Zeppelin Ã
Latest Hortonworks Data Platform (HDP) – Spark – YARN à Resource Management – HDFS à Distributed Storage Layer – And many more components... YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
38.
38 © Hortonworks Inc. 2011 –2016. All Rights Reserved Access patterns
enabled by YARN YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °N HDFS Hadoop Distributed File System Interactive Real-TimeBatch Applications Batch Needs to happen but, no timeframe limitations Interactive Needs to happen at Human time Real-Time Needs to happen at Machine Execution time.
39.
39 © Hortonworks Inc. 2011 –2016. All Rights Reserved Why Spark on YARN? Ã Utilize existing HDP cluster infrastructure Ã
Resource management – share Spark workloads with other workloads like PIG, HIVE, etc. Ã Scheduling and queues Spark Driver Client Spark Application Master YARN container Spark Executor YARN container Task Task Spark Executor YARN container Task Task Spark Executor YARN container Task Task
40.
40 © Hortonworks Inc. 2011 –2016. All Rights Reserved Why HDFS? Fault
Tolerant Distributed Storage • Divide files into big blocks and distribute 3 copies randomlyacross the cluster • Processing Data Locality • Not Just storage but computation 10110100101 00100111001 11111001010 01110100101 00101100100 10101001100 01010010111 01011101011 11011011010 10110100101 01001010101 01011100100 11010111010 0 Logical File 1 2 3 4 Blocks 1 Cluster 1 1 2 2 2 3 3 34 4 4
41.
41 © Hortonworks Inc. 2011 –2016. All Rights Reserved There’s more
to HDP YARN : Data Operating System DATA ACCESS SECURITY GOVERNANCE & INTEGRATION OPERATIONS 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Data Lifecycle & Governance Falcon Atlas Administration Authentication Authorization Auditing Data Protection Ranger Knox Atlas HDFS EncryptionData Workflow Sqoop Flume Kafka NFS WebHDFS Provisioning, Managing, & Monitoring Ambari Cloudbreak Zookeeper Scheduling Oozie Batch MapReduce Script Pig Search Solr SQL Hive NoSQL HBase Accumulo Phoenix Stream Storm In-memory Others ISV Engines Tez Tez Slider Slider DATA MANAGEMENT Hortonworks Data Platform 2.4.x Deployment ChoiceLinux Windows On-Premise Cloud HDFS Hadoop Distributed File System
42.
42 © Hortonworks Inc. 2011 –2016. All Rights Reserved HDP 2.5 TP
43.
43 © Hortonworks Inc. 2011 –2016. All Rights Reserved
44.
44 © Hortonworks Inc. 2011 –2016. All Rights Reserved
45.
45 © Hortonworks Inc. 2011 –2016. All Rights Reserved View User Sessions
46.
46 © Hortonworks Inc. 2011 –2016. All Rights Reserved Hortonworks Community Connection
47.
47 © Hortonworks Inc. 2011 –2016. All Rights Reserved Hortonworks Community Connection Read access
for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
48.
48 © Hortonworks Inc. 2011 –2016. All Rights Reserved Community Engagement Participate now
at: community.hortonworks.com© Hortonworks Inc. 2011 –2015. All Rights Reserved 7,500+ Registered Users 15,000+ Answers 20,000+ Technical Assets One Website!
49.
49 © Hortonworks Inc. 2011 –2016. All Rights Reserved Lab Preview
50.
50 © Hortonworks Inc. 2011 –2016. All Rights Reserved Link to Tutorial with Lab Instructions http://tinyurl.com/hwx-intro-to-spark
51.
Robert Hryniewicz @RobHryniewicz Thanks!
Download Now