SlideShare a Scribd company logo
1 of 6
Download to read offline
Getting Started with Apache
Spark and Scala
Dinko Srkoč
Instantor Technology Services
About Apache Spark
(inevitable but hopefully quick intro)
● Started at UC Berkeley in 2009
● General purpose cluster computing system
● Fast: 10x on disk, 100x in memory vs Hadoop MapReduce
● Runs locally, in the cloud, on Hadoop, Mesos
● High level APIs in:
○ Scala
○ Python
○ Java
○ R
About Apache Spark
The Stack:
● SQL - SQL and semi/structured data processing
● MLLib - machine learning algorithms
● GraphX - graph processing
● Streaming - stream processing of
live data streams
Data collections in Spark
Collections: immutable, distributed, partitioned across nodes, operated in parallel
● Resilient Distributed Dataset (RDD)
○ Basic abstraction
○ Low-level API
○ Suitable for unstructured data (media, streams of text)
● Dataset/DataFrame
○ Dataset[T] - typed API, DataFrame (a.k.a. DataSet[Row]) - untyped API
○ High-level expressions: filters/maps, aggregations, averages, SQL queries, columnar access
○ optimizations
Demo
The Menu:
● Starter - spark shell
○ Loading from different sources
○ The inevitable word count example
● Intermediate - spark notebook
○ Documentation, data visualization
● Main course - back to shell
○ streaming
○ Spark UI
● Dessert - mini project:
○ SBT
○ Deploying to Google Cloud Dataproc
Thank you!
Questions?

More Related Content

What's hot

FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetupTheodoros Vasiloudis
 
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...Databricks
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedChao Chen
 
Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionDatabricks
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Spark Summit
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceDatabricks
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Spark Summit
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 

What's hot (20)

FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reduced
 
Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan Zvara
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 

Viewers also liked

Viewers also liked (20)

Javantura v4 - Let me tell you a story why Scrum is not for you - Roko Roić
Javantura v4 - Let me tell you a story why Scrum is not for you - Roko RoićJavantura v4 - Let me tell you a story why Scrum is not for you - Roko Roić
Javantura v4 - Let me tell you a story why Scrum is not for you - Roko Roić
 
Javantura v4 - KumuluzEE – Microservices with Java - Matjaž B. Jurič & Tilen ...
Javantura v4 - KumuluzEE – Microservices with Java - Matjaž B. Jurič & Tilen ...Javantura v4 - KumuluzEE – Microservices with Java - Matjaž B. Jurič & Tilen ...
Javantura v4 - KumuluzEE – Microservices with Java - Matjaž B. Jurič & Tilen ...
 
Javantura v4 - Support SpringBoot application development lifecycle using Ora...
Javantura v4 - Support SpringBoot application development lifecycle using Ora...Javantura v4 - Support SpringBoot application development lifecycle using Ora...
Javantura v4 - Support SpringBoot application development lifecycle using Ora...
 
Javantura v4 - Test-driven documentation with Spring REST Docs - Danijel Mitar
Javantura v4 - Test-driven documentation with Spring REST Docs - Danijel MitarJavantura v4 - Test-driven documentation with Spring REST Docs - Danijel Mitar
Javantura v4 - Test-driven documentation with Spring REST Docs - Danijel Mitar
 
Javantura v4 - Angular2 - Ionic2 - from birth to stable versions - Hrvoje Pek...
Javantura v4 - Angular2 - Ionic2 - from birth to stable versions - Hrvoje Pek...Javantura v4 - Angular2 - Ionic2 - from birth to stable versions - Hrvoje Pek...
Javantura v4 - Angular2 - Ionic2 - from birth to stable versions - Hrvoje Pek...
 
Javantura v4 - Spring Boot and JavaFX - can they play together - Josip Kovaček
Javantura v4 - Spring Boot and JavaFX - can they play together - Josip KovačekJavantura v4 - Spring Boot and JavaFX - can they play together - Josip Kovaček
Javantura v4 - Spring Boot and JavaFX - can they play together - Josip Kovaček
 
Javantura v4 - What’s NOT new in modular Java - Milen Dyankov
Javantura v4 - What’s NOT new in modular Java - Milen DyankovJavantura v4 - What’s NOT new in modular Java - Milen Dyankov
Javantura v4 - What’s NOT new in modular Java - Milen Dyankov
 
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
 
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
 
Javantura v4 - CroDuke Indy and the Kingdom of Java Skills - Branko Mihaljevi...
Javantura v4 - CroDuke Indy and the Kingdom of Java Skills - Branko Mihaljevi...Javantura v4 - CroDuke Indy and the Kingdom of Java Skills - Branko Mihaljevi...
Javantura v4 - CroDuke Indy and the Kingdom of Java Skills - Branko Mihaljevi...
 
Javantura v4 - DMN – supplement your BPMN - Željko Šmaguc
Javantura v4 - DMN – supplement your BPMN - Željko ŠmagucJavantura v4 - DMN – supplement your BPMN - Željko Šmaguc
Javantura v4 - DMN – supplement your BPMN - Željko Šmaguc
 
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
Javantura v4 - (Spring)Boot your application on Red Hat middleware stack - Al...
 
Javantura v4 - JVM++ The GraalVM - Martin Toshev
Javantura v4 - JVM++ The GraalVM - Martin ToshevJavantura v4 - JVM++ The GraalVM - Martin Toshev
Javantura v4 - JVM++ The GraalVM - Martin Toshev
 
Javantura v4 - FreeMarker in Spring web - Marin Kalapać
Javantura v4 - FreeMarker in Spring web - Marin KalapaćJavantura v4 - FreeMarker in Spring web - Marin Kalapać
Javantura v4 - FreeMarker in Spring web - Marin Kalapać
 
Javantura v4 - The power of cloud in professional services company - Ivan Krn...
Javantura v4 - The power of cloud in professional services company - Ivan Krn...Javantura v4 - The power of cloud in professional services company - Ivan Krn...
Javantura v4 - The power of cloud in professional services company - Ivan Krn...
 
Javantura v4 - Cloud-native Architectures and Java - Matjaž B. Jurič
Javantura v4 - Cloud-native Architectures and Java - Matjaž B. JuričJavantura v4 - Cloud-native Architectures and Java - Matjaž B. Jurič
Javantura v4 - Cloud-native Architectures and Java - Matjaž B. Jurič
 
Javantura v4 - True RESTful Java Web Services with JSON API and Katharsis - M...
Javantura v4 - True RESTful Java Web Services with JSON API and Katharsis - M...Javantura v4 - True RESTful Java Web Services with JSON API and Katharsis - M...
Javantura v4 - True RESTful Java Web Services with JSON API and Katharsis - M...
 
Javantura v4 - Security architecture of the Java platform - Martin Toshev
Javantura v4 - Security architecture of the Java platform - Martin ToshevJavantura v4 - Security architecture of the Java platform - Martin Toshev
Javantura v4 - Security architecture of the Java platform - Martin Toshev
 
Javantura v4 - Keycloak – instant login for your app - Marko Štrukelj
Javantura v4 - Keycloak – instant login for your app - Marko ŠtrukeljJavantura v4 - Keycloak – instant login for your app - Marko Štrukelj
Javantura v4 - Keycloak – instant login for your app - Marko Štrukelj
 
Javantura v4 - Android App Development in 2017 - Matej Vidaković
Javantura v4 - Android App Development in 2017 - Matej VidakovićJavantura v4 - Android App Development in 2017 - Matej Vidaković
Javantura v4 - Android App Development in 2017 - Matej Vidaković
 

Similar to Javantura v4 - Getting started with Apache Spark - Dinko Srkoč

New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraAnant Corporation
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Datio Big Data
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and RDatabricks
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Apache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsApache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsJulien Anguenot
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stackJunjun Olympia
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionRUHULAMINHAZARIKA
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 

Similar to Javantura v4 - Getting started with Apache Spark - Dinko Srkoč (20)

New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in CassandraCassandra Lunch #89: Semi-Structured Data in Cassandra
Cassandra Lunch #89: Semi-Structured Data in Cassandra
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)
 
Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Apache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsApache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analytics
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stack
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 

More from HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association

More from HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association (20)

Java cro'21 the best tools for java developers in 2021 - hujak
Java cro'21   the best tools for java developers in 2021 - hujakJava cro'21   the best tools for java developers in 2021 - hujak
Java cro'21 the best tools for java developers in 2021 - hujak
 
JavaCro'21 - Java is Here To Stay - HUJAK Keynote
JavaCro'21 - Java is Here To Stay - HUJAK KeynoteJavaCro'21 - Java is Here To Stay - HUJAK Keynote
JavaCro'21 - Java is Here To Stay - HUJAK Keynote
 
Javantura v7 - Behaviour Driven Development with Cucumber - Ivan Lozić
Javantura v7 - Behaviour Driven Development with Cucumber - Ivan LozićJavantura v7 - Behaviour Driven Development with Cucumber - Ivan Lozić
Javantura v7 - Behaviour Driven Development with Cucumber - Ivan Lozić
 
Javantura v7 - The State of Java - Today and Tomowwow - HUJAK's Community Key...
Javantura v7 - The State of Java - Today and Tomowwow - HUJAK's Community Key...Javantura v7 - The State of Java - Today and Tomowwow - HUJAK's Community Key...
Javantura v7 - The State of Java - Today and Tomowwow - HUJAK's Community Key...
 
Javantura v7 - Learning to Scale Yourself: The Journey from Coder to Leader -...
Javantura v7 - Learning to Scale Yourself: The Journey from Coder to Leader -...Javantura v7 - Learning to Scale Yourself: The Journey from Coder to Leader -...
Javantura v7 - Learning to Scale Yourself: The Journey from Coder to Leader -...
 
JavaCro'19 - The State of Java and Software Development in Croatia - Communit...
JavaCro'19 - The State of Java and Software Development in Croatia - Communit...JavaCro'19 - The State of Java and Software Development in Croatia - Communit...
JavaCro'19 - The State of Java and Software Development in Croatia - Communit...
 
Javantura v6 - Java in Croatia and HUJAK - Branko Mihaljević, Aleksander Radovan
Javantura v6 - Java in Croatia and HUJAK - Branko Mihaljević, Aleksander RadovanJavantura v6 - Java in Croatia and HUJAK - Branko Mihaljević, Aleksander Radovan
Javantura v6 - Java in Croatia and HUJAK - Branko Mihaljević, Aleksander Radovan
 
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
Javantura v6 - On the Aspects of Polyglot Programming and Memory Management i...
 
Javantura v6 - Case Study: Marketplace App with Java and Hyperledger Fabric -...
Javantura v6 - Case Study: Marketplace App with Java and Hyperledger Fabric -...Javantura v6 - Case Study: Marketplace App with Java and Hyperledger Fabric -...
Javantura v6 - Case Study: Marketplace App with Java and Hyperledger Fabric -...
 
Javantura v6 - How to help customers report bugs accurately - Miroslav Čerkez...
Javantura v6 - How to help customers report bugs accurately - Miroslav Čerkez...Javantura v6 - How to help customers report bugs accurately - Miroslav Čerkez...
Javantura v6 - How to help customers report bugs accurately - Miroslav Čerkez...
 
Javantura v6 - When remote work really works - the secrets behind successful ...
Javantura v6 - When remote work really works - the secrets behind successful ...Javantura v6 - When remote work really works - the secrets behind successful ...
Javantura v6 - When remote work really works - the secrets behind successful ...
 
Javantura v6 - Kotlin-Java Interop - Matej Vidaković
Javantura v6 - Kotlin-Java Interop - Matej VidakovićJavantura v6 - Kotlin-Java Interop - Matej Vidaković
Javantura v6 - Kotlin-Java Interop - Matej Vidaković
 
Javantura v6 - Spring HATEOAS hypermedia-driven web services, and clients tha...
Javantura v6 - Spring HATEOAS hypermedia-driven web services, and clients tha...Javantura v6 - Spring HATEOAS hypermedia-driven web services, and clients tha...
Javantura v6 - Spring HATEOAS hypermedia-driven web services, and clients tha...
 
Javantura v6 - End to End Continuous Delivery of Microservices for Kubernetes...
Javantura v6 - End to End Continuous Delivery of Microservices for Kubernetes...Javantura v6 - End to End Continuous Delivery of Microservices for Kubernetes...
Javantura v6 - End to End Continuous Delivery of Microservices for Kubernetes...
 
Javantura v6 - Istio Service Mesh - The magic between your microservices - Ma...
Javantura v6 - Istio Service Mesh - The magic between your microservices - Ma...Javantura v6 - Istio Service Mesh - The magic between your microservices - Ma...
Javantura v6 - Istio Service Mesh - The magic between your microservices - Ma...
 
Javantura v6 - How can you improve the quality of your application - Ioannis ...
Javantura v6 - How can you improve the quality of your application - Ioannis ...Javantura v6 - How can you improve the quality of your application - Ioannis ...
Javantura v6 - How can you improve the quality of your application - Ioannis ...
 
Javantura v6 - Just say it v2 - Pavao Varela Petrac
Javantura v6 - Just say it v2 - Pavao Varela PetracJavantura v6 - Just say it v2 - Pavao Varela Petrac
Javantura v6 - Just say it v2 - Pavao Varela Petrac
 
Javantura v6 - Automation of web apps testing - Hrvoje Ruhek
Javantura v6 - Automation of web apps testing - Hrvoje RuhekJavantura v6 - Automation of web apps testing - Hrvoje Ruhek
Javantura v6 - Automation of web apps testing - Hrvoje Ruhek
 
Javantura v6 - Master the Concepts Behind the Java 10 Challenges and Eliminat...
Javantura v6 - Master the Concepts Behind the Java 10 Challenges and Eliminat...Javantura v6 - Master the Concepts Behind the Java 10 Challenges and Eliminat...
Javantura v6 - Master the Concepts Behind the Java 10 Challenges and Eliminat...
 
Javantura v6 - Building IoT Middleware with Microservices - Mario Kusek
Javantura v6 - Building IoT Middleware with Microservices - Mario KusekJavantura v6 - Building IoT Middleware with Microservices - Mario Kusek
Javantura v6 - Building IoT Middleware with Microservices - Mario Kusek
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Javantura v4 - Getting started with Apache Spark - Dinko Srkoč

  • 1. Getting Started with Apache Spark and Scala Dinko Srkoč Instantor Technology Services
  • 2. About Apache Spark (inevitable but hopefully quick intro) ● Started at UC Berkeley in 2009 ● General purpose cluster computing system ● Fast: 10x on disk, 100x in memory vs Hadoop MapReduce ● Runs locally, in the cloud, on Hadoop, Mesos ● High level APIs in: ○ Scala ○ Python ○ Java ○ R
  • 3. About Apache Spark The Stack: ● SQL - SQL and semi/structured data processing ● MLLib - machine learning algorithms ● GraphX - graph processing ● Streaming - stream processing of live data streams
  • 4. Data collections in Spark Collections: immutable, distributed, partitioned across nodes, operated in parallel ● Resilient Distributed Dataset (RDD) ○ Basic abstraction ○ Low-level API ○ Suitable for unstructured data (media, streams of text) ● Dataset/DataFrame ○ Dataset[T] - typed API, DataFrame (a.k.a. DataSet[Row]) - untyped API ○ High-level expressions: filters/maps, aggregations, averages, SQL queries, columnar access ○ optimizations
  • 5. Demo The Menu: ● Starter - spark shell ○ Loading from different sources ○ The inevitable word count example ● Intermediate - spark notebook ○ Documentation, data visualization ● Main course - back to shell ○ streaming ○ Spark UI ● Dessert - mini project: ○ SBT ○ Deploying to Google Cloud Dataproc