SlideShare a Scribd company logo
Apache Spark
Mate Gulyas
WHY WE DO IT?
54% average
viewability**
36% non-human
visitor *
What percentage of
digital ads reach
people?
Clickbots,
botnets
Invisible,
hidden ads
Transparency in
the market
*http://technorati.com/iab-keynote-36-percent-ad-traffic-from-bots-and-threatening-industry/
**http://www.statista.com/statistics/255061/viewability-rates-for-rich-media-ads-worldwide-by-industry/
JavaScript SegmentationBehaviour analysis
WHAT WE DO?
Distributed data processing
WHAT DO WE NEED?
Averge size client
30 GB / day
900 GB / month
20 average size clients
600 GB / day
18 TB / month
Recurring data
transformations
WHAT DO WE NEED?
Interactive / Batch /
Streaming / SQL /
Graph processing
IT’S SO COOL
In-memory
WHY SPARK?
Productive API
WHY SPARK?
Multiple language
WHY SPARK?
Active community
WHY SPARK?
friendly
WHY SPARK?
developer
analyst
CFO
friendly
WHY SPARK?
developer
analyst
CFO
friendly
WHY SPARK?
developer
analyst
CFO
Resilient Distributed
Dataset (RDD)
ONE THING TO REMEMBER
RDD
IT RUN’S ON
Mesos
YARN
Standalone
AWS EC2
IT GET’S DATA FROM?
Amazon S3, HDFS,
Cassandra, Hive, Hbase,
Tachyon, Local Filesystem,
ODBC databases, etc...
Batch processing
THE OLD WAY
Interactive analytics
THE NEW WAY
SPARK WITH IPYTHON
Spark SQL
THE NOT THAT OLD WAY
{"name": "Mate Gulyas", "twitter": "gulyasm"}
{"name": "John Doe", "email": "jdoe@freemail.com"}
{"name": "Jane Doe", "email": "janedoe@citromail.com"}
val input = hiveCtx.jsonFile(“example.json”)
input.registerAsTable(“users”)
hiveCtx.sql(“SELECT name, twitter FROM people;”)
SQL WITH JSON
Spark Streaming
THE LOW LATENCY WAY
DStream
MLlib
THE SKYNET WAY
GraphX
I LOVE GRAPHS
Third party modules
THE OTHERS WAY
On-premises
AWS
Databricks Cloud
BUT… WHERE TO GO?
TAKEAWAY I
Spark can provide one
platform to cover most of
the use-cases in data
analytics
TAKEAWAY II
Productive, fast data
processing framework that
helps you minimize to time
business impact.
MATE GULYAS
gulyasm@enbrite.ly
@gulyasm
@enbritely
THANK YOU!

More Related Content

Similar to Apache Spark: The modern data analytics platform

Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...
Manisha Sule
 
Bigdata
BigdataBigdata
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
Roger Giuffre
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
Amr Awadallah
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
Roger Giuffre
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the Enterprise
MariaDB plc
 
[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World
WSO2
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
Jos van Dongen
 
Hacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & SoftwareHacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & Software
Ensighten
 
Low-code is developing and will continue to progress in 2023. (1).pdf
Low-code is developing and will continue to progress in 2023.  (1).pdfLow-code is developing and will continue to progress in 2023.  (1).pdf
Low-code is developing and will continue to progress in 2023. (1).pdf
Argpnteq
 
Hacking Marketing Q&A Session
Hacking Marketing Q&A SessionHacking Marketing Q&A Session
Hacking Marketing Q&A Session
Scott Brinker
 
Real-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoTReal-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoT
All Things Open
 
CDS + Power Apps
CDS + Power Apps CDS + Power Apps
CDS + Power Apps
Juan Fabian
 
Transforma Insights
Transforma InsightsTransforma Insights
Transforma Insights
eduardo schettino
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
Red_Hat_Storage
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Dominik Obermaier
 
BSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsBSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation Labs
Mount Talent Consulting
 
Accelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.ioAccelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.io
CA Technologies
 
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
Richard Harbridge
 

Similar to Apache Spark: The modern data analytics platform (20)

Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...Microservics, serverless and real time; Building blocks of the modern data pi...
Microservics, serverless and real time; Building blocks of the modern data pi...
 
Bigdata
BigdataBigdata
Bigdata
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
An Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech CompanyAn Innovative Big-Data Web Scraping Tech Company
An Innovative Big-Data Web Scraping Tech Company
 
When Open Source Meets the Enterprise
When Open Source Meets the EnterpriseWhen Open Source Meets the Enterprise
When Open Source Meets the Enterprise
 
[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World[WSO2 Summit Brazil 2018] The API-driven World
[WSO2 Summit Brazil 2018] The API-driven World
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
Hacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & SoftwareHacking Marketing: The Amazing Convergence of Marketing & Software
Hacking Marketing: The Amazing Convergence of Marketing & Software
 
Low-code is developing and will continue to progress in 2023. (1).pdf
Low-code is developing and will continue to progress in 2023.  (1).pdfLow-code is developing and will continue to progress in 2023.  (1).pdf
Low-code is developing and will continue to progress in 2023. (1).pdf
 
Hacking Marketing Q&A Session
Hacking Marketing Q&A SessionHacking Marketing Q&A Session
Hacking Marketing Q&A Session
 
Real-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoTReal-World, Open Source, End-to-End JavaScript in IoT
Real-World, Open Source, End-to-End JavaScript in IoT
 
CDS + Power Apps
CDS + Power Apps CDS + Power Apps
CDS + Power Apps
 
Transforma Insights
Transforma InsightsTransforma Insights
Transforma Insights
 
Why Software-Defined Storage Matters
Why Software-Defined Storage MattersWhy Software-Defined Storage Matters
Why Software-Defined Storage Matters
 
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQBuilding a reliable and scalable IoT platform with MongoDB and HiveMQ
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
 
BSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsBSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation Labs
 
Accelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.ioAccelerate IoT Development with KnowThings.io
Accelerate IoT Development with KnowThings.io
 
Greetings david cutler inform and connect
Greetings   david cutler inform and connectGreetings   david cutler inform and connect
Greetings david cutler inform and connect
 
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
WORKSHOP: STRATEGY AND SUCCESS WITH OFFICE 365: PRACTICAL TOOLS AND TECHNIQUE...
 

Recently uploaded

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 

Recently uploaded (20)

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 

Apache Spark: The modern data analytics platform