SlideShare a Scribd company logo
1 of 13
Download to read offline
Scalable Machine Learning with 
Apache Spark 
Evan Casey 
@ev_ancasey
Who am I? 
● Engineer at Tapad 
● HackNY 2014 Fellow 
● Things I work on: 
○ Scala 
○ Distributed systems 
○ Hadoop/Spark
Overview 
● Apache Spark 
○ Dataflow model 
○ Spark vs Hadoop MapReduce 
○ Programming with Spark 
● Machine Learning with Spark 
○ MLlib overview 
○ Gradient descent example 
○ Distributed implementation on Apache Spark 
○ Lessons learned
Apache Spark 
● Distributed data-processing 
framework built on top of HDFS 
● Use cases: 
○ Interactive analytics 
○ Graph processing 
○ Stream processing 
○ Scalable ML
Why Spark? 
● Up to 100x faster than 
Hadoop 
● Built on top of Akka 
● Expressive APIs in Scala, 
Java, and Python 
● Active open-source 
community
Spark vs Hadoop MapReduce 
● In-memory data flow model 
optimized for multi-stage 
jobs 
● Novel approach to fault 
tolerance 
● Similar programming style 
to Scalding/Cascading
Programming Model 
● Resilient Distributed Dataset (RDD) 
○ Textfile, parallelize 
● Parallel Operations 
○ Map, GroupBy, Filter, Join, etc 
● Optimizations 
○ Caching, shared variables
Wordcount Example 
val sc = new SparkContext() 
val file = sc.textFile("hdfs://...") 
val counts = file.flatMap(line => line.split(" 
")) 
.map(word => (word, 1)) 
.reduceByKey(_ + _) 
counts.saveAsTextFile("hdfs://...") 
// counts.cache 
// sc.broadcast(counts)
Machine Learning in Spark 
Algorithms: 
- classification: logistic regression, linear SVM, naive bayes, random 
forests 
- regression: generalized linear models, regression tree 
- collaborative filtering: alternating least squares (ALS), non-negative 
matrix factorization (NMF) 
- clustering: k-means 
- decomposition: singular value decompositions (SVD), principal 
component analysis (PCA)
K-Means Clustering 
val data = sc.textFile("hdfs://...") 
val parsedData = data.map(_.split(‘ ‘).map(_. 
toDouble)).cache() 
// Cluster the data into two classes 
val clusters = KMeans.train(parsedData, 2, 
numIterations = 20) 
// Compute the sum of squared errors 
val cost = clusters.computeCost(parsedData)
Gradient Descent Example 
val file = sc.textFile("hdfs://...") 
val points = file.map(parsePoint).cache() 
var w = Vector.zeros(d) 
for (i <- 1 to numIterations) { 
(1 / (1 + exp(-p.y * w.dot(p.x)) -1) * p.y * 
p.x).reduce(_+_) 
w -= alpha * gradient 
}
About Tapad 
● 350k QPS 
● Ingest multiple TBs daily 
● Kafka, Scalding, Spark, Zookeeper, Aerospike 
● We’re hiring! :)
Thanks! 
@ev_ancasey 
Questions?

More Related Content

What's hot

10 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 201910 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 2019JanBask Training
 
JDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataJDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataPROIDEA
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Spark intro iwomm_2017-07-19
Spark intro iwomm_2017-07-19Spark intro iwomm_2017-07-19
Spark intro iwomm_2017-07-19Erik Schmiegelow
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of dataPiyush Katariya
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGPradeep MG
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using sparkDatabricks
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stackJunjun Olympia
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeGlobant
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Datasamuel shamiri
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm alogarg
 
Workspace Management
Workspace ManagementWorkspace Management
Workspace Managementwaldotyson
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REXdidmarin
 

What's hot (20)

HBase introduction talk
HBase introduction talkHBase introduction talk
HBase introduction talk
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
10 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 201910 big data analytics tools to watch out for in 2019
10 big data analytics tools to watch out for in 2019
 
JDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big DataJDD 2016 - Michal Matloka - Small Intro To Big Data
JDD 2016 - Michal Matloka - Small Intro To Big Data
 
Spark
SparkSpark
Spark
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Spark intro iwomm_2017-07-19
Spark intro iwomm_2017-07-19Spark intro iwomm_2017-07-19
Spark intro iwomm_2017-07-19
 
Apache spark
Apache sparkApache spark
Apache spark
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stack
 
Sistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta LakeSistema de recomendación entiempo real usando Delta Lake
Sistema de recomendación entiempo real usando Delta Lake
 
SparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big DataSparkR-Advance Analytic for Big Data
SparkR-Advance Analytic for Big Data
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm
 
CCI DAY PRESENTATION
CCI DAY PRESENTATIONCCI DAY PRESENTATION
CCI DAY PRESENTATION
 
Workspace Management
Workspace ManagementWorkspace Management
Workspace Management
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 

Similar to Machine Learning with Apache Spark - HackNY Masters

Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkDatabricks
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingAnimesh Chaturvedi
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study NotesRichard Kuo
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfWalmirCouto3
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 

Similar to Machine Learning with Apache Spark - HackNY Masters (20)

Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Spark core
Spark coreSpark core
Spark core
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 
Scala+data
Scala+dataScala+data
Scala+data
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 

Recently uploaded

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 

Recently uploaded (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 

Machine Learning with Apache Spark - HackNY Masters

  • 1. Scalable Machine Learning with Apache Spark Evan Casey @ev_ancasey
  • 2. Who am I? ● Engineer at Tapad ● HackNY 2014 Fellow ● Things I work on: ○ Scala ○ Distributed systems ○ Hadoop/Spark
  • 3. Overview ● Apache Spark ○ Dataflow model ○ Spark vs Hadoop MapReduce ○ Programming with Spark ● Machine Learning with Spark ○ MLlib overview ○ Gradient descent example ○ Distributed implementation on Apache Spark ○ Lessons learned
  • 4. Apache Spark ● Distributed data-processing framework built on top of HDFS ● Use cases: ○ Interactive analytics ○ Graph processing ○ Stream processing ○ Scalable ML
  • 5. Why Spark? ● Up to 100x faster than Hadoop ● Built on top of Akka ● Expressive APIs in Scala, Java, and Python ● Active open-source community
  • 6. Spark vs Hadoop MapReduce ● In-memory data flow model optimized for multi-stage jobs ● Novel approach to fault tolerance ● Similar programming style to Scalding/Cascading
  • 7. Programming Model ● Resilient Distributed Dataset (RDD) ○ Textfile, parallelize ● Parallel Operations ○ Map, GroupBy, Filter, Join, etc ● Optimizations ○ Caching, shared variables
  • 8. Wordcount Example val sc = new SparkContext() val file = sc.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // counts.cache // sc.broadcast(counts)
  • 9. Machine Learning in Spark Algorithms: - classification: logistic regression, linear SVM, naive bayes, random forests - regression: generalized linear models, regression tree - collaborative filtering: alternating least squares (ALS), non-negative matrix factorization (NMF) - clustering: k-means - decomposition: singular value decompositions (SVD), principal component analysis (PCA)
  • 10. K-Means Clustering val data = sc.textFile("hdfs://...") val parsedData = data.map(_.split(‘ ‘).map(_. toDouble)).cache() // Cluster the data into two classes val clusters = KMeans.train(parsedData, 2, numIterations = 20) // Compute the sum of squared errors val cost = clusters.computeCost(parsedData)
  • 11. Gradient Descent Example val file = sc.textFile("hdfs://...") val points = file.map(parsePoint).cache() var w = Vector.zeros(d) for (i <- 1 to numIterations) { (1 / (1 + exp(-p.y * w.dot(p.x)) -1) * p.y * p.x).reduce(_+_) w -= alpha * gradient }
  • 12. About Tapad ● 350k QPS ● Ingest multiple TBs daily ● Kafka, Scalding, Spark, Zookeeper, Aerospike ● We’re hiring! :)