SlideShare a Scribd company logo
1 of 59
Download to read offline
Click to edit Master text styles
Click to edit Master text styles

 

 

 

 

After Dark
Real-time Advanced Analytics, Machine Learning, 

Graph Analytics, Text NLP, and Recommendations

Barcelona Spark Meetup
Oct 20th, 2015
Chris Fregly
Principal Data Solutions Engineer
IBM Spark Technology Center
** We’re Hiring!! Nice People Only, Please. **
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Who Am I?
2

Streaming Data Engineer
Netflix Open Source Committer

Data Solutions Engineer

Apache Contributor

Principal Data Solutions Engineer
IBM Technology Center
Meetup Organizer
Advanced Apache Meetup
Book Author
Advanced (2016)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Advanced Apache Spark Meetup
Total Spark Experts: ~1350+ in 3 mos!
#4 most active Spark Meetup in the world!
Main Goals
Dig deep into the Spark & extended-Spark codebase
Study integrations such as Cassandra, ElasticSearch,
Tachyon, S3, BlinkDB, Mesos, YARN, Kafka, R, etc
Surface and share the patterns and idioms of these
well-designed, distributed, big data components
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
4
Core
Spark
Streaming
real-timeSpark SQL
structured data
MLlib
machine
learning
GraphX
graph
analytics
…	
BlinkDB
approx queries
What is Spark?
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Spark Deployments In Production
5
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Tools of the Talk
6
  Redis
  Docker
  Cassandra
  MLlib, GraphX
  Parquet, JSON
  Apache Zeppelin
  Spark Streaming, Kafka
  Spark SQL, DataFrames
  Spark JDBC/ODBC Hive ThriftServer
  ElasticSearch, Logstash, Kibana (ELK)
and…
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
SMACK Stack!
7

S park (Data Processing)
M esos (Cluster Manager)
A kka (Actors)
C assandra (NoSQL)
K afka (Streaming)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Themes of this Talk
  Parallelism
  Performance
  Streaming
  Approximations
  Similarity Measures
  Recommendations
8
and…
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Goals of Spark After Dark
  Generate high-quality recommendations
  Demonstrate Spark high-level libraries


 
 
 
Spark Streaming -> Kafka, Approximates


 
 
 
 Spark SQL -> DataFrames, Cassandra

  GraphX -> PageRank, Shortest Path
  MLlib -> Matrix Factor, Word2Vec

9
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Popular Dating Sites
10
Click to edit Master text styles
Click to edit Master text styles
Parallelism
11
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
My First Experience With Parallelism
Brady Bunch circa 1980
Season 5, Episode 18: “Two Pete’s in a Pod”





12
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Parallel Algorithm: O(log n)
13
O(log n)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Non-Parallel Algorithm: O(n)
14
O(n)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Spark is Parallel!
15
Click to edit Master text styles
Click to edit Master text styles
Performance
16
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Spark Beats Hadoop @ 100 TB GraySort
17
  On-disk only
  28,000 partitions
  No in-memory caching
(2014)(2013) (2014)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Improved Shuffle and Network Layer
  “Sort-based shuffle”
  Minimize OS resources
  Switched to async Netty
  Keep CPUs hot
  Reuse byte buffers to minimize GC
  Use epoll for I/O to stay in kernel space
18
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Project Tungsten: CPU and Memory
  More JVM bytecode generation, JIT optimize
  CPU-cache-aware data structs and algos
-->
  Custom memory management
Serializers Performance New HashMap

19
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
DataFrames and Catalyst Optimizer
20
20
https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/
Please Use
DataFrames!
-->
-->
JVM bytecode
generation
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Columnar Storage Format
21
Skip whole chunks with min-max heuristics

stored in each chunk (sorted data only)
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Parquet File Format
 Based on Google Dremel 
 Implemented by Twitter and Cloudera
 Columnar storage format
 Optimized for fast columnar aggregations
 Tight compression
 Supports pushdowns
 Nested, self-describing, evolving schema
22
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Types of Compression
  Run Length Encoding: Repeated data
  Dictionary Encoding: Fixed set of values
  Delta, Prefix Encoding: Sorted data
23
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Types of Query Optimizations
  Column, Partition Pruning
  Row, Predicate Pushdown
SELECT b FROM table WHERE a in [a2,a3]



24
Click to edit Master text styles
Click to edit Master text styles
Streaming
25
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Direct Kafka Streaming – KafkaRDD
  No single Receiver, no Write Ahead Log (WAL)
  Workers pull from Kafka in parallel
  Each KafkaRDD partition stores relevant offsets
  Upon Worker Node failure, rebuild from offsets
  Optimizes happy path by avoiding the WAL

26
At least once
delivery guarantee
<--
Click to edit Master text styles
Click to edit Master text styles
Approximations
27
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Count Min Sketch
  Approximate counters
  Better than HashMap
  Low, fixed memory
  Known error bounds
  Large num of counters
  From Twitter’s Algebird
  Streaming example in Spark codebase
28
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
HyperLogLog
  Approximate cardinality

Approx count distinct
!
  From Twitter’s Algebird!
  Low memory

 
1.5KB @ 2% error, 
10^9 elements 
!
  Streaming example in Spark codebase
  RDD: countApproxDistinctByKey()
29
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Monte Carlo Simulations
From Manhattan Project (A-bomb)
Simulate movement of neutrons

Law of Large Numbers (LLN)
Average of results of many trials

Converge on expected value

SparkPi example in 
Spark codebase


 
 
 
 
 
 
 
 
 Pi ~ (# red dots /


 
 
 
 
 
 
 
 
 
 
 # total dots * 4) 
30
Click to edit Master text styles
Click to edit Master text styles
Recommendations
31
Click to edit Master text styles
Click to edit Master text styles
Interactive Demo!
32
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Audience Participation Needed!
33
  Navigate to 
sparkafterdark.com
  Click 3 actresses and 

3 actors

 ->
You are here 

 ->
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Types of Recommendations
Non-personalized

Cold Start
No preference or behavior data for user, yet
Personalized

User-Item Similarity

Items that others with similar prefs have liked
Item-Item Similarity

Items similar to your previously-liked items
34
Click to edit Master text styles
Click to edit Master text styles
Non-Personalized Recommendations
35
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Summary Statistics and Aggregations
  Top Users by Like Count
“I might like users with the highest sum
aggregation of likes overall.”
SparkSQL + DataFrame = Aggregations





36
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Graph Analytics
  Top Influencers by Like Graph

“I might like users who have the highest probability of 

me liking them randomly while walking the like graph.” 


GraphX: PageRank







37
Click to edit Master text styles
Click to edit Master text styles
Demo!
Spark SQL/DataFrames + GraphX/PageRank
38
Click to edit Master text styles
Click to edit Master text styles
Similarities
39
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Types of Similarity
Euclidean: linear measure
Magnitude bias
Cosine: angle measure
Adjust for magnitude bias
Jaccard: (intersection / union)
Popularity bias
Log Likelihood
Adjust for popularity bias

40
		 Ali	 Matei	 Reynold	 Patrick	 Andy	
Kimberly	 1	 1	 1	 1	
Leslie	 1	 1!
Meredith	 1	 1	 1	
Lisa	 1	 1	 1	
Holden	 1	 1	 1	 1	 1	
z!
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
All-Pairs Similarity Comparison
Compare everything to everything
aka. “pair-wise similarity” or “similarity join”
Naïve shuffle: O(m*n^2); m=rows, n=cols

Minimize shuffle through approximations!
Reduce m (rows)
Sampling and bucketing 
Reduce n (cols)
Remove most frequent value (ie.0)
Principle Component Analysis
41
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Reduce m: DIMSUM Sampling
“Dimension Independent Matrix Square Using MR”
Remove rows with low similarity probability
MLlib: RowMatrix.columnSimilarities(…)




Twitter: 40% efficiency gain over Cosine Similarity

42
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Reduce m: LSH Bucketing
“Locality Sensitive Hashing”
Split m into b buckets 
Use similarity hash algorithm
Requires pre-processing of data
Compare bucket contents in parallel
Converts O(m*n^2) -> O(m*n/b*b^2);

m=rows, n=cols, b=buckets
ie. 500k x 500k matrix

O(1.25e17) -> O(1.25e13); b=50
github.com/mrsqueeze/spark-hash
43
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Reduce n: Remove Most Frequent Value
Eliminate most-frequent value
Represent other values with (index,value) pairs
Converts O(m*n^2) -> O(m*nnz^2); 

nnz=num nonzeros, nnz << n





Note: Choose most frequent value (may not be 0)
44
(index,value)
(index,value)
Click to edit Master text styles
Click to edit Master text styles
Personalized Recommendations
45
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Recommendation Terminology
User
User seeking recommendations
Item
Item that has been liked or rated
Feedback
Explicit: like, rating
Implicit: search, click, hover, view, scroll
Feature Engineering
Dimension reduction
46
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Collaborative Filtering Personalized Recs
  Like behavior of similar users

“I like the same people that you like. 

What other people did you like that I haven’t seen?” 


MLlib: Matrix Factorization, User-Item Similarity
47
Click to edit Master text styles
Click to edit Master text styles
Demo!
Spark SQL/DataFrames + MLlib/Alternating Least Squares
48
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Text-based Personalized Recs (1/3)
  Similar profiles to me

“Our profiles have similar, unique k-skip n-grams. 

We might like each other.”

MLlib: Word2Vec, TF/IDF, Doc Similarity
49
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Text Based Personalized Recs (2/3)
50
 Similar profiles from my past likes

“Your profile shares a similar feature vector space to 

others that I’ve liked. I might like you.”

MLlib: Word2Vec, TF/IDF, Doc Similarity
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Text-based Personalized Recs (3/3)
  Relevant, High-Value Emails

 “Your initial email has similar named entities to my profile.

I might like you just for making the effort.”
MLlib: Word2Vec, TF/IDF, Entity Recognition






51
^
Her Email< My Profile
Click to edit Master text styles
Click to edit Master text styles
The Future of Recommendations!
52
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Facial Recognition
  Eigenfaces

“Your face looks similar to others that I’ve liked.

I might like you.”
MLlib: RowMatrix, PCA, Item-Item Similarity




53
Image courtesy of http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Natural Language Processing: Convo Bot
  NLP and DecisionTrees
“If your responses to my trite opening 

lines are positive, I may read your profile.”
MLlib: TF/IDF, DecisionTree, 


 Sentiment Analysis


54
Positive Negative
Click to edit Master text styles
Click to edit Master text styles
55
Maintaining the Spark!
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
Recommendations for Couples
  Pathways of Similarity

“I want Mad Max. You want Message In a Bottle. 

Let’s find something in between to watch tonight.”
MLlib: RowMatrix, Item-Item Similarity

GraphX: Nearest Neighbors, Shortest Path



 
 similar 
 
 similar
•  
 plots ->
 <- actors

 

56
Click to edit Master text styles
Click to edit Master text styles
Final Recommendation!
57
Click to edit Master text styles
Click to edit Master text styles
spark.tc
Power of data. Simplicity of design. Speed of innovation.
IBM Spark
 Get Off the Computer & Meet People!
Thank you!!
Chris Fregly @cfregly
IBM Spark Tech Center
San Francisco, CA, USA
Relevant Links
advancedspark.com
Signup for the book and meetup!
github.com/fluxcapacitor/pipeline
Clone all code used today!
hub.docker.com/r/fluxcapacitor/pipeline
Run all demos presented today!

58
Image courtesy of http://www.duchess-france.org/
Click to edit Master text styles
Click to edit Master text styles
Power of data. Simplicity of design. Speed of innovation.
IBM Spark

More Related Content

What's hot

Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Chris Fregly
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Chris Fregly
 
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Chris Fregly
 
Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Chris Fregly
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Chris Fregly
 
Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Chris Fregly
 
Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Chris Fregly
 
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series:  Apache Spark, Machine Learning, Recommendations Feb 05 2016USF Seminar Series:  Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016Chris Fregly
 
Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Chris Fregly
 
Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Chris Fregly
 
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsDC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsChris Fregly
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Chris Fregly
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016 Chris Fregly
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Chris Fregly
 
Singapore Spark Meetup Dec 01 2015
Singapore Spark Meetup Dec 01 2015Singapore Spark Meetup Dec 01 2015
Singapore Spark Meetup Dec 01 2015Chris Fregly
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Chris Fregly
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Chris Fregly
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Chris Fregly
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Chris Fregly
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with SparkChris Fregly
 

What's hot (20)

Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
Stockholm Spark Meetup Nov 23 2015 Spark After Dark 1.5
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
Zurich, Berlin, Vienna Spark and Big Data Meetup Nov 02 2015
 
Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015Helsinki Spark Meetup Nov 20 2015
Helsinki Spark Meetup Nov 20 2015
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
 
Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015
 
Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015
 
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series:  Apache Spark, Machine Learning, Recommendations Feb 05 2016USF Seminar Series:  Apache Spark, Machine Learning, Recommendations Feb 05 2016
USF Seminar Series: Apache Spark, Machine Learning, Recommendations Feb 05 2016
 
Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015
 
Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015
 
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsDC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
 
Singapore Spark Meetup Dec 01 2015
Singapore Spark Meetup Dec 01 2015Singapore Spark Meetup Dec 01 2015
Singapore Spark Meetup Dec 01 2015
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
 

Viewers also liked

Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Chris Fregly
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Chris Fregly
 
Text analytics for verbal identity and branding (a first play with Semantria)
Text analytics for verbal identity and branding (a first play with Semantria)Text analytics for verbal identity and branding (a first play with Semantria)
Text analytics for verbal identity and branding (a first play with Semantria)Verbal Identity
 
Sentimental Market Segmentation
Sentimental Market SegmentationSentimental Market Segmentation
Sentimental Market SegmentationShlomo Argamon
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Chris Fregly
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Chris Fregly
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaionRavindra Chaudhary
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment AnalyzerAnkit Raj
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Hallmark Emotional Branding Study
Hallmark Emotional Branding StudyHallmark Emotional Branding Study
Hallmark Emotional Branding StudyCrimson Hexagon
 
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...Chris Fregly
 
Emotional branding presentation
Emotional branding presentationEmotional branding presentation
Emotional branding presentationJared Woods
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Chris Fregly
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysisAnkit Khera
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 

Viewers also liked (18)

Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 
Text analytics for verbal identity and branding (a first play with Semantria)
Text analytics for verbal identity and branding (a first play with Semantria)Text analytics for verbal identity and branding (a first play with Semantria)
Text analytics for verbal identity and branding (a first play with Semantria)
 
Sentimental Market Segmentation
Sentimental Market SegmentationSentimental Market Segmentation
Sentimental Market Segmentation
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
 
Sentiment tool Project presentaion
Sentiment tool Project presentaionSentiment tool Project presentaion
Sentiment tool Project presentaion
 
Sentiment Analyzer
Sentiment AnalyzerSentiment Analyzer
Sentiment Analyzer
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Hallmark Emotional Branding Study
Hallmark Emotional Branding StudyHallmark Emotional Branding Study
Hallmark Emotional Branding Study
 
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...
Advanced Spark and Tensorflow Meetup - London - Nov 15, 2016 - Deploy Spark M...
 
Emotional branding presentation
Emotional branding presentationEmotional branding presentation
Emotional branding presentation
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysis
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 

Similar to Real-time Advanced Analytics, Machine Learning, Graph Analytics, Text NLP, and Recommendations Using Apache Spark

5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...Chris Fregly
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScyllaDB
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkDESMOND YUEN
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooksAndrey Vykhodtsev
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?Eyal Ben Ivri
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkAdamRobertsIBM
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDatabricks
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Chris Fregly
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance ObservationsAdam Roberts
 

Similar to Real-time Advanced Analytics, Machine Learning, Graph Analytics, Text NLP, and Recommendations Using Apache Spark (16)

5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and BeyondScaling Up Machine Learning Experimentation at Tubi 5x and Beyond
Scaling Up Machine Learning Experimentation at Tubi 5x and Beyond
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script Transformation
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 

More from Chris Fregly

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataChris Fregly
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfChris Fregly
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupChris Fregly
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedChris Fregly
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon BraketChris Fregly
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapChris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Chris Fregly
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...Chris Fregly
 

More from Chris Fregly (20)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 

Recently uploaded

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 

Recently uploaded (20)

Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 

Real-time Advanced Analytics, Machine Learning, Graph Analytics, Text NLP, and Recommendations Using Apache Spark

  • 1. Click to edit Master text styles Click to edit Master text styles After Dark Real-time Advanced Analytics, Machine Learning, 
 Graph Analytics, Text NLP, and Recommendations Barcelona Spark Meetup Oct 20th, 2015 Chris Fregly Principal Data Solutions Engineer IBM Spark Technology Center ** We’re Hiring!! Nice People Only, Please. **
  • 2. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Who Am I? 2 Streaming Data Engineer Netflix Open Source Committer Data Solutions Engineer
 Apache Contributor Principal Data Solutions Engineer IBM Technology Center Meetup Organizer Advanced Apache Meetup Book Author Advanced (2016)
  • 3. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Advanced Apache Spark Meetup Total Spark Experts: ~1350+ in 3 mos! #4 most active Spark Meetup in the world! Main Goals Dig deep into the Spark & extended-Spark codebase Study integrations such as Cassandra, ElasticSearch, Tachyon, S3, BlinkDB, Mesos, YARN, Kafka, R, etc Surface and share the patterns and idioms of these well-designed, distributed, big data components
  • 4. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark 4 Core Spark Streaming real-timeSpark SQL structured data MLlib machine learning GraphX graph analytics … BlinkDB approx queries What is Spark?
  • 5. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Spark Deployments In Production 5
  • 6. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Tools of the Talk 6   Redis   Docker   Cassandra   MLlib, GraphX   Parquet, JSON   Apache Zeppelin   Spark Streaming, Kafka   Spark SQL, DataFrames   Spark JDBC/ODBC Hive ThriftServer   ElasticSearch, Logstash, Kibana (ELK) and…
  • 7. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark SMACK Stack! 7 S park (Data Processing) M esos (Cluster Manager) A kka (Actors) C assandra (NoSQL) K afka (Streaming)
  • 8. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Themes of this Talk   Parallelism   Performance   Streaming   Approximations   Similarity Measures   Recommendations 8 and…
  • 9. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Goals of Spark After Dark   Generate high-quality recommendations   Demonstrate Spark high-level libraries Spark Streaming -> Kafka, Approximates Spark SQL -> DataFrames, Cassandra   GraphX -> PageRank, Shortest Path   MLlib -> Matrix Factor, Word2Vec 9
  • 10. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Popular Dating Sites 10
  • 11. Click to edit Master text styles Click to edit Master text styles Parallelism 11
  • 12. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark My First Experience With Parallelism Brady Bunch circa 1980 Season 5, Episode 18: “Two Pete’s in a Pod” 12
  • 13. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Parallel Algorithm: O(log n) 13 O(log n)
  • 14. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Non-Parallel Algorithm: O(n) 14 O(n)
  • 15. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Spark is Parallel! 15
  • 16. Click to edit Master text styles Click to edit Master text styles Performance 16
  • 17. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Spark Beats Hadoop @ 100 TB GraySort 17   On-disk only   28,000 partitions   No in-memory caching (2014)(2013) (2014)
  • 18. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Improved Shuffle and Network Layer   “Sort-based shuffle”   Minimize OS resources   Switched to async Netty   Keep CPUs hot   Reuse byte buffers to minimize GC   Use epoll for I/O to stay in kernel space 18
  • 19. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Project Tungsten: CPU and Memory   More JVM bytecode generation, JIT optimize   CPU-cache-aware data structs and algos -->   Custom memory management Serializers Performance New HashMap 19
  • 20. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark DataFrames and Catalyst Optimizer 20 20 https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/ Please Use DataFrames! --> --> JVM bytecode generation
  • 21. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Columnar Storage Format 21 Skip whole chunks with min-max heuristics
 stored in each chunk (sorted data only)
  • 22. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Parquet File Format  Based on Google Dremel  Implemented by Twitter and Cloudera  Columnar storage format  Optimized for fast columnar aggregations  Tight compression  Supports pushdowns  Nested, self-describing, evolving schema 22
  • 23. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Types of Compression   Run Length Encoding: Repeated data   Dictionary Encoding: Fixed set of values   Delta, Prefix Encoding: Sorted data 23
  • 24. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Types of Query Optimizations   Column, Partition Pruning   Row, Predicate Pushdown SELECT b FROM table WHERE a in [a2,a3] 24
  • 25. Click to edit Master text styles Click to edit Master text styles Streaming 25
  • 26. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Direct Kafka Streaming – KafkaRDD   No single Receiver, no Write Ahead Log (WAL)   Workers pull from Kafka in parallel   Each KafkaRDD partition stores relevant offsets   Upon Worker Node failure, rebuild from offsets   Optimizes happy path by avoiding the WAL 26 At least once delivery guarantee <--
  • 27. Click to edit Master text styles Click to edit Master text styles Approximations 27
  • 28. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Count Min Sketch   Approximate counters   Better than HashMap   Low, fixed memory   Known error bounds   Large num of counters   From Twitter’s Algebird   Streaming example in Spark codebase 28
  • 29. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark HyperLogLog   Approximate cardinality Approx count distinct !   From Twitter’s Algebird!   Low memory 1.5KB @ 2% error, 10^9 elements !   Streaming example in Spark codebase   RDD: countApproxDistinctByKey() 29
  • 30. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Monte Carlo Simulations From Manhattan Project (A-bomb) Simulate movement of neutrons Law of Large Numbers (LLN) Average of results of many trials
 Converge on expected value SparkPi example in Spark codebase
 Pi ~ (# red dots /
 # total dots * 4) 30
  • 31. Click to edit Master text styles Click to edit Master text styles Recommendations 31
  • 32. Click to edit Master text styles Click to edit Master text styles Interactive Demo! 32
  • 33. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Audience Participation Needed! 33   Navigate to sparkafterdark.com   Click 3 actresses and 
 3 actors -> You are here ->
  • 34. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Types of Recommendations Non-personalized
 Cold Start No preference or behavior data for user, yet Personalized
 User-Item Similarity
 Items that others with similar prefs have liked Item-Item Similarity
 Items similar to your previously-liked items 34
  • 35. Click to edit Master text styles Click to edit Master text styles Non-Personalized Recommendations 35
  • 36. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Summary Statistics and Aggregations   Top Users by Like Count “I might like users with the highest sum aggregation of likes overall.” SparkSQL + DataFrame = Aggregations 36
  • 37. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Graph Analytics   Top Influencers by Like Graph
 “I might like users who have the highest probability of 
 me liking them randomly while walking the like graph.” GraphX: PageRank 37
  • 38. Click to edit Master text styles Click to edit Master text styles Demo! Spark SQL/DataFrames + GraphX/PageRank 38
  • 39. Click to edit Master text styles Click to edit Master text styles Similarities 39
  • 40. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Types of Similarity Euclidean: linear measure Magnitude bias Cosine: angle measure Adjust for magnitude bias Jaccard: (intersection / union) Popularity bias Log Likelihood Adjust for popularity bias 40 Ali Matei Reynold Patrick Andy Kimberly 1 1 1 1 Leslie 1 1! Meredith 1 1 1 Lisa 1 1 1 Holden 1 1 1 1 1 z!
  • 41. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark All-Pairs Similarity Comparison Compare everything to everything aka. “pair-wise similarity” or “similarity join” Naïve shuffle: O(m*n^2); m=rows, n=cols Minimize shuffle through approximations! Reduce m (rows) Sampling and bucketing Reduce n (cols) Remove most frequent value (ie.0) Principle Component Analysis 41
  • 42. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Reduce m: DIMSUM Sampling “Dimension Independent Matrix Square Using MR” Remove rows with low similarity probability MLlib: RowMatrix.columnSimilarities(…) Twitter: 40% efficiency gain over Cosine Similarity 42
  • 43. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Reduce m: LSH Bucketing “Locality Sensitive Hashing” Split m into b buckets Use similarity hash algorithm Requires pre-processing of data Compare bucket contents in parallel Converts O(m*n^2) -> O(m*n/b*b^2); m=rows, n=cols, b=buckets ie. 500k x 500k matrix O(1.25e17) -> O(1.25e13); b=50 github.com/mrsqueeze/spark-hash 43
  • 44. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Reduce n: Remove Most Frequent Value Eliminate most-frequent value Represent other values with (index,value) pairs Converts O(m*n^2) -> O(m*nnz^2); 
 nnz=num nonzeros, nnz << n Note: Choose most frequent value (may not be 0) 44 (index,value) (index,value)
  • 45. Click to edit Master text styles Click to edit Master text styles Personalized Recommendations 45
  • 46. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Recommendation Terminology User User seeking recommendations Item Item that has been liked or rated Feedback Explicit: like, rating Implicit: search, click, hover, view, scroll Feature Engineering Dimension reduction 46
  • 47. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Collaborative Filtering Personalized Recs   Like behavior of similar users
 “I like the same people that you like. 
 What other people did you like that I haven’t seen?” 
 MLlib: Matrix Factorization, User-Item Similarity 47
  • 48. Click to edit Master text styles Click to edit Master text styles Demo! Spark SQL/DataFrames + MLlib/Alternating Least Squares 48
  • 49. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Text-based Personalized Recs (1/3)   Similar profiles to me
 “Our profiles have similar, unique k-skip n-grams. 
 We might like each other.”
 MLlib: Word2Vec, TF/IDF, Doc Similarity 49
  • 50. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Text Based Personalized Recs (2/3) 50  Similar profiles from my past likes
 “Your profile shares a similar feature vector space to 
 others that I’ve liked. I might like you.”
 MLlib: Word2Vec, TF/IDF, Doc Similarity
  • 51. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Text-based Personalized Recs (3/3)   Relevant, High-Value Emails “Your initial email has similar named entities to my profile.
 I might like you just for making the effort.” MLlib: Word2Vec, TF/IDF, Entity Recognition 51 ^ Her Email< My Profile
  • 52. Click to edit Master text styles Click to edit Master text styles The Future of Recommendations! 52
  • 53. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Facial Recognition   Eigenfaces
 “Your face looks similar to others that I’ve liked.
 I might like you.” MLlib: RowMatrix, PCA, Item-Item Similarity 53 Image courtesy of http://crockpotveggies.com/2015/02/09/automating-tinder-with-eigenfaces.html
  • 54. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Natural Language Processing: Convo Bot   NLP and DecisionTrees “If your responses to my trite opening 
 lines are positive, I may read your profile.” MLlib: TF/IDF, DecisionTree, 
 Sentiment Analysis 54 Positive Negative
  • 55. Click to edit Master text styles Click to edit Master text styles 55 Maintaining the Spark!
  • 56. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark Recommendations for Couples   Pathways of Similarity
 “I want Mad Max. You want Message In a Bottle. 
 Let’s find something in between to watch tonight.” MLlib: RowMatrix, Item-Item Similarity
 GraphX: Nearest Neighbors, Shortest Path similar similar •  plots -> <- actors 56
  • 57. Click to edit Master text styles Click to edit Master text styles Final Recommendation! 57
  • 58. Click to edit Master text styles Click to edit Master text styles spark.tc Power of data. Simplicity of design. Speed of innovation. IBM Spark  Get Off the Computer & Meet People! Thank you!! Chris Fregly @cfregly IBM Spark Tech Center San Francisco, CA, USA Relevant Links advancedspark.com Signup for the book and meetup! github.com/fluxcapacitor/pipeline Clone all code used today! hub.docker.com/r/fluxcapacitor/pipeline Run all demos presented today! 58 Image courtesy of http://www.duchess-france.org/
  • 59. Click to edit Master text styles Click to edit Master text styles Power of data. Simplicity of design. Speed of innovation. IBM Spark