SlideShare a Scribd company logo
1 of 38
Download to read offline
FAIZAN AHMED
Using Spark-Solr at Scale:
Productionizing Spark for Search
with Apache Solr
#ExpSAIS18
Data Engineer on Flipp’s Search Team
2
medium.com/@faizanahmed_18678
About Me
Role
Education
Responsibilities
M.Sc in Computer Science from University of Waterloo
Building pipelines to support machine learning models
Developing scalable systems and working on products to
enhance search relevancy.
Core work includes using Spark for machine learning at
scale & data warehousing for analytics.
#ExpSAIS18
• About Flipp
• The Search Experience
• The Problem Domain
• The Spark-Solr Journey
– Solution Architecture
– Deep Dive: Content Categorization
– Deep Dive: Understanding User Intent
• Improvements
• Lessons Learned
• Key Takeaways
3
Agenda
#ExpSAIS18
About Flipp
4
Digital Shopping
Marketplace
The Flipp App is a digital
shopping marketplace with
over 40M+ downloads.
Storefront Visual
Merchandising
Enterprise Platform converts
analog content to provide a
mobile browsing experience,
which showcases retailers’
digital storefront.
The Flipp Platform
Flipp’s platform is used by
over 90% of tier one
retailers in North America.
#ExpSAIS18
Search Platform At a Glance
• 2M Searches/day
• 30M Indexed Documents
• Team of 5 engineers
5
#ExpSAIS18
Searchable Entities
6
#ExpSAIS18
Merchant Flyers
7
The Search Experience
Items Coupons Online Deals
#ExpSAIS18 8
The Problem Domain
#ExpSAIS18
1
Basic
Keyw
ord
Search
Inverted-index,tf-idf,BM
-25
2
Q
uery
Taxonom
ies
Synonym
s,Spell-correction,
Stem
m
ing
3
Q
uery
Intent
Entity
recognition,dom
ain-specific
classification
4
Self-learning
Learning-to-rank
Discovering and leveraging query intent is dependent on domain specific user & content signals.
9
Search Platform Evolution
#ExpSAIS18
User Signals
10
Signals: The Building Blocks Of Search
Content Signals
#ExpSAIS18
“Bose SoundSport In-Ear Sport Headphones”
• Non-uniform category structure
Consumer Electronics -> Consumer Audio Headphones
Audio-> Earbuds & In-Ear Headphones-> Headphones
11
Content Signal Challenges: Product Categorization
Every item would have a merchant-specific category
#ExpSAIS18
Tagged as “Kale”
Tagged as “Greens”
Food, Beverages & Tobacco > Food Items > Fruits & Vegetables > Fresh & Frozen Vegetables > Greens > Kale
Categorization hierarchy for “Kale”
• Content tagged with correct categories lacked complete hierarchy
12
Content Signal Challenges: Product Categorization
#ExpSAIS18 13
User Signal Challenges
● Query categorization not deep enough
Food, Beverages & Tobacco > Food Items > Fruits & Vegetables
{!boost b=log(popularity)}foo
“cheese” -> Dairy Products -> Cheese
“shredded cheese” ->
● Unable to leverage Solr features
#ExpSAIS18 14
What Was Needed?
ML, ETL & Analytics Combine Signals Spark-Solr Integration
Reduce Productionization Time Scalable & Efficient Cost-friendly
#ExpSAIS18 15
Implementing the solution
The Spark-Solr Journey
#ExpSAIS18 16
Action Plan
Content Context Collaboration
#ExpSAIS18
Pre-processed
analytics data
17
Raw Content
Indexing
Spark-Solr
Search ML & ETL Platform
User Signals
Search
Analytics
Backend
Infrastructure
Usage
metrics
The Spark-Solr Architecture
Search Middleware
Reading content
signals
Writing
signals to
solr
Search Results
#ExpSAIS18 18
Separating wheat from the chaff
Content Categorization:
DEEP DIVE
19#ExpSAIS18
What Raw Content In Solr Looks Like
Same products but
different category
structure
TV accessories
that would show
up as noise
#ExpSAIS18
Categorization Algorithm
20
name: “Samsung TV”,
desc: “LED”,
category:{..}
Product metadata
Samsung TV
Retailer A category
structure
TV & Home Theatre
Televisions
53” TVs
53”TVs, Televisions, TV & Home Theatre
Electronics
Video
Televisions
Creating
training data
name:“Sony TV”
_L1: Electronics
_L2: Video
_L3: Televisions
Predicted Categories
Category key
#ExpSAIS18 21
Content
indexing
Read
product
data
Get
Clickstream
Signals
Classification
model
Items categorized
with popularity
metrics
Training
data
Content Categorization Workflow
22#ExpSAIS18
Voilà!
TVs are now under
similar categories with
popularity weight
The stands are now
correctly classified as
‘Furniture’ and can be
filtered
#ExpSAIS18 23
Understanding
User Intent
DEEP DIVE
#ExpSAIS18
A query for “nuts”
ranks cereal,
guitar & car oil
filter on top due
to noisy keyword
signals
24
Why Decipher Query Intent?
#ExpSAIS18
● Using past impression data to improve how latest impression data is interpreted
Users behaviour informs
algorithmic improvements
Users perform
certain actions (
click, impression
etc)
Users see
search results
Users query
25
Reinforced Intelligence
26#ExpSAIS18
What is the probability that a user will click on a particular category class given the query “nuts”?
One query can be
associated with
multiple classes
The query “nuts” will be
interpreted to have category
“Dried Fruits”
The category with the most clicks for a particular search query would be interpreted as the query category
Narrowing Down The Query Intent
27#ExpSAIS18
Persisting Query Classifier Model In Solr
Model schema in solr Category classifications for “nuts” variations
#ExpSAIS18
Predict
● Query model
● Boost content ranking
Train
● Compute category classes
● Classify queries as domain
specific entities
Persist
● Query as a searchable
entity.
● Taxonomic features.
28
Query Classification Workflow In Solr
#ExpSAIS18
User
Query
Identify Query
Intent
Raw
Content
Categorize
Content
29
Collaboration!
#ExpSAIS18 30
Improvement
#ExpSAIS18 31
Measuring Search Relevance
● Cumulative Gain: 0 => Not relevant, 1 => Near relevant, 2 => Relevant
● Discounting: Positional bias
● Normalization: Ideal DCG (iDCG)
Normalized Discounted Cumulative Gain (NDCG) is a popular method for
measuring the quality of a set of search results. It asserts the following:
#ExpSAIS18
Q: “Olive Oil”
2+2+1+0+0+1= 6 2+2+2+2+2+2= 12
2 2
1 0
0 1
2 2
2 2
2 2
32
CG Improvement
#ExpSAIS18
1.44 1.82
0.62
1.11
1.44
0
2.89 1.82
1.24
0.56
0.72
0
1.44+1.82+1.44+0.62+1.11+0= 6.43 2.89+1.82+0.72+1.24+0.56= 7.23
33
Q: “Fish”
DCG Improvement
#ExpSAIS18 34
Spark-Solr:
Lessons Learned
#ExpSAIS18 35
Lessons Learned
● Latency rise in p95
by 40% during
indexing
Rise in p95
latency during
indexing
#ExpSAIS18 36
Lessons Learned
● Read Parallelism
○ Match the number of spark executor nodes to the number of shards in your solr collection.
○ Prefer using solr filter params over where clause in Spark.
● Indexing improvements
○ Multivalued fields may require extra processing.
○ Avoid using soft_commit_secs param in production.
○ Consider using smaller batch_size.
#ExpSAIS18
● The Spark-Solr framework unlocks a wide variety of machine learning techniques
to be applied to search applications.
● Enables faster data analytics in SparkSQL than traditional SQL frameworks.
● Spark-Solr can be used to optimize solr indexing with signal data.
● Storing text classification models in solr has significant semantic benefits over
other persistence layers.
37
Key Takeaways
#ExpSAIS18
Thank You!
38

More Related Content

What's hot

How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectSease
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Developers
 
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Shotaro Sano
 
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...GangSeok Lee
 
A Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemA Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemSease
 
Advanced nGrinder 2nd Edition
Advanced nGrinder 2nd EditionAdvanced nGrinder 2nd Edition
Advanced nGrinder 2nd EditionJunHo Yoon
 
Design by contract(계약에의한설계)
Design by contract(계약에의한설계)Design by contract(계약에의한설계)
Design by contract(계약에의한설계)Jeong-gyu Kim
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 
Sql Injection attacks and prevention
Sql Injection attacks and preventionSql Injection attacks and prevention
Sql Injection attacks and preventionhelloanand
 
Top 50 Node.js Interview Questions and Answers | Edureka
Top 50 Node.js Interview Questions and Answers | EdurekaTop 50 Node.js Interview Questions and Answers | Edureka
Top 50 Node.js Interview Questions and Answers | EdurekaEdureka!
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Demystifying Angular Animations
Demystifying Angular AnimationsDemystifying Angular Animations
Demystifying Angular AnimationsGil Fink
 
Let'Swift 2023 Swift Macro, 어디다 쓰죠?
Let'Swift 2023 Swift Macro, 어디다 쓰죠?Let'Swift 2023 Swift Macro, 어디다 쓰죠?
Let'Swift 2023 Swift Macro, 어디다 쓰죠?williciousk
 
Open Close Principle
Open Close PrincipleOpen Close Principle
Open Close PrincipleThaichor Seng
 
Basic overview of Angular
Basic overview of AngularBasic overview of Angular
Basic overview of AngularAleksei Bulgak
 
AngularJS $http Interceptors (Explanation and Examples)
AngularJS $http Interceptors (Explanation and Examples)AngularJS $http Interceptors (Explanation and Examples)
AngularJS $http Interceptors (Explanation and Examples)Brian Swartzfager
 

What's hot (20)

Talking About SSRF,CRLF
Talking About SSRF,CRLFTalking About SSRF,CRLF
Talking About SSRF,CRLF
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank Project
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
 
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
Microsoft Malware Classification Challenge 上位手法の紹介 (in Kaggle Study Meetup)
 
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...
[2013 CodeEngn Conference 08] pwn3r - Pwning multiplayer game - case Starcraf...
 
A Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking ProblemA Learning to Rank Project on a Daily Song Ranking Problem
A Learning to Rank Project on a Daily Song Ranking Problem
 
Security testing
Security testingSecurity testing
Security testing
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
Advanced nGrinder 2nd Edition
Advanced nGrinder 2nd EditionAdvanced nGrinder 2nd Edition
Advanced nGrinder 2nd Edition
 
Design by contract(계약에의한설계)
Design by contract(계약에의한설계)Design by contract(계약에의한설계)
Design by contract(계약에의한설계)
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Sql Injection attacks and prevention
Sql Injection attacks and preventionSql Injection attacks and prevention
Sql Injection attacks and prevention
 
Top 50 Node.js Interview Questions and Answers | Edureka
Top 50 Node.js Interview Questions and Answers | EdurekaTop 50 Node.js Interview Questions and Answers | Edureka
Top 50 Node.js Interview Questions and Answers | Edureka
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Demystifying Angular Animations
Demystifying Angular AnimationsDemystifying Angular Animations
Demystifying Angular Animations
 
Let'Swift 2023 Swift Macro, 어디다 쓰죠?
Let'Swift 2023 Swift Macro, 어디다 쓰죠?Let'Swift 2023 Swift Macro, 어디다 쓰죠?
Let'Swift 2023 Swift Macro, 어디다 쓰죠?
 
Open Close Principle
Open Close PrincipleOpen Close Principle
Open Close Principle
 
Basic overview of Angular
Basic overview of AngularBasic overview of Angular
Basic overview of Angular
 
AngularJS $http Interceptors (Explanation and Examples)
AngularJS $http Interceptors (Explanation and Examples)AngularJS $http Interceptors (Explanation and Examples)
AngularJS $http Interceptors (Explanation and Examples)
 

Similar to Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr with Faizan Ahmed

Connecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production PipelinesConnecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production PipelinesDatabricks
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Fwdays
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Michael Li
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data ValidationDatabricks
 
Content Analytics Studio – The visualization, machine learning and applicatio...
Content Analytics Studio – The visualization, machine learning and applicatio...Content Analytics Studio – The visualization, machine learning and applicatio...
Content Analytics Studio – The visualization, machine learning and applicatio...Lucidworks
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsRoger Rafanell Mas
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Fried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetFried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetJeff Fried
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Technologies
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryInside Analysis
 
Getting the right mix of digital analytics tracking, processes and tools
Getting the right mix of digital analytics tracking, processes and toolsGetting the right mix of digital analytics tracking, processes and tools
Getting the right mix of digital analytics tracking, processes and toolsDigital Science Consulting Ltd
 
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...ProductCamp Boston
 
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Authoritas
 

Similar to Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr with Faizan Ahmed (20)

Connecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production PipelinesConnecting the Dots: Integrating Apache Spark into Production Pipelines
Connecting the Dots: Integrating Apache Spark into Production Pipelines
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
 
Content Analytics Studio – The visualization, machine learning and applicatio...
Content Analytics Studio – The visualization, machine learning and applicatio...Content Analytics Studio – The visualization, machine learning and applicatio...
Content Analytics Studio – The visualization, machine learning and applicatio...
 
Activate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifiedsActivate 2019 - Search and relevance at scale for online classifieds
Activate 2019 - Search and relevance at scale for online classifieds
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Fried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetFried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranet
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
A6 big data_in_the_cloud
A6 big data_in_the_cloudA6 big data_in_the_cloud
A6 big data_in_the_cloud
 
Getting the right mix of digital analytics tracking, processes and tools
Getting the right mix of digital analytics tracking, processes and toolsGetting the right mix of digital analytics tracking, processes and tools
Getting the right mix of digital analytics tracking, processes and tools
 
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...
The Science of Software Developing Data-Driven Product Roadmaps (ProductCamp ...
 
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr with Faizan Ahmed