SlideShare a Scribd company logo
OCTOBER	
  11-­‐14,	
  2016	
  	
  •	
  	
  BOSTON,	
  MA	
  
Near	
  Real	
  8me	
  Indexing	
  
Building	
  Real	
  Time	
  Search	
  Index	
  For	
  E-­‐Commerce	
  
	
  
Umesh	
  Prasad	
  
Tech	
  Lead	
  	
  @	
  Flipkart	
  
	
  
Thejus	
  V	
  M	
  
Data	
  Architect	
  @	
  Flipkart	
  
	
  
	
  
Agenda	
  
•  Search	
  @	
  Flipkart	
  
•  Need	
  for	
  Real	
  Time	
  Search	
  
•  SolrCloud	
  Solu;on	
  
•  Our	
  approach	
  
•  Q	
  &	
  A	
  
Traffic	
  @	
  Flipkart	
  
•  Peak	
  Traffic	
  	
  
–  ~	
  800K	
  ac;ve	
  users	
  
–  ~	
  160K	
  	
  requests	
  per	
  second	
  	
  
•  Search	
  Traffic	
  	
  
–  ~	
  40K	
  searches	
  per	
  second	
  (Service)	
  
–  ~	
  10K	
  searches	
  per	
  second	
  (Solr	
  )	
  
•  Latency	
  
–  	
  Median	
  :	
  11	
  ms	
  
–  	
  99th	
  percen;le	
  :	
  1.1	
  second	
  
Search	
  @	
  Flipkart	
  
•  Catalogue	
  	
  
–  ~	
  50	
  main	
  categories	
  
– ~	
  5000	
  sub-­‐categories	
  
– ~	
  231	
  million	
  documents	
  
– ~	
  90	
  million	
  SKUs	
  
– ~	
  160	
  million	
  lis;ngs	
  
	
  
•  E-­‐commerce	
  Marketplace	
  	
  
– ~	
  100K	
  	
  Sellers	
  
– Local	
  Sellers	
  
– Regional	
  Availability	
  
– Logis;cs	
  Constraints	
  	
  
E-­‐commerce	
  Search	
  
•  Heavy	
  usage	
  of	
  drill	
  down	
  filters	
  
•  Heavy	
  usage	
  of	
  face;ng	
  
•  Only	
  top	
  results	
  maer	
  
•  Results	
  grouped/collapsed	
  by	
  products	
  
	
  
•  Serviceability	
  and	
  delivery	
  experience	
  MATTERS	
  	
  
Agenda	
  
•  Search	
  @	
  Flipkart	
  
•  Need	
  for	
  Real	
  Time	
  Search	
  
•  SolrCloud	
  Solu;on	
  
•  Our	
  approach	
  
•  Q	
  &	
  A	
  
Sorry,	
  	
  	
  Stock	
  Over	
  	
  	
  !!?	
  
Damn	
  !!	
  Is	
  Offer	
  Over	
  ??	
  
What	
  !!	
   	
  All	
  Steal	
  Deals	
  Gone	
  ??	
  
Product	
  /Lis;ng:	
  Important	
  Aributes	
  
Seller	
  
Ra;ng	
  
Service	
  
catalogue	
  
service	
  
Promise	
  
Service	
  
Availability
Service
Offer	
  
Service	
  
Pricing	
  
Service	
  
Product	
  aka	
  SKU	
  
Lis;ngs	
  
Summary	
  :	
  	
  Lucene	
  Document	
  
•  Product/SKU	
  	
  (Parent	
  Document)	
  
–  Lis;ng	
  (Child	
  Document)	
  	
  
•  Query	
  :	
  	
  Mostly	
  	
  SKU	
  Aributes	
  	
  	
  	
   	
   	
  (Free	
  Text)	
  
•  Filters	
  :	
  SKU	
  +	
  	
  Lis;ng	
  Aributes	
   	
   	
   	
  (Drill	
  Down)	
  
•  Ranking	
  :	
  SKU	
  +	
  Lis;ng	
  	
  Aributes	
  	
   	
   	
  (Explicit/
Relevance)	
  	
  
•  Index	
  Time	
  Join	
  aka	
  Block	
  Join	
   	
   	
   	
  (Best	
  
Performance)	
  
	
  
	
  
Out	
  Of	
  Stock,	
  but	
  Why	
  Show?	
  
Index has Stale
Availability Data
234K	
  
Products	
  
Challenge	
  1	
  :	
  High	
  Update	
  Rates	
  
updates	
  /	
  sec	
   updates	
  /hr	
  	
  
normal	
   Peak	
  
text	
  /	
  catalogue	
   ~10	
   ~100	
   ~100K	
  
pricing	
   ~100	
   ~1K	
   ~10	
  million	
  
availability	
   ~100	
   ~10K	
   ~10	
  million	
  
offer	
   ~100	
   ~10K	
   ~10	
  million	
  
seller	
  ra8ng	
   ~10	
   ~1K	
   ~1	
  million	
  
signal	
  6	
   ~10	
   ~100	
   ~1	
  million	
  
signal	
  7	
   ~100	
   ~10K	
   ~10	
  million	
  
signal	
  8	
   ~100	
   ~10K	
   ~10	
  million	
  
Challenge	
  2	
  :	
  Micro	
  Services	
  	
  
Ingestion pipeline
Catalogue Pricing Availability Offers ...
Document Builder
Solr/Lucene
Change
Propagation
Documents
{L1,L2 … P1}
Updates Stream 1
Updates Stream 2
Updates Stream 3
●  Lucene doesn’t support Partial Updates
●  Update = Delete + Add
Agenda	
  
•  Search	
  @	
  Flipkart	
  
•  Need	
  for	
  Real	
  Time	
  Search	
  
•  SolrCloud	
  Solu;on	
  
•  Our	
  approach	
  
•  Q	
  &	
  A	
  
SolrCloud	
  for	
  NRT	
  
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Ingestion pipeline Shard
Leader
Auto commit
Soft Commit
Batch of
documents
For Document
Versioning
Update Log
Forward to Replica
SolrCloud	
  Evalua;on	
  
•  Update	
  =	
  Delete	
  +	
  Add	
  
–  Block	
  Join	
  Index	
   	
  Update	
  Whole	
  Block	
  (Product	
  +	
  Lis;ngs)	
  
•  Updated	
  Document	
  gets	
  streamed	
  to	
  all	
  replicas	
  in	
  sync	
  
–  Reduces	
  indexing	
  throughput	
  
•  Sol	
  commit	
  is	
  Not	
  Free	
  
–  Sol	
  commit	
   	
  In	
  Memory	
  Segment	
  
–  Lots	
  of	
  Merges	
  
–  Huge	
  document	
  churn	
  /	
  deletes	
  
–  All	
  caches	
  s;ll	
  need	
  to	
  be	
  re-­‐generated	
  
–  Filter	
  Cache	
  miss	
  specially	
  hurts	
  performance	
  
Agenda	
  
•  Search	
  @	
  Flipkart	
  
•  Need	
  for	
  Real	
  Time	
  Index	
  
•  SolrCloud	
  Solu;on	
  
• Our	
  approach	
  
•  Q	
  &	
  A	
  
ProductA
brand : Apple
availability : T
price : 45000
ProductB
brand : Samsung
availability : T
price : 23000
ProductC
brand : Apple
availability : F
price : 5000
Document ID
Mappings
Posting List
(Inverted Index)
DocValues
(columunar data)
Lucene Segment
Lucene	
  Index	
  
0 ProductA
1 ProductB
2 ProductC
45000 23000 5000Price
availability : T
brand : Samsung
brand : Apple 0 , 2
1
0 , 1
Terms
Sparse
Bitsets
A	
  Typical	
  Search	
  Flow	
  
Query Rewrite
Results
Query
Matching
Ranking Faceting
Stats
Posting List
Doc Values
Other
Components
Lucene Segment
Inverted Index
Forward Index
NRT Store
samsung mobiles
Offer : exchange offer
price desc
category : mobiles
brand : samsung
Offer : exchange offer
NRT	
  Forward	
  Index	
  -­‐	
  Considera;ons	
  
●  Lookup	
  efficiency	
  	
  
–  50th	
  percen;le	
  :	
  ~10K	
  matches	
  
–  99th	
  percen;le	
  :	
  ~1	
  million	
  matches	
  
●  Data	
  on	
  Java	
  heap	
  
–  Memory	
  efficiency	
  
	
  
NRT	
  Forward	
  Index	
  -­‐	
  Naive	
  Implementa;on	
  
NRT Forward IndexLucene Segment
Lookup Engine
0 ProductB
1 ProductA
2 ProductC
3 ProductD
ProductD
ProductA
ProductB
ProductC
ProductD
True
False
False
True
100
150
200
250
ProductId(3) <ProductD,price>
DocId : 3
field: price
250
ProductId Availability Price
Latency : ~10 secs for ~1 Million
lookups
NRT	
  Store	
  -­‐	
  Forward	
  Index	
  Op;mized	
  
Lookup Engine
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Field : price
250
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
Latency : ~100 ms for ~1 Million lookups
NRT	
  Store	
  Filter	
  -­‐	
  PostFilter	
  
PostFilter(Price:[100 TO 150])
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Don’t
Delegate
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
NRT Filter
NRT	
  Store	
  -­‐	
  Invert	
  index	
  
NRT Forward StoreNRT Inverter
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
NRT DocIdSet Cache
Availability : T 0 3
Offer : O1 2 3
Offer:O1 DocIdSet
Solr	
  Integra;on	
  Points	
  
•  ValueSources	
  
•  Filtering	
  
–  Custom	
  Filter	
  Implementa;on	
  for	
  cached	
  DocIdSet	
  
–  Custom	
  PostFilter	
  
•  Query	
  
–  Wrapper	
  over	
  Filter	
  
•  Custom	
  FacetComponent	
  
Near	
  Real	
  Time	
  Solr	
  Architecture	
  
Solr
Kafka
Ingestion pipeline
NRT Forward
Index
Ranking
Matching
Faceting
Redis
Bootstrap
NRT Inverted
store
Solr Master
NRT Updates
Lucene Updates
Catalogue
Pricing
Availability
Offers
Seller
Quality
Commit
+
Replicate
+
Reopen
Lucene
Others
Accomplishments	
  
•  Real	
  ;me	
  sor;ng	
  
•  Real	
  ;me	
  filtering	
  :	
  PostFilter	
  
–  Higher	
  latency	
  
•  Near	
  real	
  ;me	
  filtering	
  :	
  cached	
  DocIdSet	
  
–  No	
  consistency	
  between	
  lookup	
  and	
  filtering	
  
•  Independent	
  of	
  lucene	
  commits	
  
•  Query	
  latency	
  comparable	
  to	
  DocValues	
  
–  Consistent	
  99%	
  performance	
  
Accomplishments	
  @	
  Flipkart	
  
●  Real	
  ;me	
  consump;on	
  for	
  ~150	
  Signals	
  
●  Reduc;on	
  in	
  shown	
  out	
  of	
  stock	
  products	
  by	
  2X	
  
●  Produc;on	
  instances	
  of	
  ~50K	
  updates/second	
  real	
  ;me	
  
Thank	
  you	
  
&	
  
Ques8ons	
  

More Related Content

What's hot

The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
Kendrick Lo
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
ObjectRocket
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
Patrick McFadin
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
Danny Yuan
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Imply
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
Tommaso Teofili
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
Justin Basilico
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
HostedbyConfluent
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
Paul Wlodarczyk
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
DataWorks Summit
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 

What's hot (20)

The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Similar to Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
Umesh Prasad
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
GreenM
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Lucidworks
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020
Guido Oswald
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Tech Triveni
 
History of Apache Pinot
History of Apache Pinot History of Apache Pinot
History of Apache Pinot
Kishore Gopalakrishna
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
C4Media
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Lucidworks
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Inside Analysis
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
Apache Apex
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
lucenerevolution
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
Amazon Web Services
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann
 
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVSalesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Amit Chaudhary
 

Similar to Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart (20)

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
History of Apache Pinot
History of Apache Pinot History of Apache Pinot
History of Apache Pinot
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDVSalesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
Salesforce Apex Hours : How Lightning Platform Query Optimizer works for LDV
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart

  • 1. OCTOBER  11-­‐14,  2016    •    BOSTON,  MA  
  • 2. Near  Real  8me  Indexing   Building  Real  Time  Search  Index  For  E-­‐Commerce     Umesh  Prasad   Tech  Lead    @  Flipkart     Thejus  V  M   Data  Architect  @  Flipkart      
  • 3. Agenda   •  Search  @  Flipkart   •  Need  for  Real  Time  Search   •  SolrCloud  Solu;on   •  Our  approach   •  Q  &  A  
  • 4.
  • 5.
  • 6. Traffic  @  Flipkart   •  Peak  Traffic     –  ~  800K  ac;ve  users   –  ~  160K    requests  per  second     •  Search  Traffic     –  ~  40K  searches  per  second  (Service)   –  ~  10K  searches  per  second  (Solr  )   •  Latency   –   Median  :  11  ms   –   99th  percen;le  :  1.1  second  
  • 7. Search  @  Flipkart   •  Catalogue     –  ~  50  main  categories   – ~  5000  sub-­‐categories   – ~  231  million  documents   – ~  90  million  SKUs   – ~  160  million  lis;ngs     •  E-­‐commerce  Marketplace     – ~  100K    Sellers   – Local  Sellers   – Regional  Availability   – Logis;cs  Constraints    
  • 8. E-­‐commerce  Search   •  Heavy  usage  of  drill  down  filters   •  Heavy  usage  of  face;ng   •  Only  top  results  maer   •  Results  grouped/collapsed  by  products     •  Serviceability  and  delivery  experience  MATTERS    
  • 9. Agenda   •  Search  @  Flipkart   •  Need  for  Real  Time  Search   •  SolrCloud  Solu;on   •  Our  approach   •  Q  &  A  
  • 10. Sorry,      Stock  Over      !!?  
  • 11. Damn  !!  Is  Offer  Over  ??  
  • 12. What  !!    All  Steal  Deals  Gone  ??  
  • 13. Product  /Lis;ng:  Important  Aributes   Seller   Ra;ng   Service   catalogue   service   Promise   Service   Availability Service Offer   Service   Pricing   Service   Product  aka  SKU   Lis;ngs  
  • 14. Summary  :    Lucene  Document   •  Product/SKU    (Parent  Document)   –  Lis;ng  (Child  Document)     •  Query  :    Mostly    SKU  Aributes            (Free  Text)   •  Filters  :  SKU  +    Lis;ng  Aributes        (Drill  Down)   •  Ranking  :  SKU  +  Lis;ng    Aributes        (Explicit/ Relevance)     •  Index  Time  Join  aka  Block  Join        (Best   Performance)      
  • 15. Out  Of  Stock,  but  Why  Show?   Index has Stale Availability Data 234K   Products  
  • 16. Challenge  1  :  High  Update  Rates   updates  /  sec   updates  /hr     normal   Peak   text  /  catalogue   ~10   ~100   ~100K   pricing   ~100   ~1K   ~10  million   availability   ~100   ~10K   ~10  million   offer   ~100   ~10K   ~10  million   seller  ra8ng   ~10   ~1K   ~1  million   signal  6   ~10   ~100   ~1  million   signal  7   ~100   ~10K   ~10  million   signal  8   ~100   ~10K   ~10  million  
  • 17. Challenge  2  :  Micro  Services     Ingestion pipeline Catalogue Pricing Availability Offers ... Document Builder Solr/Lucene Change Propagation Documents {L1,L2 … P1} Updates Stream 1 Updates Stream 2 Updates Stream 3 ●  Lucene doesn’t support Partial Updates ●  Update = Delete + Add
  • 18. Agenda   •  Search  @  Flipkart   •  Need  for  Real  Time  Search   •  SolrCloud  Solu;on   •  Our  approach   •  Q  &  A  
  • 19. SolrCloud  for  NRT   Shard Replica Shard Replica Shard Replica Shard Replica Shard Replica Shard Replica Re-open searcher Re-open searcher Re-open searcher Re-open searcher Re-open searcher Re-open searcher Ingestion pipeline Shard Leader Auto commit Soft Commit Batch of documents For Document Versioning Update Log Forward to Replica
  • 20. SolrCloud  Evalua;on   •  Update  =  Delete  +  Add   –  Block  Join  Index    Update  Whole  Block  (Product  +  Lis;ngs)   •  Updated  Document  gets  streamed  to  all  replicas  in  sync   –  Reduces  indexing  throughput   •  Sol  commit  is  Not  Free   –  Sol  commit    In  Memory  Segment   –  Lots  of  Merges   –  Huge  document  churn  /  deletes   –  All  caches  s;ll  need  to  be  re-­‐generated   –  Filter  Cache  miss  specially  hurts  performance  
  • 21. Agenda   •  Search  @  Flipkart   •  Need  for  Real  Time  Index   •  SolrCloud  Solu;on   • Our  approach   •  Q  &  A  
  • 22. ProductA brand : Apple availability : T price : 45000 ProductB brand : Samsung availability : T price : 23000 ProductC brand : Apple availability : F price : 5000 Document ID Mappings Posting List (Inverted Index) DocValues (columunar data) Lucene Segment Lucene  Index   0 ProductA 1 ProductB 2 ProductC 45000 23000 5000Price availability : T brand : Samsung brand : Apple 0 , 2 1 0 , 1 Terms Sparse Bitsets
  • 23. A  Typical  Search  Flow   Query Rewrite Results Query Matching Ranking Faceting Stats Posting List Doc Values Other Components Lucene Segment Inverted Index Forward Index NRT Store samsung mobiles Offer : exchange offer price desc category : mobiles brand : samsung Offer : exchange offer
  • 24. NRT  Forward  Index  -­‐  Considera;ons   ●  Lookup  efficiency     –  50th  percen;le  :  ~10K  matches   –  99th  percen;le  :  ~1  million  matches   ●  Data  on  Java  heap   –  Memory  efficiency    
  • 25. NRT  Forward  Index  -­‐  Naive  Implementa;on   NRT Forward IndexLucene Segment Lookup Engine 0 ProductB 1 ProductA 2 ProductC 3 ProductD ProductD ProductA ProductB ProductC ProductD True False False True 100 150 200 250 ProductId(3) <ProductD,price> DocId : 3 field: price 250 ProductId Availability Price Latency : ~10 secs for ~1 Million lookups
  • 26. NRT  Store  -­‐  Forward  Index  Op;mized   Lookup Engine Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Field : price 250 DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00 Latency : ~100 ms for ~1 Million lookups
  • 27. NRT  Store  Filter  -­‐  PostFilter   PostFilter(Price:[100 TO 150]) Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Don’t Delegate DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00
  • 28. NRT Filter NRT  Store  -­‐  Invert  index   NRT Forward StoreNRT Inverter Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD NRT DocIdSet Cache Availability : T 0 3 Offer : O1 2 3 Offer:O1 DocIdSet
  • 29. Solr  Integra;on  Points   •  ValueSources   •  Filtering   –  Custom  Filter  Implementa;on  for  cached  DocIdSet   –  Custom  PostFilter   •  Query   –  Wrapper  over  Filter   •  Custom  FacetComponent  
  • 30. Near  Real  Time  Solr  Architecture   Solr Kafka Ingestion pipeline NRT Forward Index Ranking Matching Faceting Redis Bootstrap NRT Inverted store Solr Master NRT Updates Lucene Updates Catalogue Pricing Availability Offers Seller Quality Commit + Replicate + Reopen Lucene Others
  • 31. Accomplishments   •  Real  ;me  sor;ng   •  Real  ;me  filtering  :  PostFilter   –  Higher  latency   •  Near  real  ;me  filtering  :  cached  DocIdSet   –  No  consistency  between  lookup  and  filtering   •  Independent  of  lucene  commits   •  Query  latency  comparable  to  DocValues   –  Consistent  99%  performance  
  • 32. Accomplishments  @  Flipkart   ●  Real  ;me  consump;on  for  ~150  Signals   ●  Reduc;on  in  shown  out  of  stock  products  by  2X   ●  Produc;on  instances  of  ~50K  updates/second  real  ;me  
  • 33. Thank  you   &   Ques8ons