Slash n near real time indexing

Solr/Lucene expert, Search & Data@Flipkart. Payments@Amazon. Customer Exp/Startup India Champ
Dec. 10, 2015
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
Slash n   near real time indexing
1 of 27

More Related Content

What's hot

Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfSease
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchSigmoid
Taxonomies for UsersTaxonomies for Users
Taxonomies for UsersHeather Hedden
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks

Viewers also liked

Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyondAnshum Gupta
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceLucidworks
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Anshum Gupta
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks

Similar to Slash n near real time indexing

near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce Umesh Prasad
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysWout Scheepers
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDBAmazon Web Services
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey

Similar to Slash n near real time indexing(20)

Recently uploaded

North American YAT-28 Turbo Trojan.pdfNorth American YAT-28 Turbo Trojan.pdf
North American YAT-28 Turbo Trojan.pdfTahirSadikovi
gdsc info session .pptxgdsc info session .pptx
gdsc info session .pptxThestarsahil
THEODOLITE.ppsxTHEODOLITE.ppsx
THEODOLITE.ppsxMOHDTARIQFAROOQUI
1st Ansys Technology Day in Athens, Agenda1st Ansys Technology Day in Athens, Agenda
1st Ansys Technology Day in Athens, AgendaSIMTEC Software and Services
Dynamics (Hibbeler) (1).pdfDynamics (Hibbeler) (1).pdf
Dynamics (Hibbeler) (1).pdfVEGACHRISTINEF
DBMSDBMS
DBMSKaranSingh274675

Slash n near real time indexing

Editor's Notes

  1. Going from a Page 1 to Page could be a matter of seconds on Sales Day ( Big Billion Day)
  2. Hierarchical documents ( Product → Listing ) Highly structured Free Text, Numeric, Tags Micro services for individual field updates Different update rates Independently updating fields
  3. Availabilty has been used in ranking, but it is stale, hence OOS. Explain challenge of 234K
  4. Means, the entire index will be recreated every hour
  5. Product Documents + Seller SKU Documents block-join index block : Composite document, with product and all its seller SKU Con Any Update = Delete + Recreate entire block Aggravates Delete + Recreate problem
  6. Remove animation, don’t spend too much time on it.
  7. Posting =
  8. Keep the fast changing data outside of the index Update this data independent of Solr updates Hooks in Lucene/Solr for retrieval ValueSource Filter Collector
  9. Explain the API Hook
  10. Lucene APIs : internal document id Columnar data structures Implementation dependent on data type Chosen for memory efficiency boolean : 1bit enum : log(#enumerations) bits int : 4 bytes multi val : array of the above data structures
  11. Filter API of lucene DocIdSet getDocIdSet(LuceneIndex) Invert data to adhere to lucene’s internal order at regular intervals of time
  12. Extract segment structure in a different slide
  13. Extract segment structure in a different slide