SlideShare a Scribd company logo
1 of 22
Download to read offline
Ubiquitous Solr - A Database’s not-so-evil Twin
Ayon Sinha
Data Foundation @WalmartLabs
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2
Text Search
wow
Search Suggestions
Search Engine… Lucene… Solr
•  Internet and Intranet Search
•  Relevance
•  Search Suggestions
•  Faceting
•  Recommendations
•  Time series
•  Log search
•  Geo-spatial search
•  Analytics
•  Graph search
•  Document Store
Recommendations
Relevance
Facets
3
Overview
•  How to scale any data infrastructure with Apache Solr
•  Build a high performance and highly available data platform for
internal and external users alike
•  Walmart’s commitment to open source
4
About me
•  Team lead at the Data Foundation team for the largest retailer and the
largest private employer in the world
•  Prior to Walmart, worked at startups building recommendation and
analytics systems
•  And prior to that, was building search applications, recommendations
systems and Hadoop based analytics systems for the largest online
auction company, ebay, for 6 years
•  Have been a manuscript reviewer for Manning publications for 4 years
and have helped shape the contents of “Hadoop in Practice” and “Big
Data”
5
About Walmart
•  11,000+ Stores in 27 countries
•  11 eCommerce sites
•  250M customers weekly in stores and online
•  Millions of database transactions per day
•  Sales, Holidays and massive volume shifts
6
It starts-up so simple
An idea implemented on the LAMP stack
7
Turns out to be a great idea!
Users seem to like the new product
8
Users REALLY like this..
Higher volume, increased use cases. Quick fix scaling
alternatives add some headroom … and complexity
9
We need more Business Intelligence
Business is looking good but source-of-truth data store,
not so much …
10
Scale up (in a hurry) with hardware
Least risk. Diminishing returns. What next?
11
Design to scale out
•  Offload queries to Search Engines
•  Offload recurring reads to Cache
•  Offload analytics to OLAP datastores
•  Shard the database
… and of course do something to hide the complexity. It is
worth it.
12
The Inspiration
Integration tools with a Lucene based search engines are
abundant
13
The “not-so-evil” Twin to protect your Source of Truth DB
•  What if a copy of your source-of-truth data is available … Just about
anywhere you want it?
•  How could you use a search engine to protect and augment your
database?
–  Redirect queries
•  Helps scale by reducing demand for
–  database indexing
–  database connections
–  scarce database resources like memory, storage
•  Not-so-evil Twin
–  Adding multiple near real-time search adds complexity … and it
comes at a cost; but done right, the benefits far outweigh the costs
14
Our Approach
•  Abstract the complexity of managing
–  source-of-truth database
–  cache coherence
–  Search queries
–  message bus
•  Abstract Connection pool management
•  Provide a scalable way to query across shards with full control of Solr
schema
•  And to analyze big data without affecting real-time systems and
isolating individual data domains
15
From a situation like..
16
DB, Solr and Hadoop
17
Sharded DB with Solr
18
The Eco-system
Separation of concerns
19
The Result
Scatter-gather vs Powered by Apache Solr
20
Lessons learned
A Search engine like Apache Solr is…
•  not limited to search-based business applications.
•  a first class citizen in your persistence technology stack; it
complements the SoT database.
•  easy to adopt and has all of us as community for support.
21
The Future
•  Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big
Data systems
•  Walmart is committed to be part of the community building it
22
Questions? Reach us at:
•  You can reach me, Ayon Sinha, at:
–  asinha@walmartlabs.com
–  https://www.linkedin.com/in/ayonsinha
•  Jason Sardina, our Lead Persistence Architect
–  jsardina@walmartlabs.com
•  @WalmartLabs is always hiring the best

More Related Content

What's hot

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Lucidworks
 

What's hot (20)

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & SparkWebinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
 
NATE-Central-Log
NATE-Central-LogNATE-Central-Log
NATE-Central-Log
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
An Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs AnalysisAn Open Source NoSQL solution for Internet Access Logs Analysis
An Open Source NoSQL solution for Internet Access Logs Analysis
 
Elastic{ON} 2017 Recap
Elastic{ON} 2017 RecapElastic{ON} 2017 Recap
Elastic{ON} 2017 Recap
 
Open source log analytics
Open source log analyticsOpen source log analytics
Open source log analytics
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 

Viewers also liked

Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Lucidworks
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Lucidworks
 

Viewers also liked (14)

What's Your Money Persona?
What's Your Money Persona?What's Your Money Persona?
What's Your Money Persona?
 
Nordnet investorkveld i Bergen - 6.6.2016
Nordnet investorkveld i Bergen - 6.6.2016Nordnet investorkveld i Bergen - 6.6.2016
Nordnet investorkveld i Bergen - 6.6.2016
 
Framtidens konkurransekraft finnes der det skapes sammen @ First Tuesday Bergen
Framtidens konkurransekraft finnes der det skapes sammen @ First Tuesday BergenFramtidens konkurransekraft finnes der det skapes sammen @ First Tuesday Bergen
Framtidens konkurransekraft finnes der det skapes sammen @ First Tuesday Bergen
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
 
Insight Works KPI Strategi @ Berghs School of Communication
Insight Works KPI Strategi @ Berghs School of CommunicationInsight Works KPI Strategi @ Berghs School of Communication
Insight Works KPI Strategi @ Berghs School of Communication
 
Digital disruption v6
Digital disruption v6Digital disruption v6
Digital disruption v6
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
一歩前に進める Web開発のスパイス(仙台Geek★Night #1)
一歩前に進めるWeb開発のスパイス(仙台Geek★Night #1)一歩前に進めるWeb開発のスパイス(仙台Geek★Night #1)
一歩前に進める Web開発のスパイス(仙台Geek★Night #1)
 
Creuna designthinking
Creuna designthinkingCreuna designthinking
Creuna designthinking
 
iShares - Exchange Traded Funds
iShares - Exchange Traded FundsiShares - Exchange Traded Funds
iShares - Exchange Traded Funds
 
How do it and telecom change ... วตท v2
How do it and telecom change ... วตท v2How do it and telecom change ... วตท v2
How do it and telecom change ... วตท v2
 
Extension of time Analysis
Extension of time AnalysisExtension of time Analysis
Extension of time Analysis
 
こんなに使える!今どきのAPIドキュメンテーションツール
こんなに使える!今どきのAPIドキュメンテーションツールこんなに使える!今どきのAPIドキュメンテーションツール
こんなに使える!今どきのAPIドキュメンテーションツール
 
Google apps scriptを使って業務改善
Google apps scriptを使って業務改善Google apps scriptを使って業務改善
Google apps scriptを使って業務改善
 

Similar to Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs

1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
Fabio Fumarola
 

Similar to Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs (20)

Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain   Big Data Baby Steps (4-12-2014) FinalUtah Big Mountain   Big Data Baby Steps (4-12-2014) Final
Utah Big Mountain Big Data Baby Steps (4-12-2014) Final
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupReal Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
 

More from Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs

  • 1. Ubiquitous Solr - A Database’s not-so-evil Twin Ayon Sinha Data Foundation @WalmartLabs O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. 2 Text Search wow Search Suggestions Search Engine… Lucene… Solr •  Internet and Intranet Search •  Relevance •  Search Suggestions •  Faceting •  Recommendations •  Time series •  Log search •  Geo-spatial search •  Analytics •  Graph search •  Document Store Recommendations Relevance Facets
  • 3. 3 Overview •  How to scale any data infrastructure with Apache Solr •  Build a high performance and highly available data platform for internal and external users alike •  Walmart’s commitment to open source
  • 4. 4 About me •  Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world •  Prior to Walmart, worked at startups building recommendation and analytics systems •  And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years •  Have been a manuscript reviewer for Manning publications for 4 years and have helped shape the contents of “Hadoop in Practice” and “Big Data”
  • 5. 5 About Walmart •  11,000+ Stores in 27 countries •  11 eCommerce sites •  250M customers weekly in stores and online •  Millions of database transactions per day •  Sales, Holidays and massive volume shifts
  • 6. 6 It starts-up so simple An idea implemented on the LAMP stack
  • 7. 7 Turns out to be a great idea! Users seem to like the new product
  • 8. 8 Users REALLY like this.. Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity
  • 9. 9 We need more Business Intelligence Business is looking good but source-of-truth data store, not so much …
  • 10. 10 Scale up (in a hurry) with hardware Least risk. Diminishing returns. What next?
  • 11. 11 Design to scale out •  Offload queries to Search Engines •  Offload recurring reads to Cache •  Offload analytics to OLAP datastores •  Shard the database … and of course do something to hide the complexity. It is worth it.
  • 12. 12 The Inspiration Integration tools with a Lucene based search engines are abundant
  • 13. 13 The “not-so-evil” Twin to protect your Source of Truth DB •  What if a copy of your source-of-truth data is available … Just about anywhere you want it? •  How could you use a search engine to protect and augment your database? –  Redirect queries •  Helps scale by reducing demand for –  database indexing –  database connections –  scarce database resources like memory, storage •  Not-so-evil Twin –  Adding multiple near real-time search adds complexity … and it comes at a cost; but done right, the benefits far outweigh the costs
  • 14. 14 Our Approach •  Abstract the complexity of managing –  source-of-truth database –  cache coherence –  Search queries –  message bus •  Abstract Connection pool management •  Provide a scalable way to query across shards with full control of Solr schema •  And to analyze big data without affecting real-time systems and isolating individual data domains
  • 16. 16 DB, Solr and Hadoop
  • 19. 19 The Result Scatter-gather vs Powered by Apache Solr
  • 20. 20 Lessons learned A Search engine like Apache Solr is… •  not limited to search-based business applications. •  a first class citizen in your persistence technology stack; it complements the SoT database. •  easy to adopt and has all of us as community for support.
  • 21. 21 The Future •  Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems •  Walmart is committed to be part of the community building it
  • 22. 22 Questions? Reach us at: •  You can reach me, Ayon Sinha, at: –  asinha@walmartlabs.com –  https://www.linkedin.com/in/ayonsinha •  Jason Sardina, our Lead Persistence Architect –  jsardina@walmartlabs.com •  @WalmartLabs is always hiring the best