SlideShare a Scribd company logo
1 of 34
Download to read offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Developing a Big Data Search Engine 2.0
Where we have gone, where we are going.
Mark Miller
Software Engineer, Cloudera
3
01
Who am I?
I’m Mark Miller
I’m a Lucene junkie (2006)
I’m a Lucene committer (2008)
And a Solr committer (2009)
And a member of the ASF (2011)
And a former Lucene PMC Chair (2014-2015)
I co-created SolrCloud (????)
4
02
A Quick Tour Through History
First there was Lucene.
It took a little while, but soon it was “good enough” to
replace most search engines. And faster. And more
efficient.
Lots of Search Engines built on Lucene (I made one!)
Then there was Solr.

And then there were Others.
5
03
Family Ties
...
6
01
What Search Engines Matter?
Lucene search engines lead the pack.

How can you tell? 

I like to look at db-engines.org - these are the skills devs
have, users are talking about and employers are hiring
for.

Also, plenty of anecdotal evidence that others are using
Lucene for the core.
Classic Enterprise Search Engines matter a little bit.
7
01
DB-Engines
2 Lucene engines in top usage - signs of success
8
01
Enterprise Search Engines
Oracle buys Endeca (2011 - $1.075B), Microsoft buys Fast
(2008 - $1.2B), HP buys Autonomy (2011 - $10.3B)
World Happiness Decreases.
The old leaders.
2006: Autonomy, FAST, Endeca tops in Gartner site search
study
2015 Leaders: Coveo, HP, Sinequa, Attivio, Lexmark
You can bet any large company using one of these also uses a
Lucene based solution.
The new leaders.
9
But Search is a general tool and Open Source is the core of the
Future and a virtuous cycle
Open source has become the default approach for
software with more than 66 percent of respondents
saying they consider OSS before other options.
(2015 BlackDuck)

It’s an open-source world: 78 percent of companies
run open-source software. (2015 BlackDuck)

Less than 3% DON’T USE OSS IN ANY WAY.
10
It’s the age of Lucene
11
“It is hopeless to talk to both of
you, you don't understand
virtual memory.”
Uwe Schindler @thetaph1 @uwesays
12
01
What is the future of Search?
More NoSQL
More SQL
More Realtime Analytics
More System of Record
More Graph
More Scale
Search will eat away at the stack.

Search focuses on pre processing and efficient in memory
data structures for fast responses.
13
01
The Solr Beginnings - Single node, then DIY distributed
Solr started as a single node solution, followed by master/
slave replication, followed by simple distributed search.
This was ‘good enough’ for a long time.
Classic ‘innovators dilemma’ problem.

Scaling out was super important, but not as soon as some
thought and sooner than others thought.
The challenge was, how do we evolve the system and can we
move the Solr user base to this evolution without disrupting
current users?
14
01
SolrCloud - Solr, ‘Clusterized’
15
01
Solr Meets Hadoop
First Class Solr Integrations
HDFS
MapReduce
Spark
Flume
HBase
Sentry
Etc
SolrCloud was already built on ZooKeeper
16
01
Now it’s all about scale and correctness.
The search features for the big data world are here and
rapidly advancing.

The next step is being able to handle Hadoop scale in the
‘general’ case.

And to be able to handle that correctly ‘enough’ of the
time.
17
“In my opinion the whole
code is a bug by itself.”
Uwe Schindler @thetaph1 @uwesays
18
01
The Call Me Maybe Tests
https://aphyr.com/tags/jepsen

Some basic testing around how systems live up to their CAP
promises. Heavy focus on partitions.

Most systems fail pretty badly. ZooKeeper rocked it.
SolrCloud did pretty darn well*.
Kyle
Kingsbury
19
01
Call Me … Maybe ??
Passing is actually like a very minimum bar. It doesn’t at all
mean your system is correct.

Your system could be complete crap and still pass.
In fact, in the general case, all the current best search
engines are still flakey at scale.
20
01
Search at Scale is still Flakey?
Yes, yes it is. Most systems at scale are still flakey. Most systems
don’t deliver on their promises. It’s a matter of degree.

How does search in particular get away with it?

Users are already used to not considering it the system of record.
Its easier to scale specialized than general - project has to scale
general but massive users can scale specialized.

We want the project to easily scale generally - no expertise
needed. You can already scale pretty large, but it takes a
‘vertical’ and expertise.
21
01
Search in Particular is HARD
The search engine is a many faceted beast.

There is a lot of surface area.
You need many very different features to all integrate well
together, usually in near realtime.

It sounds a lot easier than it is.
22
"Lucene is maybe the world's
most tested open source
project."
Uwe Schindler @thetaph1 #bbuzz 2014
23
01
The Lucene Testing Framework
Lucene regularly finds bugs in new Java releases.
Seriously. Regularly.

Many of those bugs are fixed and fixed quickly. Many are
not.

Randomized testing, reproducible master seeds.

“Test Beasting” and seti@home type resource requirements.
24
01
The Lucene Testing Framework
Code checkers and build enforcers galore, as well as test
level checkers and enforcers.



Who is policing the policeman?



You need a vibrant community that gives a damn.
25
“The stack trace is only
impossible if you look at the
code.”
Uwe Schindler @thetaph1 @uwesays
26
01
Testing is the Key and the Answer
Just because your tests don't normally fail doesn't mean
they are great. You probably just don’t normally see the
problems.

Our test framework exposes the problems - quickly.

This has pluses and minuses, but the pluses greatly
outweigh the minuses!
27
01
More on Testing
Integration and unit tests are equally important.

Integration tests are a little more important.

Testing, testing, and more testing is your best friend.

Communities grow, communities change, one or two can’t
hold the code together.
28
01
More on Testing
Distributed testing takes more.
You want testing to the hardware level, not as just as part
of a simple test framework.
You want to test on large, expensive clusters.
Debugging grows as an issue.
Companies are taking on this work and the results are and
will be funneled back into the project.
Age is a virtue.
29
01
Regular Large Scale Testing will be a challenge!
1000 nodes
with
SolrCloud Radial
View
30
01
RAW
TBD: At Cloudera we are building htrace, chaos
monkey w/ fault injection, etc - higher level is
important, beast test cluster
31
01
The Race for Scalable search is on!
My approach will be to leverage Hadoop as much as possible!

Many companies are focused on Solr - there will be many
approaches!

It’s still early in the game.
32
01
Leverage Hadoop
A distributed filesystem is a beautiful crutch to lean on!
Consider a single index shared by all replicas.

ETL at scale is not Solr’s strength.
A marriage with Hadoop is natural and has been long
ongoing

Hadoop will push Solr to it’s limits and 

beyond.
33
"As a good policeman I have
all open source ‘guns’ for
code checking available."
Uwe Schindler @thetaph1 @uwesays
https://code.google.com/p/forbidden-apis/
http://labs.carrotsearch.com/randomizedtesting.html
34
01
Thank You!
Mark Miller
@heismark
Software
Engineer
Cloudera

More Related Content

What's hot

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningLucidworks
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...Grokking VN
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Rahul Thankachan
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAmrith Krishna
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 

What's hot (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
 
Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics Stack Overflow slides Data Analytics
Stack Overflow slides Data Analytics
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 

Similar to Developing A Big Data Search Engine - Where we have gone. Where we are going: Presented by Mark Miller, Cloudera

Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedSalman Ahmed
 
The route towards cloud automation
The route towards cloud automationThe route towards cloud automation
The route towards cloud automationjcrugzz
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsSteve Pember
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
 
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent OzarGeek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent OzarIDERA Software
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
Diagnosability vs The Cloud
Diagnosability vs The CloudDiagnosability vs The Cloud
Diagnosability vs The CloudBob Rhubart
 
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Cary Millsap
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Moving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesMoving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesKP Kaiser
 
devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!Andrew Shafer
 
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!VMware Tanzu
 
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Michael Rosenblum
 
SAD07 - Project Management
SAD07 - Project ManagementSAD07 - Project Management
SAD07 - Project ManagementMichael Heron
 
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...NETWAYS
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 

Similar to Developing A Big Data Search Engine - Where we have gone. Where we are going: Presented by Mark Miller, Cloudera (20)

Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Wed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmedWed-12-05pm-box-salmanahmed
Wed-12-05pm-box-salmanahmed
 
Goto chicago
Goto chicagoGoto chicago
Goto chicago
 
The route towards cloud automation
The route towards cloud automationThe route towards cloud automation
The route towards cloud automation
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
 
Micro services
Micro servicesMicro services
Micro services
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent OzarGeek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
Geek Sync | Planning a SQL Server to Azure Migration in 2021 - Brent Ozar
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
Diagnosability vs The Cloud
Diagnosability vs The CloudDiagnosability vs The Cloud
Diagnosability vs The Cloud
 
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Moving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed TracesMoving to Microservices with the Help of Distributed Traces
Moving to Microservices with the Help of Distributed Traces
 
devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!
 
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
 
On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)On Impact in Software Engineering Research (Dagstuhl 2020)
On Impact in Software Engineering Research (Dagstuhl 2020)
 
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
Managing the Earthquake: Surviving Major Database Architecture Changes (rev.2...
 
SAD07 - Project Management
SAD07 - Project ManagementSAD07 - Project Management
SAD07 - Project Management
 
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...
stackconf 2023 | Better Living by Changing Less – IncrativeOps by Michael Cot...
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 

Developing A Big Data Search Engine - Where we have gone. Where we are going: Presented by Mark Miller, Cloudera

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Developing a Big Data Search Engine 2.0 Where we have gone, where we are going. Mark Miller Software Engineer, Cloudera
  • 3. 3 01 Who am I? I’m Mark Miller I’m a Lucene junkie (2006) I’m a Lucene committer (2008) And a Solr committer (2009) And a member of the ASF (2011) And a former Lucene PMC Chair (2014-2015) I co-created SolrCloud (????)
  • 4. 4 02 A Quick Tour Through History First there was Lucene. It took a little while, but soon it was “good enough” to replace most search engines. And faster. And more efficient. Lots of Search Engines built on Lucene (I made one!) Then there was Solr.
 And then there were Others.
  • 6. 6 01 What Search Engines Matter? Lucene search engines lead the pack.
 How can you tell? 
 I like to look at db-engines.org - these are the skills devs have, users are talking about and employers are hiring for.
 Also, plenty of anecdotal evidence that others are using Lucene for the core. Classic Enterprise Search Engines matter a little bit.
  • 7. 7 01 DB-Engines 2 Lucene engines in top usage - signs of success
  • 8. 8 01 Enterprise Search Engines Oracle buys Endeca (2011 - $1.075B), Microsoft buys Fast (2008 - $1.2B), HP buys Autonomy (2011 - $10.3B) World Happiness Decreases. The old leaders. 2006: Autonomy, FAST, Endeca tops in Gartner site search study 2015 Leaders: Coveo, HP, Sinequa, Attivio, Lexmark You can bet any large company using one of these also uses a Lucene based solution. The new leaders.
  • 9. 9 But Search is a general tool and Open Source is the core of the Future and a virtuous cycle Open source has become the default approach for software with more than 66 percent of respondents saying they consider OSS before other options. (2015 BlackDuck)
 It’s an open-source world: 78 percent of companies run open-source software. (2015 BlackDuck)
 Less than 3% DON’T USE OSS IN ANY WAY.
  • 10. 10 It’s the age of Lucene
  • 11. 11 “It is hopeless to talk to both of you, you don't understand virtual memory.” Uwe Schindler @thetaph1 @uwesays
  • 12. 12 01 What is the future of Search? More NoSQL More SQL More Realtime Analytics More System of Record More Graph More Scale Search will eat away at the stack.
 Search focuses on pre processing and efficient in memory data structures for fast responses.
  • 13. 13 01 The Solr Beginnings - Single node, then DIY distributed Solr started as a single node solution, followed by master/ slave replication, followed by simple distributed search. This was ‘good enough’ for a long time. Classic ‘innovators dilemma’ problem.
 Scaling out was super important, but not as soon as some thought and sooner than others thought. The challenge was, how do we evolve the system and can we move the Solr user base to this evolution without disrupting current users?
  • 14. 14 01 SolrCloud - Solr, ‘Clusterized’
  • 15. 15 01 Solr Meets Hadoop First Class Solr Integrations HDFS MapReduce Spark Flume HBase Sentry Etc SolrCloud was already built on ZooKeeper
  • 16. 16 01 Now it’s all about scale and correctness. The search features for the big data world are here and rapidly advancing.
 The next step is being able to handle Hadoop scale in the ‘general’ case.
 And to be able to handle that correctly ‘enough’ of the time.
  • 17. 17 “In my opinion the whole code is a bug by itself.” Uwe Schindler @thetaph1 @uwesays
  • 18. 18 01 The Call Me Maybe Tests https://aphyr.com/tags/jepsen
 Some basic testing around how systems live up to their CAP promises. Heavy focus on partitions.
 Most systems fail pretty badly. ZooKeeper rocked it. SolrCloud did pretty darn well*. Kyle Kingsbury
  • 19. 19 01 Call Me … Maybe ?? Passing is actually like a very minimum bar. It doesn’t at all mean your system is correct.
 Your system could be complete crap and still pass. In fact, in the general case, all the current best search engines are still flakey at scale.
  • 20. 20 01 Search at Scale is still Flakey? Yes, yes it is. Most systems at scale are still flakey. Most systems don’t deliver on their promises. It’s a matter of degree.
 How does search in particular get away with it?
 Users are already used to not considering it the system of record. Its easier to scale specialized than general - project has to scale general but massive users can scale specialized.
 We want the project to easily scale generally - no expertise needed. You can already scale pretty large, but it takes a ‘vertical’ and expertise.
  • 21. 21 01 Search in Particular is HARD The search engine is a many faceted beast.
 There is a lot of surface area. You need many very different features to all integrate well together, usually in near realtime.
 It sounds a lot easier than it is.
  • 22. 22 "Lucene is maybe the world's most tested open source project." Uwe Schindler @thetaph1 #bbuzz 2014
  • 23. 23 01 The Lucene Testing Framework Lucene regularly finds bugs in new Java releases. Seriously. Regularly.
 Many of those bugs are fixed and fixed quickly. Many are not.
 Randomized testing, reproducible master seeds.
 “Test Beasting” and seti@home type resource requirements.
  • 24. 24 01 The Lucene Testing Framework Code checkers and build enforcers galore, as well as test level checkers and enforcers.
 
 Who is policing the policeman?
 
 You need a vibrant community that gives a damn.
  • 25. 25 “The stack trace is only impossible if you look at the code.” Uwe Schindler @thetaph1 @uwesays
  • 26. 26 01 Testing is the Key and the Answer Just because your tests don't normally fail doesn't mean they are great. You probably just don’t normally see the problems.
 Our test framework exposes the problems - quickly.
 This has pluses and minuses, but the pluses greatly outweigh the minuses!
  • 27. 27 01 More on Testing Integration and unit tests are equally important.
 Integration tests are a little more important.
 Testing, testing, and more testing is your best friend.
 Communities grow, communities change, one or two can’t hold the code together.
  • 28. 28 01 More on Testing Distributed testing takes more. You want testing to the hardware level, not as just as part of a simple test framework. You want to test on large, expensive clusters. Debugging grows as an issue. Companies are taking on this work and the results are and will be funneled back into the project. Age is a virtue.
  • 29. 29 01 Regular Large Scale Testing will be a challenge! 1000 nodes with SolrCloud Radial View
  • 30. 30 01 RAW TBD: At Cloudera we are building htrace, chaos monkey w/ fault injection, etc - higher level is important, beast test cluster
  • 31. 31 01 The Race for Scalable search is on! My approach will be to leverage Hadoop as much as possible!
 Many companies are focused on Solr - there will be many approaches!
 It’s still early in the game.
  • 32. 32 01 Leverage Hadoop A distributed filesystem is a beautiful crutch to lean on! Consider a single index shared by all replicas.
 ETL at scale is not Solr’s strength. A marriage with Hadoop is natural and has been long ongoing
 Hadoop will push Solr to it’s limits and 
 beyond.
  • 33. 33 "As a good policeman I have all open source ‘guns’ for code checking available." Uwe Schindler @thetaph1 @uwesays https://code.google.com/p/forbidden-apis/ http://labs.carrotsearch.com/randomizedtesting.html