SlideShare a Scribd company logo
1 of 60
Big Search w/ Big Data
      Principles
          LuceneRevolution 2012
  Eric Pugh | epugh@o19s.com | @dep4b
What is Big Search?
Who am i?
ā€¢    Principal of OpenSource Connections
    - Solr/Lucene Search Consultancy

ā€¢   Member of Apache Software
    Foundation

ā€¢   SOLR-284 UpdateRichDocuments
    (July 07)

ā€¢   Fascinated by the art of software
    development
n!
                   io
              it
         ed
     d
2n
                         AUTHOR
AGILISTA
Telling some stories

ā€¢ Prototyping
ā€¢ Application Development
ā€¢ Maintaining Your Cluster
Not an intro to cloud
    Computing
ā€¢ See Indexing Big Data on Amazon AWS by
  Scott Stults @ 1:15 Thursday
ā€¢ See How is the Government Spending Your
  Money? How GCE is Using Lucene and the
  GCE Big Data Cloud by Seshu Simhadi @
  2:55 Thursday
Not an intro to
       SolrCloud!
ā€¢ See How SolrCloud Changes the User
  Experience In a Sharded Environment by
  Erick Erickson @ 2:55 Today
ā€¢ See Solr 4: The SolrCloud Architecture by
  Mark Miller @ 10:45 Tomorrow
My Assumptions for
       Client X
ā€¢ Big Data is any data set that is primarily at
  rest due to the difļ¬culty of working with it.
ā€¢ Limited selection of tools available.
ā€¢ Aggressive timeline.
ā€¢ All the data must be searched per query.
ā€¢ On Solr 3.x line
Telling some stories

ā€¢ Prototyping
ā€¢ Application Development
ā€¢ Maintaining Your Cluster
Boy meets Girl Story

Metadata

            Ingest    Solr
                       Solr
           Pipeline     Solr
                         Solr
Content
 Files
Bash Rocks
Bash Rocks
ā€¢ Remote Solr stop/start scripts
ā€¢ Remote Indexer stop/start scripts
ā€¢ Performance Monitoring
ā€¢ Content Extraction scripts (+Java)
ā€¢ Ingestor Scripts (+Java)
ā€¢ Artifact Deployment (CM)
Make it easy to change
       sharding
Make it easy to change
             sharding
	 public void run(Map options, List<SolrInputDocument> docs) throws
InstantiationException, IllegalAccessException, ClassNotFoundException {
	 	 IndexStrategy indexStrategy = (IndexStrategy) Class.forName(
	 	 	 	 "com.o19s.solr.ModShardIndexStrategy").newInstance();
	 	 indexStrategy.configure(options);
	 	
	 	 for (SolrInputDocument doc:docs){
	 	 	 indexStrategy.addDocument(doc);
	 	 }
	 }
Separate JVM from Solr
        Cores
ā€¢ Step 1: Fire up empty Solrā€™s on all the
  servers (nohup &).
ā€¢ Step 2:Verify they started cleanly.
ā€¢ Step 3: Create Cores (curl http://
  search1.o19s.com:8983/solr/admin?
  action=create&name=run2)
ā€¢ Step 4: Create a ā€œaggregatorā€ core, passing
  in urls of Cores. (&property.shards=)
Go Wide Quickly
search1.o19s.com
search1.o19s.com
                      shard1
                       shard1
                        shard1
                         shard1   :8983
 shard1
  shard1
   shard1
    shard1   :8983
                     search2.o19s.com
 shard1
  shard1
   shard1
    shard8   :8984    shard1
                       shard1
                        shard1    :8983
                         shard8

 shard1
  shard1
   shard1 :8985
    shard12          search3.o19s.com
                      shard1
                       shard1
                        shard1 :8985
                         shard12
                       shard1
                        shard1
                         shard1 :8983
                          shard12
Simple Pipeline


ā€¢   Simple pipeline

ā€¢   mv is atomic
Donā€™t Move Files
ā€¢ SCP across machines is slow/error prone
ā€¢ NFS share, single point of failure.
ā€¢ Clustered ļ¬le system like GFS (Global File
  System) can have ā€œfencingā€ issues
ā€¢ HDFS shines here.
ā€¢ ZooKeeper shines here.
Can you test your
    changes?
JVM tuning is black art
-verbose:gc
-XX:+PrintGCDetails
-server
-Xmx8G
-Xms8G
-XX:MaxPermSize=256m
-XX:PermSize=256m
-XX:+AggressiveHeap
-XX:+DisableExplicitGC
-XX:ParallelGCThreads=16
-XX:+UseParallelOldGC
Run, donā€™t Walk
Telling some stories

ā€¢ Prototyping
ā€¢ Application Development
ā€¢ Maintaining Your Cluster
Grab some Data
#!/bin/sh
SOURCE_SOLR='http://
ec2-107-20-92-190.compute-1.amazonaws.com:8983/solr/
core0/select?q=*%3A*&start=0&rows=500000&wt=csv'

TARGET_SOLR=http://localhost:8983/solr/us_patent_grant/
update/csv

wget -O output.csv $SOURCE_SOLR

curl 'http://localhost:8983/solr/us_patent_grant/update/
csv?skipLines=1&commit=true&optimize=true' --data-binary
@output.csv -H 'Content-type:text/plain; charset=utf-8'
Using Solr as a key/
           value store
    ā€¢ thousands of queries per second without
       real time get.
http://localhost:8983/solr/run2_enrichment/select?
q=id:DOC45242&fl=entities,html



    ā€¢ ??? with real time get?
http://localhost:8983/solr/run2_enrichment/get?
id=DOC45242&fl=entities,html
Using Solr as key/value store
               Solr Key/
              Value Cache
   Metadata

                 Ingest     Solr
                             Solr
                Pipeline      Solr
                               Solr
   Content
    Files
Using Solr as key/value store
    ā€¢ thousands of queries per second without
       real time get.
http://localhost:8983/solr/run2_enrichment/select?
q=id:DOC45242&fl=entities,html



    ā€¢ ??? with real time get?
 http://localhost:8983/solr/run2_enrichment/get?
 id=DOC45242&fl=entities,html
Push schema deļ¬nition
      to the application
    ā€¢ Not ā€œschema lessā€
    ā€¢ Just different owner of schema!
    ā€¢ Schema may have common set of ļ¬elds like
       id, type, timestamp, version
    ā€¢ Nothing required.
q=intensity_i:[70 TO 0]&fq=TYPE:streetlamp_monitor
Donā€™t do expensive
    things in Solr

ā€¢ Tika content extraction aka Solr Cell

ā€¢ UpdateRequestProcessorChain
Donā€™t do expensive
    things in Solr

ā€¢ Tika content extraction aka Solr Cell

ā€¢ UpdateRequestProcessorChain
Avro!
ā€¢ Supports serialization of data readable from
  multiple languages
ā€¢ Itā€™s smart XML
ā€¢ Handles forward and reverse versions of an
  object
ā€¢ Compact and fast to read.
Avro!
 Solr Key/
Value Cache
                         .avro


 Metadata      Ingest            Solr
                                  Solr
              Pipeline             Solr
                                    Solr



 Content
  Files
No JavaBin




                         /u
                           G te
                            p
                            iv /
                             da
                              e av
                                m r
                                 e o!
ā€¢ Avoid Jarmaggeddon
ā€¢ Reļ¬‚ection? Ugh.
No JavaBin
              Solr Key/
             Value Cache
Metadata

                Ingest     Solr
                            Solr
               Pipeline      Solr
                              Solr
Content
 Files
No JavaBin
              Solr Key/  Solr 3.4
             Value Cache
Metadata

                Ingest         Solr
                                Solr
               Pipeline          Solr
                                  Solr
Content
 Files
No JavaBin
              Solr Key/  Solr 3.4
             Value Cache
Metadata
                                    Solr 4
                Ingest         Solr
                                Solr
               Pipeline          Solr
                                  Solr
Content
 Files
No JavaBin
              Solr Key/  Solr 3.4
             Value Cache
Metadata
                                    Solr 4
                Ingest         Solr
                                Solr
               Pipeline          Solr
                                  Solr
Content
             Which SolrJ
 Files
             version do I
                 use?
Telling some stories

ā€¢ Prototyping
ā€¢ Application Development
ā€¢ Maintaining Your Cluster
Upgrade Lucene
           Indexes Easily
    ā€¢ Donā€™t reindex!
    ā€¢ Try out new features!
                                          David Lyle
java -cp lucene-core.jar
org.apache.lucene.index.IndexUpgrader [-delete-prior-
commits] [-verbose] indexDir
Indexing is Easy and
       Quick
CHEAP AND CHEERFUL



    <       >
NRT versus BigData
The tension between
scale and update rate


                  Bad Place
                    to Be
  > 100,000,000               < 10,000,000
The tension between
    scale and update rate

10 million   Bad Place   100ā€™s of millions
Grim Reaper
Delayed Replication
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="slave">
 <str name="masterUrl">http://localhost:8983/solr/replication</str>
 <str name="pollInterval">36:00:00</str>
</lst>
</requestHandler>
Enable/Disable
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
  <str name="q">MY HARD QUERY</str>
  <str name="shards">http://search1.o19s.com:8983/solr/run2_1,http://
search1.o19s.com:8983/solr/run2_2,http://search1.o19s.com:8983/solr/run2_2
</lst>
<lst name="defaults">
  <str name="echoParams">all</str>
</lst>
<str name="healthcheckFile">server-enabled.txt</str>
</requestHandler>
Enable/Disable

ā€¢ Solr-3301
Provisioning

ā€¢ Chef/Puppet
ā€¢ ZooKeeper
ā€¢ Have you versioned everything to build an
  index?
TYPICAL ENVIRONMENT
FLEXIBLE ENVIRONMENT
Do I need Failover?

ā€¢ Can I build quickly?
ā€¢ Do I have a reliable cluster?
ā€¢ Am I spread across data centers?
ā€¢ Is sooo 90ā€™s....
Telling some stories

ā€¢ Prototyping
ā€¢ Application Development
ā€¢ Maintaining Your Cluster
Some Other Thoughts
Donā€™t be Mesmerized
Scientiļ¬c
Method
Thank you!

ā€¢ epugh@o19s.com
ā€¢ @dep4b
ā€¢ www.opensourceconnections.com

More Related Content

What's hot

NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
Ā 

What's hot (20)

Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Ā 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
Ā 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Ā 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
Ā 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
Ā 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Ā 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
Ā 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
Ā 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
Ā 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
Ā 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
Ā 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
Ā 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
Ā 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
Ā 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Ā 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
Ā 
RESTful API ā€“ How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API ā€“ How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API ā€“ How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API ā€“ How to Consume, Extract, Store and Visualize Data with InfluxDB...
Ā 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Ā 
Constructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDBConstructing Web APIs with Rack, Sinatra and MongoDB
Constructing Web APIs with Rack, Sinatra and MongoDB
Ā 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Ā 

Viewers also liked

Chapter 02
Chapter 02Chapter 02
Chapter 02
mcastro284
Ā 
A7 getting value from big data how to get there quickly and leverage your c...
A7   getting value from big data how to get there quickly and leverage your c...A7   getting value from big data how to get there quickly and leverage your c...
A7 getting value from big data how to get there quickly and leverage your c...
Dr. Wilfred Lin (Ph.D.)
Ā 
C6 deploying applications to your private cloud 7 to 10 times faster
C6   deploying applications to your private cloud 7 to 10 times fasterC6   deploying applications to your private cloud 7 to 10 times faster
C6 deploying applications to your private cloud 7 to 10 times faster
Dr. Wilfred Lin (Ph.D.)
Ā 
Chapter 02 The Internet
Chapter 02 The InternetChapter 02 The Internet
Chapter 02 The Internet
xtin101
Ā 
Chapter 06 Inside Computers and Mobile Devices
Chapter 06 Inside Computers and Mobile DevicesChapter 06 Inside Computers and Mobile Devices
Chapter 06 Inside Computers and Mobile Devices
xtin101
Ā 
Chapter 05 Digital Safety and Security
Chapter 05 Digital Safety and SecurityChapter 05 Digital Safety and Security
Chapter 05 Digital Safety and Security
xtin101
Ā 

Viewers also liked (20)

5 Factors Impacting Your Big Data Projectā€™s Performance
5 Factors Impacting Your Big Data Projectā€™s Performance5 Factors Impacting Your Big Data Projectā€™s Performance
5 Factors Impacting Your Big Data Projectā€™s Performance
Ā 
Chart of the week- 11th November 2016 - no Brexit effect in the trade data
Chart of the week- 11th November 2016 - no Brexit effect in the trade dataChart of the week- 11th November 2016 - no Brexit effect in the trade data
Chart of the week- 11th November 2016 - no Brexit effect in the trade data
Ā 
Git Internals
Git InternalsGit Internals
Git Internals
Ā 
Chapter 02
Chapter 02Chapter 02
Chapter 02
Ā 
A7 getting value from big data how to get there quickly and leverage your c...
A7   getting value from big data how to get there quickly and leverage your c...A7   getting value from big data how to get there quickly and leverage your c...
A7 getting value from big data how to get there quickly and leverage your c...
Ā 
C6 deploying applications to your private cloud 7 to 10 times faster
C6   deploying applications to your private cloud 7 to 10 times fasterC6   deploying applications to your private cloud 7 to 10 times faster
C6 deploying applications to your private cloud 7 to 10 times faster
Ā 
Big Data: Implications for Marketing and Strategy
Big Data: Implications for Marketing and StrategyBig Data: Implications for Marketing and Strategy
Big Data: Implications for Marketing and Strategy
Ā 
Large-scale digitisation options at the Natural History Museum, London.
Large-scale digitisation options at the Natural History Museum, London.Large-scale digitisation options at the Natural History Museum, London.
Large-scale digitisation options at the Natural History Museum, London.
Ā 
Methodological principles in dealing with Big Data, Reijo Sund
Methodological principles in dealing with Big Data, Reijo SundMethodological principles in dealing with Big Data, Reijo Sund
Methodological principles in dealing with Big Data, Reijo Sund
Ā 
Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...
Ā 
Privacy in a digital world
Privacy in a digital worldPrivacy in a digital world
Privacy in a digital world
Ā 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
Ā 
Case study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPANCase study of online machine learning for display advertising in Yahoo! JAPAN
Case study of online machine learning for display advertising in Yahoo! JAPAN
Ā 
Hadoop Hbase - Introduction
Hadoop Hbase - IntroductionHadoop Hbase - Introduction
Hadoop Hbase - Introduction
Ā 
Chapter 02 The Internet
Chapter 02 The InternetChapter 02 The Internet
Chapter 02 The Internet
Ā 
Chapter 06 Inside Computers and Mobile Devices
Chapter 06 Inside Computers and Mobile DevicesChapter 06 Inside Computers and Mobile Devices
Chapter 06 Inside Computers and Mobile Devices
Ā 
Chapter 05 Digital Safety and Security
Chapter 05 Digital Safety and SecurityChapter 05 Digital Safety and Security
Chapter 05 Digital Safety and Security
Ā 
Research on data journalism: What is there to investigate? Insights from a st...
Research on data journalism: What is there to investigate? Insights from a st...Research on data journalism: What is there to investigate? Insights from a st...
Research on data journalism: What is there to investigate? Insights from a st...
Ā 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
Ā 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
Ā 

Similar to Big Search with Big Data Principles

[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
James Chen
Ā 

Similar to Big Search with Big Data Principles (20)

ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
Ā 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
Ā 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Ā 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
Ā 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
Ā 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Ā 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
Ā 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
Ā 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
Ā 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
Ā 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
Ā 
Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF Database
Ā 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Ā 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
Ā 
Solr
SolrSolr
Solr
Ā 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
Ā 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Ā 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
Ā 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
Ā 
Polyglot Grails
Polyglot GrailsPolyglot Grails
Polyglot Grails
Ā 

More from OpenSource Connections

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
Ā 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
OpenSource Connections
Ā 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
OpenSource Connections
Ā 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
Ā 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
Ā 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
Ā 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
Ā 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
Ā 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Ā 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Ā 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Ā 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Ā 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Ā 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Ā 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
Ā 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Ā 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Ā 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Ā 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Ā 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Ā 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Ā 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Ā 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
Ā 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
Ā 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Ā 

Big Search with Big Data Principles

  • 1. Big Search w/ Big Data Principles LuceneRevolution 2012 Eric Pugh | epugh@o19s.com | @dep4b
  • 2. What is Big Search?
  • 3. Who am i? ā€¢ Principal of OpenSource Connections - Solr/Lucene Search Consultancy ā€¢ Member of Apache Software Foundation ā€¢ SOLR-284 UpdateRichDocuments (July 07) ā€¢ Fascinated by the art of software development
  • 4. n! io it ed d 2n AUTHOR
  • 6. Telling some stories ā€¢ Prototyping ā€¢ Application Development ā€¢ Maintaining Your Cluster
  • 7. Not an intro to cloud Computing ā€¢ See Indexing Big Data on Amazon AWS by Scott Stults @ 1:15 Thursday ā€¢ See How is the Government Spending Your Money? How GCE is Using Lucene and the GCE Big Data Cloud by Seshu Simhadi @ 2:55 Thursday
  • 8. Not an intro to SolrCloud! ā€¢ See How SolrCloud Changes the User Experience In a Sharded Environment by Erick Erickson @ 2:55 Today ā€¢ See Solr 4: The SolrCloud Architecture by Mark Miller @ 10:45 Tomorrow
  • 9. My Assumptions for Client X ā€¢ Big Data is any data set that is primarily at rest due to the difļ¬culty of working with it. ā€¢ Limited selection of tools available. ā€¢ Aggressive timeline. ā€¢ All the data must be searched per query. ā€¢ On Solr 3.x line
  • 10. Telling some stories ā€¢ Prototyping ā€¢ Application Development ā€¢ Maintaining Your Cluster
  • 11. Boy meets Girl Story Metadata Ingest Solr Solr Pipeline Solr Solr Content Files
  • 13. Bash Rocks ā€¢ Remote Solr stop/start scripts ā€¢ Remote Indexer stop/start scripts ā€¢ Performance Monitoring ā€¢ Content Extraction scripts (+Java) ā€¢ Ingestor Scripts (+Java) ā€¢ Artifact Deployment (CM)
  • 14. Make it easy to change sharding
  • 15. Make it easy to change sharding public void run(Map options, List<SolrInputDocument> docs) throws InstantiationException, IllegalAccessException, ClassNotFoundException { IndexStrategy indexStrategy = (IndexStrategy) Class.forName( "com.o19s.solr.ModShardIndexStrategy").newInstance(); indexStrategy.configure(options); for (SolrInputDocument doc:docs){ indexStrategy.addDocument(doc); } }
  • 16. Separate JVM from Solr Cores ā€¢ Step 1: Fire up empty Solrā€™s on all the servers (nohup &). ā€¢ Step 2:Verify they started cleanly. ā€¢ Step 3: Create Cores (curl http:// search1.o19s.com:8983/solr/admin? action=create&name=run2) ā€¢ Step 4: Create a ā€œaggregatorā€ core, passing in urls of Cores. (&property.shards=)
  • 18. search1.o19s.com search1.o19s.com shard1 shard1 shard1 shard1 :8983 shard1 shard1 shard1 shard1 :8983 search2.o19s.com shard1 shard1 shard1 shard8 :8984 shard1 shard1 shard1 :8983 shard8 shard1 shard1 shard1 :8985 shard12 search3.o19s.com shard1 shard1 shard1 :8985 shard12 shard1 shard1 shard1 :8983 shard12
  • 19. Simple Pipeline ā€¢ Simple pipeline ā€¢ mv is atomic
  • 20. Donā€™t Move Files ā€¢ SCP across machines is slow/error prone ā€¢ NFS share, single point of failure. ā€¢ Clustered ļ¬le system like GFS (Global File System) can have ā€œfencingā€ issues ā€¢ HDFS shines here. ā€¢ ZooKeeper shines here.
  • 21. Can you test your changes?
  • 22. JVM tuning is black art -verbose:gc -XX:+PrintGCDetails -server -Xmx8G -Xms8G -XX:MaxPermSize=256m -XX:PermSize=256m -XX:+AggressiveHeap -XX:+DisableExplicitGC -XX:ParallelGCThreads=16 -XX:+UseParallelOldGC
  • 23.
  • 25. Telling some stories ā€¢ Prototyping ā€¢ Application Development ā€¢ Maintaining Your Cluster
  • 26. Grab some Data #!/bin/sh SOURCE_SOLR='http:// ec2-107-20-92-190.compute-1.amazonaws.com:8983/solr/ core0/select?q=*%3A*&start=0&rows=500000&wt=csv' TARGET_SOLR=http://localhost:8983/solr/us_patent_grant/ update/csv wget -O output.csv $SOURCE_SOLR curl 'http://localhost:8983/solr/us_patent_grant/update/ csv?skipLines=1&commit=true&optimize=true' --data-binary @output.csv -H 'Content-type:text/plain; charset=utf-8'
  • 27. Using Solr as a key/ value store ā€¢ thousands of queries per second without real time get. http://localhost:8983/solr/run2_enrichment/select? q=id:DOC45242&fl=entities,html ā€¢ ??? with real time get? http://localhost:8983/solr/run2_enrichment/get? id=DOC45242&fl=entities,html
  • 28. Using Solr as key/value store Solr Key/ Value Cache Metadata Ingest Solr Solr Pipeline Solr Solr Content Files
  • 29. Using Solr as key/value store ā€¢ thousands of queries per second without real time get. http://localhost:8983/solr/run2_enrichment/select? q=id:DOC45242&fl=entities,html ā€¢ ??? with real time get? http://localhost:8983/solr/run2_enrichment/get? id=DOC45242&fl=entities,html
  • 30. Push schema deļ¬nition to the application ā€¢ Not ā€œschema lessā€ ā€¢ Just different owner of schema! ā€¢ Schema may have common set of ļ¬elds like id, type, timestamp, version ā€¢ Nothing required. q=intensity_i:[70 TO 0]&fq=TYPE:streetlamp_monitor
  • 31. Donā€™t do expensive things in Solr ā€¢ Tika content extraction aka Solr Cell ā€¢ UpdateRequestProcessorChain
  • 32. Donā€™t do expensive things in Solr ā€¢ Tika content extraction aka Solr Cell ā€¢ UpdateRequestProcessorChain
  • 33. Avro! ā€¢ Supports serialization of data readable from multiple languages ā€¢ Itā€™s smart XML ā€¢ Handles forward and reverse versions of an object ā€¢ Compact and fast to read.
  • 34. Avro! Solr Key/ Value Cache .avro Metadata Ingest Solr Solr Pipeline Solr Solr Content Files
  • 35. No JavaBin /u G te p iv / da e av m r e o! ā€¢ Avoid Jarmaggeddon ā€¢ Reļ¬‚ection? Ugh.
  • 36. No JavaBin Solr Key/ Value Cache Metadata Ingest Solr Solr Pipeline Solr Solr Content Files
  • 37. No JavaBin Solr Key/ Solr 3.4 Value Cache Metadata Ingest Solr Solr Pipeline Solr Solr Content Files
  • 38. No JavaBin Solr Key/ Solr 3.4 Value Cache Metadata Solr 4 Ingest Solr Solr Pipeline Solr Solr Content Files
  • 39. No JavaBin Solr Key/ Solr 3.4 Value Cache Metadata Solr 4 Ingest Solr Solr Pipeline Solr Solr Content Which SolrJ Files version do I use?
  • 40. Telling some stories ā€¢ Prototyping ā€¢ Application Development ā€¢ Maintaining Your Cluster
  • 41. Upgrade Lucene Indexes Easily ā€¢ Donā€™t reindex! ā€¢ Try out new features! David Lyle java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior- commits] [-verbose] indexDir
  • 42. Indexing is Easy and Quick
  • 45. The tension between scale and update rate Bad Place to Be > 100,000,000 < 10,000,000
  • 46. The tension between scale and update rate 10 million Bad Place 100ā€™s of millions
  • 48. Delayed Replication <requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="slave"> <str name="masterUrl">http://localhost:8983/solr/replication</str> <str name="pollInterval">36:00:00</str> </lst> </requestHandler>
  • 49. Enable/Disable <requestHandler name="/admin/ping" class="solr.PingRequestHandler"> <lst name="invariants"> <str name="q">MY HARD QUERY</str> <str name="shards">http://search1.o19s.com:8983/solr/run2_1,http:// search1.o19s.com:8983/solr/run2_2,http://search1.o19s.com:8983/solr/run2_2 </lst> <lst name="defaults"> <str name="echoParams">all</str> </lst> <str name="healthcheckFile">server-enabled.txt</str> </requestHandler>
  • 51. Provisioning ā€¢ Chef/Puppet ā€¢ ZooKeeper ā€¢ Have you versioned everything to build an index?
  • 54. Do I need Failover? ā€¢ Can I build quickly? ā€¢ Do I have a reliable cluster? ā€¢ Am I spread across data centers? ā€¢ Is sooo 90ā€™s....
  • 55. Telling some stories ā€¢ Prototyping ā€¢ Application Development ā€¢ Maintaining Your Cluster
  • 59.
  • 60. Thank you! ā€¢ epugh@o19s.com ā€¢ @dep4b ā€¢ www.opensourceconnections.com

Editor's Notes

  1. \n
  2. Search was the original big data problem. Then Google search came along, and search wandered in the wilderness of internal Enterprise search and ecommerce search. But now search is back, but with a new cooler name &amp;#x201C;Big Data&amp;#x201D;. Search interfaces are the dominant metaphor for working with big data sets by non data scientists.\n
  3. SOLR-284 back in July 07 was a first cut at a content extraction library before Tika came along.\n
  4. \n
  5. And I love Agile development processes. And I think of agile as business -&gt; requirements -&gt; development -&gt; testing -&gt; systems administration\n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. And I don&amp;#x2019;t mean this as a shot against Hadoop, but with the right hardware, you can get a lot done in bash, with a bit of Java or Perl sprinkled in.\nThere is a lot of value in getting started today building large scaled out ingestors.\n
  13. \n
  14. Notice our property style? Made it easy to read in properties in both Bash and Java!\n
  15. Try sharding at different sizes using Mod\nTry sharding by month, or week, or hour depending on your volume of data.\n
  16. \n
  17. \n
  18. We had huge left over &amp;#x201C;enterprise&amp;#x201D; boxes with ginourmous amounts of ram and cpu\n\n
  19. \n
  20. \n
  21. \n
  22. The verbose:gc and +PrintGCDetails lets you grep for the frequency of partial versus full garbage collecitons. We rolled back from 3.4 to 3.1 based on this data on one project.\n
  23. Again, horse racing two slaves can help. You can also pass in the connection information via jconsole command line which makes it easier to monitor a set of Solrs\n
  24. \n
  25. \n
  26. i love working with CSV and Solr. The CSV writer type is great for moving data between solrs. (Don&amp;#x2019;t forget to store everything!)\n
  27. \n
  28. \n
  29. \n
  30. \n
  31. You have many fewer Solrs then you do Indexer processors.\n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. dollar tree makes crap. Stores are always empty or missing items. You don&amp;#x2019;t want your indexing like that. Space shuttle costed 500 MILLIOn dollars to launch it every time. You don&amp;#x2019;t want your indexing process to be like launching the space shuttle.\n
  42. \n
  43. \n
  44. \n
  45. runs every hour.\nLooks at log files to determine if a solr cluster is misbehaving.\n
  46. Hal 9000 misbheaved\nruns every hour.\nLooks at log files to determine if a solr cluster is misbehaving.\nEspecially if you are on cloud platform. They implement their servers on the cheapest commodity hardware \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. Kaa the snake from The Jungle Book hynotizing Mowgli. \nDanah Boyd among others have said that Big Data sometimes throws out thousands of years \n\n
  58. \n
  59. Nathan Marz\n
  60. \n