SlideShare a Scribd company logo
Solr 4 Highlights
                         Mark Miller, LucidImagination.com
                             markrmiller@gmail.com




Friday, October 21, 11
What I Will Cover
               What are some cool features coming in Solr
                4?
               Mark Miller - Lucene/Solr committer, PMC
                member, Apache member, blah blah blah
               DirectSolrSpellChecker
               NRT
               Realtime Get
               SolrCloud - search side
               SolrCloud - indexing side (WIP)


                                   4



Friday, October 21, 11
DirectSolrSpellchecker

        New spellcheck implementation that works with
         the new automaton code.
        A practical benefit of this spellchecker is that it
         requires no additional data structures (neither in
         RAM nor on disk) to do its work.
        Also, no build phase - suggestion are always up
         to date.




                                     7



Friday, October 21, 11
First Came...

  LUCENE-1606, LUCENE-2089:

  Adds AutomatonQuery, a MultiTermQuery
  that matches terms against a finite-state
  machine. Implement WildcardQuery and
  FuzzyQuery with finite-state methods.
  Adds RegexpQuery.

    (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)

                                 4



Friday, October 21, 11
Leading too...

  LUCENE-2507, SOLR-2571, SOLR-2576:


  Added DirectSolrSpellChecker, which
  uses Lucene's DirectSpellChecker to
  retrieve correction candidates directly
  from the term dictionary using levenshtein
  automata.

  (James Dyer, rmuir)

                                 5



Friday, October 21, 11
Automaton




                              6



Friday, October 21, 11
SolrConfig


     <searchComponent name="spellcheck" class="solr.SpellCheckComponent">



          <!-- a spellchecker built from a field of the main index -->
          <lst name="spellchecker">
            <str name="name">default</str>
            <str name="field">name</str>
            <str name="classname">solr.DirectSolrSpellChecker</str>




                                          7



Friday, October 21, 11
Realtime-Get

   SOLR-2656:

   realtime-get, efficiently retrieves the latest
   stored fields for specified documents,
   even if they are not yet searchable (i.e.
   without reopening a searcher)
     (yonik)



                               8



Friday, October 21, 11
Realtime Get
       <updateLog class="solr.FSUpdateLog">
            <str name="dir">${solr.data.dir:}</str>
       </updateLog>




      <!-- realtime get handler, guaranteed to return the latest stored fields of
           any document, without the need to commit or open a new searcher. The
           current implementation relies on the updateLog feature being enabled.
    -->
      <requestHandler name="/get" class="solr.RealTimeGetHandler">
         <lst name="defaults">
           <str name="omitHeader">true</str>
         </lst>
      </requestHandler>




                                           10



Friday, October 21, 11
Realtime Get

      curl 'http://localhost:8983/solr/update/json?commitWithin=10000000'
      -H 'Content-type:application/json' -d '
      [{
         "id" : "mydoc",
         "title" : "realtime-get test!"
      }]'



      http://localhost:8983/solr/get?ids=mydoc,mydoc

      http://localhost:8983/solr/get?id=mydoc&id=mydoc




      {"doc":{"id":"mydoc","title":["realtime-get test!"]}}




                                                10



Friday, October 21, 11
NRT
                         UpdateHandler Re-Architecture

                  Fix commit waits for background merges
                  Fix reloaded core does not share
                   IndexWriter
                  Remove unnecessary locking
                  Add NRT
                  SoftCommit
                  SoftAutoCommit


                                      11



Friday, October 21, 11
Trying to do NRT


           Commit - 1 second.
           Commit - 0.5 seconds.
           Commit - 1.2 seconds.
           Commit - 1.5 minutes. BACKGROUND MERGE
           Commit - 0.9 seconds.




                                 12



Friday, October 21, 11
Previously....


                         SolrCore   Reload ->   SolrCore




                          IW                     IW


          Usually worked because IW is lazy instantiated...but
          overlap is possible. What happens when
          continuously feeding Solr and you do a core reload?
          IW Lock timeout!


                                         13



Friday, October 21, 11
Solution

                         SolrCore    Reload ->   SolrCore




                                       IW




                 Reloaded SolrCore now shares a State object



                                            14



Friday, October 21, 11
NRT AutoCommit

             <updateHandler class="solr.DirectUpdateHandler2">

                         <autoCommit>
                           <maxTime>60000</maxTime>
                         </autoCommit>

                         <autoSoftCommit>
                           <maxTime>1000</maxTime>
                         </autoSoftCommit>




                                             15



Friday, October 21, 11
SolrCloud - Search Side




                                     16



Friday, October 21, 11
SolrCloud - Search Side

     SOLR-1873:

     SolrCloud - added shared/central config
     and core/shard management via
     zookeeper, built-in load balancing, and
     infrastructure for future SolrCloud work.
     (yonik, Mark Miller)



                                     17



Friday, October 21, 11
• /solr
                         ZooKeeper Layout
  /solr/configs
   /solr/configs/conf1
    /solr/configs/conf1/mapping-ISOLatin1Accent.txt
    DATA: ...supressed...
    /solr/configs/conf1/protwords.txt
    DATA: ...supressed...
    /solr/configs/conf1/old_synonyms.txt          Centralized Configuration
    DATA: ...supressed...
    /solr/configs/conf1/solrconfig.xml
    DATA: ...supressed...
    /solr/configs/conf1/stopwords.txt
    DATA: ...supressed...
    /solr/configs/conf1/schema.xml
                                                  Cluster Composition
    DATA: ...supressed...

 /solr/collections/collection1/shards
  /solr/collections/collection1/shards/shard4/localhost:50270_solr_
   DATA:
       node_name=localhost:50270_solr
       url=http://localhost:50270/solr/

                                         18



Friday, October 21, 11
Improvements Coming...
        Modeling the cluster as ZooKeeper nodes is
         cool and convenient. But reading the cluster is
         costly - many requests.
        This can be mitigated with an incremental
         update mode.
        But that makes it difficult to monitor for on the
         fly property changes...too many ZooKeeper
         Watches needed.
        Solution - store the cluster state as the data for
         a single node.
        Downsides: 1MB max data size on zk node.
         Possibly less convenient.
                                19



Friday, October 21, 11
Ephemeral Nodes


       •      /solr/live_nodes
               /solr/live_nodes/localhost:50272_solr
               /solr/live_nodes/localhost:50268_solr
               /solr/live_nodes/localhost:50274_solr
               /solr/live_nodes/localhost:50270_solr
               /solr/live_nodes/localhost:50266_solr

               Nodes know which other nodes are down. There is
               some lag - nodes also manage their own list of good
               nodes to handle the lag time.

                                       20



Friday, October 21, 11
Getting Started
   http://wiki.apache.org/solr/SolrCloud




                                  21



Friday, October 21, 11
SolrCloud - Index Side
                                     Goals

                             Distributed Indexing
                             Fault Tolerant
                             Elastic
                             Durable




                                          22



Friday, October 21, 11
Current Vision
        You figure out you need 20 servers at least to
         serve your index.
        You specify 20 shards to Solr and launch 60
         servers with minimum configuration needed.
        This gives you 20 shards, each with 3 replicas.
        You begin indexing to any of the 60 servers and
         documents are pushed around to where they
         belong and are visible to search within seconds.
        If servers die, you don’t lose data (unless a full
         shard goes down and the data on each server
         is unrecoverable).

                                23



Friday, October 21, 11
SolrCloud - Index Side
                  What’s Being worked on this very moment?


                  Realtime Get
                  Transaction Log
                  Versions
                  Leader per Shard
                  Auto Assign Nodes to Shards
                  Distributed IDF



                                       24



Friday, October 21, 11
Versions
        If two updates to a document are issues around
         the same time, how do we know which
         document should be the latest? More important,
         how can we be sure the same document ‘wins’
         on each replica.
        Lamport Clocks, Vector Clocks, a Leader per
         shard?




                              25



Friday, October 21, 11
Leader Per Shard
        Use ZooKeeper for Leader election.
        Leader can easily version documents.
        Leader is the true record - recovery.
        If the Leader goes down, a replica takes it’s
         place.
        Updates briefly block or are buffered with
         transaction log.




                                 26



Friday, October 21, 11
Auto Assign Shards


                           Collection lock?
                           Optimistic lock?
                           Overseer?




                                      27



Friday, October 21, 11
Cluster

                         Shard1 Shard2   Shard3

              Leaders



              Replicas




                                   28



Friday, October 21, 11
Wrap Up
        These are only a few features - there are many
         more already, and more coming.
        Pseudo Join, Pivot Faceting, Pseudo Fields,
         Per Segment Faceting, Configure Codec Per
         Filed, Use Alternate Scoring Algorithms...

        When? Your guess is as good as mine ;) I hope
         sometime early next year.




                               11



Friday, October 21, 11
Sources
        Links

              • http://wiki.apache.org/solr/
                SpellCheckComponent
              • http://wiki.apache.org/solr/NearRealtimeSearch
              • http://www.lucidimagination.com/blog/
                2011/09/07/realtime-get/




                                     13



Friday, October 21, 11
Contact
        Mark Miller
              • markrmiller@gmail.com /
                mark.miller@lucidimagination.com
              • http://www.lucidimagination.com/blog/author/
                markmiller/
              • http://twitter.com/heismark




                                     14



Friday, October 21, 11

More Related Content

What's hot

Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
Oracle linux kube
Oracle linux kubeOracle linux kube
Oracle linux kube
Ahmed Mekawy
 
Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响
maclean liu
 
Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019
Unity Technologies
 
A MySQL Odyssey - A Blackhole Crossover
A MySQL Odyssey - A Blackhole CrossoverA MySQL Odyssey - A Blackhole Crossover
A MySQL Odyssey - A Blackhole Crossover
Keith Hollman
 
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...InSync2011
 
Where Did My Cpu Go?
Where Did My Cpu Go?Where Did My Cpu Go?
Where Did My Cpu Go?Enkitec
 
Server startup
Server startupServer startup
Server startup
reina200305
 
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
mwedgwood
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
Richard Lister
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSE
SUSE Labs Taipei
 
New em12c kscope
New em12c kscopeNew em12c kscope
New em12c kscope
Kellyn Pot'Vin-Gorman
 
Enterprise managerclodcontrolinstallconfiguration emc12c
Enterprise managerclodcontrolinstallconfiguration emc12cEnterprise managerclodcontrolinstallconfiguration emc12c
Enterprise managerclodcontrolinstallconfiguration emc12c
Rakesh Gujjarlapudi
 
Oracle upgrade
Oracle upgradeOracle upgrade
Oracle upgrade
Raj p
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
Yoshiro Tokumasu
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
Arno Huetter
 

What's hot (18)

Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
Oracle linux kube
Oracle linux kubeOracle linux kube
Oracle linux kube
 
Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响
 
Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019Using Entity Command Buffers – Unite Copenhagen 2019
Using Entity Command Buffers – Unite Copenhagen 2019
 
A MySQL Odyssey - A Blackhole Crossover
A MySQL Odyssey - A Blackhole CrossoverA MySQL Odyssey - A Blackhole Crossover
A MySQL Odyssey - A Blackhole Crossover
 
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
Database & Technology 1 _ Clancy Bufton _ Flashback Query - oracle total reca...
 
Where Did My Cpu Go?
Where Did My Cpu Go?Where Did My Cpu Go?
Where Did My Cpu Go?
 
Server startup
Server startupServer startup
Server startup
 
Linux Du Jour
Linux Du JourLinux Du Jour
Linux Du Jour
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSE
 
New em12c kscope
New em12c kscopeNew em12c kscope
New em12c kscope
 
Enterprise managerclodcontrolinstallconfiguration emc12c
Enterprise managerclodcontrolinstallconfiguration emc12cEnterprise managerclodcontrolinstallconfiguration emc12c
Enterprise managerclodcontrolinstallconfiguration emc12c
 
Oracle upgrade
Oracle upgradeOracle upgrade
Oracle upgrade
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
 

Similar to Solr 4 highlights - Mark Miller

How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
Marco Vigelini
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Lucidworks (Archived)
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
lucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Rafał Kuć
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Osama Mustafa
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
Lucidworks
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovationsLucidworks (Archived)
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
thelabdude
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
searchbox-com
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
Cominvent AS
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
Alex Zaballa
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
DataStax Academy
 
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Understanding Oracle RAC 12c Internals OOW13 [CON8806]Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Markus Michalewicz
 
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASMRACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
Maaz Anjum
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
Cominvent AS
 
How to operate containerized OpenStack
How to operate containerized OpenStackHow to operate containerized OpenStack
How to operate containerized OpenStack
Nalee Jang
 

Similar to Solr 4 highlights - Mark Miller (20)

How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
Oracle Enterprise Manager Cloud Control 12c: how to solve 'ERROR: NMO Not Set...
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
Step by Step to Install oracle grid 11.2.0.3 on solaris 11.1
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Understanding Oracle RAC 12c Internals OOW13 [CON8806]Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
 
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASMRACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
RACATTACK Lab Handbook - Enable Flex Cluster and Flex ASM
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
How to operate containerized OpenStack
How to operate containerized OpenStackHow to operate containerized OpenStack
How to operate containerized OpenStack
 
les09.pdf
les09.pdfles09.pdf
les09.pdf
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
lucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
lucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
lucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
lucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
lucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
lucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
lucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
lucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Solr 4 highlights - Mark Miller

  • 1. Solr 4 Highlights Mark Miller, LucidImagination.com markrmiller@gmail.com Friday, October 21, 11
  • 2. What I Will Cover  What are some cool features coming in Solr 4?  Mark Miller - Lucene/Solr committer, PMC member, Apache member, blah blah blah  DirectSolrSpellChecker  NRT  Realtime Get  SolrCloud - search side  SolrCloud - indexing side (WIP) 4 Friday, October 21, 11
  • 3. DirectSolrSpellchecker  New spellcheck implementation that works with the new automaton code.  A practical benefit of this spellchecker is that it requires no additional data structures (neither in RAM nor on disk) to do its work.  Also, no build phase - suggestion are always up to date. 7 Friday, October 21, 11
  • 4. First Came... LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that matches terms against a finite-state machine. Implement WildcardQuery and FuzzyQuery with finite-state methods. Adds RegexpQuery. (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller) 4 Friday, October 21, 11
  • 5. Leading too... LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's DirectSpellChecker to retrieve correction candidates directly from the term dictionary using levenshtein automata. (James Dyer, rmuir) 5 Friday, October 21, 11
  • 6. Automaton 6 Friday, October 21, 11
  • 7. SolrConfig <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <!-- a spellchecker built from a field of the main index --> <lst name="spellchecker"> <str name="name">default</str> <str name="field">name</str> <str name="classname">solr.DirectSolrSpellChecker</str> 7 Friday, October 21, 11
  • 8. Realtime-Get SOLR-2656: realtime-get, efficiently retrieves the latest stored fields for specified documents, even if they are not yet searchable (i.e. without reopening a searcher) (yonik) 8 Friday, October 21, 11
  • 9. Realtime Get <updateLog class="solr.FSUpdateLog"> <str name="dir">${solr.data.dir:}</str> </updateLog> <!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled. --> <requestHandler name="/get" class="solr.RealTimeGetHandler"> <lst name="defaults"> <str name="omitHeader">true</str> </lst> </requestHandler> 10 Friday, October 21, 11
  • 10. Realtime Get curl 'http://localhost:8983/solr/update/json?commitWithin=10000000' -H 'Content-type:application/json' -d ' [{ "id" : "mydoc", "title" : "realtime-get test!" }]' http://localhost:8983/solr/get?ids=mydoc,mydoc http://localhost:8983/solr/get?id=mydoc&id=mydoc {"doc":{"id":"mydoc","title":["realtime-get test!"]}} 10 Friday, October 21, 11
  • 11. NRT UpdateHandler Re-Architecture  Fix commit waits for background merges  Fix reloaded core does not share IndexWriter  Remove unnecessary locking  Add NRT  SoftCommit  SoftAutoCommit 11 Friday, October 21, 11
  • 12. Trying to do NRT  Commit - 1 second.  Commit - 0.5 seconds.  Commit - 1.2 seconds.  Commit - 1.5 minutes. BACKGROUND MERGE  Commit - 0.9 seconds. 12 Friday, October 21, 11
  • 13. Previously.... SolrCore Reload -> SolrCore IW IW Usually worked because IW is lazy instantiated...but overlap is possible. What happens when continuously feeding Solr and you do a core reload? IW Lock timeout! 13 Friday, October 21, 11
  • 14. Solution SolrCore Reload -> SolrCore IW Reloaded SolrCore now shares a State object 14 Friday, October 21, 11
  • 15. NRT AutoCommit <updateHandler class="solr.DirectUpdateHandler2"> <autoCommit> <maxTime>60000</maxTime> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit> 15 Friday, October 21, 11
  • 16. SolrCloud - Search Side 16 Friday, October 21, 11
  • 17. SolrCloud - Search Side SOLR-1873: SolrCloud - added shared/central config and core/shard management via zookeeper, built-in load balancing, and infrastructure for future SolrCloud work. (yonik, Mark Miller) 17 Friday, October 21, 11
  • 18. • /solr ZooKeeper Layout /solr/configs /solr/configs/conf1 /solr/configs/conf1/mapping-ISOLatin1Accent.txt DATA: ...supressed... /solr/configs/conf1/protwords.txt DATA: ...supressed... /solr/configs/conf1/old_synonyms.txt Centralized Configuration DATA: ...supressed... /solr/configs/conf1/solrconfig.xml DATA: ...supressed... /solr/configs/conf1/stopwords.txt DATA: ...supressed... /solr/configs/conf1/schema.xml Cluster Composition DATA: ...supressed... /solr/collections/collection1/shards /solr/collections/collection1/shards/shard4/localhost:50270_solr_ DATA: node_name=localhost:50270_solr url=http://localhost:50270/solr/ 18 Friday, October 21, 11
  • 19. Improvements Coming...  Modeling the cluster as ZooKeeper nodes is cool and convenient. But reading the cluster is costly - many requests.  This can be mitigated with an incremental update mode.  But that makes it difficult to monitor for on the fly property changes...too many ZooKeeper Watches needed.  Solution - store the cluster state as the data for a single node.  Downsides: 1MB max data size on zk node. Possibly less convenient. 19 Friday, October 21, 11
  • 20. Ephemeral Nodes • /solr/live_nodes /solr/live_nodes/localhost:50272_solr /solr/live_nodes/localhost:50268_solr /solr/live_nodes/localhost:50274_solr /solr/live_nodes/localhost:50270_solr /solr/live_nodes/localhost:50266_solr Nodes know which other nodes are down. There is some lag - nodes also manage their own list of good nodes to handle the lag time. 20 Friday, October 21, 11
  • 21. Getting Started http://wiki.apache.org/solr/SolrCloud 21 Friday, October 21, 11
  • 22. SolrCloud - Index Side Goals  Distributed Indexing  Fault Tolerant  Elastic  Durable 22 Friday, October 21, 11
  • 23. Current Vision  You figure out you need 20 servers at least to serve your index.  You specify 20 shards to Solr and launch 60 servers with minimum configuration needed.  This gives you 20 shards, each with 3 replicas.  You begin indexing to any of the 60 servers and documents are pushed around to where they belong and are visible to search within seconds.  If servers die, you don’t lose data (unless a full shard goes down and the data on each server is unrecoverable). 23 Friday, October 21, 11
  • 24. SolrCloud - Index Side What’s Being worked on this very moment?  Realtime Get  Transaction Log  Versions  Leader per Shard  Auto Assign Nodes to Shards  Distributed IDF 24 Friday, October 21, 11
  • 25. Versions  If two updates to a document are issues around the same time, how do we know which document should be the latest? More important, how can we be sure the same document ‘wins’ on each replica.  Lamport Clocks, Vector Clocks, a Leader per shard? 25 Friday, October 21, 11
  • 26. Leader Per Shard  Use ZooKeeper for Leader election.  Leader can easily version documents.  Leader is the true record - recovery.  If the Leader goes down, a replica takes it’s place.  Updates briefly block or are buffered with transaction log. 26 Friday, October 21, 11
  • 27. Auto Assign Shards  Collection lock?  Optimistic lock?  Overseer? 27 Friday, October 21, 11
  • 28. Cluster Shard1 Shard2 Shard3 Leaders Replicas 28 Friday, October 21, 11
  • 29. Wrap Up  These are only a few features - there are many more already, and more coming.  Pseudo Join, Pivot Faceting, Pseudo Fields, Per Segment Faceting, Configure Codec Per Filed, Use Alternate Scoring Algorithms...  When? Your guess is as good as mine ;) I hope sometime early next year. 11 Friday, October 21, 11
  • 30. Sources  Links • http://wiki.apache.org/solr/ SpellCheckComponent • http://wiki.apache.org/solr/NearRealtimeSearch • http://www.lucidimagination.com/blog/ 2011/09/07/realtime-get/ 13 Friday, October 21, 11
  • 31. Contact  Mark Miller • markrmiller@gmail.com / mark.miller@lucidimagination.com • http://www.lucidimagination.com/blog/author/ markmiller/ • http://twitter.com/heismark 14 Friday, October 21, 11