SlideShare a Scribd company logo
1 of 16
tl;dr: Solr




     
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, 
                          pours them into the basin, and examines them at one's leisure. It becomes 
                          easier to spot patterns and links, you understand, when they are in this form."
    Harry:           "You mean... that stuff's your thoughts?"
    Dumbledore: "Certainly."




                                   
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, 
                          pours them into the basin, and examines them at one's leisure. It becomes 
                          easier to spot patterns and links, you understand, when they are in this form."
    Harry:           "You mean... that stuff's your thoughts?"
    Dumbledore: "Certainly."




                                   
Solr is Lucene­based
    
        Lucene = text search engine library written in Java
    
        All kinds of crazy goodies:
        
          Ranked search
        
          Multiple indexing
        
          Simultaneous read & write
        
          Date­range search
        
          ...the list goes on
    
        Platform­independent (thanks, Java!)
    
        Fast & efficient
          
             Index size ~= 20­30% size of indexed data
          
             Very high throughput indexing (95GB/hour)




                             
Solr is NoSQL
    
        NoSQL == Non­relational database
    
        RDBMS metaphor:
        
          One database
        
          One table
        
          Denormalized data
        
          Query parameters instead of SQL
        
          “Documents” instead of rows
    
        Bottom line: it's a persistent datastore, and we use it to store data 
        persistently.




                              
Vocabulary
    
      Master
    
      Slave
    
      Replication
    
      Document
    
      API




                     
Master
    
      There can be only one
    
      Read & write operations
    
      Must be secure
    
      Younger, stronger brother of production DB
    
      Home base for Solr slaves




                    
Slave
    
      There are many copies
    
      They have a plan: replication
    
      Read­only
    
      Gets copy of index from the Solr master every k 
      minutes
    
      Responds to queries  




                    
Replication
    
      Slaves –­HTTP GET­­> Master
    
      Replication is differential
    
      Configuration is set in solrconfig.xml
    
      http://tinyurl.com/DESolrRepl




                     
Document
    
      RDBMS = row; Solr = document
    
      Denormalized relational data




                        my friend,





    Flatten a bunch of related RDBMS rows into a 
    single Solr document
                   
API
    
      Application programming interface
    
      Primary means of communicating with Solr is an 
      HTTP API




                    
The Good Stuff:
                    Unix & Diagnostics
                       “This  is  the Unix  philosophy:  Write programs  that 
                       do  one  thing  and  do  it  well.  Write  programs  to 
                       work  together.  Write  programs  to  handle  text 
                       streams, because that is a universal interface.” 
                                                               ­ Doug McIlroy


    
        Examples of things beyond the scope of this talk:
        
          Cat
        
          Awk
        
          Grep
        
          Sed
        
          Cut
        
          Wc
        
          Sort
        
          Tail
        
          Head
    
        Great read: http://matt.might.net/articles/sql­in­the­shell/


                                
The Good Stuff:
                      Unix & Diagnostics
    
        You cannot effectively troubleshoot without parsing logs
    
        You cannot effectively parse logs without good text­parsing tools:
        
          Cat
        
          Awk
        
          Grep
        
          Sed
        
          Cut
        
          Wc
        
          Sort
        
          Tail
        
          Head
    
        No *nix OS? PowerShell!




                                
The Good Stuff:
                   Unix & Diagnostics
    
        Example commands:
        
          tail -f /var/log/celery/project.log
          
            Output the Celery log to stdout, in real time
        
          cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]
          {0,5})'|grep -oE '[0-9]{0,5}'|sort --unique
          
            Parse the Celery log, printing a list of unique BUIDs
        
          cat /ebs2/log/celery/project.log|grep -B 15
          "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]
          {1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'
          
            Parse the Celery log, outputting a list of BUID the feed
            file for which failed for some reason:




                            
Conclusion
    
        RTFreakingM
        
           http://wiki.apache.org/solr/SolrQuerySyntax
        
           http://wiki.apache.org/solr/SolrCaching
        
           http://wiki.apache.org/solr/SchemaXml
        
           http://django­haystack.readthedocs.org/en/latest/
    
        Experiment & tinker & reinvent the wheel
    
        Get comfortable with the command line – you can't effectively administer Solr 
         (or any sufficiently complex system) with a web GUI
    
        Read the logs
    
        Connect Solr behavior to application operations




                                
     

More Related Content

Similar to Tldr solr-courseload

Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Espen Brækken
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 reportKoji Kawamura
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosEuangelos Linardos
 
Solr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your appSolr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your appNick Zadrozny
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...Saurabh Nanda
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by AccidentGleicon Moraes
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 
Exploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeExploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeFrancis Alexander
 
Bash shell programming in linux
Bash shell programming in linuxBash shell programming in linux
Bash shell programming in linuxNorberto Angulo
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 

Similar to Tldr solr-courseload (20)

Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 
Solr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your appSolr Powr — Enterprise-grade search for your app
Solr Powr — Enterprise-grade search for your app
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Ruby on rails for beginers
Ruby on rails for beginersRuby on rails for beginers
Ruby on rails for beginers
 
Play framework
Play frameworkPlay framework
Play framework
 
MongoDB is the MashupDB
MongoDB is the MashupDBMongoDB is the MashupDB
MongoDB is the MashupDB
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Exploiting NoSQL Like Never Before
Exploiting NoSQL Like Never BeforeExploiting NoSQL Like Never Before
Exploiting NoSQL Like Never Before
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Bash shell programming in linux
Bash shell programming in linuxBash shell programming in linux
Bash shell programming in linux
 
Ruby On Rails
Ruby On RailsRuby On Rails
Ruby On Rails
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 

Tldr solr-courseload

  • 2. Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind,                        pours them into the basin, and examines them at one's leisure. It becomes                        easier to spot patterns and links, you understand, when they are in this form." Harry:           "You mean... that stuff's your thoughts?" Dumbledore: "Certainly."    
  • 3. Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind,                        pours them into the basin, and examines them at one's leisure. It becomes                        easier to spot patterns and links, you understand, when they are in this form." Harry:           "You mean... that stuff's your thoughts?" Dumbledore: "Certainly."    
  • 4. Solr is Lucene­based  Lucene = text search engine library written in Java  All kinds of crazy goodies:  Ranked search  Multiple indexing  Simultaneous read & write  Date­range search  ...the list goes on  Platform­independent (thanks, Java!)  Fast & efficient  Index size ~= 20­30% size of indexed data  Very high throughput indexing (95GB/hour)    
  • 5. Solr is NoSQL  NoSQL == Non­relational database  RDBMS metaphor:  One database  One table  Denormalized data  Query parameters instead of SQL  “Documents” instead of rows  Bottom line: it's a persistent datastore, and we use it to store data  persistently.    
  • 6. Vocabulary  Master  Slave  Replication  Document  API    
  • 7. Master  There can be only one  Read & write operations  Must be secure  Younger, stronger brother of production DB  Home base for Solr slaves    
  • 8. Slave  There are many copies  They have a plan: replication  Read­only  Gets copy of index from the Solr master every k  minutes  Responds to queries      
  • 9. Replication  Slaves –­HTTP GET­­> Master  Replication is differential  Configuration is set in solrconfig.xml  http://tinyurl.com/DESolrRepl    
  • 10. Document  RDBMS = row; Solr = document  Denormalized relational data my friend,  Flatten a bunch of related RDBMS rows into a  single Solr document    
  • 11. API  Application programming interface  Primary means of communicating with Solr is an  HTTP API    
  • 12. The Good Stuff: Unix & Diagnostics “This  is  the Unix  philosophy:  Write programs  that  do  one  thing  and  do  it  well.  Write  programs  to  work  together.  Write  programs  to  handle  text  streams, because that is a universal interface.”  ­ Doug McIlroy  Examples of things beyond the scope of this talk:  Cat  Awk  Grep  Sed  Cut  Wc  Sort  Tail  Head  Great read: http://matt.might.net/articles/sql­in­the­shell/    
  • 13. The Good Stuff: Unix & Diagnostics  You cannot effectively troubleshoot without parsing logs  You cannot effectively parse logs without good text­parsing tools:  Cat  Awk  Grep  Sed  Cut  Wc  Sort  Tail  Head  No *nix OS? PowerShell!    
  • 14. The Good Stuff: Unix & Diagnostics  Example commands:  tail -f /var/log/celery/project.log  Output the Celery log to stdout, in real time  cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9] {0,5})'|grep -oE '[0-9]{0,5}'|sort --unique  Parse the Celery log, printing a list of unique BUIDs  cat /ebs2/log/celery/project.log|grep -B 15 "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9] {1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'  Parse the Celery log, outputting a list of BUID the feed file for which failed for some reason:    
  • 15. Conclusion  RTFreakingM  http://wiki.apache.org/solr/SolrQuerySyntax  http://wiki.apache.org/solr/SolrCaching  http://wiki.apache.org/solr/SchemaXml  http://django­haystack.readthedocs.org/en/latest/  Experiment & tinker & reinvent the wheel  Get comfortable with the command line – you can't effectively administer Solr   (or any sufficiently complex system) with a web GUI  Read the logs  Connect Solr behavior to application operations    
  • 16.