Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Apache Solr


Published on

Introduction to Solr, presented at Bangkok meetup in April 2014:

Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).

Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.

Published in: Education, Technology

Introduction to Apache Solr

  1. 1. Introduction to Apache Solr Software is eating the world" The search is eating the software April 2014
  2. 2. 2 Alexandre Rafalovitch
  3. 3. Web search engines ! are quite sophisticated 3
  4. 4. 4
  5. 5. But the real search needs ! are! much DEEPER and BROADER 5
  6. 6. Searching code 6
  7. 7. Searching people and companies 7
  8. 8. Searching products 8
  9. 9. Searching library material 9
  10. 10. Searching languages 10
  11. 11. Understanding full-text search SELECT * 
 FROM database
 WHERE field LIKE ‘%word%’" This DOES NOT Scale" Instead: " break text into tokens" domain-specific processing (e.g. lower-casing)" build fast-access structures" algorithms for term, phrases, proximity search 11
  12. 12. Basic search engine features Search (Duh!): keyword, phrase, field-specific" Positive and negative terms" Sort: relevancy, recency" Pagination" Compact summary in results" SPEED 12
  13. 13. Advanced search engine features Facets/Taxonomy - based navigation with live counts" Language-specific processing" Domain-specific text processing (WiFi = Wi-Fi = WIFI)" Geographic search" More-like-this, did-you-mean, autocomplete" Scaling/Clustering" NOT web crawling - different, but related 13
  14. 14. Search engine solutions? Solr" Elastic Search" Xapian" Sphinx" Zoie" Groonga" Searchdaimon" {F}lexSearch" Algolia (SaaS)" Searchify (SaaS)" ForageJS" Lunr.js" FACT-Finder" DtSearch" MarkLogic" Verity" Fast" Most databases" ! ! …AND MORE 14
  15. 15. Used with permission from SemaText Open Source Search Evolution 15
  16. 16. Secret Ingredient - Lucene Solr" Elastic Search" Zoie" SwiftType" PyLucene (Python wrapper)" (C# port) Scalable, high-performance indexing" Incremental indexing" Full-text search" Information-Retrieval algorithms" Implemented in Java" Written in 1999, still going strong 16
  17. 17. Secret Ingredient - Solr Certified distributions" LucidWorks" HelioSearch" Big Data platforms" Cloudera" Hortonworks HDP" Hosted and SaaS" Amazon CloudSearch" WebSolr, SolrHQ, SearchBox Lucene full-text-search" XML and REST config" Schema/Schemaless" SolrCloud (clustering)" Caching" Near real-time" Rich-document indexing (Tika inside)" Plugins, components, processors 17
  18. 18. Solr Ecosystem sample Drupal" Project Blacklight" LuxDB" SolrMeter" CrafterCMS" Typo3" Magenta" HippoCMS" ColdFusion" SolrNet" DataStax" Dovecot" NGData Lily" Basho Riak" YaCy" Apache ManifoldCF" Apache Camel" FranzAllegrograph" BitNami Solr Stack" Carrot2! Broadleaf Commerce" Cloudera CDK! CodeLibs Fess (フェス)! Splunk" Alfresco" Rosette by BasisTech! Luwak by Flax! Quepid by OSC! TwigKit! SPM by SemaText! SILK by LucidWorks! Banana (O/S Solr Kibana) 18
  19. 19. DEMO Time 19
  20. 20. DEMO - Basic Unzip" Go to example directory" Run Solr" Import some documents from example docs" grep -l store *.xml | xargs ./" Show off Solr 4 admin panel 20
  21. 21. DEMO - Browse handler Restart Solr with -Dsolr.clustering.enabled=true" Visit http://localhost:8983/solr/browse/ " Show off" Search" Facets - Categories and Ranges" Spatial/Geo-distance" Clusters 21
  22. 22. DEMO - Thai specific Index Thai and English text" Search in English, Thai,Auto-transliterated Thai" ShowAnalysis screen" Code at: 22
  23. 23. Getting into Solr 23
  24. 24. Start for free Download, unzip, cd example; java -jar start.jar" Go through basic tutorial in docs/tutorial.html" Copy example directory, modify schema.xml until happy" If coming from ElasticSearch, look at example-schemaless" Do NOT follow this path to production" example schema is a kitchen sink !!! 24
  25. 25. Accelerate your learning Buy my book - seriously. That’s what it’s for" All code/data is at: " Buy Solr InAction - just published and is a great reference" Use my resource and join the mailing list" Join solr-user mailing list - full of advanced hackers" Watch Lucid Revolution videos for background" Start helping out on Stack Overflow #solr" Blog what you learned, twit with #Solr 25
  26. 26. Pick a project - make it happen Solr + Dart => Better search experience for Dart packages" Solr consultants discovery website" Visualise Solr search request - step by step" Solr + your language => is client library up to date?" ToDoMVC for Solr clients" Package LARGE dataset for others (e.g. Project Gutenberg)" Rebuild Esperanto dictionary with Solr backend 26
  27. 27. With Solr, how far can I go? Cloudera (BigData) has > 1,000,000,000 $USD investments - opportunities?" 8M+ searches/day, 40 languages, 100ms NRT, 1024 cores, 256 shards, 32 servers on #solr at Bloomberg 1jmG72G (via @FlaxSearch) 27
  28. 28. Other Search-related books Designing the Search Experience: The Information Architecture of Discovery - by a TwigKit creator +1" SearchAnalytics for Your Site: Conversations with Your Customers by Louis Rosenfeld - see also Quepid" Enterprise Search by Martin White 28
  29. 29. 29 Alexandre Rafalovitch