Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keynote Yonik Seeley & Steve Rowe lucene solr roadmap


Published on

Published in: Education, Technology
  • Be the first to comment

Keynote Yonik Seeley & Steve Rowe lucene solr roadmap

  1. 1. Lucene RoadmapSteve RoweLucidWorks
  2. 2. • 1997: Doug Cutting creates Lucene• 2000-2001: SourceForge hosts Lucene• 2001-present: Lucene @ Apache Software Foundation• 2006: Flexible indexing planning starts• 2007: Solr graduates from the Apache Incubator to join the Lucene PMC as a sub-project• 2008: Flexible indexing implementation begins• 2010: Lucene and Solr development merge• 2011: Lucene and Solr 3.1 and all further releases coordinated (13 joint releases so far)• 2012: Lucene/Solr 4.0 releasedSome Lucene (& Solr) History & Stats
  3. 3. Lucene 4.0 Highlights• Flexible indexing: pluggable codecs: index format suites• Flexible scoring: more index stats & similarities that use them• Faster multithreaded indexing via concurrent flushing: DWPT• Doc Values: typed single-valued fields: flexible sorting, scoring• Norms are now doc values: you can have more than one byte!• More RAM efficient data structures, e.g. terms dict/idx & fieldcache• Faster search filtering• Merge I/O can be rate-limited, to reduce I/O contention• IndexReader is now per-segment• Completely reworked spatial search
  4. 4. Lucene 4.1 & 4.2 Highlights• Seeks on writing out index files eliminated• Compressed stored fields and term vectors• AnalyzingSuggester and FuzzySuggester• Lucene facet module improvements: speedups, NRTsupport, DrillSideways• PostingsHighlighter: uses postings offsets• CommonTermsQuery: speed up queries with very highlyfrequent terms.• Doc Values API and performance improvements• The FST package supports FSTs over 2GB in size• LiveFieldValues: real-time get for Lucene• New classification module
  5. 5. Lucene 4.3 Highlights• minShouldMatch BooleanQuery major performanceimprovement• SortingAtomicReader and SortingMergePolicy• DocIdSetIterator and Scorer now has a cost API• Analyzing/FuzzySuggester now enable recording anarbitrary byte[] as a payload• Spatial module: support for query relations Within,Contains, and Disjoint• Facet module: new method computes facet countsusing SortedSetDocValuesField, without a separatetaxonomy index.
  6. 6. On the horizon• More efficient positional queries• Incremental field updates• Korean Analyzer
  7. 7. Solr Dev/User Survey Results
  8. 8. Solr Developer/User survey, April 2013• Survey invitation emailed to 4,136 people:– LucidWorks training class attendees– Revolution attendees– LucidWorks webinar registrants• 177 have responded so far
  9. 9. Please rank the following features by priorityAnswered: 165 Skipped: 12
  10. 10. More questions1. How many attendees are Eclipse developers?2. How many attendees are running Solr Cloudin production?
  11. 11. Solr: Past, Present & FutureYonik SeeleyLucidWorks
  12. 12. Origins of Solr• CNET driven to find alternatives to discontinuedcommercial enterprise search product• Plan A: ATOMICS (Apache TO MySQL In CNETSearch)– Standalone server speaking XML over HTTP– Meet majority of “search” needs–• Plan B: “Something based on Lucene”– Started Summer 2004– First prototype called “Fusion”, later renamed SOLAR(Search On Lucene And Resin)
  13. 13. Origins of the first Solr admin UI
  14. 14. New admin UI
  15. 15. Timeline(up to 1.4)InitialprototypeCNETproductionCNETcontributesSolr to ASFSolrgraduatesfromIncubatorSimplefacetingreplicationhighlighting,dismaxSpellchecking, CSV, LukeMLT, UpdateRequestProcessorsQParsers SearchComponentsMulti-coreDistributedSearchData ImportHandlerJMX1.31.4StatisticsComponentJavaReplicationTerms andTermVectorComponentsMulti-selectfacetingDynamicClustering1.
  16. 16. Solr 4• Solr Cloud– Distributed Indexing– No single points of failure– Near Real Time friendly (push replication)• NoSQL feature set– Update Durability– Real-time get– Atomic Updates– Optimistic Concurrency• Pseudo-join, Pivot Faceting, Pseudo-fields, etc
  17. 17. What search solution/version are youcurrently using?
  18. 18. Recent Enhancements
  19. 19. Document Routing80000000-bfffffff00000000-3fffffff40000000-7fffffffc0000000-ffffffffshard1shard4shard3 shard2id = BigCo!doc51f27 3c71(MurmurHash3)q=my_queryshard.keys=BigCo!1f27 0000 1f27 ffffto(hash)shard1numShards=4router=compositeId
  20. 20. Seamless Online Shard SplittingShard2_0Shard1replicaleaderShard2replicaleaderShard3replicaleaderShard2_11. New sub-shards created in “construction” state2. Leader starts forwarding applicable updates, whichare buffered by the sub-shards3. Leader index is split and installed on the sub-shards4. Sub-shards apply buffered updates then become“active” leaders and old shard becomes “inactive”update
  21. 21. Cloud Enhancements• Request forwarding– In a multi-collection cluster, any node canhandle/forward requests for any collection• Collection Aliaseshttp://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=northeast&collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT• Coming Soon: Shard Aliases
  22. 22. Schema REST API• Restlet is now integrated with Solr• Get a specific fieldcurl http://localhost:8983/solr/schema/fields/price{"field":{"name":"price","type":"float","indexed":true,"stored":true }}• Get all fieldscurl http://localhost:8983/solr/schema/fields• Get Entire Schema!curl http://localhost:8983/solr/schema
  23. 23. Dynamic Schema• Add a new field (Solr 4.4)curl -XPUT http://localhost:8983/solr/schema/fields/strength -d ‘{"type":”float", "indexed":"true”}‘• Works in distributed (cloud) mode too!• Future: More schemaless– Reality: there is no such thing for Lucene based systems– Type guessing for fields we haven’t seen before
  24. 24. Future• Greater scalability• More “NoSQL”– More ways to update & manipulate documents• Analytics– More powerful faceting, functions, statistics• Improved Relational queries• More dynamic (settings & configuration)• Continued focus on ease of use
  25. 25. Thank You!