SOLR
777 Washington Road #5
Parlin, NJ 08859
Phone: 732 307 2655
Email: sathish@egrovesys.com
- An Apache Product
CONTENTS
INTRODUCTION
FEATURES
FUNCTIONS
ARCHITECTURE
PERFORMANCE
PROs & CONs
FUTURE TRENDS
WEBSITES USING SOLR
2
INTRODUCTION
INTRODUCTION
• A full text search server based on Lucene
• XML/HTTP Interfaces
• Loose Schema to define types and fields
• Web Administration Interface
• Extensive Caching
• Index Replication
• Extensible Open Architecture
• Written in Java5, deployable as a WAR
4
5
INTRODUCTION
FEATURES
• Advanced full – text search.
• Optimized for high traffic volume.
• Standards based open interfaces – XML, JSON & HTTP
• Comprehensive administration interfaces
• Near real – time indexing
• Extensible plugin architecture
• Multiple search indices
• Apache UIMA
• Rich document parsing
• Advanced storage options
• Performance optimization
FEATURES
7
FUNCTIONS
• XML/HTTP and JSON APIs
• Hit highlighting
• Faceted Search and Filtering
• Geospatial Search
• Fast Incremental Updates and Index Replication
• Caching
• Replication
• Web administration interface
FUNCTIONS
9
ARCHITECTURE
ARCHITECTURE
Source : www.xaviermorera.com
11
PERFORMANCE
Performance Factors
• Schema design
• # of indexed fields
• omitNorms
• Term – vectors
• Docvalues
• Configuration
• mergeFactor
• Caches
• Indexing
• Bulk updates
• Commit Strategy
• Optimize
• Querying
PERFORMANCE
14
1. Memory Testing – SOLR response time for 1 million volume index
with 8 GB and 32 GB instance.
Source : www.hathitrust.org
PERFORMANCE
15
2. SOLR index size analysis for Twitter dataset
Source : www.dzone.com
PERFORMANCE
16
PROs & CONs
PROS CONS
 Easymonitoring.
 HighlyScalable.
 FaultTolerant.
 Flexibleandadaptablewith
easyconfiguration.
 PerformanceOptimization.
 HighlyConfigurableand
userextensiblecaching.
 Freelyavailable.
 Multilingualsupport.
 Easyimplementationandsetup
 Lessresourceutilization
 Agenerallackofcommitment
towardsSOLR.
 LessattentionsonJVM
settings&garbage.
 Increasedlatency.
 OccasionallargeIOloadto
replicatelargemerges.
 Complicatedloadbalanceand
management.
 Reconfigurationifthemaster
islost.
PROs & CONs
18
FUTURE TRENDS
• OOTB Simple Faceted Browsing
• Automatic Database Indexing
• Federated Search
– HA with failover
• Alternate output formats (JSON, Ruby)
• Highlighter integration
• Spellchecker
• Alternate APIs (Google Data, OpenSearch)
FUTURE TRENDS
20
WEBSITES
USING SOLR
• Whitehouse.gov
• Buy.com
• Cnet
• Netflix
• Apple
• Disney
• eTrade
• NASA
• MTV
• Zappos
• AOL
• Digg
WEBSITES USING SOLR
22
eGrove Systems - "SOLR" An Apache Product

eGrove Systems - "SOLR" An Apache Product

  • 1.
    SOLR 777 Washington Road#5 Parlin, NJ 08859 Phone: 732 307 2655 Email: sathish@egrovesys.com - An Apache Product
  • 2.
  • 3.
  • 4.
    INTRODUCTION • A fulltext search server based on Lucene • XML/HTTP Interfaces • Loose Schema to define types and fields • Web Administration Interface • Extensive Caching • Index Replication • Extensible Open Architecture • Written in Java5, deployable as a WAR 4
  • 5.
  • 6.
  • 7.
    • Advanced full– text search. • Optimized for high traffic volume. • Standards based open interfaces – XML, JSON & HTTP • Comprehensive administration interfaces • Near real – time indexing • Extensible plugin architecture • Multiple search indices • Apache UIMA • Rich document parsing • Advanced storage options • Performance optimization FEATURES 7
  • 8.
  • 9.
    • XML/HTTP andJSON APIs • Hit highlighting • Faceted Search and Filtering • Geospatial Search • Fast Incremental Updates and Index Replication • Caching • Replication • Web administration interface FUNCTIONS 9
  • 10.
  • 11.
  • 12.
  • 13.
    Performance Factors • Schemadesign • # of indexed fields • omitNorms • Term – vectors • Docvalues • Configuration • mergeFactor • Caches • Indexing • Bulk updates • Commit Strategy • Optimize • Querying PERFORMANCE 14
  • 14.
    1. Memory Testing– SOLR response time for 1 million volume index with 8 GB and 32 GB instance. Source : www.hathitrust.org PERFORMANCE 15
  • 15.
    2. SOLR indexsize analysis for Twitter dataset Source : www.dzone.com PERFORMANCE 16
  • 16.
  • 17.
    PROS CONS  Easymonitoring. HighlyScalable.  FaultTolerant.  Flexibleandadaptablewith easyconfiguration.  PerformanceOptimization.  HighlyConfigurableand userextensiblecaching.  Freelyavailable.  Multilingualsupport.  Easyimplementationandsetup  Lessresourceutilization  Agenerallackofcommitment towardsSOLR.  LessattentionsonJVM settings&garbage.  Increasedlatency.  OccasionallargeIOloadto replicatelargemerges.  Complicatedloadbalanceand management.  Reconfigurationifthemaster islost. PROs & CONs 18
  • 18.
  • 19.
    • OOTB SimpleFaceted Browsing • Automatic Database Indexing • Federated Search – HA with failover • Alternate output formats (JSON, Ruby) • Highlighter integration • Spellchecker • Alternate APIs (Google Data, OpenSearch) FUTURE TRENDS 20
  • 20.
  • 21.
    • Whitehouse.gov • Buy.com •Cnet • Netflix • Apple • Disney • eTrade • NASA • MTV • Zappos • AOL • Digg WEBSITES USING SOLR 22