SlideShare a Scribd company logo
1 of 25
Download to read offline
Rapid Prototyping
           with Solr

          Erik Hatcher, Lucid Imagination
erik.hatcher @ lucidimagination.com, May 25, 2011
Abstract
§  Got data? Let's make it searchable! This interactive
    presentation will demonstrate getting documents into
    Solr quickly, will provide some tips in adjusting Solr's
    schema to match your needs better, and finally will
    discuss how showcase your data in a flexible search
    user interface. We'll see how to rapidly leverage
    faceting, highlighting, spell checking, and debugging.
    Even after all that, there will be enough time left to
    outline the next steps in developing your search
    application and taking it to production.




                                                               3
My Background
§  Erik Hatcher
   •  Lucid Imagination
      §  Technical Staff
   •  Co-author
      §  Java Development with Ant / Ant in Action (Manning)
      §  Lucene in Action (Manning)
   •  Apache Software Foundation
      §  Committer – Lucene / Solr
      §  PMC – Lucene TLP
      §  Member




                                                                4
Why prototype?
§  Demonstrate Solr can handle your data and
    searching needs; mitigate risk, learn the
    unknown
§  It’s quick and easy, with very little time
    investment
§  Immediate functional user interface impresses
    decision makers and target users;
    get buy-in
  •  The user interface IS the app



                                                    5
Prior Art
§  Hoss’ amazing ISFDB work
   •  http://www.lucidimagination.com/blog/tag/isfdb/
§  Previous “Rapid Prototyping with Solr” presentations
   •  Data.gov Catalog on Solr:
      http://www.lucidimagination.com/blog/2010/11/05/data-gov-
      on-solr/
   •  Rich text files on Solr:
      http://www.lucidimagination.com/Community/Hear-from-
      the-Experts/Podcasts-and-Videos/Rapid-Prototyping-
      Search-Applications-Solr
   •  CSV (conference attendee data) on Solr:
      http://www.slideshare.net/erikhatcher/rapid-prototyping-
      with-solr-4312681



                                                                  6
Rapid Prototyping using CSV
§  Fired up Solr’s example configuration
§  /update/csv
   •  http://localhost:8983/solr/update/csv?
      commit=true&stream.file=EuroCon2010.csv&fieldnames=fi
      rst,last,company,title,country&header=true&f.country.map
      =Great+Britain:United+Kingdom
§  Tweak configuration
   •  schema: domain-centric field names
   •  solrconfig: /browse request handler
   •  Template adjustments
§  Instant classic search results view, tree map
    visualization of facet data, and random selection of
    contest winners

                                                                 7
CSV results




              8
… using rich text files
§  curl "http://localhost:8983 /solr/update/extract?
    stream.file=/docs/file.pdf &literal.id=/docs/file.pdf




                                                            9
… using Data.Gov catalog data
§  /update/csv – again!




                                 10
Explaining




             11
Suggest




          12
Venn Viz




           13
E-commerce data
§  http://bbyopen.com/
§  Product data, via easy HTTP JSON API




                                           14
Ingesting the data
require 'solr’!
#...!
1.upto(max_pages) do |page|!
  puts "Processing page #{page}"!
  json = fetch_page(page)!
  !
  response = JSON.parse(json, :symbolize_names=>true)!
  puts "Total products: #{response[:total]}" if page == 1!
!
  mapping = {!
     :id           => :sku,!
     :name_t       => :name,!
     :thumbnail_s => :thumbnailImage,!
     :url_s        => :url,!
     :type_s       => :type,!
     :category_s   => Proc.new {|prod| !
                        prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},!
     :department_s => :department,!
     :class_s      => :class,!
     :subclass_s   => :subclass,!
     :sale_price_f => :salePrice!
  }!
!
  Solr::Indexer.new(response[:products], mapping, !
                     {:debug => debug, :buffer_docs => 500}).index!
end!



                                                                                         15
solr-ruby’s secret power
§  Solr::Indexer.new(
        source, mapping, options
    ).index
§  “Quacks like a duck”
§  source simply #each’s
§  mapping simply #[]’s




                                   16
… on Prism




             17
What is Prism?
§  Yet another opinionated brainstorm from Erik
§  https://github.com/lucidimagination/Prism
§  Under the covers
    •  Ruby
        §  because it’s beautiful
    •  Sinatra
        §  to be lightweight and have elegant flexible routing
    •  Velocity
        §  because it is easy to learn and use, and has powerful features, facilitates
            edit/refresh work
§  Separate from Solr, Rack-savvy, allows easy coding of new routes
    and capabilities
§  Designed to work with any arbitrary Solr instance, and already has
    some basic LucidWorks Enterprise capability
§  Totally a proof-of-concept at this point – just a quick hack

                                                                                          18
… on Solritas




                19
Solritas?
§  Pronounced: so-LAIR-uh-toss
§  Celeritas is a Latin word, translated as "swiftness" or
    "speed". It is often given as the origin of the symbol c,
    the universal notation for the speed of light - http://
    en.wikipedia.org/wiki/Celeritas
§  Technically it’s the VelocityResponseWriter
    (wt=velocity)
   •  simply passes the Solr response through the Apache
      Velocity templating engine
§  http://wiki.apache.org/solr/VelocityResponseWriter
§  Built into Solr, available instantly out of the box at:
    http://localhost:8983/solr/browse

                                                                20
… on Blacklight




                  21
Blacklight?
§  http://projectblacklight.org/
§  Blacklight is a free and open source Ruby on Rails based
    discovery interface (a.k.a. “next-generation catalog”) especially
    optimized for heterogeneous collections. You can use it as a library
    catalog, as a front end for a digital repository, or as a single-search
    interface to aggregate digital content that would otherwise be
    siloed.
§  Production sites:
       •  http://search.lib.virginia.edu/
       •  http://searchworks.stanford.edu/
§    Features:
       •  Authentication
       •  Saved searches
       •  Bookmarks – saved result items
       •  Selected items – for exporting to 3rd party systems
       •  Customizable / extensible UI

                                                                              22
Prototyping Tips and Tools
§  Get data into Solr in the simplest possible way
    •  CSV – if it fits, it’s really nice
§  Schema adjusting
    •  <dynamicField name="*" type="string" multiValued="true"/>
    •  <copyField source="*" dest="text"/>
§  Data analysis
    •  Understand what Solr is doing with your fields
    •  Solr’s Schema Browser and /admin/luke request handler
§  UI
    •  /browse – easy tweaking of <solr-home>/conf/velocity/*.vm
       templates




                                                                   23
Now what?
§  Script the indexing process: full and
    incremental/delta
§  Work with real users on real needs
§  Integrate into production systems
§  Iterate on schema enhancements and
    configuration tweaks
§  Deploy to staging/production environments and
    work at scale: collection size, real queries and
    volume, hardware and JVM settings


                                                       24
Test
§    Performance
§    Scalability
§    Relevance
§    Automate all of the above, start baselines,
      avoid regressions




                                                    25
Thanks!




          26

More Related Content

What's hot

Drupal + ApacheSolr
Drupal + ApacheSolrDrupal + ApacheSolr
Drupal + ApacheSolrDropsolid
 
RESTful Api practices Rails 3
RESTful Api practices Rails 3RESTful Api practices Rails 3
RESTful Api practices Rails 3Anton Narusberg
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisOfer Zelig
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your websitehernanibf
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
 
Saving Time with WP-CLI
Saving Time with WP-CLISaving Time with WP-CLI
Saving Time with WP-CLITaylor Lovett
 
My site is slow
My site is slowMy site is slow
My site is slowhernanibf
 
Perl in the Real World
Perl in the Real WorldPerl in the Real World
Perl in the Real WorldOpusVL
 
PLAT-16 Using Enterprise Content in Grails
PLAT-16 Using Enterprise Content in GrailsPLAT-16 Using Enterprise Content in Grails
PLAT-16 Using Enterprise Content in GrailsAlfresco Software
 
Jasig rubyon rails
Jasig rubyon railsJasig rubyon rails
Jasig rubyon rails_zaMmer_
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchTaylor Lovett
 
Building web framework with Rack
Building web framework with RackBuilding web framework with Rack
Building web framework with Racksickill
 
Ruby w/o Rails (Олександр Сімонов)
Ruby w/o Rails (Олександр Сімонов)Ruby w/o Rails (Олександр Сімонов)
Ruby w/o Rails (Олександр Сімонов)Fwdays
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) RoundupWayne Carter
 
Simplify your integrations with Apache Camel
Simplify your integrations with Apache CamelSimplify your integrations with Apache Camel
Simplify your integrations with Apache CamelKenneth Peeples
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset PipelineEric Berry
 
Rails Girls: Programming, Web Applications and Ruby on Rails
Rails Girls: Programming, Web Applications and Ruby on RailsRails Girls: Programming, Web Applications and Ruby on Rails
Rails Girls: Programming, Web Applications and Ruby on RailsDonSchado
 

What's hot (19)

Drupal + ApacheSolr
Drupal + ApacheSolrDrupal + ApacheSolr
Drupal + ApacheSolr
 
RESTful Api practices Rails 3
RESTful Api practices Rails 3RESTful Api practices Rails 3
RESTful Api practices Rails 3
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 
Saving Time with WP-CLI
Saving Time with WP-CLISaving Time with WP-CLI
Saving Time with WP-CLI
 
My site is slow
My site is slowMy site is slow
My site is slow
 
Perl in the Real World
Perl in the Real WorldPerl in the Real World
Perl in the Real World
 
PLAT-16 Using Enterprise Content in Grails
PLAT-16 Using Enterprise Content in GrailsPLAT-16 Using Enterprise Content in Grails
PLAT-16 Using Enterprise Content in Grails
 
Jasig rubyon rails
Jasig rubyon railsJasig rubyon rails
Jasig rubyon rails
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Building web framework with Rack
Building web framework with RackBuilding web framework with Rack
Building web framework with Rack
 
Ruby w/o Rails (Олександр Сімонов)
Ruby w/o Rails (Олександр Сімонов)Ruby w/o Rails (Олександр Сімонов)
Ruby w/o Rails (Олександр Сімонов)
 
Supa fast Ruby + Rails
Supa fast Ruby + RailsSupa fast Ruby + Rails
Supa fast Ruby + Rails
 
Drupal Camp Melbourne
Drupal Camp MelbourneDrupal Camp Melbourne
Drupal Camp Melbourne
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) Roundup
 
Simplify your integrations with Apache Camel
Simplify your integrations with Apache CamelSimplify your integrations with Apache Camel
Simplify your integrations with Apache Camel
 
Asset Pipeline
Asset PipelineAsset Pipeline
Asset Pipeline
 
Rails Girls: Programming, Web Applications and Ruby on Rails
Rails Girls: Programming, Web Applications and Ruby on RailsRails Girls: Programming, Web Applications and Ruby on Rails
Rails Girls: Programming, Web Applications and Ruby on Rails
 

Similar to Rapid prototyping with solr - By Erik Hatcher

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceLucidworks (Archived)
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Nilesh Panchal
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 

Similar to Rapid prototyping with solr - By Erik Hatcher (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Data Science
Data ScienceData Science
Data Science
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Using Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User ExperienceUsing Solr in Online Travel Shopping to Improve User Experience
Using Solr in Online Travel Shopping to Improve User Experience
 
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Solr 101
Solr 101Solr 101
Solr 101
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Big Search with Big Data Principles
Big Search with Big Data PrinciplesBig Search with Big Data Principles
Big Search with Big Data Principles
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Rapid prototyping with solr - By Erik Hatcher

  • 1. Rapid Prototyping with Solr Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011
  • 2. Abstract §  Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production. 3
  • 3. My Background §  Erik Hatcher •  Lucid Imagination §  Technical Staff •  Co-author §  Java Development with Ant / Ant in Action (Manning) §  Lucene in Action (Manning) •  Apache Software Foundation §  Committer – Lucene / Solr §  PMC – Lucene TLP §  Member 4
  • 4. Why prototype? §  Demonstrate Solr can handle your data and searching needs; mitigate risk, learn the unknown §  It’s quick and easy, with very little time investment §  Immediate functional user interface impresses decision makers and target users; get buy-in •  The user interface IS the app 5
  • 5. Prior Art §  Hoss’ amazing ISFDB work •  http://www.lucidimagination.com/blog/tag/isfdb/ §  Previous “Rapid Prototyping with Solr” presentations •  Data.gov Catalog on Solr: http://www.lucidimagination.com/blog/2010/11/05/data-gov- on-solr/ •  Rich text files on Solr: http://www.lucidimagination.com/Community/Hear-from- the-Experts/Podcasts-and-Videos/Rapid-Prototyping- Search-Applications-Solr •  CSV (conference attendee data) on Solr: http://www.slideshare.net/erikhatcher/rapid-prototyping- with-solr-4312681 6
  • 6. Rapid Prototyping using CSV §  Fired up Solr’s example configuration §  /update/csv •  http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=fi rst,last,company,title,country&header=true&f.country.map =Great+Britain:United+Kingdom §  Tweak configuration •  schema: domain-centric field names •  solrconfig: /browse request handler •  Template adjustments §  Instant classic search results view, tree map visualization of facet data, and random selection of contest winners 7
  • 8. … using rich text files §  curl "http://localhost:8983 /solr/update/extract? stream.file=/docs/file.pdf &literal.id=/docs/file.pdf 9
  • 9. … using Data.Gov catalog data §  /update/csv – again! 10
  • 11. Suggest 12
  • 12. Venn Viz 13
  • 13. E-commerce data §  http://bbyopen.com/ §  Product data, via easy HTTP JSON API 14
  • 14. Ingesting the data require 'solr’! #...! 1.upto(max_pages) do |page|! puts "Processing page #{page}"! json = fetch_page(page)! ! response = JSON.parse(json, :symbolize_names=>true)! puts "Total products: #{response[:total]}" if page == 1! ! mapping = {! :id => :sku,! :name_t => :name,! :thumbnail_s => :thumbnailImage,! :url_s => :url,! :type_s => :type,! :category_s => Proc.new {|prod| ! prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},! :department_s => :department,! :class_s => :class,! :subclass_s => :subclass,! :sale_price_f => :salePrice! }! ! Solr::Indexer.new(response[:products], mapping, ! {:debug => debug, :buffer_docs => 500}).index! end! 15
  • 15. solr-ruby’s secret power §  Solr::Indexer.new( source, mapping, options ).index §  “Quacks like a duck” §  source simply #each’s §  mapping simply #[]’s 16
  • 17. What is Prism? §  Yet another opinionated brainstorm from Erik §  https://github.com/lucidimagination/Prism §  Under the covers •  Ruby §  because it’s beautiful •  Sinatra §  to be lightweight and have elegant flexible routing •  Velocity §  because it is easy to learn and use, and has powerful features, facilitates edit/refresh work §  Separate from Solr, Rack-savvy, allows easy coding of new routes and capabilities §  Designed to work with any arbitrary Solr instance, and already has some basic LucidWorks Enterprise capability §  Totally a proof-of-concept at this point – just a quick hack 18
  • 19. Solritas? §  Pronounced: so-LAIR-uh-toss §  Celeritas is a Latin word, translated as "swiftness" or "speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas §  Technically it’s the VelocityResponseWriter (wt=velocity) •  simply passes the Solr response through the Apache Velocity templating engine §  http://wiki.apache.org/solr/VelocityResponseWriter §  Built into Solr, available instantly out of the box at: http://localhost:8983/solr/browse 20
  • 21. Blacklight? §  http://projectblacklight.org/ §  Blacklight is a free and open source Ruby on Rails based discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed. §  Production sites: •  http://search.lib.virginia.edu/ •  http://searchworks.stanford.edu/ §  Features: •  Authentication •  Saved searches •  Bookmarks – saved result items •  Selected items – for exporting to 3rd party systems •  Customizable / extensible UI 22
  • 22. Prototyping Tips and Tools §  Get data into Solr in the simplest possible way •  CSV – if it fits, it’s really nice §  Schema adjusting •  <dynamicField name="*" type="string" multiValued="true"/> •  <copyField source="*" dest="text"/> §  Data analysis •  Understand what Solr is doing with your fields •  Solr’s Schema Browser and /admin/luke request handler §  UI •  /browse – easy tweaking of <solr-home>/conf/velocity/*.vm templates 23
  • 23. Now what? §  Script the indexing process: full and incremental/delta §  Work with real users on real needs §  Integrate into production systems §  Iterate on schema enhancements and configuration tweaks §  Deploy to staging/production environments and work at scale: collection size, real queries and volume, hardware and JVM settings 24
  • 24. Test §  Performance §  Scalability §  Relevance §  Automate all of the above, start baselines, avoid regressions 25
  • 25. Thanks! 26