Apache Solr
Masterclass
From zero to hero
June 2014
www.slideshare.net/arafalov/solr-masterclass-bangkok-june-2014
2
Alexandre Rafalovitch
www.outerthoughts.com
Web search engines !
are quite sophisticated
3
4
But the real search needs !
are!
much DEEPER and BROADER
5
Searching code
6
Searching people and companies
7
Searching products
8
Searching library material
9
Searching languages
10
Understanding full-text search
SELECT * 

FROM database

WHERE field LIKE ‘%word%’#
This DOES NOT Scale#
Instead: #
break text into tokens#
domain-specific processing (e.g. lower-casing)#
build fast-access structures#
algorithms for term, phrases, proximity search
11
Basic search engine features
Search (Duh!): keyword, phrase, field-specific#
Positive and negative terms#
Sort: relevancy, recency#
Pagination#
Compact summary in results#
SPEED
12
Advanced search engine features
Facets/Taxonomy - based navigation with live counts#
Language-specific processing#
Domain-specific text processing (WiFi = Wi-Fi = WIFI)#
Geographic search#
More-like-this, did-you-mean, autocomplete#
Scaling/Clustering#
NOT web crawling - different, but related
13
Search engine solutions?
Solr#
Elastic Search#
Xapian#
Sphinx#
Groonga#
Searchdaimon#
{F}lexSearch#
Algolia (SaaS)#
Searchify
(SaaS)#
ForageJS#
Lunr.js#
FACT-Finder#
DtSearch#
MarkLogic#
Verity#
Fast#
Most databases#
!
!
…AND MORE
14
Used with permission from SemaText
Open Source Search Evolution
15
Secret Ingredient - Lucene
Solr#
Elastic Search#
SwiftType#
Galene (LinkedIn’s)#
PyLucene (Python
wrapper)#
Lucene.net (C# port)
Scalable, high-performance
indexing#
Incremental indexing#
Full-text search#
Information-Retrieval
algorithms#
Implemented in Java#
Written in 1999, still going
strong
16
Secret Ingredient - Solr
Certified distributions#
LucidWorks#
HelioSearch#
Big Data platforms#
Cloudera#
Hortonworks HDP#
Hosted and SaaS#
Amazon CloudSearch#
WebSolr, SolrHQ, SearchBox
Lucene full-text-search#
XML and REST config#
Schema/Schemaless#
SolrCloud (clustering)#
Caching#
Near real-time#
Rich-document indexing (Tika
inside)#
Plugins, components, processors
17
Solr Ecosystem sample
Drupal#
Project Blacklight#
LuxDB#
SolrMeter#
CrafterCMS#
Typo3#
Magenta#
HippoCMS#
ColdFusion#
SolrNet#
DataStax#
Dovecot#
NGData Lily#
Basho Riak#
YaCy#
Apache ManifoldCF#
Apache Camel#
FranzAllegrograph#
BitNami Solr Stack#
Carrot2!
Broadleaf Commerce#
Cloudera CDK!
CodeLibs Fess (フェス)!
Splunk#
Alfresco#
Rosette by BasisTech!
Luwak by Flax!
Quepid by OSC!
TwigKit!
SPM by SemaText!
SILK by LucidWorks!
Banana (O/S Solr
Kibana)
18
DEMO Time
19
DEMO - Basic
Unzip#
Go to example directory#
Run Solr#
Import some documents from example docs#
grep -l store *.xml | xargs ./post.sh#
Show off Solr 4 admin panel
20
DEMO - Browse handler
Restart Solr with -Dsolr.clustering.enabled=true#
Visit http://localhost:8983/solr/browse/ #
Show off#
Search#
Facets - Categories and Ranges#
Spatial/Geo-distance#
Clusters
21
Getting into Solr
22
Start for free
Download, unzip, cd example; java -jar start.jar#
Go through basic tutorial in docs/tutorial.html#
Copy example directory, modify schema.xml until happy#
If coming from ElasticSearch, look at example-schemaless#
Do NOT follow this path to production#
Example schema is a kitchen sink !!! Read it as a story.#
<solr>/examples/solr/collection1/conf/{schema.xml|solrconfig.xml}
23
Simplest Solr - directory layout
solr-home - point here with -Dsolr.solr.home
collection1 - default collection name, without solr.xml
conf - configuration directory for the collection
schema.xml - defines fields and types
solrconfig.xml - defines low-level configuration but also
components, handlers, and chains for UpdateRequestProcessor
24
Simplest Solr - schema.xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema version="1.5" name="simplest-solr">
<fieldType name="string" class=“solr.StrField"/>
!
<field name="id" type="string" indexed="true" stored="true"
required="true"/>
<dynamicField name="*" type="string" indexed="true"
stored="true" multiValued="true"/>
!
<uniqueKey>id</uniqueKey>
</schema>
25
Simplest Solr - solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>LUCENE_4_9</luceneMatchVersion>
<requestDispatcher handleSelect="false">
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/select" class="solr.SearchHandler" />
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/admin" class="solr.admin.AdminHandlers" />
<requestHandler name="/analysis/field"
class="solr.FieldAnalysisRequestHandler" startup="lazy" />
</config>
26
DEMO
https://github.com/arafalov/simplest-solr-config
java -Dsolr.solr.home=…./simplest-solr
Go to <solr>/example/exampledocs
grep -l store *.xml |xargs ./post.sh (same, same)
Check Admin UI
Query - same, but different (multivalue, date)
Schema browser
27
Lots of things missing
Some admin UI items disabled (Ping, Files)#
No Near-Real-Time or atomic/partial update#
No types (apart from String)#
No dynamic schema#
No SolrCloud#
DOES NOT MATTER. NOTYET!
28
Two ways of learning
You can follow a path (going forward)#
A tutorial#
A book#
Learn what it teaches#
You can reach for the goal (going backwards)#
Have an idea#
Try to achieve it#
Learn what’s on the critical path#
Both are valuable. The second is harder, but gives you more.
29
Goal-driven Solr
1. Start with the simplest configuration that works#
2. Get something in (import data)#
3. Get something out (display data)#
4. Celebrate!!
5. Decide/Fine-tune what/how you want to find things#
6. Change the schema to match#
7. Change the import/display to match#
8. GOTO 5 (never really stops)
30
Getting data in
curl#
post.jar (in example/exampledocs); Try “java -jar post.jar -h” for help#
Admin UI (core/Documents)#
Clients (SolrJ, among 33 at various level of support: https://leanpub.com/solr-
clients/)#
Formats: XML, JSON, CSV, other formats (processed with Tika)#
DataImportHandler to pull data from external sources#
BigData connectors (Hadoop, Flume, etc) #
BigData integrations (DataStax for Solr on Cassandra, Cloudera for Solr on
HDFS)
31
Getting data out
Curl#
Web browser#
Admin UI (core/Query)#
Clients (ResponseWriters for JSON, XML, Python, Ruby, PHP,
CSV)#
UI toolkits (Cloudera HUE, TwigKit)#
Internal post-processors (we saw VelocityResponseWriter at /browse)#
Needs middleware or strong proxy - not secure otherwise
32
Celebrate!
You achieved basic end-to-end test#
You got Solr running#
You figured out how to display it#
You now know where the issues are#
FIX THOSE NEXT
33
Fine-tune schema
Solr is not friends with your data, it’s here to get your documents
found.#
<field name="features" stored="true" indexed="true"
type="text_general" multiValued=“true"/>#
stored=true - that’s for you#
indexed=true - that’s for Solr, where the magic happens#
type=“type_name” - defines what analyser chain to use!
SeeAdminUI core/Analysis#
See http://www.solr-start.com/info/analyzers/ for full list
34
Analyzers - English
<fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">#
<analyzer type="index">#
<tokenizer class="solr.StandardTokenizerFactory"/>#
<filter class=“solr.StopFilterFactory" ignoreCase=“true" words=“lang/
stopwords_en.txt"/>#
<filter class="solr.LowerCaseFilterFactory"/>#
# <filter class="solr.EnglishPossessiveFilterFactory"/>#
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>#
<filter class=“solr.PorterStemFilterFactory”/>….#
</analyzer>….
35
Analyzers - Persian
<fieldType name="text_fa" class="solr.TextField"
positionIncrementGap="100">#
<analyzer>#
<charFilter class="solr.PersianCharFilterFactory"/>#
<tokenizer class="solr.StandardTokenizerFactory"/>#
<filter class="solr.LowerCaseFilterFactory"/>#
<filter class="solr.ArabicNormalizationFilterFactory"/>#
<filter class="solr.PersianNormalizationFilterFactory"/>#
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/
stopwords_fa.txt" />#
</analyzer>#
</fieldType>
36
copyField FTW
<copyField source="cat" dest="text"/>#
<copyField source="*_t" dest="text" maxChars="3000"/>#
Indexing book authors 

“Schildt, Herbert; Wolpert, Lewis; Davies, P. “#
For searching: Tokenized, case-folded, punctuation-stripped:

schildt / herbert / wolpert / lewis / davies / p #
For sorting: Untokenized, case-folded, punctuation-stripped:

schildt herbert wolpert lewis davies p #
For faceting: Primary author only, using a solr.StringField:

Schildt, Herbert
37
Fine-tune search
Default query parser supports Lucene search syntax:#
text +compulsory -negated field:value#
uses default field or explicit field#
not very good for complex analysis#
eDisMax supports that plus searching across many fields#
Many more specialised types: https://cwiki.apache.org/
confluence/display/solr/Other+Parsers
38
Fine-tune indexing
UpdateRequestProcessor#
after you send your data to Solr #
before it hits the schema#
Deal with missing values, do pre-processing, identify
languages, secret to schemaless mode (see example-schemaless)#
Defined in solrconfig.xml, search for
updateRequestProcessorChain#
Full list at: http://www.solr-start.com/info/update-request-
processors/
39
Fine-tune display
Sorting #
Faceting - automatic taxonomy with counts (indexed value)#
Highlighting#
MoreLikeThis#
Statistics#
Grouping, Pivoting#
Debug for troubleshooting
40
Documentation
Solr WIKI - old but still has a lot of information#
Solr Reference Guide - new; online and downloadable#
http://www.solr-start.com/ - my resources of learners#
http://heliosearch.org/author/joel-bernstein/ - about new
features
41
With Solr, how far can I go?
Cloudera (BigData) has > 1,000,000,000 $USD
investments - opportunities?#
8M+ searches/day, 40 languages, 100ms NRT, 1024 cores,
256 shards, 32 servers on #solr at Bloomberg http://bit.ly/
1jmG72G (via @FlaxSearch)
42
Hackathon
43
First steps
Install Solr 4.9#
Go through the tutorial - gives you basics and end-to-end test#
Join the Slack chat (invitations are coming)#
Twit #SolrMasterclassBkk , @SolrStart, if have space :-)#
Attend breakout sessions#
Choose your own adventure (next)
44
Path 1 - Solr indexing book
Great for first timers#
Gets you from zero to comfortable#
All example are provided#
If are you stuck, I will help you#
Probably will not win you any prizes….. #
Do it for the skills
45
Path 2 - Your own dataset
Get it in at any costs#
Get it displayed#
Start iterating#
Book a time slot to discuss your questions#
Demo tips#
Explain problem domain (what is your dataset)#
Show how far you got#
Discuss the challenges
46
Path 3 - Need a dataset
Index your favourite Git repository (e.g. Solr): 

https://github.com/arafalov/git-to-solr#
Your own WordPress blog export (with DataImportHandler)#
Your own hard-drive#
Demo tips#
How far did you get#
Concentrate on displaying something cool (statistics?)#
Coolest Solr feature you found
47
Path 4 - A bigger challenge
Project Guttenberg (ask me for a copy of RDF dump)#
WorldCup matches data: http://worldcup.sfg.io/ #
Twitter feed (e.g. with Spring XD/Integration)#
Your own photographs collection (Tika extracts metadata)
48
DEMO Rules
There are no rules#
And the prizes are not terribly important#
What we are looking for is learning#
Make something new out of something old#
Learn a new features and show others#
Learn, teach, share - everybody wins
49
For later
50
Accelerate your learning
If still feel like a beginner, buy my book - seriously. That’s what it’s for#
All code/data is at: https://github.com/arafalov/solr-indexing-book #
Buy Solr InAction - recently and is a great reference, 

follow @ManningBooks for discounts#
Use my www.solr-start.com resources and join the mailing list 

(I’ll do that for you this time)#
Join solr-user mailing list - full of advanced hackers#
Watch Lucid Revolution videos for background#
Start helping out on Stack Overflow #solr#
Blog what you learned, twit with #Solr
51
Other Search-related books
Designing the Search Experience: The Information
Architecture of Discovery - by a TwigKit creator +1#
SearchAnalytics for Your Site: Conversations with Your
Customers by Louis Rosenfeld - see also Quepid#
Enterprise Search by Martin White
52
53
Alexandre Rafalovitch
www.outerthoughts.com

Solr Masterclass Bangkok, June 2014

  • 1.
    Apache Solr Masterclass From zeroto hero June 2014 www.slideshare.net/arafalov/solr-masterclass-bangkok-june-2014
  • 2.
  • 3.
    Web search engines! are quite sophisticated 3
  • 4.
  • 5.
    But the realsearch needs ! are! much DEEPER and BROADER 5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Understanding full-text search SELECT* 
 FROM database
 WHERE field LIKE ‘%word%’# This DOES NOT Scale# Instead: # break text into tokens# domain-specific processing (e.g. lower-casing)# build fast-access structures# algorithms for term, phrases, proximity search 11
  • 12.
    Basic search enginefeatures Search (Duh!): keyword, phrase, field-specific# Positive and negative terms# Sort: relevancy, recency# Pagination# Compact summary in results# SPEED 12
  • 13.
    Advanced search enginefeatures Facets/Taxonomy - based navigation with live counts# Language-specific processing# Domain-specific text processing (WiFi = Wi-Fi = WIFI)# Geographic search# More-like-this, did-you-mean, autocomplete# Scaling/Clustering# NOT web crawling - different, but related 13
  • 14.
    Search engine solutions? Solr# ElasticSearch# Xapian# Sphinx# Groonga# Searchdaimon# {F}lexSearch# Algolia (SaaS)# Searchify (SaaS)# ForageJS# Lunr.js# FACT-Finder# DtSearch# MarkLogic# Verity# Fast# Most databases# ! ! …AND MORE 14
  • 15.
    Used with permissionfrom SemaText Open Source Search Evolution 15
  • 16.
    Secret Ingredient -Lucene Solr# Elastic Search# SwiftType# Galene (LinkedIn’s)# PyLucene (Python wrapper)# Lucene.net (C# port) Scalable, high-performance indexing# Incremental indexing# Full-text search# Information-Retrieval algorithms# Implemented in Java# Written in 1999, still going strong 16
  • 17.
    Secret Ingredient -Solr Certified distributions# LucidWorks# HelioSearch# Big Data platforms# Cloudera# Hortonworks HDP# Hosted and SaaS# Amazon CloudSearch# WebSolr, SolrHQ, SearchBox Lucene full-text-search# XML and REST config# Schema/Schemaless# SolrCloud (clustering)# Caching# Near real-time# Rich-document indexing (Tika inside)# Plugins, components, processors 17
  • 18.
    Solr Ecosystem sample Drupal# ProjectBlacklight# LuxDB# SolrMeter# CrafterCMS# Typo3# Magenta# HippoCMS# ColdFusion# SolrNet# DataStax# Dovecot# NGData Lily# Basho Riak# YaCy# Apache ManifoldCF# Apache Camel# FranzAllegrograph# BitNami Solr Stack# Carrot2! Broadleaf Commerce# Cloudera CDK! CodeLibs Fess (フェス)! Splunk# Alfresco# Rosette by BasisTech! Luwak by Flax! Quepid by OSC! TwigKit! SPM by SemaText! SILK by LucidWorks! Banana (O/S Solr Kibana) 18
  • 19.
  • 20.
    DEMO - Basic Unzip# Goto example directory# Run Solr# Import some documents from example docs# grep -l store *.xml | xargs ./post.sh# Show off Solr 4 admin panel 20
  • 21.
    DEMO - Browsehandler Restart Solr with -Dsolr.clustering.enabled=true# Visit http://localhost:8983/solr/browse/ # Show off# Search# Facets - Categories and Ranges# Spatial/Geo-distance# Clusters 21
  • 22.
  • 23.
    Start for free Download,unzip, cd example; java -jar start.jar# Go through basic tutorial in docs/tutorial.html# Copy example directory, modify schema.xml until happy# If coming from ElasticSearch, look at example-schemaless# Do NOT follow this path to production# Example schema is a kitchen sink !!! Read it as a story.# <solr>/examples/solr/collection1/conf/{schema.xml|solrconfig.xml} 23
  • 24.
    Simplest Solr -directory layout solr-home - point here with -Dsolr.solr.home collection1 - default collection name, without solr.xml conf - configuration directory for the collection schema.xml - defines fields and types solrconfig.xml - defines low-level configuration but also components, handlers, and chains for UpdateRequestProcessor 24
  • 25.
    Simplest Solr -schema.xml <?xml version="1.0" encoding="UTF-8" ?> <schema version="1.5" name="simplest-solr"> <fieldType name="string" class=“solr.StrField"/> ! <field name="id" type="string" indexed="true" stored="true" required="true"/> <dynamicField name="*" type="string" indexed="true" stored="true" multiValued="true"/> ! <uniqueKey>id</uniqueKey> </schema> 25
  • 26.
    Simplest Solr -solrconfig.xml <?xml version="1.0" encoding="UTF-8" ?> <config> <luceneMatchVersion>LUCENE_4_9</luceneMatchVersion> <requestDispatcher handleSelect="false"> <httpCaching never304="true" /> </requestDispatcher> <requestHandler name="/select" class="solr.SearchHandler" /> <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/admin" class="solr.admin.AdminHandlers" /> <requestHandler name="/analysis/field" class="solr.FieldAnalysisRequestHandler" startup="lazy" /> </config> 26
  • 27.
    DEMO https://github.com/arafalov/simplest-solr-config java -Dsolr.solr.home=…./simplest-solr Go to<solr>/example/exampledocs grep -l store *.xml |xargs ./post.sh (same, same) Check Admin UI Query - same, but different (multivalue, date) Schema browser 27
  • 28.
    Lots of thingsmissing Some admin UI items disabled (Ping, Files)# No Near-Real-Time or atomic/partial update# No types (apart from String)# No dynamic schema# No SolrCloud# DOES NOT MATTER. NOTYET! 28
  • 29.
    Two ways oflearning You can follow a path (going forward)# A tutorial# A book# Learn what it teaches# You can reach for the goal (going backwards)# Have an idea# Try to achieve it# Learn what’s on the critical path# Both are valuable. The second is harder, but gives you more. 29
  • 30.
    Goal-driven Solr 1. Startwith the simplest configuration that works# 2. Get something in (import data)# 3. Get something out (display data)# 4. Celebrate!! 5. Decide/Fine-tune what/how you want to find things# 6. Change the schema to match# 7. Change the import/display to match# 8. GOTO 5 (never really stops) 30
  • 31.
    Getting data in curl# post.jar(in example/exampledocs); Try “java -jar post.jar -h” for help# Admin UI (core/Documents)# Clients (SolrJ, among 33 at various level of support: https://leanpub.com/solr- clients/)# Formats: XML, JSON, CSV, other formats (processed with Tika)# DataImportHandler to pull data from external sources# BigData connectors (Hadoop, Flume, etc) # BigData integrations (DataStax for Solr on Cassandra, Cloudera for Solr on HDFS) 31
  • 32.
    Getting data out Curl# Webbrowser# Admin UI (core/Query)# Clients (ResponseWriters for JSON, XML, Python, Ruby, PHP, CSV)# UI toolkits (Cloudera HUE, TwigKit)# Internal post-processors (we saw VelocityResponseWriter at /browse)# Needs middleware or strong proxy - not secure otherwise 32
  • 33.
    Celebrate! You achieved basicend-to-end test# You got Solr running# You figured out how to display it# You now know where the issues are# FIX THOSE NEXT 33
  • 34.
    Fine-tune schema Solr isnot friends with your data, it’s here to get your documents found.# <field name="features" stored="true" indexed="true" type="text_general" multiValued=“true"/># stored=true - that’s for you# indexed=true - that’s for Solr, where the magic happens# type=“type_name” - defines what analyser chain to use! SeeAdminUI core/Analysis# See http://www.solr-start.com/info/analyzers/ for full list 34
  • 35.
    Analyzers - English <fieldTypename="text_en" class="solr.TextField" positionIncrementGap="100"># <analyzer type="index"># <tokenizer class="solr.StandardTokenizerFactory"/># <filter class=“solr.StopFilterFactory" ignoreCase=“true" words=“lang/ stopwords_en.txt"/># <filter class="solr.LowerCaseFilterFactory"/># # <filter class="solr.EnglishPossessiveFilterFactory"/># <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/># <filter class=“solr.PorterStemFilterFactory”/>….# </analyzer>…. 35
  • 36.
    Analyzers - Persian <fieldTypename="text_fa" class="solr.TextField" positionIncrementGap="100"># <analyzer># <charFilter class="solr.PersianCharFilterFactory"/># <tokenizer class="solr.StandardTokenizerFactory"/># <filter class="solr.LowerCaseFilterFactory"/># <filter class="solr.ArabicNormalizationFilterFactory"/># <filter class="solr.PersianNormalizationFilterFactory"/># <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/ stopwords_fa.txt" /># </analyzer># </fieldType> 36
  • 37.
    copyField FTW <copyField source="cat"dest="text"/># <copyField source="*_t" dest="text" maxChars="3000"/># Indexing book authors 
 “Schildt, Herbert; Wolpert, Lewis; Davies, P. “# For searching: Tokenized, case-folded, punctuation-stripped:
 schildt / herbert / wolpert / lewis / davies / p # For sorting: Untokenized, case-folded, punctuation-stripped:
 schildt herbert wolpert lewis davies p # For faceting: Primary author only, using a solr.StringField:
 Schildt, Herbert 37
  • 38.
    Fine-tune search Default queryparser supports Lucene search syntax:# text +compulsory -negated field:value# uses default field or explicit field# not very good for complex analysis# eDisMax supports that plus searching across many fields# Many more specialised types: https://cwiki.apache.org/ confluence/display/solr/Other+Parsers 38
  • 39.
    Fine-tune indexing UpdateRequestProcessor# after yousend your data to Solr # before it hits the schema# Deal with missing values, do pre-processing, identify languages, secret to schemaless mode (see example-schemaless)# Defined in solrconfig.xml, search for updateRequestProcessorChain# Full list at: http://www.solr-start.com/info/update-request- processors/ 39
  • 40.
    Fine-tune display Sorting # Faceting- automatic taxonomy with counts (indexed value)# Highlighting# MoreLikeThis# Statistics# Grouping, Pivoting# Debug for troubleshooting 40
  • 41.
    Documentation Solr WIKI -old but still has a lot of information# Solr Reference Guide - new; online and downloadable# http://www.solr-start.com/ - my resources of learners# http://heliosearch.org/author/joel-bernstein/ - about new features 41
  • 42.
    With Solr, howfar can I go? Cloudera (BigData) has > 1,000,000,000 $USD investments - opportunities?# 8M+ searches/day, 40 languages, 100ms NRT, 1024 cores, 256 shards, 32 servers on #solr at Bloomberg http://bit.ly/ 1jmG72G (via @FlaxSearch) 42
  • 43.
  • 44.
    First steps Install Solr4.9# Go through the tutorial - gives you basics and end-to-end test# Join the Slack chat (invitations are coming)# Twit #SolrMasterclassBkk , @SolrStart, if have space :-)# Attend breakout sessions# Choose your own adventure (next) 44
  • 45.
    Path 1 -Solr indexing book Great for first timers# Gets you from zero to comfortable# All example are provided# If are you stuck, I will help you# Probably will not win you any prizes….. # Do it for the skills 45
  • 46.
    Path 2 -Your own dataset Get it in at any costs# Get it displayed# Start iterating# Book a time slot to discuss your questions# Demo tips# Explain problem domain (what is your dataset)# Show how far you got# Discuss the challenges 46
  • 47.
    Path 3 -Need a dataset Index your favourite Git repository (e.g. Solr): 
 https://github.com/arafalov/git-to-solr# Your own WordPress blog export (with DataImportHandler)# Your own hard-drive# Demo tips# How far did you get# Concentrate on displaying something cool (statistics?)# Coolest Solr feature you found 47
  • 48.
    Path 4 -A bigger challenge Project Guttenberg (ask me for a copy of RDF dump)# WorldCup matches data: http://worldcup.sfg.io/ # Twitter feed (e.g. with Spring XD/Integration)# Your own photographs collection (Tika extracts metadata) 48
  • 49.
    DEMO Rules There areno rules# And the prizes are not terribly important# What we are looking for is learning# Make something new out of something old# Learn a new features and show others# Learn, teach, share - everybody wins 49
  • 50.
  • 51.
    Accelerate your learning Ifstill feel like a beginner, buy my book - seriously. That’s what it’s for# All code/data is at: https://github.com/arafalov/solr-indexing-book # Buy Solr InAction - recently and is a great reference, 
 follow @ManningBooks for discounts# Use my www.solr-start.com resources and join the mailing list 
 (I’ll do that for you this time)# Join solr-user mailing list - full of advanced hackers# Watch Lucid Revolution videos for background# Start helping out on Stack Overflow #solr# Blog what you learned, twit with #Solr 51
  • 52.
    Other Search-related books Designingthe Search Experience: The Information Architecture of Discovery - by a TwigKit creator +1# SearchAnalytics for Your Site: Conversations with Your Customers by Louis Rosenfeld - see also Quepid# Enterprise Search by Martin White 52
  • 53.