In this slide, we introduce the mechanism of Solr used in Search Engine Back End API Solution for Fast Prototyping (LDSP). You will learn how to create a new core, update schema, query and sort in Solr.
1. Search Engine Back End API
Solution for Fast Prototyping
https://github.com/jimmylai/
search_engine_backend_template
Solr
Jimmy Lai
r97922028 [at] ntu.edu.tw
http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/28
2. Outline - Solr
•
•
•
•
•
Add a new core
Define the schema
Feed data and update partial data
Query and Ranking
Spatial Query
• Prerequisite: Introduction and Tutorial
http://www.slideshare.net/jimmy_lai/search-engine-back-end-api-solution-for-fast-prototyping
LDSP
2
3. Add a New Core
• Core in Solr: a set of configuration of schema
and indexing.
• Cmd: fab create_core:$core_name
– What happended:
1. Copy the dir data/solr_core_template to dir
solr_conf with new name $core_name
2. Set config in core.properties with name=
$core_name
3. Put it under solr_conf and then run solr with
solr.solr.home=solr_conf, and then solr can find
the new core.
•
Can be done with: fab start_solr
Ref:
h'p://wiki.apache.org/solr/CoreAdmin
4. Define the Schema
• Edit file solr_conf/$core_name/conf/
shcema.xml
• Add more fields in <fields> section:
– <field name="sourcesite" type="string" indexed="true" stored="true"
required="false" />
• Define your type in <types> section:
<fieldType name=”chinese" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory”/>
</analyzer>
</fieldType>
Ref:
h'p://wiki.apache.org/solr/SchemaXml
5. Feed Data and Update Partial Data
• Run solr after schema updated:
– fab run_solr
• Prepare new feed data in JSON format:
– [{‘id’: id1, ‘field1’: value11, ‘field2’: value21}, {‘id’: id2,
‘field_1’:value21}]
• Send by Http Post to Solr:
– curl http://localhost:8983/solr/$core_name/update/json?
commit=true --data-binary @$json_file_name -i -H 'Contenttype:application/json’
• Partial Update JSON format:
– [{‘id’: id1, ‘field1’: {‘set’: new_value}]
Ref:
h'p://wiki.apache.org/solr/UpdateJSON
6. Query and Ranking
• Let’s use pysolr:
– solr = pysolr.Solr('http://localhost:8983/solr/movie’)
– results = [i for i in solr.search('*:*', **{‘row’: 100})]
• Query (1st parameter of search())
– Specify value of a field ‘field:value’
– Range query
‘field:[1 TO 100]’
– Boolean operation ‘field1:v1 AND field2:v2’
• Sort (put in 2nd parameter of search())
– {‘sort’: ‘field desc’}
Ref:
h'p://wiki.apache.org/solr/SolrQuerySyntax
8. Spatial Query
• In shcema.xml, set the the type of field as
location:
– <field name="location" type="location" indexed="true" stored="true"
required="false" />
• Query:
– {'pt': ’23.853,120.498’, 'sfield': 'location', 'sort': 'geodist() asc'}
• Get the distance (KM) by:
– {‘fl’: ‘*,distance:geodist()’}
Ref:
h'p://wiki.apache.org/solr/SpaEalSearch
9. The next steps
• Understand the mechanism of:
– Solr:
• Update schema
• Update partial data
• Query language of Solr
– Djangorestframework:
• Add more parameters
• Input validation
• Output rendering
• Deploy to production environment
LDSP
9