Your SlideShare is downloading. ×
Apache solr소개 20120629
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache solr소개 20120629

5,405
views

Published on

An open-source search server based on the Lucene Java search library

An open-source search server based on the Lucene Java search library

Published in: Technology, News & Politics

0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,405
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
209
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Solr 소개2012.06.29윤도상dsyoon@ncue.net
  • 2. Solr 기능 • Schema – 색인할 문서의 필드와 그 필드 타입을 쉽게 정의 – Lucene의 Analyzer 사용 – Dynamic Field를 지원 – Copy Field를 사용하여 여러 field를 검색 가능한 단일 field로 묶을 수 있음 – 외부 파일을 통해 금지어 등을 설정할 수 있다. • Query – HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정 – 쿼리와 필드 값에 근거한 Faceted Search 제공 – query로 검색 정렬을 정의 가능 – 용이한 검색 score 설정 – query에 특정 field에 대한 가중치 부여 가능 • Core – query handler와 확장 가능한 XML format – unique key field에 기반하여 중복 문서 탐지 • Caching – query 결과, 필터, 문서에 대한 캐시 설정 – 사용자 수준에서의 캐시 설정 지원 • Replication – rsync transport를 통해 효과적인 분산 색인 • Admin Interface – cache, update, query 상태를 알려줌. – Text Analyzer에 대한 디버거 제공 – 웹 쿼리 인터페이스 제공 2
  • 3. Architecture
  • 4. Overall Architecture 4
  • 5. Component 5
  • 6. High Availability 6
  • 7. Replication 7
  • 8. Configure
  • 9. Schema.xml • Overall <schema> <types> … </types> <fields> … </fields> <uniqueKey /> <solrQueryParser /> <copyField /> <dynamicField /> </schema> 9
  • 10. Schema.xml • Type <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ /> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> </types> 10
  • 11. Schema.xml • Fields <fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name=“release_dt" type="date" indexed="true" stored="true" /> <field name="title" type="text_general" indexed="true" stored="true" /> <field name=“content" type="text_general" indexed="true" stored="true" /> <field name=“text" type="text_general" indexed="true" stored="true" /> </fields> • uniqueKey – <uniqueKey>id</uniqueKey> • solrQueryParser – <solrQueryParser defaultOperator="OR"/> • copyField – <copyField source=“title" dest=“test"/> – <copyField source=“content" dest=“test"/> • dynamicField – <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/> – <dynamicField name="*_text" type="string" indexed="true" stored="true"/> 11
  • 12. Schema.xml • Example for bigram analyzer <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer> </fieldType> • Dynamically Reload $curl „http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0‟ [예) $ curl http://localhost:8981/solr/admin/cores?action=RELOAD&core=news „] 12
  • 13. Multi-Core
  • 14. 설정 파일 1. solr 디렉토리에 solr.xml 설정파일 수정 <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" defaultCoreName=“core1"> <core name=“core1" instanceDir=“core_dir1" /> <core name=“core2" instanceDir=“core_dir2" /> </cores> </solr> 2. solr 디렉토리에 core의 홈 디렉토리 생성 - solr - core_dir1 - core_dir2 3. 생성한 각 디렉토리에 conf와 data 디렉토리를 생성한다.  data 경로는 solrconfig.xml에서 아래와 같은 부분에서 설정할 수 있다. <dataDir>${solr.data.dir:}</dataDir> - solr - core_dir1 - conf - data - core_dir2 - conf - data 14
  • 15. Web Admin Interface
  • 16. Web Admin Interface • Config, Schema, Distribution 정보 조회 • Query Interface • 각종 통계 – Caches: lookups, hits, hitratio, inserts, evictions, size – RequestHandlers: requests, errors – UpdateHandler: adds, deletes, commits, optimizes – IndexReader, open-time, index-version, numDocs, maxDocs • Analysis Debugger – 각 분석 단계에 대한 결과를 보여줌 – 쿼리와 색인에 대한 매치에 대한 정보를 보여줌 16
  • 17. Solr Document
  • 18. XML • Document <add> <delete> <doc> <id>05991</id> <field name="employeeId">05991</field> <id>06000</id> <field name="office">Bridgewater</field> <query>office:Bridgewater</query> <field name="skills">Perl</field> <query>office:Osaka</query> <field name="skills">Java</field> </delete> </doc> </add> • Indexing $ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml” --data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’ • Update $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field> <field name="office" boost="2.0">Bridgewater</field> </doc> </add>’ • Commit $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’ 18
  • 19. Json • Document [ { "id" : "MyTestDocument", "title" : "This is just a test“ } ] • Indexing $ curl http://localhost:8983/solr/update/json -H Content-type:application/json -d [ { "id" : "MyTestDocument", "title" : "This is just a test" } ] • Update/Delete $ curl http://localhost:8983/solr/update/json -H Content-type:application/json -d { "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} }, "add": {"doc": {"id" : "TestDoc2", "title" : "another test“} }, “delete”: {"id" : "TestDoc1“ } }, “delete”: {“query" : “Test“, commitWithin:500 } }, } • Commit $ curl http://localhost:8983/solr/update?commit=true 19
  • 20. CVS • Document [test.cvs] [test.cvs] fieldnames=id,,category fieldnames=id,title,category 100,”title”, ”This Value is "“food“”" 100,”title”, ”This Value is "“food“”" • Indexing $ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H Content-type:text/plain; charset=utf-8 • Example from Mysql Dump $ curl http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=&stream.file=/tmp/result.text‘ 20
  • 21. Data Handler Interface
  • 22. Full-import • 테스트 DB 구성 예 Create database solr; Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by „solr‟; Create table maker ( mid int primary key auto_increment, name varchar(30) not null, lastmodified datetime ); Create table product ( id int primary key auto_increment, mid int not null, name varchar(30) not null, hname varchar(30) not null, lastmodified datetime ); Insert into maker(name, lastmodified) values(apple, 2012-05-11 17:00:00); Insert into maker(name, lastmodified) values(sony, 2012-05-11 17:00:00); Insert into maker(name, lastmodified) values(microsoft, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, iphone, 아이폰, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, ipod, 아아팟, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, ipad, 아이패드, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(2, walkman, 워크맨, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(2, vaio, 바이오, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(3, windowsxp, 윈도우xp, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(3, windowx7, 윈도우7, 2012-05-11 17:00:00); 22
  • 23. Full-import • MYSQL Connection 설정 – Solrconfig.xml 파일에서 db 설정 파일을 지정한다. <requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler> – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver” url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" query="select id, mid, name from product"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <field column=“hname" name=“hname" /> <entity name="maker" query="select mid, name from maker where mid = ${product.mid}"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> 23
  • 24. Full-import • 색인 설정 – Shema.xml 파일에서 검색 필드를 설정 <field name="pid" type="string" indexed="true" stored="true" required="true" /> <field name="mid" type="int" indexed="true" stored="true" multiValued="false“ /> <field name="pname" type="text" indexed="true" stored="true" multiValued="true“ /> <field name="mname" type="text" indexed="true" stored="true" multiValued="true“ /> …….. <defaultSearchField>pname</defaultSearchField> <defaultSearchField>mname</defaultSearchField> …….. <uniqueKey>pid</uniqueKey> …….. <copyField source="pname" dest="text"/> <copyField source="mname" dest="text"/> – Solr 실행 java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar – 색인 실행 http://localhost:8983/solr/db/dataimport?command=full-import 24
  • 25. Delta-import • 테스트 DB 구성 예 Insert into maker(name, lastmodified) values(Samsung, 2012-05-14 14:00:00); Insert into maker(name, lastmodified) values(LG, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyS, 겔럭시S, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyA, 겔럭시A, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyNote, 겔럭시노트, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(5, OptimusLTE, 옵티머스LTE, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(5, VegaLTE, 베가LTE, 2012-05-14 14:00:00); 25
  • 26. Delta-import • MYSQL Connection 설정 – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" pk="id“ query="select * from product“ deltaImportQuery="select * from product where id=${dataimporter.delta.id}“ deltaQuery="select id from product where lastmodified > ${dataimporter.last_index_time}"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <entity name="maker" pk="mid“ query="select mid from maker where mid=${product.mid}"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> – 색인 실행 http://localhost:8983/solr/db/dataimport?command=delta-import 26
  • 27. Index
  • 28. Index • 기존 데이터를 모두 지움 $ java -Durl=http://localhost:$port/solr/update/?commit=true -Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>" • 다음과 같이 post.jar 파일을 이용하여 색인함 $ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml $ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml ※ 주의 – 처음 색인 파일 생성시 <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> – 색인 파일 갱신시 <update> <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> </update> 28
  • 29. Search
  • 30. Search Parameter Parameter Default Description q 검색 쿼리. 예) q=video 혹은 q=title:spiderman^10 text:spiderman start 0 검색된 결과 리스트에 대한 Offset rows 10 반환될 결과 문서 수 반환될 필드 (필드명은 comma로 구분) fl * 예) fl=*,score 혹은 fl=id, name qf 결과로써 제공받을 필드 지정. 예) q=superman&qf=title subject 오름/내림차순으로 검색할 필드 지정 sort 예) sort=inStock asc, price desc 혹은 sort=price asc wt Writer type. 예) wt=json 혹은 wt=xml 필터 쿼리 지정 (결과내 검색 기능) fq 예) q=video&fq=superman hl 하이라이트 필드 지정. 예) hl=true&hl.fl=name, description Faceted Search facet 예) facet=true&facet.field=cat facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] debugQuery 검색결과에 debug 결과를 추가해 보여줌 30
  • 31. Query Examples • mission이나 impossible이 포함되고 releaseDate로 내림차순 검색 – q=mission impossible; releaseDate desc • mission을 포함하고actor에 cruise가 포함되지 않은 문서를 검색 – q=+mission –actor:cruise • mission impossible이 붙고, actor에 cruise가 포함되지 않은 문서 검색 – q=“mission impossible” –actor:cruise • title에 spiderman을 description의 spiderman보다 10의 가중치 부여 – q=title:spiderman^10 description:spiderman • description필드에서 spiderman과 movie가 10단어 이내의 문서 검색 – q=description:“spiderman movie”~10 • HDTV를 반드시 포함하고 weight이 40 이상인 문서를 검색 – q=+HDTV +weight:[40 TO *] • Wildcard queries • q=te?t • q=te*t • q=test* 31
  • 32. Search Relevancy 32
  • 33. Faceted Browsing 33
  • 34. Autocomplete
  • 35. Suggest • 설정 – Solrconfig.xml에 suggest 기능을 추가한다. <searchComponent name="suggest" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str> <str name="field">name_autocomplete</str> </lst> </searchComponent> <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.count">10</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler> 35
  • 36. Suggest • 설정 – Shema.xml에 suggest 필드를 추가한다. <fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiValued="false" /> <copyField source="name" dest="name_autocomplete" /> • 검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true) http://localhost:8983/solr/db/suggest?q=겔 http://localhost:8983/solr/db/suggest?q=윈도 36
  • 37. Basic Dictionary- 동의어/불용어 사전-
  • 38. 동의어 사전 • 항목 (synonyms.txt) Window => windowxp window7 window8 window 7, door • 테스트 쿼리 [Query: window 7] 38
  • 39. 동의어 사전 • 테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 39
  • 40. 불용어 사전 • 항목 (stopwords.txt) Window • 테스트 쿼리 [Query: window 7] • 테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 40

×