Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache solr소개 20120629

An open-source search server based on the Lucene Java search library

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Apache solr소개 20120629

  1. 1. Apache Solr 소개2012.06.29윤도상dsyoon@ncue.net
  2. 2. Solr 기능 • Schema – 색인할 문서의 필드와 그 필드 타입을 쉽게 정의 – Lucene의 Analyzer 사용 – Dynamic Field를 지원 – Copy Field를 사용하여 여러 field를 검색 가능한 단일 field로 묶을 수 있음 – 외부 파일을 통해 금지어 등을 설정할 수 있다. • Query – HTTP 인터페이스로 XML/XSLT, JSON, Python, Ruby 와 같은 응답 포멧 설정 – 쿼리와 필드 값에 근거한 Faceted Search 제공 – query로 검색 정렬을 정의 가능 – 용이한 검색 score 설정 – query에 특정 field에 대한 가중치 부여 가능 • Core – query handler와 확장 가능한 XML format – unique key field에 기반하여 중복 문서 탐지 • Caching – query 결과, 필터, 문서에 대한 캐시 설정 – 사용자 수준에서의 캐시 설정 지원 • Replication – rsync transport를 통해 효과적인 분산 색인 • Admin Interface – cache, update, query 상태를 알려줌. – Text Analyzer에 대한 디버거 제공 – 웹 쿼리 인터페이스 제공 2
  3. 3. Architecture
  4. 4. Overall Architecture 4
  5. 5. Component 5
  6. 6. High Availability 6
  7. 7. Replication 7
  8. 8. Configure
  9. 9. Schema.xml • Overall <schema> <types> … </types> <fields> … </fields> <uniqueKey /> <solrQueryParser /> <copyField /> <dynamicField /> </schema> 9
  10. 10. Schema.xml • Type <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0“ /> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt” /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true“ /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> </types> 10
  11. 11. Schema.xml • Fields <fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name=“release_dt" type="date" indexed="true" stored="true" /> <field name="title" type="text_general" indexed="true" stored="true" /> <field name=“content" type="text_general" indexed="true" stored="true" /> <field name=“text" type="text_general" indexed="true" stored="true" /> </fields> • uniqueKey – <uniqueKey>id</uniqueKey> • solrQueryParser – <solrQueryParser defaultOperator="OR"/> • copyField – <copyField source=“title" dest=“test"/> – <copyField source=“content" dest=“test"/> • dynamicField – <dynamicField name="*_dt" type=“date" indexed="true" stored="true"/> – <dynamicField name="*_text" type="string" indexed="true" stored="true"/> 11
  12. 12. Schema.xml • Example for bigram analyzer <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CJKBigramFilterFactory"/> </analyzer> </fieldType> • Dynamically Reload $curl „http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0‟ [예) $ curl http://localhost:8981/solr/admin/cores?action=RELOAD&core=news „] 12
  13. 13. Multi-Core
  14. 14. 설정 파일 1. solr 디렉토리에 solr.xml 설정파일 수정 <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" defaultCoreName=“core1"> <core name=“core1" instanceDir=“core_dir1" /> <core name=“core2" instanceDir=“core_dir2" /> </cores> </solr> 2. solr 디렉토리에 core의 홈 디렉토리 생성 - solr - core_dir1 - core_dir2 3. 생성한 각 디렉토리에 conf와 data 디렉토리를 생성한다.  data 경로는 solrconfig.xml에서 아래와 같은 부분에서 설정할 수 있다. <dataDir>${solr.data.dir:}</dataDir> - solr - core_dir1 - conf - data - core_dir2 - conf - data 14
  15. 15. Web Admin Interface
  16. 16. Web Admin Interface • Config, Schema, Distribution 정보 조회 • Query Interface • 각종 통계 – Caches: lookups, hits, hitratio, inserts, evictions, size – RequestHandlers: requests, errors – UpdateHandler: adds, deletes, commits, optimizes – IndexReader, open-time, index-version, numDocs, maxDocs • Analysis Debugger – 각 분석 단계에 대한 결과를 보여줌 – 쿼리와 색인에 대한 매치에 대한 정보를 보여줌 16
  17. 17. Solr Document
  18. 18. XML • Document <add> <delete> <doc> <id>05991</id> <field name="employeeId">05991</field> <id>06000</id> <field name="office">Bridgewater</field> <query>office:Bridgewater</query> <field name="skills">Perl</field> <query>office:Osaka</query> <field name="skills">Java</field> </delete> </doc> </add> • Indexing $ curl http://localhost:8983/solr/update?commit=true -H “Content-Type: text/xml” --data-binary ‘<add><doc><field name="id">testdoc</field></doc></add>’ • Update $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<add><doc boost="2.5“><field name="employeeId">05991</field> <field name="office" boost="2.0">Bridgewater</field> </doc> </add>’ • Commit $ curl http://localhost:8983/solr/update -H “Content-Type: text/xml” --data-binary ‘<commit waitFlush="false" waitSearcher="false"/>’ 18
  19. 19. Json • Document [ { "id" : "MyTestDocument", "title" : "This is just a test“ } ] • Indexing $ curl http://localhost:8983/solr/update/json -H Content-type:application/json -d [ { "id" : "MyTestDocument", "title" : "This is just a test" } ] • Update/Delete $ curl http://localhost:8983/solr/update/json -H Content-type:application/json -d { "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} }, "add": {"doc": {"id" : "TestDoc2", "title" : "another test“} }, “delete”: {"id" : "TestDoc1“ } }, “delete”: {“query" : “Test“, commitWithin:500 } }, } • Commit $ curl http://localhost:8983/solr/update?commit=true 19
  20. 20. CVS • Document [test.cvs] [test.cvs] fieldnames=id,,category fieldnames=id,title,category 100,”title”, ”This Value is "“food“”" 100,”title”, ”This Value is "“food“”" • Indexing $ curl http://localhost:8983/solr/update/csv --data-binary @test.csv -H Content-type:text/plain; charset=utf-8 • Example from Mysql Dump $ curl http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=&stream.file=/tmp/result.text‘ 20
  21. 21. Data Handler Interface
  22. 22. Full-import • 테스트 DB 구성 예 Create database solr; Grant alter, select, insert, update, delete on solr.* to solr@localhost identified by „solr‟; Create table maker ( mid int primary key auto_increment, name varchar(30) not null, lastmodified datetime ); Create table product ( id int primary key auto_increment, mid int not null, name varchar(30) not null, hname varchar(30) not null, lastmodified datetime ); Insert into maker(name, lastmodified) values(apple, 2012-05-11 17:00:00); Insert into maker(name, lastmodified) values(sony, 2012-05-11 17:00:00); Insert into maker(name, lastmodified) values(microsoft, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, iphone, 아이폰, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, ipod, 아아팟, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(1, ipad, 아이패드, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(2, walkman, 워크맨, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(2, vaio, 바이오, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(3, windowsxp, 윈도우xp, 2012-05-11 17:00:00); Insert into product(mid, name, hname, lastmodified) values(3, windowx7, 윈도우7, 2012-05-11 17:00:00); 22
  23. 23. Full-import • MYSQL Connection 설정 – Solrconfig.xml 파일에서 db 설정 파일을 지정한다. <requestHandler name="/dataimport“ class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler> – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver” url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" query="select id, mid, name from product"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <field column=“hname" name=“hname" /> <entity name="maker" query="select mid, name from maker where mid = ${product.mid}"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> 23
  24. 24. Full-import • 색인 설정 – Shema.xml 파일에서 검색 필드를 설정 <field name="pid" type="string" indexed="true" stored="true" required="true" /> <field name="mid" type="int" indexed="true" stored="true" multiValued="false“ /> <field name="pname" type="text" indexed="true" stored="true" multiValued="true“ /> <field name="mname" type="text" indexed="true" stored="true" multiValued="true“ /> …….. <defaultSearchField>pname</defaultSearchField> <defaultSearchField>mname</defaultSearchField> …….. <uniqueKey>pid</uniqueKey> …….. <copyField source="pname" dest="text"/> <copyField source="mname" dest="text"/> – Solr 실행 java -Dsolr.solr.home="./example-DIH/solr/" -jar start.jar – 색인 실행 http://localhost:8983/solr/db/dataimport?command=full-import 24
  25. 25. Delta-import • 테스트 DB 구성 예 Insert into maker(name, lastmodified) values(Samsung, 2012-05-14 14:00:00); Insert into maker(name, lastmodified) values(LG, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyS, 겔럭시S, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyA, 겔럭시A, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(4, GalaxyNote, 겔럭시노트, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(5, OptimusLTE, 옵티머스LTE, 2012-05-14 14:00:00); Insert into product(mid, name, hname, lastmodified) values(5, VegaLTE, 베가LTE, 2012-05-14 14:00:00); 25
  26. 26. Delta-import • MYSQL Connection 설정 – db-data-config.xml 파일에서 데이터에 대한 SQL문을 적용한다. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/solr" user="solr" password="solr" name="solr"/> <document> <entity name="product" pk="id“ query="select * from product“ deltaImportQuery="select * from product where id=${dataimporter.delta.id}“ deltaQuery="select id from product where lastmodified > ${dataimporter.last_index_time}"> <field column="id" name="pid" /> <field column="mid" name="mid" /> <field column="name" name="pname" /> <entity name="maker" pk="mid“ query="select mid from maker where mid=${product.mid}"> <field column="mid" name="mid" /> <field column="name" name="mname" /> </entity> </entity> </document> </dataConfig> – 색인 실행 http://localhost:8983/solr/db/dataimport?command=delta-import 26
  27. 27. Index
  28. 28. Index • 기존 데이터를 모두 지움 $ java -Durl=http://localhost:$port/solr/update/?commit=true -Ddata=args -jar $dir/post.jar "<delete><query>*:*</query></delete>" • 다음과 같이 post.jar 파일을 이용하여 색인함 $ java -Durl=http://localhost:8983/solr/core1/update/?commit=true -jar post.jar core1_data.xml $ java -Durl=http://localhost:8983/solr/core2/update/?commit=true -jar post.jar core1_data.xml ※ 주의 – 처음 색인 파일 생성시 <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> – 색인 파일 갱신시 <update> <doc> <field name="id">id1</field> <field name=“title“>title1</field> </doc> </update> 28
  29. 29. Search
  30. 30. Search Parameter Parameter Default Description q 검색 쿼리. 예) q=video 혹은 q=title:spiderman^10 text:spiderman start 0 검색된 결과 리스트에 대한 Offset rows 10 반환될 결과 문서 수 반환될 필드 (필드명은 comma로 구분) fl * 예) fl=*,score 혹은 fl=id, name qf 결과로써 제공받을 필드 지정. 예) q=superman&qf=title subject 오름/내림차순으로 검색할 필드 지정 sort 예) sort=inStock asc, price desc 혹은 sort=price asc wt Writer type. 예) wt=json 혹은 wt=xml 필터 쿼리 지정 (결과내 검색 기능) fq 예) q=video&fq=superman hl 하이라이트 필드 지정. 예) hl=true&hl.fl=name, description Faceted Search facet 예) facet=true&facet.field=cat facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] debugQuery 검색결과에 debug 결과를 추가해 보여줌 30
  31. 31. Query Examples • mission이나 impossible이 포함되고 releaseDate로 내림차순 검색 – q=mission impossible; releaseDate desc • mission을 포함하고actor에 cruise가 포함되지 않은 문서를 검색 – q=+mission –actor:cruise • mission impossible이 붙고, actor에 cruise가 포함되지 않은 문서 검색 – q=“mission impossible” –actor:cruise • title에 spiderman을 description의 spiderman보다 10의 가중치 부여 – q=title:spiderman^10 description:spiderman • description필드에서 spiderman과 movie가 10단어 이내의 문서 검색 – q=description:“spiderman movie”~10 • HDTV를 반드시 포함하고 weight이 40 이상인 문서를 검색 – q=+HDTV +weight:[40 TO *] • Wildcard queries • q=te?t • q=te*t • q=test* 31
  32. 32. Search Relevancy 32
  33. 33. Faceted Browsing 33
  34. 34. Autocomplete
  35. 35. Suggest • 설정 – Solrconfig.xml에 suggest 기능을 추가한다. <searchComponent name="suggest" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str> <str name="field">name_autocomplete</str> </lst> </searchComponent> <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.count">10</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler> 35
  36. 36. Suggest • 설정 – Shema.xml에 suggest 필드를 추가한다. <fieldType name="text_auto" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1” generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0” splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="name_autocomplete" type="text_auto" indexed="true" stored="true” multiValued="false" /> <copyField source="name" dest="name_autocomplete" /> • 검색 실행 (http://localhost:8983/solr/db/suggest?spellcheck.build=true) http://localhost:8983/solr/db/suggest?q=겔 http://localhost:8983/solr/db/suggest?q=윈도 36
  37. 37. Basic Dictionary- 동의어/불용어 사전-
  38. 38. 동의어 사전 • 항목 (synonyms.txt) Window => windowxp window7 window8 window 7, door • 테스트 쿼리 [Query: window 7] 38
  39. 39. 동의어 사전 • 테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 39
  40. 40. 불용어 사전 • 항목 (stopwords.txt) Window • 테스트 쿼리 [Query: window 7] • 테스트 쿼리 [Query: window] • 테스트 쿼리 [Query: door] 40

×