Apache Solr guía

¿Quiénes Somos? Expertos en sistemas de búsqueda, repositorios digitales y recomendación . Referencia Relevante : 24Symbols, BBVA o Biblioteca Nacional.

Índice ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

¿Qué es Apache Solr? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Componentes Principales ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Instalando Apache Solr: Requisitos ,[object Object],[object Object],[object Object]

Instalando Apache Solr: Despliegue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Configurando Apache Solr: Ficheros ,[object Object],[object Object],[object Object],[object Object]

Configurando Apache Solr: solr . xml < solr persistent ="true" sharedLib ="lib"> < cores adminPath ="/admin/cores"> < core name ="pdg" instanceDir ="/etc/solr/cores/pdg" /> <core name="geoeuskadi" instanceDir="/etc/solr/geoeuskadi" /> <core name="24symbols" instanceDir="/etc/solr/24symbols" /> <core name="cmt" instanceDir="/etc/cores/cmt" /> <core name="disofic" instanceDir="/etc/solr/disofic" /> <core name="cdl" instanceDir="/etc/solr/cdl" /> <core name="24sac" instanceDir="/etc/solr/24symbolsac" /> </cores> </solr>

Configurando Apache Solr: schema.xml < schema name="opensearch" version="1.1"> < types > <fieldType name="string" class="solr.StrField" sortMissingLast ="true" omitNorms ="true"/> < fieldType name="text" class="solr.TextField" positionIncrementGap ="100"> < analyzer type=" index "> < tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> < filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> </analyzer> <analyzer type=" query "> < tokenizer class="solr.StandardTokenizerFactory"/> < filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> </analyzer> </fieldType> </types> < fields > < field name="id" type ="string" stored ="true" indexed ="true"/> <field name="docName" type="text" stored="true" indexed="true"/> < dynamicField name=" fecha* " type="date" indexed="true" stored="true"/> <field name="buscable" type="text" stored="false" indexed="true" multiValued ="true" positionIncrementGap ="50"/> </fields> < copyField source="docName" dest="buscable"/> < uniqueKey >id</uniqueKey> < defaultSearchField >buscable</defaultSearchField> < solrQueryParser defaultOperator="AND"/> </schema>

Configurando Apache Solr: schema.xml ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Configurando Apache Solr: schema.xml ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Configurando Apache Solr: solrconfig.xml Propiedades del índice < dataDir >/var/solr/data</dataDir> < indexDefaults >    < ramBufferSizeMB >32</ramBufferSizeMB>  < maxFieldLength >10000</maxFieldLength> < writeLockTimeout >1000</writeLockTimeout> < commitLockTimeout >10000</commitLockTimeout> < lockType >native</lockType> </indexDefaults> < mainIndex > < useCompoundFile >false</useCompoundFile> < ramBufferSizeMB >32</ramBufferSizeMB> < mergeFactor >10</mergeFactor> </mainIndex>

Configurando Apache Solr: solrconfig.xml Configuración de consulta < query > < maxBooleanClauses >1024</maxBooleanClauses> < filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" /> < queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" /> < documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" /> < enableLazyFieldLoading >true</enableLazyFieldLoading> <cache name="myUserCache" class="solr.LRUCache" size="4096" initialSize="1024" autowarmCount="1024" regenerator="org.mycompany.mypackage.MyRegenerator" /> < queryResultWindowSize >20</queryResultWindowSize> < queryResultMaxDocsCached >200</queryResultMaxDocsCached> < listener event=" newSearcher " class="solr.QuerySenderListener"> <arr name="queries">  </arr> </listener> < useColdSearcher >false</useColdSearcher> < maxWarmingSearchers >2</maxWarmingSearchers> </query>

Configurando Apache Solr: solrconfig.xml Configuración de los RequestHandlers < requestHandler name=" /spell " class="solr.SearchHandler" lazy="true"> < lst name="defaults">  < str name ="spellcheck.onlyMorePopular"> false </str>  <str name="spellcheck.extendedResults">false</str>  <str name="spellcheck.count">1</str> </lst> < arr name=" last-components "> <str> spellcheck </str> </arr> </requestHandler> < searchComponent name=" spellcheck " class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">name</str> <str name="spellcheckIndexDir">./spellchecker</str> </lst> </searchComponent>

Modificando el índice: indexación ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Modificando el índice: indexación Formato de XML de indexación <?xml version="1.0" encoding="UTF-8"?> < add > < doc > < field name ="identificador">http://url.cualquiera.com/webclient/</field> <field name=" pid "> 1865905 </field> <field name="thumbnail">http://url.cualquiera.com/webclient/Delivery</field> <field name="autor">Castro, Adolfo de 1823-1898 </field> <field name="autor_facet">Castro, Adolfo de</field> <field name="autor_abreviada">Castro, Adolfo de-1823-1898-</field> <field name="autor_completa">Castro, Adolfo de-1823-1898-#</field> <field name="titulo">Cádiz en la Guerra de la Independencia : cuadro histórico</field> <field name="editor">Cádiz Revista Médica </field> <field name="materia">Cádiz Historia S.XIX </field> … </doc> ... </add>

Modificando el índice: borrado ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],<?xml version="1.0" encoding="UTF-8"?> < delete > < id >5c9e2d7e-7114-31cf-8582-56f11cefbcce</id> <id> d0b58756-8a8b-3de2-8327-55023cbdf16f </id> <id>4d9cb11a-2e2a-34e5-821d-5c7f77a41f8f</id> <id>fe722eac-d4cb-30f3-b2b5-b9e72630f523</id> ... </delete> <?xml version="1.0" encoding="UTF-8"?> < delete > < query > category: "Facturas" </query> </delete> Mediante consulta Mediante IDs

Modificando el índice: operaciones ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Modificando el índice: práctica ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: básicos ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: básicos curl http://host:8983/opensearch/select/?q=Pilar%20rojo&fl=title,contenttype&start=0&rows=10&wt=xml < response > <lst name=" responseHeader "> <int name=" status ">0</int> <int name=" QTime ">293</int> <lst name=" params "> <str name="wt">xml</str> <str name=" rows ">10</str> <str name=" start ">0</str> <str name=" q "> Pilar rojo </str> </lst> </lst> < result name="response" numFound ="2894" start ="0"> <doc>...</doc> <doc>...</doc> <doc>...</doc> < doc > <str name=" contenttype ">text/html; charset=utf-8</str> <str name=" title "> Pilar Rojo participa nos actos do Xacobeo 2010</str> </doc> <doc>...</doc> <doc>...</doc> </result> </response>

Consultando el índice: avanzada ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: faceting

Consultando el índice: faceting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: faceting <lst name=" facet_counts "> <lst name=" facet_queries "> <int name="l ocale:"es_ES" ">16684</int> <int name=" locale_americano ">844</int> </lst> <lst name=" facet_fields "> <lst name=" locale "> <int name="es_ES">16684</int> <int name="en_US">844</int> <int name="ca_ES">110</int> <int name="eu_ES">110</int> <int name="gl_ES">110</int> </lst> </lst> <lst name=" facet_dates "> <lst name=" fecha_resolucion "> <int name="2007-08-31T16:17:37.304Z">4</int> <int name="2008-08-31T16:17:37.304Z">41</int> <int name="2009-08-31T16:17:37.304Z">49</int> <str name="gap">+1YEARS</str> <date name="end">2010-08-31T16:17:37.304Z</date> </lst> </lst> </lst>

Consultando el índice: faceting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: highlighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: highlighting <lst name=" highlighting "> <lst name=" lp_es_ES2567024 "> <arr name=" titulo "> <str>La CMT abre expediente sancionador a Proyecto Atarfe</str> </arr> </lst> <lst name=" lp_es_ES2574813 "> <arr name=" asunto "> <str>ACUERDO PARA LA INSCRIPCIÓN DE AUTORIZACIÓN GENERAL EN EL EXPEDIENTE AUT-001/98 DE</str> </arr> <arr name=" expediente "> <str> Expediente AUT</str> </arr> </lst> <lst name=" lp_es_ES2627601 "> <arr name=" asunto "> <str>RESOLUCION DEL ARCHIVO DEL EXPEDIENTE DE LA ENTIDAD GOYA SERVICIOS</str> </arr> </lst> ... </lst>

Consultando el índice: spellchecking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Consultando el índice: spellchecking <lst name=" spellcheck "> <lst name=" suggestions "> <lst name=" ipoteca "> <int name="numFound">1</int> <int name="startOffset">0</int> <int name="endOffset">7</int> <arr name=" suggestion "> <str> hipoteca </str> </arr> </lst> <lst name=" vasura "> <int name="numFound">1</int> <int name="startOffset">8</int> <int name="endOffset">14</int> <arr name=" suggestion "> <str> basura </str> </arr> </lst> <str name=" collation "> hipoteca basura </str> </lst> </lst> /select?q= ipoteca%20vasura & spellcheck=true & spellcheck.collate=true & spellcheck.count=1

Análisis personalizado: CharFilterFactories ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Análisis personalizado: tokenizers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Análisis personalizado: analyzer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Análisis personalizado: analyzer <fieldtype name="teststop" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> < filter class="solr. StopFilterFactory " words =" stopwords.txt " ignoreCase ="true"/> </analyzer> </fieldtype> <fieldtype name="testedgengrams" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory"/> < filter class="solr. EdgeNGramFilterFactory " minGramSize ="2" maxGramSize ="15" side ="front"/> </analyzer> </fieldtype> <fieldtype name="testkeep" class="solr.TextField"> <analyzer> < filter class="solr. KeepWordFilterFactory " words=" keepwords.txt " ignoreCase ="true"/> </analyzer> </fieldtype> <fieldtype name="syn" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> < filter class="solr. SynonymFilterFactory " synonyms =" syn.txt " ignoreCase ="true" expand ="false"/> </analyzer> </fieldtype>

Análisis personalizado: testing

Clustering: replicación ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Alta Disponibilidad: replicación

Clustering: replicación ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name=" slave "> <!– Configurar aquí la URL del servidor maestro --> <str name=" masterUrl "> http://localhost:8080/solr/replication </str> <str name=" pollInterval "> 00:00:10 </str> </lst> </requestHandler>

Clustering: sharding ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mantenimiento OpenSearch: operaciones ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Apache Solr guía

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Apache Solr guía

Similar to Apache Solr guía (20)

More from EmpathyBroker

More from EmpathyBroker (15)

Recently uploaded

Recently uploaded (20)

Apache Solr guía