Integrare Apache Solr in TYPO3

Milano 14/15 Marzo 2014
Ricerche evolute
mediante Apache Solr
Mauro Lorenzutti
T3Camp Italia
Il quarto evento italiano dedicato a TYPO3

Mauro Lorenzutti
CTO di Webformat
Sviluppatore e consulente TYPO3 dal 2004
TYPO3 Certified Integrator dal 2009
Sviluppatore di numerose estensioni, tra cui:
 DB Integration (wfqbe)
 Webformat Shop System (extendedshop)
 TYPO3-Alfresco Connector
 TYPO3-Magento Connector
Speaker in varie conferenze:
T3DD07
T3CON07
T3CON09US
Pag. 2Mauro Lorenzutti: Ricerche evolute mediante Apache Solr
T3CON13DE
MageDay
4 x T3CampItalia ;-)

Di cosa parleremo
La ricerca integrata di TYPO3
Indexed Search Engine
Introduzione ad Apache Solr
Integrare Solr in TYPO3

50 slide…
Tutti pronti?

La ricerca nei siti
Molto spesso trascurata e sottovalutata
Ma…
Se ho un database con >100k eventi?
Se ho un catalogo con 20k prodotti strutturato in
categorie e con caratteristiche sulla base delle quali
vorrei che gli utenti potessero filtrare i prodotti?
…

La ricerca standard di TYPO3

Un contenuto standard

Le funzionalità della ricerca standard
Consente all’utente di scegliere se cercare nelle keyword
della pagina o nel contenuto
Consente di configurare più tabelle via typoscript
Output configurabile via typoscript
La ricerca si basa sul operatore “LIKE %...%”

Pro & Contro della ricerca standard
+ Integrata
+ Facile da configurare
- Funzionalità di base
- Cerca solo nel database
- LIKE %...%

Indexed Search Engine

Un’estensione di sistema

Indice

Statistiche

Le funzionalità della ricerca standard
Consente di indicizzare pagine, record nel database,
documenti, immagini, url esterni, ecc.
Costruisce un indice di parole chiave per ciascun
contenuto ed esegue la ricerca su questo
Fornisce una percentuale di pertinenza del risultato
I contenuti vengono indicizzati contestualmente alla prima
visualizzazione
È disponibile un crawler per indicizzare massivamente il
sito

Pro & Contro di Indexed search engine
+ Integrata
+ Indicizza pagine, record, file, url, …
+ Ordinamento per rilevanza
+ Statistiche di ricerca
+ Molte estensioni disponibili
- Configurazione non immediata
- Può rallentare la navigazione
- Solo le pagine in cache vengono
indicizzate
- Indice salvato nel db (problemi di
prestazioni)

Apache Solr?
• SolrTM is the popular, blazing fast open source enterprise
search platform from the Apache LuceneTM project. Its major
features include powerful full-text search, hit highlighting,
faceted search, near real-time indexing, dynamic clustering,
database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly reliable,
scalable and fault tolerant, providing distributed indexing,
replication and load-balanced querying, automated failover
and recovery, centralized configuration and more. Solr powers
the search and navigation features of many of the world's
largest internet sites.
https://lucene.apache.org/solr/

Download
Download:
http://www.apache.org/dyn/closer.cgi/lucene/solr/4.7.0
Apache License, versione 2.0
http://www.apache.org/licenses/LICENSE-2.0
Requisiti:
Java 1.6
Un application server (Tomcat, Jetty, JBoss, ecc.)

Ma cos’è Apache Solr?
Solr è un “enterprise search server”
Espone delle API REST per l’interazione
Vi si possono caricare documenti (ovvero qualsiasi tipo di
contenuto) via XML, JSON e CSV tramite chiamate HTTP
Si possono cercare e scaricare documenti in formato XML,
JSON e CSV tramite chiamate HTTP
Un database NoSQL?

Funzionalità principali
Ricerche Full-text
Faceted navigation
Spellchecking: “Did you mean…”
Raccomandazioni: “More like this”
Indicizzazione di documenti (PDF, DOC, ecc.)
Gestione dei sinonimi e delle stopword
Ricerche geospaziali
Ottimizzato per elevato traffico e mole di dati
Estendibile mediante plugin

Schemaless
Permette di definire dinamicamente i campi che descrivono
il documento

Interfaccia di amministrazione

Come integrarlo in TYPO3?
Due possibilità:
Apache Solr for TYPO3 (ext: solr)
DIY (ext: arrangiati)

Apache Solr for TYPO3

DIY
Sono disponibili diversi client PHP, ad es.:
http://www.solarium-project.org/
https://code.google.com/p/solr-php-client/
Due soluzioni “quick and dirty”:
cURL
file_get_contents()

DIY
Costruiamo un esempio
molto semplice
Indicizziamo le news del
nostro sito
Implementiamo una
funzionalità di ricerca
Costruiamo dei filtri a
faccette

Configuriamo il core
core.properties
name=news
config=solrconfig.xml
schema=schema.xml
loadOnStartup=true

solrconfig.xml
[…]
<requestHandler name="/newsImporter"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="update.chain">data-extraction</str>
<str name="config">newsImporter.xml</str>
</lst>
</requestHandler>
[…]

newsImporter.xml
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/typo3_61?user=typo3_61&password=typo3_61"
batchSize="1" />
<document name="news_document">
<entity name="news" pk="uid" transformer="script:GenerateId" query="SELECT
uid, pid, title, teaser, bodytext, keywords, author FROM
tx_news_domain_model_news WHERE deleted=0 AND hidden=0">
</entity>
</document>
</dataConfig>

schema.xml 1/3
<schema name="typo3_news" version="1.1">
<types>
<fieldtype name="string" class="solr.StrField" />
<fieldType name="long" class="solr.TrieLongField" />
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mappa-
accenti.txt"/>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
</types>

schema.xml 2/3
<fields>
<field name="uid" type="long" indexed="true" stored="true" multiValued="false"
required="true" />
<field name="title" type="text_general" indexed="true" stored="true"
multiValued="false" />
<field name="teaser" type="text_general" indexed="true" stored="true"
<field name="bodytext" type="text_general" indexed="true" stored="true"
<field name="author" type="string" indexed="true" stored="true" multiValued="false"
/>
<field name="keywords" type="string" indexed="true" stored="true"

schema.xml 3/3
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />
<field name="fullText" type="text_general" indexed="true" stored="true"
multiValued="true" />
<copyField source="title" dest="fullText" />
<copyField source="teaser" dest="fullText" />
<copyField source="bodytext" dest="fullText" />
<copyField source="author" dest="fullText" />
<copyField source="keywords" dest="fullText" />
</fields>
<defaultSearchField>fullText</defaultSearchField>
<solrQueryParser defaultOperator="OR" />
<uniqueKey>uid</uniqueKey>
</schema>

DataImport

Interroghiamo
l’indice
http://localhost:8983/
solr/news/select?
q=fullText:celebr*
&wt=json
&indent=true
&facet=true
&facet.field=author
&facet.sort=index
Pag. 40

Facciamolo da TYPO3

Setup.txt

NewsSolrController.php

SolrService.php 1/2

SolrService.php 2/2

Search.html

Un dettaglio:

Pro & Contro di Apache Solr
+ Tante tante tante tante funzionalità!
+ Adatto a gestire elevati volumi di dati
+ Molto veloce
- Non integrato in TYPO3
- Impegnativo da configurare
- Richiede Java
- C’è molto da studiare…
- Può creare dipendenza 

Alcuni libri utili

Q & A
mauro.lorenzutti@webformat.com
http://it.linkedin.com/in/maurolorenzutti
https://twitter.com/MauroLorenzutti
http://www.slideshare.net/mauro.lorenzutti

Integrare Apache Solr in TYPO3

Recommended

Recommended

More Related Content

Similar to Integrare Apache Solr in TYPO3

Similar to Integrare Apache Solr in TYPO3 (20)

More from Mauro Lorenzutti

More from Mauro Lorenzutti (10)

Integrare Apache Solr in TYPO3