Infinispan,Lucene,Hibername OGM

Padova, InfoCamere
JBoss User Group
12 Aprile 2012

Chi sono?
• Team Hibernate Sanne Grinovero
– Hibernate Search Italiano, Olandese, Newcastle
Red Hat: JBoss, Engineering
– Hibernate OGM

• Infinispan
– Infinispan Core
– Infinispan Query
– JGroups

• Apache Lucene

Infinispan
• Cache distribuita
• Datagrid scalabile e transazionale:
performance estreme e cloud
• NoSQL “DataBase”: key-value store
– Come si interroga un data grid ?

SELECT * FROM GRID

Interrogare
una “Grid”

Object v =

cache.get(“c7”);

Senza chiave, non puoi
ottenere il valore.

É pratico il solo
accesso per chiave?

Test sulla mia
libreria
• Dov'é Hibernate
Search in Action?
• Mi passi
ISBN 978-1-
933988-17-7 ?
• Prendi i libri su
Gaudí ?

Come implementare
queste funzioni su un
Key/Value store?
• Dov'é Hibernate Search in Action?
• Mi passi ISBN 978-1-933988-17-7 ?
• Trovi i libri su Gaudí ?

document based NoSQL:
Map/Reduce
Infinispan non é propriamente document based ma
offre Map/Reduce.
Eppure non é escluso l'uso di JSON, XML, YAML, Java:
public class Book implements Serializable {

final String title;
final String author;
final String editor;

public Book(String title, String author, String editor) {
this.title = title;
this.author = author;
this.editor = editor;
}

}

Iterate & collect
class TitleBookSearcher implements
Mapper<String, Book, String, Book> {
final String title;
public TitleBookSearcher(String t) { title = t; }
public void map(String key, Book value, Collector collector){
if ( title.equals( value.title ) )
collector.emit( key, value );
}

class BookReducer implements
Reducer<String, Book> {
public Book reduce(String reducedKey, Iterator<Book> iter) {
return iter.next();
}
}

Implementare queste
semplici funzioni:
✔ Trova “Hibernate Search in Action”?
✔ Trova per codice “ISBN 978-1-933988-17-7” ?
✗ Quanti libri a proposito di
“Shakespeare” ?
• Per uno score corretto in ricerche fulltext
servono le frequenze dei frammenti di
testo relative al corpus.
• Il Pre-tagging é poco pratico e limitante

Apache Lucene
• Progetto open source Apache™

• Integrato in innumerevoli progetti

• .. tra cui Hibernate via Hibernate Search

• Clusterizzabile via Infinispan
– Performance
– Real time
– High availability

Cosa offre Lucene?
• Ricerche per Similarity score

• Analisi del testo
– Sinonyms, Stopwords, Stemming, ...

• Reusable declarative Filters

• TermVectors

• MoreLikeThis

• Faceted Search

• Veloce!

Lucene: Stopwords
a, able, about, across, after, all, almost, also, am,
among, an, and, any, are, as, at, be, because, been,
but, by, can, cannot, could, dear, did, do, does,
either, else, ever, every, for, from, get, got, had,
has, have, he, her, hers, him, his, how, however, i, if,
in, into, is, it, its, just, least, let, like, likely,
may, me, might, most, must, my, neither, no, nor, not,
of, off, often, on, only, or, other, our, own, rather,
said, say, says, she, should, since, so, some, than,
that, the, their, them, then, there, these, they, this,
tis, to, too, twas, us, wants, was, we, were, what,
when, where, which, while, who, whom, why, will, with,
would, yet, you, your

Facciamo un bel motore di
ricerca che restituisce i
risultati in ordine
alfabetico?

Dov'é la fregatura?
• Necessita di un indice: risorse fisiche e di
amministrazione.
– in memory
– on filesystem
– in Infinispan

• Sostanzialmente immutable segments
– Ottimizzato per data mining / query, non per
updates.

• Un mondo di stringhe e vettori di frequenze

Infinispan Query quickstart
• Abilita indexing=true nella
configurazione
• Aggiungi il modulo infinispan-
query.jar al classpath
• Annota i POJO inseriti nella cache
per le modalitá di indicizzazione
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-query</artifactId>
<version>5.1.3.FINAL</version>
</dependency>

Configurazione tramite
codice
Configuration c = new Configuration()
.fluent()
.indexing()
.addProperty(
"hibernate.search.default.directory_provider",
"ram")
.build();

CacheManager manager = new DefaultCacheManager(c);

Configurazione / XML
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.0
http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
xmlns="urn:infinispan:config:5.0">
<default>
<indexing enabled="true" indexLocalOnly="true">
<properties>
<property name="hibernate.search.option1" value="..." />
<property name="hibernate.search.option2" value="..." />
</properties>
</indexing>
</default>

Annotazioni sul modello
@ProvidedId @Indexed
public class Book implements Serializable {

@Field String title;
@Field String author;
@Field String editor;

public Book(String title, String author, String editor) {
this.title = title;
this.author = author;
this.editor = editor;
}

}

Esecuzione di Query
SearchManager sm = Search.getSearchManager(cache);

Query query = sm.buildQueryBuilderForClass(Book.class)
.get()
.phrase()
.onField("title")
.sentence("in action")
.createQuery();

List<Object> list = sm.getQuery(query).list();

Architettura
• Integra Hibernate Search (engine)
– Listener a eventi Hibernate &
transazioni
• Eventi Infinispan & transazioni
– Mappa tipi Java e grafi del modello a
Documents di Lucene
– Thin-layer design

Tests per
Infinispan Query
https://github.com/infinispan/infinispan

org.apache.lucene.search.Query luceneQuery =
queryBuilder.phrase()
.onField( "description" )
.andField( "title" )
.sentence( "a book on highly scalable query engines" )
.enableFullTextFilter( “ready-for-shipping” )
.createQuery();

CacheQuery cacheQuery =
searchManager.getQuery( luceneQuery, Book.class);
List<Book> objectList = cacheQuery.list();

Architettura: Infinispan
Query

Problemi di scalabilitá
• Writer locks globali
• Sharing su NFS
molto problematico

Queue-based clustering
(filesystem)

Quickstart Hibernate
Search
• Aggiungi la dipendenza ad hibernate-
search:
<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernatesearchorm</artifactId>
   <version>4.1.0.Final</version>
</dependency>

Search
• Tutto il resto é opzionale:
– Come gestire gli indici
– Moduli di estensione, Analyzer custom
– Performance tuning
– Mapping custom dei tipi
– Clustering
• JGroups
• Infinispan
• JMS

@Entity
Search
public class Essay {
   @Id
   public Long getId() { return id; }

   public String getSummary() { return
summary; }
   @Lob
   public String getText() { return text; }
   @ManyToOne
   public Author getAuthor() { return
author; }
...

@Entity @Indexed
Search
   @Id

summary; }
   @Lob
   @ManyToOne
author; }
...

@Entity @Indexed
Search
   @Id
   @Field
summary; }
   @Lob
   @ManyToOne
author; }
...

@Entity @Indexed
Search
   @Id
   @Field
summary; }
   @Lob @Field @Boost(0.8)
   @ManyToOne
author; }
...

@Entity @Indexed
Search
   @Id
   @Field
summary; }
   @Lob @Field @Boost(0.8)
   @ManyToOne @IndexedEmbedded
author; }
...

Un secondo esempio
@Entity @Entity
public class Author { public class Book {
@Id @GeneratedValue private Integer id;
private Integer id; private String title;
private String name; }
@OneToMany
private Set<Book>
books;
}

Struttura dell'indice
@Entity @Indexed @Entity
public class Author { public class Book {
@Id @GeneratedValue private Integer id;
private Integer id; @Field(store=Store.YES)
private String title;
@Field(store=Store.YES) }
private String name;
@OneToMany
@IndexedEmbedded
private Set<Book>
books;
}

Query
String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene
Query

FullTextEntityManager ftEm =
   Search.getFullTextEntityManager(entityManager);

FullTextQuery query =
   ftEm.createFullTextQuery( luceneQuery,
Product.class );

List<Product> items =
   query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();
TotalNbrOfResults= 8.320.000
(0.002 seconds)

Sui risultati:
• Managed POJO: modifiche alle entitá applicati sia a
Lucene che al database

• Paginazione JPA, familiari (standard):
– .setMaxResults( 20 ).setFirstResult( 100 );

• Restrizioni sul tipo, query fulltext polimorifiche:
– .createQuery( luceneQuery, A.class, B.class, ..);

• Projection

• Result mapping

Filters
FullTextQuery ftQuery = s // s is a FullTextSession

   .createFullTextQuery( query, Product.class )

   .enableFullTextFilter( "filtroMinori" )

   .enableFullTextFilter( "offertaDelGiorno" )

      .setParameter( "day", “20120412” )

   .enableFullTextFilter( "inStockA" )

      .setParameter( "location", "Padova" );

List<Product> results = ftQuery.list();

Uso di Infinispan per la
distribuzione degli indici

Clustering di un uso
Lucene “diretto”

• Usando org.apache.lucene
– Tradizionalmente difficile da distribuire
su nodi multipli
– Su qualsiasi cloud

Nodo singolo
idea di performance
Write ops/sec Queries/sec

RAMDirectory RAMDirectory

Infinispan 0 Infinispan 0

queries per second
Infinispan D4 Infinispan D4


FSDirectory FSDirectory

Infinispan Local Infinispan Local

0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000

Nodi multipli
idea di performance
Write ops/sec Queries/sec

RAMDirectory RAMDirectory

Infinispan 0 Infinispan 0

queries per second


FSDirectory FSDirectory

Infinispan Local Infinispan Local

0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000

Suggerimenti per
performance ottimali
• Calibra il chunk_size per l'uso effettivo
del vostro indice (evita i read lock
evitando la frammentazione)
• Verifica la dimensione dei pacchetti
network: blob size, JGroups packets,
network interface and hardware.
• Scegli e configura un CacheLoader
adatto

Requisiti di memoria
• RAMDirectory: tutto l'indice (e piú) in RAM.

• FSDirectory: un buon OS sa fare un ottimo
lavoro di caching di IO – spesso meglio di
RAMDirectory.

• Infinispan: configurabile, fino alla memoria
condivisa tra nodi
– Flexible
– Fast
– Network vs. disk

Moduli per cloud
deployment scalabili
One Infinispan to rule them all
– Store Lucene indexes
– Hibernate second level cache
– Application managed cache
– Datagrid
– EJB, session replication in AS7
– As a JPA “store” via Hibernate OGM

Ingredienti per la cloud
• JGroups DISCOVERY protocol
– MPING
– TCP_PING
– JDBC_PING
– S3_PING

• Scegli un CacheLoader
– Database based, Jclouds,
Cassandra, ...

Futuro prossimo
• Semplificare la scalabilitá in scrittura
• Auto-tuning dei parametri di
clustering – ergonomics!

• Parallel searching: multi/core +
multi/node

• A component of
– http://www.cloudtm.eu

NoSQL:
la flessibilitá costa
• Programming model
• one per product :-(
• no schema => app driven schema
• query (Map Reduce, specific DSL, ...)
• data structure transpires
• Transaction
• durability / consistency

Esempio: Infinispan
Distributed Key/Value store
(or Replicated, local only efficient cache,
•
invalidating cache)
Each node is equal
Just start more nodes, or kill some
•
No bottlenecks
by design
•
Cloud-network friendly
JGroups
•
And “cloud storage” friendly too!
•

ABC di Infinispan

map.put( “user-34”,
userInstance );

map.get( “user-34” );

map.remove( “user-34” );

É una ConcurrentMap !
map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

map.putIfAbsent( “user-38”,
another );

Qualche altro dettaglio su
Infinispan
● Support for Transactions (XA)
● CacheLoaders
● Cassandra, JDBC, Amazon S3 (jclouds),...
● Tree API for JBossCache compatibility
● Lucene integration

● Two-fold

● Some Hibernate integrations

● Second level cache

● Hibernate Search indexing backend

Obiettivi di Hibernate
OGM

Encourage new data usage patterns
•
Familiar environment
•
Ease of use
•
easy to jump in
•
easy to jump out
•
Push NoSQL exploration in enterprises
•
“PaaS for existing API” initiative
•

Cos'é

• JPA front end to key/value stores
• Object CRUD (incl polymorphism and
associations)
• OO queries (JP-QL)
• Reuses
• Hibernate Core
• Hibernate Search (and Lucene)
• Infinispan
• Is not a silver bullet
• not for all NoSQL use cases

Entitá come blob
serializzati?
• Serialize objects into the (key) value
• store the whole graph?

• maintain consistency with duplicated
objects
• guaranteed identity a == b
• concurrency / latency
• structure change and (de)serialization,
class definition changes

OGM’s approach to
schema
• Keep what’s best from relational model
• as much as possible
• tables / columns / pks
• Decorrelate object structure from data
structure
• Data stored as (self-described) tuples
• Core types limited
• portability

Query

• Hibernate Search indexes entities
• Store Lucene indexes in Infinispan
• JP-QL to Lucene query transformation

• Works for simple queries
• Lucene is not a relational SQL engine

E ora?

• MongoDB
• EHCache / Terracotta
• Redis
• Voldemort
• Neo4J
• Dynamo
• ... Git? Spreadsheet? ...CapeDwarf?

Q&A

http://infinispan.org @Infinispan
http://in.relation.to @Hibernate
http://jboss.org @SanneGrinovero

Infinispan,Lucene,Hibername OGM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Infinispan,Lucene,Hibername OGM

Similar to Infinispan,Lucene,Hibername OGM (20)

More from JBug Italy

More from JBug Italy (20)

Recently uploaded

Recently uploaded (20)

Infinispan,Lucene,Hibername OGM