Infinispan,Lucene,Hibername OGM
Upcoming SlideShare
Loading in...5

Infinispan,Lucene,Hibername OGM



Sanne Grinovero - JBug Milano

Sanne Grinovero - JBug Milano



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Infinispan,Lucene,Hibername OGM Infinispan,Lucene,Hibername OGM Presentation Transcript

  • Padova, InfoCamere JBoss User Group 12 Aprile 2012
  • Chi sono?• Team Hibernate Sanne Grinovero– Hibernate Search Italiano, Olandese, Newcastle Red Hat: JBoss, Engineering– Hibernate OGM• Infinispan– Infinispan Core– Infinispan Query– JGroups• Apache Lucene
  • Infinispan• Cache distribuita• Datagrid scalabile e transazionale: performance estreme e cloud• NoSQL “DataBase”: key-value store– Come si interroga un data grid ? SELECT * FROM GRID View slide
  • Interrogareuna “Grid”Object v =cache.get(“c7”); View slide
  • Senza chiave, non puoiottenere il valore.
  • É pratico il soloaccesso per chiave?
  • Test sulla mia libreria• Dové Hibernate Search in Action?• Mi passi ISBN 978-1- 933988-17-7 ?• Prendi i libri su Gaudí ?
  • Come implementare queste funzioni su un Key/Value store?• Dové Hibernate Search in Action?• Mi passi ISBN 978-1-933988-17-7 ?• Trovi i libri su Gaudí ?
  • document based NoSQL: Map/ReduceInfinispan non é propriamente document based maoffre Map/Reduce.Eppure non é escluso luso di JSON, XML, YAML, Java: public class Book implements Serializable { final String title; final String author; final String editor; public Book(String title, String author, String editor) { this.title = title; = author; this.editor = editor; } }
  • Iterate & collectclass TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); }class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return; }}
  • Implementare queste semplici funzioni:✔ Trova “Hibernate Search in Action”?✔ Trova per codice “ISBN 978-1-933988-17-7” ?✗ Quanti libri a proposito di“Shakespeare” ? • Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus. • Il Pre-tagging é poco pratico e limitante
  • Apache Lucene• Progetto open source Apache™• Integrato in innumerevoli progetti• .. tra cui Hibernate via Hibernate Search• Clusterizzabile via Infinispan– Performance– Real time– High availability
  • Cosa offre Lucene?• Ricerche per Similarity score• Analisi del testo– Sinonyms, Stopwords, Stemming, ...• Reusable declarative Filters• TermVectors• MoreLikeThis• Faceted Search• Veloce!
  • Lucene: Stopwordsa, able, about, across, after, all, almost, also, am,among, an, and, any, are, as, at, be, because, been,but, by, can, cannot, could, dear, did, do, does,either, else, ever, every, for, from, get, got, had,has, have, he, her, hers, him, his, how, however, i, if,in, into, is, it, its, just, least, let, like, likely,may, me, might, most, must, my, neither, no, nor, not,of, off, often, on, only, or, other, our, own, rather,said, say, says, she, should, since, so, some, than,that, the, their, them, then, there, these, they, this,tis, to, too, twas, us, wants, was, we, were, what,when, where, which, while, who, whom, why, will, with,would, yet, you, your
  • Filters
  • Faceted Search
  • Facciamo un bel motore diricerca che restituisce irisultati in ordinealfabetico?
  • Chi usa Lucene?Nexus
  • Dové la fregatura?• Necessita di un indice: risorse fisiche e di amministrazione.– in memory– on filesystem– in Infinispan• Sostanzialmente immutable segments– Ottimizzato per data mining / query, non per updates.• Un mondo di stringhe e vettori di frequenze
  • Infinispan Query quickstart • Abilita indexing=true nella configurazione • Aggiungi il modulo infinispan- query.jar al classpath • Annota i POJO inseriti nella cache per le modalitá di indicizzazione <dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version> </dependency>
  • Configurazione tramite codiceConfiguration c = new Configuration() .fluent() .indexing() .addProperty("","ram") .build();CacheManager manager = new DefaultCacheManager(c);
  • Configurazione / XML<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="" xsi:schemaLocation="urn:infinispan:config:5.0" xmlns="urn:infinispan:config:5.0"><default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="" value="..." /> <property name="" value="..." /> </properties> </indexing></default>
  • Annotazioni sul modello@ProvidedId @Indexedpublic class Book implements Serializable { @Field String title; @Field String author; @Field String editor; public Book(String title, String author, String editor) { this.title = title; = author; this.editor = editor; }}
  • Esecuzione di QuerySearchManager sm = Search.getSearchManager(cache);Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery();List<Object> list = sm.getQuery(query).list();
  • Architettura• Integra Hibernate Search (engine)– Listener a eventi Hibernate & transazioni • Eventi Infinispan & transazioni– Mappa tipi Java e grafi del modello a Documents di Lucene– Thin-layer design
  • Index mapping
  • Tests per Infinispan Query
  • luceneQuery = queryBuilder.phrase() .onField( "description" ) .andField( "title" ) .sentence( "a book on highly scalable query engines" ) .enableFullTextFilter( “ready-for-shipping” ) .createQuery();CacheQuery cacheQuery = searchManager.getQuery( luceneQuery, Book.class);List<Book> objectList = cacheQuery.list();
  • Architettura: Infinispan Query
  • Problemi di scalabilitá• Writer locks globali• Sharing su NFS molto problematico
  • Queue-based clustering (filesystem)
  • Index stored in Infinispan
  • Quickstart Hibernate Search • Aggiungi la dipendenza ad hibernate- search:<dependency>   <groupId>org.hibernate</groupId>   <artifactId>hibernate­search­orm</artifactId>   <version>4.1.0.Final</version></dependency>
  • Quickstart Hibernate Search• Tutto il resto é opzionale:– Come gestire gli indici– Moduli di estensione, Analyzer custom– Performance tuning– Mapping custom dei tipi– Clustering • JGroups • Infinispan • JMS
  • Quickstart Hibernate@Entity Searchpublic class Essay {   @Id   public Long getId() { return id; }   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  • Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  • Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  • Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  • Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne @IndexedEmbedded    public Author getAuthor() { return author; }...
  • Un secondo esempio@Entity @Entitypublic class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; private String title; private String name; } @OneToMany private Set<Book>books;}
  • Struttura dellindice@Entity @Indexed @Entitypublic class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; @Field(store=Store.YES) private String title;@Field(store=Store.YES) } private String name; @OneToMany @IndexedEmbedded private Set<Book>books;}
  • QueryString[] productFields = {"summary", ""};Query luceneQuery = // query builder or any Lucene QueryFullTextEntityManager ftEm =   Search.getFullTextEntityManager(entityManager);FullTextQuery query =   ftEm.createFullTextQuery( luceneQuery, Product.class );List<Product> items =   query.setMaxResults(100).getResultList();int totalNbrOfResults = query.getResultSize(); TotalNbrOfResults= 8.320.000 (0.002 seconds)
  • Uso della DSL
  • Sui risultati:• Managed POJO: modifiche alle entitá applicati sia a Lucene che al database• Paginazione JPA, familiari (standard):– .setMaxResults( 20 ).setFirstResult( 100 );• Restrizioni sul tipo, query fulltext polimorifiche:– .createQuery( luceneQuery, A.class, B.class, ..);• Projection• Result mapping
  • FiltersFullTextQuery ftQuery = s // s is a FullTextSession   .createFullTextQuery( query, Product.class )   .enableFullTextFilter( "filtroMinori" )   .enableFullTextFilter( "offertaDelGiorno" )      .setParameter( "day", “20120412” )   .enableFullTextFilter( "inStockA" )      .setParameter( "location", "Padova" );List<Product> results = ftQuery.list();
  • Uso di Infinispan per ladistribuzione degli indici
  • Clustering di un uso Lucene “diretto”• Usando org.apache.lucene– Tradizionalmente difficile da distribuire su nodi multipli– Su qualsiasi cloud
  • Nodo singolo idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectoryInfinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  • Nodi multipli idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectoryInfinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  • Le scritture non scalano?
  • Suggerimenti per performance ottimali• Calibra il chunk_size per luso effettivo del vostro indice (evita i read lock evitando la frammentazione)• Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware.• Scegli e configura un CacheLoader adatto
  • Requisiti di memoria• RAMDirectory: tutto lindice (e piú) in RAM.• FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory.• Infinispan: configurabile, fino alla memoria condivisa tra nodi– Flexible– Fast– Network vs. disk
  • Moduli per cloud deployment scalabili One Infinispan to rule them all– Store Lucene indexes– Hibernate second level cache– Application managed cache– Datagrid– EJB, session replication in AS7– As a JPA “store” via Hibernate OGM
  • Ingredienti per la cloud• JGroups DISCOVERY protocol– MPING– TCP_PING– JDBC_PING– S3_PING• Scegli un CacheLoader– Database based, Jclouds, Cassandra, ...
  • Futuro prossimo• Semplificare la scalabilitá in scrittura• Auto-tuning dei parametri di clustering – ergonomics!• Parallel searching: multi/core + multi/node• A component of–
  • JPA for NoSQL
  • NoSQL: la flessibilitá costa• Programming model • one per product :-(• no schema => app driven schema• query (Map Reduce, specific DSL, ...)• data structure transpires• Transaction• durability / consistency
  • Esempio: InfinispanDistributed Key/Value store (or Replicated, local only efficient cache, • invalidating cache)Each node is equal Just start more nodes, or kill some •No bottlenecks by design •Cloud-network friendly JGroups • And “cloud storage” friendly too! •
  • ABC di Infinispanmap.put( “user-34”,userInstance );map.get( “user-34” );map.remove( “user-34” );
  • É una ConcurrentMap !map.put( “user-34”, userInstance );map.get( “user-34” );map.remove( “user-34” );map.putIfAbsent( “user-38”,another );
  • Qualche altro dettaglio su Infinispan ● Support for Transactions (XA) ● CacheLoaders ● Cassandra, JDBC, Amazon S3 (jclouds),... ● Tree API for JBossCache compatibility ● Lucene integration ● Two-fold ● Some Hibernate integrations ● Second level cache ● Hibernate Search indexing backend
  • Obiettivi di Hibernate OGM Encourage new data usage patterns • Familiar environment • Ease of use • easy to jump in • easy to jump out • Push NoSQL exploration in enterprises • “PaaS for existing API” initiative •
  • Cosé• JPA front end to key/value stores • Object CRUD (incl polymorphism and associations) • OO queries (JP-QL)• Reuses • Hibernate Core • Hibernate Search (and Lucene) • Infinispan• Is not a silver bullet • not for all NoSQL use cases
  • Entitá come blob serializzati?• Serialize objects into the (key) value • store the whole graph?• maintain consistency with duplicated objects • guaranteed identity a == b • concurrency / latency • structure change and (de)serialization, class definition changes
  • OGM’s approach to schema• Keep what’s best from relational model • as much as possible • tables / columns / pks• Decorrelate object structure from data structure• Data stored as (self-described) tuples• Core types limited • portability
  • Query• Hibernate Search indexes entities• Store Lucene indexes in Infinispan• JP-QL to Lucene query transformation• Works for simple queries • Lucene is not a relational SQL engine
  • E ora?• MongoDB• EHCache / Terracotta• Redis• Voldemort• Neo4J• Dynamo• ... Git? Spreadsheet? ...CapeDwarf?
  • Q&A @Infinispan @Hibernate @SanneGrinovero