Infinispan,Lucene,Hibername OGM


Published on

Sanne Grinovero - JBug Milano

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Infinispan,Lucene,Hibername OGM

  1. 1. Padova, InfoCamere JBoss User Group 12 Aprile 2012
  2. 2. Chi sono?• Team Hibernate Sanne Grinovero– Hibernate Search Italiano, Olandese, Newcastle Red Hat: JBoss, Engineering– Hibernate OGM• Infinispan– Infinispan Core– Infinispan Query– JGroups• Apache Lucene
  3. 3. Infinispan• Cache distribuita• Datagrid scalabile e transazionale: performance estreme e cloud• NoSQL “DataBase”: key-value store– Come si interroga un data grid ? SELECT * FROM GRID
  4. 4. Interrogareuna “Grid”Object v =cache.get(“c7”);
  5. 5. Senza chiave, non puoiottenere il valore.
  6. 6. É pratico il soloaccesso per chiave?
  7. 7. Test sulla mia libreria• Dové Hibernate Search in Action?• Mi passi ISBN 978-1- 933988-17-7 ?• Prendi i libri su Gaudí ?
  8. 8. Come implementare queste funzioni su un Key/Value store?• Dové Hibernate Search in Action?• Mi passi ISBN 978-1-933988-17-7 ?• Trovi i libri su Gaudí ?
  9. 9. document based NoSQL: Map/ReduceInfinispan non é propriamente document based maoffre Map/Reduce.Eppure non é escluso luso di JSON, XML, YAML, Java: public class Book implements Serializable { final String title; final String author; final String editor; public Book(String title, String author, String editor) { this.title = title; = author; this.editor = editor; } }
  10. 10. Iterate & collectclass TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); }class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return; }}
  11. 11. Implementare queste semplici funzioni:✔ Trova “Hibernate Search in Action”?✔ Trova per codice “ISBN 978-1-933988-17-7” ?✗ Quanti libri a proposito di“Shakespeare” ? • Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus. • Il Pre-tagging é poco pratico e limitante
  12. 12. Apache Lucene• Progetto open source Apache™• Integrato in innumerevoli progetti• .. tra cui Hibernate via Hibernate Search• Clusterizzabile via Infinispan– Performance– Real time– High availability
  13. 13. Cosa offre Lucene?• Ricerche per Similarity score• Analisi del testo– Sinonyms, Stopwords, Stemming, ...• Reusable declarative Filters• TermVectors• MoreLikeThis• Faceted Search• Veloce!
  14. 14. Lucene: Stopwordsa, able, about, across, after, all, almost, also, am,among, an, and, any, are, as, at, be, because, been,but, by, can, cannot, could, dear, did, do, does,either, else, ever, every, for, from, get, got, had,has, have, he, her, hers, him, his, how, however, i, if,in, into, is, it, its, just, least, let, like, likely,may, me, might, most, must, my, neither, no, nor, not,of, off, often, on, only, or, other, our, own, rather,said, say, says, she, should, since, so, some, than,that, the, their, them, then, there, these, they, this,tis, to, too, twas, us, wants, was, we, were, what,when, where, which, while, who, whom, why, will, with,would, yet, you, your
  15. 15. Filters
  16. 16. Faceted Search
  17. 17. Facciamo un bel motore diricerca che restituisce irisultati in ordinealfabetico?
  18. 18. Chi usa Lucene?Nexus
  19. 19. Dové la fregatura?• Necessita di un indice: risorse fisiche e di amministrazione.– in memory– on filesystem– in Infinispan• Sostanzialmente immutable segments– Ottimizzato per data mining / query, non per updates.• Un mondo di stringhe e vettori di frequenze
  20. 20. Infinispan Query quickstart • Abilita indexing=true nella configurazione • Aggiungi il modulo infinispan- query.jar al classpath • Annota i POJO inseriti nella cache per le modalitá di indicizzazione <dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version> </dependency>
  21. 21. Configurazione tramite codiceConfiguration c = new Configuration() .fluent() .indexing() .addProperty("","ram") .build();CacheManager manager = new DefaultCacheManager(c);
  22. 22. Configurazione / XML<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="" xsi:schemaLocation="urn:infinispan:config:5.0" xmlns="urn:infinispan:config:5.0"><default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="" value="..." /> <property name="" value="..." /> </properties> </indexing></default>
  23. 23. Annotazioni sul modello@ProvidedId @Indexedpublic class Book implements Serializable { @Field String title; @Field String author; @Field String editor; public Book(String title, String author, String editor) { this.title = title; = author; this.editor = editor; }}
  24. 24. Esecuzione di QuerySearchManager sm = Search.getSearchManager(cache);Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery();List<Object> list = sm.getQuery(query).list();
  25. 25. Architettura• Integra Hibernate Search (engine)– Listener a eventi Hibernate & transazioni • Eventi Infinispan & transazioni– Mappa tipi Java e grafi del modello a Documents di Lucene– Thin-layer design
  26. 26. Index mapping
  27. 27. Tests per Infinispan Query
  28. 28. luceneQuery = queryBuilder.phrase() .onField( "description" ) .andField( "title" ) .sentence( "a book on highly scalable query engines" ) .enableFullTextFilter( “ready-for-shipping” ) .createQuery();CacheQuery cacheQuery = searchManager.getQuery( luceneQuery, Book.class);List<Book> objectList = cacheQuery.list();
  29. 29. Architettura: Infinispan Query
  30. 30. Problemi di scalabilitá• Writer locks globali• Sharing su NFS molto problematico
  31. 31. Queue-based clustering (filesystem)
  32. 32. Index stored in Infinispan
  33. 33. Quickstart Hibernate Search • Aggiungi la dipendenza ad hibernate- search:<dependency>   <groupId>org.hibernate</groupId>   <artifactId>hibernate­search­orm</artifactId>   <version>4.1.0.Final</version></dependency>
  34. 34. Quickstart Hibernate Search• Tutto il resto é opzionale:– Come gestire gli indici– Moduli di estensione, Analyzer custom– Performance tuning– Mapping custom dei tipi– Clustering • JGroups • Infinispan • JMS
  35. 35. Quickstart Hibernate@Entity Searchpublic class Essay {   @Id   public Long getId() { return id; }   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  36. 36. Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  37. 37. Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  38. 38. Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...
  39. 39. Quickstart Hibernate@Entity @Indexed Searchpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne @IndexedEmbedded    public Author getAuthor() { return author; }...
  40. 40. Un secondo esempio@Entity @Entitypublic class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; private String title; private String name; } @OneToMany private Set<Book>books;}
  41. 41. Struttura dellindice@Entity @Indexed @Entitypublic class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; @Field(store=Store.YES) private String title;@Field(store=Store.YES) } private String name; @OneToMany @IndexedEmbedded private Set<Book>books;}
  42. 42. QueryString[] productFields = {"summary", ""};Query luceneQuery = // query builder or any Lucene QueryFullTextEntityManager ftEm =   Search.getFullTextEntityManager(entityManager);FullTextQuery query =   ftEm.createFullTextQuery( luceneQuery, Product.class );List<Product> items =   query.setMaxResults(100).getResultList();int totalNbrOfResults = query.getResultSize(); TotalNbrOfResults= 8.320.000 (0.002 seconds)
  43. 43. Uso della DSL
  44. 44. Sui risultati:• Managed POJO: modifiche alle entitá applicati sia a Lucene che al database• Paginazione JPA, familiari (standard):– .setMaxResults( 20 ).setFirstResult( 100 );• Restrizioni sul tipo, query fulltext polimorifiche:– .createQuery( luceneQuery, A.class, B.class, ..);• Projection• Result mapping
  45. 45. FiltersFullTextQuery ftQuery = s // s is a FullTextSession   .createFullTextQuery( query, Product.class )   .enableFullTextFilter( "filtroMinori" )   .enableFullTextFilter( "offertaDelGiorno" )      .setParameter( "day", “20120412” )   .enableFullTextFilter( "inStockA" )      .setParameter( "location", "Padova" );List<Product> results = ftQuery.list();
  46. 46. Uso di Infinispan per ladistribuzione degli indici
  47. 47. Clustering di un uso Lucene “diretto”• Usando org.apache.lucene– Tradizionalmente difficile da distribuire su nodi multipli– Su qualsiasi cloud
  48. 48. Nodo singolo idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectoryInfinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  49. 49. Nodi multipli idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectoryInfinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  50. 50. Le scritture non scalano?
  51. 51. Suggerimenti per performance ottimali• Calibra il chunk_size per luso effettivo del vostro indice (evita i read lock evitando la frammentazione)• Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware.• Scegli e configura un CacheLoader adatto
  52. 52. Requisiti di memoria• RAMDirectory: tutto lindice (e piú) in RAM.• FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory.• Infinispan: configurabile, fino alla memoria condivisa tra nodi– Flexible– Fast– Network vs. disk
  53. 53. Moduli per cloud deployment scalabili One Infinispan to rule them all– Store Lucene indexes– Hibernate second level cache– Application managed cache– Datagrid– EJB, session replication in AS7– As a JPA “store” via Hibernate OGM
  54. 54. Ingredienti per la cloud• JGroups DISCOVERY protocol– MPING– TCP_PING– JDBC_PING– S3_PING• Scegli un CacheLoader– Database based, Jclouds, Cassandra, ...
  55. 55. Futuro prossimo• Semplificare la scalabilitá in scrittura• Auto-tuning dei parametri di clustering – ergonomics!• Parallel searching: multi/core + multi/node• A component of–
  56. 56. JPA for NoSQL
  57. 57. NoSQL: la flessibilitá costa• Programming model • one per product :-(• no schema => app driven schema• query (Map Reduce, specific DSL, ...)• data structure transpires• Transaction• durability / consistency
  58. 58. Esempio: InfinispanDistributed Key/Value store (or Replicated, local only efficient cache, • invalidating cache)Each node is equal Just start more nodes, or kill some •No bottlenecks by design •Cloud-network friendly JGroups • And “cloud storage” friendly too! •
  59. 59. ABC di Infinispanmap.put( “user-34”,userInstance );map.get( “user-34” );map.remove( “user-34” );
  60. 60. É una ConcurrentMap !map.put( “user-34”, userInstance );map.get( “user-34” );map.remove( “user-34” );map.putIfAbsent( “user-38”,another );
  61. 61. Qualche altro dettaglio su Infinispan ● Support for Transactions (XA) ● CacheLoaders ● Cassandra, JDBC, Amazon S3 (jclouds),... ● Tree API for JBossCache compatibility ● Lucene integration ● Two-fold ● Some Hibernate integrations ● Second level cache ● Hibernate Search indexing backend
  62. 62. Obiettivi di Hibernate OGM Encourage new data usage patterns • Familiar environment • Ease of use • easy to jump in • easy to jump out • Push NoSQL exploration in enterprises • “PaaS for existing API” initiative •
  63. 63. Cosé• JPA front end to key/value stores • Object CRUD (incl polymorphism and associations) • OO queries (JP-QL)• Reuses • Hibernate Core • Hibernate Search (and Lucene) • Infinispan• Is not a silver bullet • not for all NoSQL use cases
  64. 64. Entitá come blob serializzati?• Serialize objects into the (key) value • store the whole graph?• maintain consistency with duplicated objects • guaranteed identity a == b • concurrency / latency • structure change and (de)serialization, class definition changes
  65. 65. OGM’s approach to schema• Keep what’s best from relational model • as much as possible • tables / columns / pks• Decorrelate object structure from data structure• Data stored as (self-described) tuples• Core types limited • portability
  66. 66. Query• Hibernate Search indexes entities• Store Lucene indexes in Infinispan• JP-QL to Lucene query transformation• Works for simple queries • Lucene is not a relational SQL engine
  67. 67. E ora?• MongoDB• EHCache / Terracotta• Redis• Voldemort• Neo4J• Dynamo• ... Git? Spreadsheet? ...CapeDwarf?
  68. 68. Q&A @Infinispan @Hibernate @SanneGrinovero
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.