Database Full-text Search....                                 making it not suck                                          ...
Why Full-text Search                             • Add a rich search dimension to your                               persi...
Integrate full-text search and persistent                     domain                             •   SQL Search vs Full-te...
SQL search limits                             •    Wildcard / word search                                 • ‘%hibernate%’ ...
Full Text Search                             •   Search information                                 •   by word           ...
Mismatches with a                                      domain model                             •   Structural mismatch   ...
Hibernate Search                             •   Under the Hibernate platform                                 •   LGPL    ...
Architecture                             •   Transparent indexing through event system (JPA)                              ...
Architecture (Backend)                             •   Lucene Directory                                     Architecture (...
Architecture (Backend)                      • JMS (Cluster)(Backend)                     Architecture                     ...
DEMObase2Services Pty Ltd 2010
Configuration and                                        Mapping                             •   Configuration              ...
Mapping example                             Mapping example                             @Entity @Indexed(index="indexes/es...
Fluent Mapping API                             SearchMapping mapping = new SearchMapping();                             ma...
Query                             •   Retrieve objects, not documents                                 •   no boilerplate c...
Query example                             Query example                             org.apache.lucene.search.Query luceneQ...
More query features                             •   Fine grained fetching strategy                             •   Sort by...
DEMObase2Services Pty Ltd 2010
Some more features                             •   Automatic index optimisation                             •   Index shar...
Full-Text search                                 without the hassle                             • Transparent index synchr...
Questions                             • http://search.hibernate.org                             • Hibernate Search in Acti...
Upcoming SlideShare
Loading in...5
×

OSDC-2010 Database Full-text Search.... making it not suck

1,329

Published on

Providing rich "Google like" search capabilities in traditional database backed applications has generally meant relying on the full-text support of your chosen database. This leads to database lock-in not to mention the need of having a dba with some serious kung-fu to tune it.

Hibernate Search brings the power of full text search engines to the persistence domain model by combining Hibernate Core with the capabilities of the Apache Lucene search engine.

In this session, you will learn what problems Hibernate Search can solve and you will follow the steps of adding it to a Hibernate based application.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,329
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OSDC-2010 Database Full-text Search.... making it not suck

  1. 1. Database Full-text Search.... making it not suck Emmanuel Bernard > Aaron Walker Hibernate Search in Actio a.walker@base2services.com > blog.emmanuelbernard.c @aaronwalker > twitter.com/emmanuelbe www.base2services.combase2Services Pty Ltd 2010
  2. 2. Why Full-text Search • Add a rich search dimension to your persistent domain model • Standard SQL sucks for doing “Google like” searchesbase2Services Pty Ltd 2010
  3. 3. Integrate full-text search and persistent domain • SQL Search vs Full-text Search • Object model / Full-text Search mismatches • Hibernate Search architecture Configuration and Mapping • Demo • Full-text based object queries • Demo • Some more featuresbase2Services Pty Ltd 2010
  4. 4. SQL search limits • Wildcard / word search • ‘%hibernate%’ • Approximation (or synonym) • ‘hybernat’ • Proximity • ‘Java’ close to ‘Persistence’ • Relevance or (result scoring) • multi-”column” searchbase2Services Pty Ltd 2010
  5. 5. Full Text Search • Search information • by word • inverted indices (word frequency, position) • In RDBMS engines • portability (proprietary add-on on top of SQL) • flexibility • Standalone engine • Apache Lucene(tm) http://lucene.apache.orgbase2Services Pty Ltd 2010
  6. 6. Mismatches with a domain model • Structural mismatch • full text index are text only • no reference/association between document • Synchronisation mismatch • keeping index and database up to date • Retrieval mismatch • the index does not store objects • certainly not managed objectsbase2Services Pty Ltd 2010
  7. 7. Hibernate Search • Under the Hibernate platform • LGPL • Built on top of Hibernate Core • Use Apache Lucene(tm) under the hood • In top 10 downloaded at Apache • Very powerful • Somewhat low level • easy to use it the “wrong” way • Solve the mismatchesbase2Services Pty Ltd 2010
  8. 8. Architecture • Transparent indexing through event system (JPA) • PERSIST / UPDATE / DELETE • Convert the object structure into Index structure • Operation batching per transaction • better Lucene performance • “ACID”-ity • (pluggable scope) • Backend • synchronous / asynchronous mode • Lucene, JMSbase2Services Pty Ltd 2010
  9. 9. Architecture (Backend) • Lucene Directory Architecture (Backend) • standalone or symmetric cluster • immediate Directory • Lucene visibility • standalone or symmetric cluster • Can •affect front end runtime immediate visibility • Can affect front end runtime Hibernate + Hibernate Search Search request Index update Lucene Directory (Index) Database Search request Index update Hibernate + Hibernate Search Copyright Red Hat Middleware LLC. Do not redistribute unless authorized | v1.2 10base2Services Pty Ltd 2010
  10. 10. Architecture (Backend) • JMS (Cluster)(Backend) Architecture • Search processed locally • JMS (Cluster) • •Search processedto a master node (JMS) Changes sent locally • • synchronous a master node (JMS) Changes sent to indexing (delay) • asynchronous indexing (delay) • •No No front extraextra cost front end end cost Slave Hibernate Database Lucene + Directory Search request Hibernate Search (Index) Copy Index update order Copy Hibernate Master Lucene + Directory Index update JMS Process Hibernate Search (Index) queue Master Copyright Red Hat Middleware LLC. Do not redistribute unless authorized | v1.2 11base2Services Pty Ltd 2010
  11. 11. DEMObase2Services Pty Ltd 2010
  12. 12. Configuration and Mapping • Configuration • event listener wiring • transparent in Hibernate Annotations • Backend configuration • Mapping: annotation based • @Indexed • @Field(store, index) • @IndexedEmbedded • @FieldBridge • @Boost / @Analyzerbase2Services Pty Ltd 2010
  13. 13. Mapping example Mapping example @Entity @Indexed(index="indexes/essays") public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES) public String getSummary() { return summary; } @Lob @Field(index=Index.TOKENIZED) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; } } Copyright Red Hat Middleware LLC. Do not redistribute unless authorized | v1.2 13base2Services Pty Ltd 2010
  14. 14. Fluent Mapping API SearchMapping mapping = new SearchMapping(); mapping .analyzerDef( "ngram", StandardTokenizerFactory.class ) .filter( LowerCaseFilterFactory.class ) .filter( NGramFilterFactory.class ) .param( "minGramSize", "3" ) .param( "maxGramSize", "3" ) .entity(Address.class) .indexed() .property("addressId", METHOD) .documentId() .property("street1", METHOD) .field();base2Services Pty Ltd 2010
  15. 15. Query • Retrieve objects, not documents • no boilerplate conversion code! • Objects from the Persistence Context • same semantic as a JPA-QL or Criteria query • Use org.hibernate.Query/javax.persistence.Query • common API for all your queries • pagination support • list / scroll / iterate support • Query on correlated objects • “JOIN”-like querybase2Services Pty Ltd 2010
  16. 16. Query example Query example org.apache.lucene.search.Query luceneQuery; String queryString = "summary:Festina Or brand:Seiko" luceneQuery = parser.parse( queryString ); org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery ); fullTextQuery.setMaxResult(200); List result = fullTextQuery.list(); //return a list of managed objects queryString = "title:hybernat~" //Approximate search queryString = ""Hibernate JBoss"~10" //Proximity search queryString = "author.address.city:Atlanta" //correlated search Copyright Red Hat Middleware LLC. Do not redistribute unless authorized | v1.2 15base2Services Pty Ltd 2010
  17. 17. More query features • Fine grained fetching strategy • Sort by property rather than relevance • Projection (both field and metadata) • metadata: SCORE, ID, BOOST etc • no DB / Persistence Context access • Filters • security / temporal data / category • Total number of results • New Type Safe Fluent Query API (Experimental)base2Services Pty Ltd 2010
  18. 18. DEMObase2Services Pty Ltd 2010
  19. 19. Some more features • Automatic index optimisation • Index sharding • Programmatic Mapping API • Manual indexing and purging • non event-based system • Mass Indexing/Re-indexing • Shared Lucene resources • Native Lucene accessbase2Services Pty Ltd 2010
  20. 20. Full-Text search without the hassle • Transparent index synchronisation • Automatic Structural conversion through Mapping • No paradigm shift when retrieving data • Clustering capability out of the box • Easier / transparent optimized Lucene usebase2Services Pty Ltd 2010
  21. 21. Questions • http://search.hibernate.org • Hibernate Search in Action (Manning) • http://lucene.apache.org • a.walker@base2services.com • github.com/aaronwalkerbase2Services Pty Ltd 2010

×