Full-Text Search: HumanHeaven and DatabaseSavior in the CloudEmmanuel BernardJBoss a Division of Red HatAaron Walkerbase2S...
Goals>   Happier users>   Happier DBAs>   Simplicity in the cloud                              2
Emmanuel Bernard>   Hibernate Search in Action>   blog.emmanuelbernard.com>   twitter.com/emmanuelbernard                 ...
Aaron Walker>   CTO base2Services>   blog.base2services.com/aaron>   twitter.com/aaronwalker                              ...
Full-text Search andHibernate Search
What is searching?>   Searching is asking a question>   Different ways to answer    • Categorize data up-front    • Offer ...
SQL search limits>   Wildcard / word search    • ‘%hibernate%’>   Approximation (or synonym)    • ‘hybernat’>   Proximity ...
Full Text Search>   Search information    • by word    • inverted indices (word frequency, position)>   In RDBMS engines  ...
Mismatches with a domain model>   Structural mismatch                    Appl                                           Fw...
Hibernate Search>   Transparent indexing through event system    • PERSIST / UPDATE / DELETE>   Convert the object structu...
Queries and indexing>   Query    • Managed objects    • extends Query APIs    • Minimal intrusion>   Indexing    • synchro...
Mapping@Entity @Indexedpublic class Essay {  ...  @Id @DocumentId  public Long getId() { return id; }    @Field(name="Abst...
QueryFullTextEntityManager ftEm = Search.getFullTextEntityManager(em);FullTextSession ftSession = Search.getFullTextSessio...
Clustering search in a Java EEenvironment withoutcompromising scalability
What are the problems we are trying to solve?                                     MSSQL>>   SQL limitations               ...
Case study
Just Magazines>   Australia’s number 1 selling automotive magazine>   Specializes in niche & customs vehicles>   525,000 r...
Just Auto - Online automotive classifieds &communities>   Classifieds    • private & dealer ads>   Community features    • b...
Technology Stack>   Standard JEE APIs    • primarily EJB 3.0, JPA & JAX-RS>   Front-end    • Freemarker templating engine ...
Deployed in the Cloud>   Amazon Web Services    • EC2, EBS, S3 & CloudFront>   JBoss AS on CentOS/RHEL    • CMS Admin tool...
Deployment                         Amazon EC2                                web                                  web     ...
Techniques for buildinghighly scalable Web sitesand Web applications
Overview of using Hibernate Search queryprojection>   Hibernate Search allows you to return a subset of    properties dire...
Hibernate Search query projection - APIs>   Example - Result Transformerorg.hibernate.search.FullTextQuery query = s.creat...
Overview of Hibernate Search index replication>   Automatic replication>   Local indexes                    JMS           ...
Overview of Hibernate Search index sharding>   Allows you to index a given entity type into several    sub indexes    • de...
Techniques for building applications that are cloud-ready>   Break the architecture into small discrete pieces    • separa...
Take control of your cloud>   JOPR    • more than just a JBoss management console    • monitor OS, App Servers, Database a...
So why Amazon Web Services?>   Flexibility    • easily add and remove instances    • scale on demand!!!>   Play space    •...
More Amazon Web Services>   S3 - Simple Storage>   Elastic Block Storage - EBS    • fast persistence storage    • mounted ...
Summary>   Hibernate Search    • unified programmatic model    • feels like Hibernate, search like Lucene>   Scalability   ...
Questions?>   http://search.hibernate.org>   Hibernate Search in Action (Manning)>   http://lucene.apache.org>   a.walker@...
Emmanuel Bernardemmanuel@hibernate.orgHibernate Search in Action - Manninghttp://search.hibernate.orghttp://in.relation.to...
Upcoming SlideShare
Loading in …5
×

JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the Cloud

1,052 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,052
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the Cloud

  1. 1. Full-Text Search: HumanHeaven and DatabaseSavior in the CloudEmmanuel BernardJBoss a Division of Red HatAaron Walkerbase2Services
  2. 2. Goals> Happier users> Happier DBAs> Simplicity in the cloud 2
  3. 3. Emmanuel Bernard> Hibernate Search in Action> blog.emmanuelbernard.com> twitter.com/emmanuelbernard 3
  4. 4. Aaron Walker> CTO base2Services> blog.base2services.com/aaron> twitter.com/aaronwalker 4
  5. 5. Full-text Search andHibernate Search
  6. 6. What is searching?> Searching is asking a question> Different ways to answer • Categorize data up-front • Offer a detailed search screen • Offer a simple search box 6
  7. 7. SQL search limits> Wildcard / word search • ‘%hibernate%’> Approximation (or synonym) • ‘hybernat’> Proximity • ‘Java’ close to ‘Persistence’> Relevance or (result scoring)> multi-”column” search 7
  8. 8. Full Text Search> Search information • by word • inverted indices (word frequency, position)> In RDBMS engines • portability (proprietary add-on on top of SQL) • flexibility • scalability> Standalone engine 8
  9. 9. Mismatches with a domain model> Structural mismatch Appl Fwk • full text index are text only • no reference/association between document Persistence> Synchronization mismatch • keeping index and database up to date> Retrieval mismatch Domain • the index does not store objects Search Model • certainly not managed objects 9
  10. 10. Hibernate Search> Transparent indexing through event system • PERSIST / UPDATE / DELETE> Convert the object structure into Index structure • metadata (annotations) driven> Uses Lucene under the hood • optimizations 10
  11. 11. Queries and indexing> Query • Managed objects • extends Query APIs • Minimal intrusion> Indexing • synchronous / asynchronous • Plain Lucene / Clustered though JMS 11
  12. 12. Mapping@Entity @Indexedpublic class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES) public String getSummary() { return summary; } @Lob @Field(index=Index.TOKENIZED) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }} 12
  13. 13. QueryFullTextEntityManager ftEm = Search.getFullTextEntityManager(em);FullTextSession ftSession = Search.getFullTextSession(session);org.hibernate.Query query = ftSession.createFullTextQuery(luceneQuery);List<?> results = query.setMaxResults(100).list();FullTextQuery query = ftSession.createFullTextQuery(luceneQuery, Author.class);@SuppressWarnings(“unchecked”)List<Author> results = query.setMaxResults(100).list();int totalNbrOfResults = query.getResultSize(); 13
  14. 14. Clustering search in a Java EEenvironment withoutcompromising scalability
  15. 15. What are the problems we are trying to solve? MSSQL>> SQL limitations SELECT * FROM articles WHERE CONTAINS((title, body), ‘database’); • proprietary full text search MySQL>> performance bottlenecks SELECT * FROM articles WHERE MATCH (title,body) AGAINST (‘database’); • limited resources • non linear performance> scaling complexities • limited to scaling up • Vendor lock-in 15
  16. 16. Case study
  17. 17. Just Magazines> Australia’s number 1 selling automotive magazine> Specializes in niche & customs vehicles> 525,000 readers across all magazines 17
  18. 18. Just Auto - Online automotive classifieds &communities> Classifieds • private & dealer ads> Community features • blogs • projects • clubs • videos • and more cool web 2.0 stuff!!! :) 18
  19. 19. Technology Stack> Standard JEE APIs • primarily EJB 3.0, JPA & JAX-RS> Front-end • Freemarker templating engine • AJAX - mootools> Hibernate Search!!!!! 19
  20. 20. Deployed in the Cloud> Amazon Web Services • EC2, EBS, S3 & CloudFront> JBoss AS on CentOS/RHEL • CMS Admin tool • Light-weight front-end (Stripped down JBoss AS) • JOPR - JBoss management console> Load-balancing • Apache httpd, mod_cluster + DNS round-robin 20
  21. 21. Deployment Amazon EC2 web web front-end web front-end web front-end web front-end web front-end web front-end web front-end Postgres front-end JBoss AS load-balancer load-balancer apache Index Updates Lucene Lucene Lucene Indexes Lucene Indexes Users CMS CMS Indexes Indexes Images JBoss AS EBS/S3 Video etc CloudFront Admin 21
  22. 22. Techniques for buildinghighly scalable Web sitesand Web applications
  23. 23. Overview of using Hibernate Search queryprojection> Hibernate Search allows you to return a subset of properties directly from the Lucene index> Avoids a database hit!!> Requirements • the properties projected must be stored in the index @Field(store=Store.YES) • only simple properties of the indexed entity or its embedded associations 23
  24. 24. Hibernate Search query projection - APIs> Example - Result Transformerorg.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Blog.class );query.setProjection( "title", "author.name" );query.setResultTransformer( new StaticAliasToBeanResultTransformer( BlogView.class, "title", "author" ));List<BlogView> results = (List<BlogView>) query.list();for(BlogView view : results) { log.info( "Blog: " + view.getTitle() + ", " + view.getAuthor() );} • See org.hibernate.transform.ResultTransformer Interface for more details 24
  25. 25. Overview of Hibernate Search index replication> Automatic replication> Local indexes JMS Queue Index updates> Updates delegated to process a master Master Hibernate Hibernate Slave Slave Slave Slave Search Hibernate Search Slave Slave • via JMS Queue Master Hibernate Search Hibernate Lucene Search Lucene Hibernate Lucene Search Updates Hibernate Lucene Search Index search Index Search Index search Lucene Index search Lucene search Index Lucene Can easily add more search Index Lucene> search copy Index Index slaves copy 25
  26. 26. Overview of Hibernate Search index sharding> Allows you to index a given entity type into several sub indexes • default strategy uses hash of id field> Can Specify a custom sharding strategy • shard on a business field e.g geographic location, product category, etc... Dealer Entity Custom sharding Stratergy Dealer Lucene Dealer Index Shard Just Cars Index Just Bikes Index Shard 26
  27. 27. Techniques for building applications that are cloud-ready> Break the architecture into small discrete pieces • separated CMS from content delivery • individual sites for Cars, Bikes etc... • JBoss micro-container> Independently deployable components • can deploy CMS across number of servers • mix and match site deployments 27
  28. 28. Take control of your cloud> JOPR • more than just a JBoss management console • monitor OS, App Servers, Database and more • pluggable agents with simple API> EC2 • scriptable AMIs for rapid server configuration • change an instances personality at runtime • automate automate automate 28
  29. 29. So why Amazon Web Services?> Flexibility • easily add and remove instances • scale on demand!!!> Play space • can quick bring-up environments to experiment with • production migration> No lock-in> Complete cloud offering 29
  30. 30. More Amazon Web Services> S3 - Simple Storage> Elastic Block Storage - EBS • fast persistence storage • mounted multiple volumes in RAID 0 • snapshot backups to S3> CloudFront • content delivery network • used for static content images & video 30
  31. 31. Summary> Hibernate Search • unified programmatic model • feels like Hibernate, search like Lucene> Scalability • avoid inessential database hits • simple is better> Simplicity in the Cloud • design to scale out, not up!!! 31
  32. 32. Questions?> http://search.hibernate.org> Hibernate Search in Action (Manning)> http://lucene.apache.org> a.walker@base2services.com> emmanuel@hibernate.org 32
  33. 33. Emmanuel Bernardemmanuel@hibernate.orgHibernate Search in Action - Manninghttp://search.hibernate.orghttp://in.relation.to/Bloggers/EmmanuelAaron Walkera.walker@base2services.comhttp://blog.base2services.com/aaron

×