SlideShare a Scribd company logo
1 of 33
Full-Text Search: Human
Heaven and Database
Savior in the Cloud
Emmanuel Bernard
JBoss a Division of Red Hat
Aaron Walker
base2Services
Goals

>   Happier users
>   Happier DBAs
>   Simplicity in the cloud




                              2
Emmanuel Bernard

>   Hibernate Search in Action
>   blog.emmanuelbernard.com
>   twitter.com/emmanuelbernard




                                  3
Aaron Walker

>   CTO base2Services
>   blog.base2services.com/aaron
>   twitter.com/aaronwalker




                                   4
Full-text Search and
Hibernate Search
What is searching?

>   Searching is asking a question

>   Different ways to answer
    • Categorize data up-front
    • Offer a detailed search screen
    • Offer a simple search box



                                       6
SQL search limits

>   Wildcard / word search
    • ‘%hibernate%’
>   Approximation (or synonym)
    • ‘hybernat’
>   Proximity
    • ‘Java’ close to ‘Persistence’
>   Relevance or (result scoring)
>   multi-”column” search

                                      7
Full Text Search

>   Search information
    • by word
    • inverted indices (word frequency, position)

>   In RDBMS engines
    • portability (proprietary add-on on top of SQL)
    • flexibility
    • scalability
>   Standalone engine
                                                       8
Mismatches with a domain model

>   Structural mismatch                    Appl
                                           Fwk
    • full text index are text only
    • no reference/association between document   Persistence


>   Synchronization mismatch
    • keeping index and database up to date
>   Retrieval mismatch
                                                   Domain
    • the index does not store objects   Search
                                                    Model


    • certainly not managed objects
                                                            9
Hibernate Search

>   Transparent indexing through event system
    • PERSIST / UPDATE / DELETE
>   Convert the object structure into Index structure
    • metadata (annotations) driven
>   Uses Lucene under the hood
    • optimizations



                                                        10
Queries and indexing

>   Query
    • Managed objects
    • extends Query APIs
    • Minimal intrusion
>   Indexing
    • synchronous / asynchronous
    • Plain Lucene / Clustered though JMS


                                            11
Mapping

@Entity @Indexed
public class Essay {
  ...
  @Id @DocumentId
  public Long getId() { return id; }

    @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
    public String getSummary() { return summary; }

    @Lob @Field(index=Index.TOKENIZED)
    public String getText() { return text; }

    @ManyToOne @IndexedEmbedded
    public Author getAuthor() { return author; }
}



                                                                      12
Query


FullTextEntityManager ftEm = Search.getFullTextEntityManager(em);

FullTextSession ftSession = Search.getFullTextSession(session);

org.hibernate.Query query = ftSession.createFullTextQuery(luceneQuery);
List<?> results = query.setMaxResults(100).list();

FullTextQuery query = ftSession.createFullTextQuery(luceneQuery, Author.class);
@SuppressWarnings(“unchecked”)
List<Author> results = query.setMaxResults(100).list();

int totalNbrOfResults = query.getResultSize();




                                                                                  13
Clustering search in a Java EE
environment without
compromising scalability
What are the problems we are trying to solve?
                                     MSSQL>
>   SQL limitations                  SELECT * FROM articles
                                     WHERE CONTAINS((title, body), ‘database’);
    • proprietary full text search
                                     MySQL>
>   performance bottlenecks          SELECT * FROM articles
                                     WHERE MATCH (title,body) AGAINST (‘database’);
    • limited resources
    • non linear performance
>   scaling complexities
    • limited to scaling up
    • Vendor lock-in
                                                                                      15
Case study
Just Magazines

>   Australia’s number 1 selling automotive magazine
>   Specializes in niche & customs vehicles
>   525,000 readers across all magazines




                                                       17
Just Auto - Online automotive classifieds &
communities
>   Classifieds
    • private & dealer ads
>   Community features
    • blogs
    • projects
    • clubs
    • videos
    • and more cool web 2.0 stuff!!! :)
                                             18
Technology Stack

>   Standard JEE APIs
    • primarily EJB 3.0, JPA & JAX-RS
>   Front-end
    • Freemarker templating engine
    • AJAX - mootools
>   Hibernate Search!!!!!




                                        19
Deployed in the Cloud

>   Amazon Web Services
    • EC2, EBS, S3 & CloudFront
>   JBoss AS on CentOS/RHEL
    • CMS Admin tool
    • Light-weight front-end (Stripped down JBoss AS)
    • JOPR - JBoss management console
>   Load-balancing
    • Apache httpd, mod_cluster + DNS round-robin
                                                        20
Deployment
                         Amazon EC2
                                web
                                  web
                            front-end
                                    web
                              front-end
                                      web
                                front-end
                                        web
                                  front-end
                                          web
                                    front-end
                                            web
                                      front-end
                                             web
                                        front-end
        Postgres                          front-end
                                            JBoss AS
                                                                  load-balancer
                                                                    load-balancer
                                                                       apache
                   Index Updates


                                                  Lucene
                                                    Lucene
                                                      Lucene
                                                  Indexes
                                                        Lucene
                                                    Indexes
                                                                                                  Users
               CMS
                CMS                                   Indexes
                                                        Indexes                          Images
              JBoss AS
                                                       EBS/S3                             Video
                                                                                           etc




                                                                            CloudFront




               Admin




                                                                                                          21
Techniques for building
highly scalable Web sites
and Web applications
Overview of using Hibernate Search query
projection
>   Hibernate Search allows you to return a subset of
    properties directly from the Lucene index
>   Avoids a database hit!!
>   Requirements
    • the properties projected must be stored in the index
      @Field(store=Store.YES)
    • only simple properties of the indexed entity or its
      embedded associations


                                                             23
Hibernate Search query projection - APIs

>   Example - Result Transformer
org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Blog.class );

query.setProjection( "title", "author.name" );

query.setResultTransformer(
   new StaticAliasToBeanResultTransformer( BlogView.class, "title", "author" )
);

List<BlogView> results = (List<BlogView>) query.list();
for(BlogView view : results) {
   log.info( "Blog: " + view.getTitle() + ", " + view.getAuthor() );
}

    •   See org.hibernate.transform.ResultTransformer Interface for more details


                                                                                               24
Overview of Hibernate Search index replication

>   Automatic replication
>   Local indexes                    JMS
                                    Queue
                                                     Index updates

>   Updates delegated to           process



    a master                     Master
                            Hibernate                                 Hibernate
                                                                                       Slave
                                                                                        Slave
                                                                                         Slave
                                                                                          Slave
                             Search                                    Hibernate
                                                                       Search              Slave
                                                                                            Slave
    • via JMS Queue                         Master                       Hibernate
                                                                         Search
                                                                          Hibernate Lucene
                                                                          Search
                                            Lucene                          Hibernate Lucene
                                                                            Search
                              Updates                                         Hibernate Lucene
                                                                              Search Index
                                                                          search
                                             Index
                                                                               Search Index
                                                                            search         Lucene
                                                                                          Index
                                                                             search          Lucene
                                                                               search       Index
                                                                                               Lucene

    Can easily add more                                                          search       Index
                                                                                                 Lucene
>                                                                                 search
                                                                                     copy
                                                                                                Index
                                                                                                  Index


    slaves                                                           copy




                                                                                                          25
Overview of Hibernate Search index sharding

>   Allows you to index a given entity type into several
    sub indexes
    • default strategy uses hash of id field
>   Can Specify a custom sharding strategy
    • shard on a business field e.g geographic location,
      product category, etc...         Dealer
                                                     Entity


                                            Custom sharding Stratergy



                           Dealer                   Lucene                          Dealer
                        Index Shard   Just Cars      Index          Just Bikes   Index Shard




                                                                                               26
Techniques for building applications that are cloud-
ready
>   Break the architecture into small discrete pieces
    • separated CMS from content delivery
    • individual sites for Cars, Bikes etc...
    • JBoss micro-container
>   Independently deployable components
    • can deploy CMS across number of servers
    • mix and match site deployments


                                                        27
Take control of your cloud

>   JOPR
    • more than just a JBoss management console
    • monitor OS, App Servers, Database and more
    • pluggable agents with simple API
>   EC2
    • scriptable AMIs for rapid server configuration
    • change an instances personality at runtime
    • automate automate automate
                                                      28
So why Amazon Web Services?

>   Flexibility
    • easily add and remove instances
    • scale on demand!!!
>   Play space
    • can quick bring-up environments to experiment with
    • production migration
>   No lock-in
>   Complete cloud offering

                                                           29
More Amazon Web Services

>   S3 - Simple Storage
>   Elastic Block Storage - EBS
    • fast persistence storage
    • mounted multiple volumes in RAID 0
    • snapshot backups to S3
>   CloudFront
    • content delivery network
    • used for static content images & video
                                               30
Summary

>   Hibernate Search
    • unified programmatic model
    • feels like Hibernate, search like Lucene
>   Scalability
    • avoid inessential database hits
    • simple is better
>   Simplicity in the Cloud
    • design to scale out, not up!!!
                                                 31
Questions?

>   http://search.hibernate.org
>   Hibernate Search in Action (Manning)
>   http://lucene.apache.org

>   a.walker@base2services.com
>   emmanuel@hibernate.org




                                           32
Emmanuel Bernard
emmanuel@hibernate.org
Hibernate Search in Action - Manning
http://search.hibernate.org
http://in.relation.to/Bloggers/Emmanuel


Aaron Walker
a.walker@base2services.com
http://blog.base2services.com/aaron

More Related Content

What's hot

02.egovFrame Development Environment training book
02.egovFrame Development Environment training book02.egovFrame Development Environment training book
02.egovFrame Development Environment training bookChuong Nguyen
 
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...Dr. Spock
 
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011OSGi-enabled Java EE Applications using GlassFish at JCertif 2011
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011Arun Gupta
 
Weblogic Server
Weblogic ServerWeblogic Server
Weblogic Serveracsvianabr
 
Sql server indexed views speed up your select queries part 1 - code-projec
Sql server indexed views   speed up your select queries  part 1 - code-projecSql server indexed views   speed up your select queries  part 1 - code-projec
Sql server indexed views speed up your select queries part 1 - code-projecKaing Menglieng
 
RESTful Web services using JAX-RS
RESTful Web services using JAX-RSRESTful Web services using JAX-RS
RESTful Web services using JAX-RSArun Gupta
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...Stephan H. Wissel
 
F428435966 odtug web-logic for developers
F428435966 odtug   web-logic for developersF428435966 odtug   web-logic for developers
F428435966 odtug web-logic for developersMeng He
 
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010Arun Gupta
 
0012
00120012
0012none
 
Building a SharePoint Platform That Scales
Building a SharePoint Platform That ScalesBuilding a SharePoint Platform That Scales
Building a SharePoint Platform That ScalesScott Hoag
 
Developer’s intro to the alfresco platform
Developer’s intro to the alfresco platformDeveloper’s intro to the alfresco platform
Developer’s intro to the alfresco platformAlfresco Software
 

What's hot (20)

Jdbc
JdbcJdbc
Jdbc
 
JBoss AS7 Reloaded
JBoss AS7 ReloadedJBoss AS7 Reloaded
JBoss AS7 Reloaded
 
JBoss AS / EAP and Java EE6
JBoss AS / EAP and Java EE6JBoss AS / EAP and Java EE6
JBoss AS / EAP and Java EE6
 
02.egovFrame Development Environment training book
02.egovFrame Development Environment training book02.egovFrame Development Environment training book
02.egovFrame Development Environment training book
 
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...
Avoiding Java EE Application Design Traps to Achieve Effective Use of Cloud C...
 
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011OSGi-enabled Java EE Applications using GlassFish at JCertif 2011
OSGi-enabled Java EE Applications using GlassFish at JCertif 2011
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Weblogic Server
Weblogic ServerWeblogic Server
Weblogic Server
 
Sql server indexed views speed up your select queries part 1 - code-projec
Sql server indexed views   speed up your select queries  part 1 - code-projecSql server indexed views   speed up your select queries  part 1 - code-projec
Sql server indexed views speed up your select queries part 1 - code-projec
 
RESTful Web services using JAX-RS
RESTful Web services using JAX-RSRESTful Web services using JAX-RS
RESTful Web services using JAX-RS
 
Lo nuevo en Spring 3.0
Lo nuevo  en Spring 3.0Lo nuevo  en Spring 3.0
Lo nuevo en Spring 3.0
 
Nick harris-sic-2011
Nick harris-sic-2011Nick harris-sic-2011
Nick harris-sic-2011
 
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
SHOW107: The DataSource Session: Take XPages data boldly where no XPages data...
 
F428435966 odtug web-logic for developers
F428435966 odtug   web-logic for developersF428435966 odtug   web-logic for developers
F428435966 odtug web-logic for developers
 
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010
Tools Coverage for the Java EE Platform @ Silicon Valley Code Camp 2010
 
Java EE and Glassfish
Java EE and GlassfishJava EE and Glassfish
Java EE and Glassfish
 
0012
00120012
0012
 
Building a SharePoint Platform That Scales
Building a SharePoint Platform That ScalesBuilding a SharePoint Platform That Scales
Building a SharePoint Platform That Scales
 
ORACLE 9i
ORACLE 9iORACLE 9i
ORACLE 9i
 
Developer’s intro to the alfresco platform
Developer’s intro to the alfresco platformDeveloper’s intro to the alfresco platform
Developer’s intro to the alfresco platform
 

Similar to JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the Cloud

Enterprise Java Web Application Frameworks Sample Stack Implementation
Enterprise Java Web Application Frameworks   Sample Stack ImplementationEnterprise Java Web Application Frameworks   Sample Stack Implementation
Enterprise Java Web Application Frameworks Sample Stack ImplementationMert Çalışkan
 
BP-1 Performance and Scalability
BP-1 Performance and ScalabilityBP-1 Performance and Scalability
BP-1 Performance and ScalabilityAlfresco Software
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Spring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingSpring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingThorsten Kamann
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
JUDCon London 2011 - Elastic SOA on the Cloud, Steve Millidge
JUDCon London 2011 - Elastic SOA on the Cloud, Steve MillidgeJUDCon London 2011 - Elastic SOA on the Cloud, Steve Millidge
JUDCon London 2011 - Elastic SOA on the Cloud, Steve MillidgeC2B2 Consulting
 
Introduction to Ember.js and how we used it at FlowPro.io
Introduction to Ember.js and how we used it at FlowPro.ioIntroduction to Ember.js and how we used it at FlowPro.io
Introduction to Ember.js and how we used it at FlowPro.ioPaul Knittel
 
Summer training oracle
Summer training   oracle Summer training   oracle
Summer training oracle Arshit Rai
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
Extending Oracle E-Business Suite with Ruby on Rails
Extending Oracle E-Business Suite with Ruby on RailsExtending Oracle E-Business Suite with Ruby on Rails
Extending Oracle E-Business Suite with Ruby on RailsRaimonds Simanovskis
 
Java EE 6 & GlassFish = Less Code + More Power at CEJUG
Java EE 6 & GlassFish = Less Code + More Power at CEJUGJava EE 6 & GlassFish = Less Code + More Power at CEJUG
Java EE 6 & GlassFish = Less Code + More Power at CEJUGArun Gupta
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services
 
Do We Need Esb Any More
Do We Need Esb Any MoreDo We Need Esb Any More
Do We Need Esb Any Morekaraznie
 
Summer training oracle
Summer training   oracle Summer training   oracle
Summer training oracle Arshit Rai
 
Java EE 6 = Less Code + More Power
Java EE 6 = Less Code + More PowerJava EE 6 = Less Code + More Power
Java EE 6 = Less Code + More PowerArun Gupta
 
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnition
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnitionJava EE 6 & GlassFish = Less Code + More Power @ DevIgnition
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnitionArun Gupta
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repositorynobby
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...J V
 

Similar to JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the Cloud (20)

Enterprise Java Web Application Frameworks Sample Stack Implementation
Enterprise Java Web Application Frameworks   Sample Stack ImplementationEnterprise Java Web Application Frameworks   Sample Stack Implementation
Enterprise Java Web Application Frameworks Sample Stack Implementation
 
BP-1 Performance and Scalability
BP-1 Performance and ScalabilityBP-1 Performance and Scalability
BP-1 Performance and Scalability
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Spring 3 - Der dritte Frühling
Spring 3 - Der dritte FrühlingSpring 3 - Der dritte Frühling
Spring 3 - Der dritte Frühling
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
JUDCon London 2011 - Elastic SOA on the Cloud, Steve Millidge
JUDCon London 2011 - Elastic SOA on the Cloud, Steve MillidgeJUDCon London 2011 - Elastic SOA on the Cloud, Steve Millidge
JUDCon London 2011 - Elastic SOA on the Cloud, Steve Millidge
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Introduction to Ember.js and how we used it at FlowPro.io
Introduction to Ember.js and how we used it at FlowPro.ioIntroduction to Ember.js and how we used it at FlowPro.io
Introduction to Ember.js and how we used it at FlowPro.io
 
Summer training oracle
Summer training   oracle Summer training   oracle
Summer training oracle
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Extending Oracle E-Business Suite with Ruby on Rails
Extending Oracle E-Business Suite with Ruby on RailsExtending Oracle E-Business Suite with Ruby on Rails
Extending Oracle E-Business Suite with Ruby on Rails
 
Java EE 6 & GlassFish = Less Code + More Power at CEJUG
Java EE 6 & GlassFish = Less Code + More Power at CEJUGJava EE 6 & GlassFish = Less Code + More Power at CEJUG
Java EE 6 & GlassFish = Less Code + More Power at CEJUG
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
Do We Need Esb Any More
Do We Need Esb Any MoreDo We Need Esb Any More
Do We Need Esb Any More
 
Summer training oracle
Summer training   oracle Summer training   oracle
Summer training oracle
 
Java EE 6 = Less Code + More Power
Java EE 6 = Less Code + More PowerJava EE 6 = Less Code + More Power
Java EE 6 = Less Code + More Power
 
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnition
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnitionJava EE 6 & GlassFish = Less Code + More Power @ DevIgnition
Java EE 6 & GlassFish = Less Code + More Power @ DevIgnition
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
 

More from Aaron Walker

Just Enough Infrastructure
Just Enough InfrastructureJust Enough Infrastructure
Just Enough InfrastructureAaron Walker
 
Amazon VPC Lattice: The Service Mesh you actually want!!
Amazon VPC Lattice: The Service Mesh you actually want!!Amazon VPC Lattice: The Service Mesh you actually want!!
Amazon VPC Lattice: The Service Mesh you actually want!!Aaron Walker
 
Berlin AWS User Group - 10 May 2022
Berlin AWS User Group - 10 May 2022 Berlin AWS User Group - 10 May 2022
Berlin AWS User Group - 10 May 2022 Aaron Walker
 
Do you REALLY know what is going on in your AWS Accounts?
Do you REALLY know what is going on in your AWS Accounts?Do you REALLY know what is going on in your AWS Accounts?
Do you REALLY know what is going on in your AWS Accounts?Aaron Walker
 
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with Jenkins
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with JenkinsBerlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with Jenkins
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with JenkinsAaron Walker
 
Meetup - AWS Berlin October 2018 - Account Management and AWS Organizations
Meetup - AWS Berlin October 2018 - Account Management and AWS OrganizationsMeetup - AWS Berlin October 2018 - Account Management and AWS Organizations
Meetup - AWS Berlin October 2018 - Account Management and AWS OrganizationsAaron Walker
 
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormation
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormationMeetup AWS Berlin July 2018 - You're writing WAY too much CloudFormation
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormationAaron Walker
 
Berlin DevOps Meetup 2018-07-12
Berlin DevOps Meetup 2018-07-12Berlin DevOps Meetup 2018-07-12
Berlin DevOps Meetup 2018-07-12Aaron Walker
 
Enabling your DevOps culture with AWS-webinar
Enabling your DevOps culture with AWS-webinarEnabling your DevOps culture with AWS-webinar
Enabling your DevOps culture with AWS-webinarAaron Walker
 
Enabling your DevOps culture with AWS
Enabling your DevOps culture with AWSEnabling your DevOps culture with AWS
Enabling your DevOps culture with AWSAaron Walker
 
OSDC 2010 - You've Got Cucumber in my Java and it Tastes Great
OSDC 2010 - You've Got Cucumber in my Java and it Tastes GreatOSDC 2010 - You've Got Cucumber in my Java and it Tastes Great
OSDC 2010 - You've Got Cucumber in my Java and it Tastes GreatAaron Walker
 
OSDC-2010 Database Full-text Search.... making it not suck
OSDC-2010 Database Full-text Search.... making it not suckOSDC-2010 Database Full-text Search.... making it not suck
OSDC-2010 Database Full-text Search.... making it not suckAaron Walker
 
Java EE Behave!!!!
Java EE Behave!!!!Java EE Behave!!!!
Java EE Behave!!!!Aaron Walker
 

More from Aaron Walker (13)

Just Enough Infrastructure
Just Enough InfrastructureJust Enough Infrastructure
Just Enough Infrastructure
 
Amazon VPC Lattice: The Service Mesh you actually want!!
Amazon VPC Lattice: The Service Mesh you actually want!!Amazon VPC Lattice: The Service Mesh you actually want!!
Amazon VPC Lattice: The Service Mesh you actually want!!
 
Berlin AWS User Group - 10 May 2022
Berlin AWS User Group - 10 May 2022 Berlin AWS User Group - 10 May 2022
Berlin AWS User Group - 10 May 2022
 
Do you REALLY know what is going on in your AWS Accounts?
Do you REALLY know what is going on in your AWS Accounts?Do you REALLY know what is going on in your AWS Accounts?
Do you REALLY know what is going on in your AWS Accounts?
 
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with Jenkins
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with JenkinsBerlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with Jenkins
Berlin CI/CD Meetup - Reusable Serverless CI/CD pipelines with Jenkins
 
Meetup - AWS Berlin October 2018 - Account Management and AWS Organizations
Meetup - AWS Berlin October 2018 - Account Management and AWS OrganizationsMeetup - AWS Berlin October 2018 - Account Management and AWS Organizations
Meetup - AWS Berlin October 2018 - Account Management and AWS Organizations
 
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormation
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormationMeetup AWS Berlin July 2018 - You're writing WAY too much CloudFormation
Meetup AWS Berlin July 2018 - You're writing WAY too much CloudFormation
 
Berlin DevOps Meetup 2018-07-12
Berlin DevOps Meetup 2018-07-12Berlin DevOps Meetup 2018-07-12
Berlin DevOps Meetup 2018-07-12
 
Enabling your DevOps culture with AWS-webinar
Enabling your DevOps culture with AWS-webinarEnabling your DevOps culture with AWS-webinar
Enabling your DevOps culture with AWS-webinar
 
Enabling your DevOps culture with AWS
Enabling your DevOps culture with AWSEnabling your DevOps culture with AWS
Enabling your DevOps culture with AWS
 
OSDC 2010 - You've Got Cucumber in my Java and it Tastes Great
OSDC 2010 - You've Got Cucumber in my Java and it Tastes GreatOSDC 2010 - You've Got Cucumber in my Java and it Tastes Great
OSDC 2010 - You've Got Cucumber in my Java and it Tastes Great
 
OSDC-2010 Database Full-text Search.... making it not suck
OSDC-2010 Database Full-text Search.... making it not suckOSDC-2010 Database Full-text Search.... making it not suck
OSDC-2010 Database Full-text Search.... making it not suck
 
Java EE Behave!!!!
Java EE Behave!!!!Java EE Behave!!!!
Java EE Behave!!!!
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 

JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the Cloud

  • 1. Full-Text Search: Human Heaven and Database Savior in the Cloud Emmanuel Bernard JBoss a Division of Red Hat Aaron Walker base2Services
  • 2. Goals > Happier users > Happier DBAs > Simplicity in the cloud 2
  • 3. Emmanuel Bernard > Hibernate Search in Action > blog.emmanuelbernard.com > twitter.com/emmanuelbernard 3
  • 4. Aaron Walker > CTO base2Services > blog.base2services.com/aaron > twitter.com/aaronwalker 4
  • 6. What is searching? > Searching is asking a question > Different ways to answer • Categorize data up-front • Offer a detailed search screen • Offer a simple search box 6
  • 7. SQL search limits > Wildcard / word search • ‘%hibernate%’ > Approximation (or synonym) • ‘hybernat’ > Proximity • ‘Java’ close to ‘Persistence’ > Relevance or (result scoring) > multi-”column” search 7
  • 8. Full Text Search > Search information • by word • inverted indices (word frequency, position) > In RDBMS engines • portability (proprietary add-on on top of SQL) • flexibility • scalability > Standalone engine 8
  • 9. Mismatches with a domain model > Structural mismatch Appl Fwk • full text index are text only • no reference/association between document Persistence > Synchronization mismatch • keeping index and database up to date > Retrieval mismatch Domain • the index does not store objects Search Model • certainly not managed objects 9
  • 10. Hibernate Search > Transparent indexing through event system • PERSIST / UPDATE / DELETE > Convert the object structure into Index structure • metadata (annotations) driven > Uses Lucene under the hood • optimizations 10
  • 11. Queries and indexing > Query • Managed objects • extends Query APIs • Minimal intrusion > Indexing • synchronous / asynchronous • Plain Lucene / Clustered though JMS 11
  • 12. Mapping @Entity @Indexed public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES) public String getSummary() { return summary; } @Lob @Field(index=Index.TOKENIZED) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; } } 12
  • 13. Query FullTextEntityManager ftEm = Search.getFullTextEntityManager(em); FullTextSession ftSession = Search.getFullTextSession(session); org.hibernate.Query query = ftSession.createFullTextQuery(luceneQuery); List<?> results = query.setMaxResults(100).list(); FullTextQuery query = ftSession.createFullTextQuery(luceneQuery, Author.class); @SuppressWarnings(“unchecked”) List<Author> results = query.setMaxResults(100).list(); int totalNbrOfResults = query.getResultSize(); 13
  • 14. Clustering search in a Java EE environment without compromising scalability
  • 15. What are the problems we are trying to solve? MSSQL> > SQL limitations SELECT * FROM articles WHERE CONTAINS((title, body), ‘database’); • proprietary full text search MySQL> > performance bottlenecks SELECT * FROM articles WHERE MATCH (title,body) AGAINST (‘database’); • limited resources • non linear performance > scaling complexities • limited to scaling up • Vendor lock-in 15
  • 17. Just Magazines > Australia’s number 1 selling automotive magazine > Specializes in niche & customs vehicles > 525,000 readers across all magazines 17
  • 18. Just Auto - Online automotive classifieds & communities > Classifieds • private & dealer ads > Community features • blogs • projects • clubs • videos • and more cool web 2.0 stuff!!! :) 18
  • 19. Technology Stack > Standard JEE APIs • primarily EJB 3.0, JPA & JAX-RS > Front-end • Freemarker templating engine • AJAX - mootools > Hibernate Search!!!!! 19
  • 20. Deployed in the Cloud > Amazon Web Services • EC2, EBS, S3 & CloudFront > JBoss AS on CentOS/RHEL • CMS Admin tool • Light-weight front-end (Stripped down JBoss AS) • JOPR - JBoss management console > Load-balancing • Apache httpd, mod_cluster + DNS round-robin 20
  • 21. Deployment Amazon EC2 web web front-end web front-end web front-end web front-end web front-end web front-end web front-end Postgres front-end JBoss AS load-balancer load-balancer apache Index Updates Lucene Lucene Lucene Indexes Lucene Indexes Users CMS CMS Indexes Indexes Images JBoss AS EBS/S3 Video etc CloudFront Admin 21
  • 22. Techniques for building highly scalable Web sites and Web applications
  • 23. Overview of using Hibernate Search query projection > Hibernate Search allows you to return a subset of properties directly from the Lucene index > Avoids a database hit!! > Requirements • the properties projected must be stored in the index @Field(store=Store.YES) • only simple properties of the indexed entity or its embedded associations 23
  • 24. Hibernate Search query projection - APIs > Example - Result Transformer org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Blog.class ); query.setProjection( "title", "author.name" ); query.setResultTransformer( new StaticAliasToBeanResultTransformer( BlogView.class, "title", "author" ) ); List<BlogView> results = (List<BlogView>) query.list(); for(BlogView view : results) { log.info( "Blog: " + view.getTitle() + ", " + view.getAuthor() ); } • See org.hibernate.transform.ResultTransformer Interface for more details 24
  • 25. Overview of Hibernate Search index replication > Automatic replication > Local indexes JMS Queue Index updates > Updates delegated to process a master Master Hibernate Hibernate Slave Slave Slave Slave Search Hibernate Search Slave Slave • via JMS Queue Master Hibernate Search Hibernate Lucene Search Lucene Hibernate Lucene Search Updates Hibernate Lucene Search Index search Index Search Index search Lucene Index search Lucene search Index Lucene Can easily add more search Index Lucene > search copy Index Index slaves copy 25
  • 26. Overview of Hibernate Search index sharding > Allows you to index a given entity type into several sub indexes • default strategy uses hash of id field > Can Specify a custom sharding strategy • shard on a business field e.g geographic location, product category, etc... Dealer Entity Custom sharding Stratergy Dealer Lucene Dealer Index Shard Just Cars Index Just Bikes Index Shard 26
  • 27. Techniques for building applications that are cloud- ready > Break the architecture into small discrete pieces • separated CMS from content delivery • individual sites for Cars, Bikes etc... • JBoss micro-container > Independently deployable components • can deploy CMS across number of servers • mix and match site deployments 27
  • 28. Take control of your cloud > JOPR • more than just a JBoss management console • monitor OS, App Servers, Database and more • pluggable agents with simple API > EC2 • scriptable AMIs for rapid server configuration • change an instances personality at runtime • automate automate automate 28
  • 29. So why Amazon Web Services? > Flexibility • easily add and remove instances • scale on demand!!! > Play space • can quick bring-up environments to experiment with • production migration > No lock-in > Complete cloud offering 29
  • 30. More Amazon Web Services > S3 - Simple Storage > Elastic Block Storage - EBS • fast persistence storage • mounted multiple volumes in RAID 0 • snapshot backups to S3 > CloudFront • content delivery network • used for static content images & video 30
  • 31. Summary > Hibernate Search • unified programmatic model • feels like Hibernate, search like Lucene > Scalability • avoid inessential database hits • simple is better > Simplicity in the Cloud • design to scale out, not up!!! 31
  • 32. Questions? > http://search.hibernate.org > Hibernate Search in Action (Manning) > http://lucene.apache.org > a.walker@base2services.com > emmanuel@hibernate.org 32
  • 33. Emmanuel Bernard emmanuel@hibernate.org Hibernate Search in Action - Manning http://search.hibernate.org http://in.relation.to/Bloggers/Emmanuel Aaron Walker a.walker@base2services.com http://blog.base2services.com/aaron