© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc.
A virtual developer conference for Adobe Experience
Manager
Gaston Gonzalez | Do you need an external search platform for AEM?
© 2016 headwire.com, Inc. All Rights Reserved. 2
About Me
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc.
Agenda
3
1 | What’s the problem?
2 | How can an external search platform help?
3 | How do you get started?
4 | Demo
5 | Q&A
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 4
What’s the problem?
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. All Rights Reserved.
What’s the problem?
5
Sites are becoming
increasingly
dynamic
Not all content is in
AEM (or should be)
Enterprise
integration can be
difficult
© 2016 headwire.com, Inc. All Rights Reserved. 6
Where is your enterprise data?
Web site data is typically distributed in disparate
locations:
• Digital media services
• Analytics
• Ratings & reviews
• Social & collaboration
• Product data
• CMS content
• Digital assets
• Internal RDBMs
• Legacy Systems
• Cloud-based APIs
© 2016 headwire.com, Inc. All Rights Reserved. 7
The Voice of the Customer & Marketing
© 2016 headwire.com, Inc. All Rights Reserved. 8
Enterprise Data Integration & AEM
Approach Pro Con
Real-time integration • Eliminates content
synchronization
• Performance dependent on
weakest link
• Data merging not possible
• Multiple calls required for
aggregated data views
Load data into AEM • Improved rendering performance • Data duplication
• Content synchronization
• Clustering and/or replication
Hybrid approach • A balance of the above • A balance of the above
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 9
How can an external
search platform help?
© 2016 headwire.com, Inc. All Rights Reserved. 10
Search Platform: Features
Search
Platform
Full Text Search
Text Analysis
Linguistics & NLP
Federated SearchAutosuggest
Geospatial Search
Rich Documents
Faceted Search
Multi Lingual
Search
Hit Highlighting
More Like ThisDid You Mean
© 2016 headwire.com, Inc. All Rights Reserved. 11
Search Platform: Use Cases
 Full Text Search
 Federated Search
 Dynamic Navigation
 Site Navigation
 Breadcrumbs
 List Pages
 Content Aggregation
 Landing Pages
 Product Pages
 Dynamic Content Pods
 Recommendations
 Carousels, Spotlights
© 2016 headwire.com, Inc. All Rights Reserved. 12
Federated Search
Search across all content types
Integration Approach
1. Normalize common fields across content
types.
2. Allocate a content type field for filtering and
boosting.
3. Load all content type documents into a single
collection.
4. Consider using the eDisMax query parser.
5. Consider layering on boost queries or
function queries to meet relevancy goals
(e.g., popularity, freshness, etc.)
© 2016 headwire.com, Inc. All Rights Reserved. 13
Dynamic Navigation – List Pages
Category pages such as product list pages
Integration Approach
1. Index category hierarchy information along with
your documents and encode path levels.
2. Implement navigation components using Solr’s
facet.prefix along with a wildcard query
(*:*).
3. Consider layering on boost queries or function
queries to meet relevancy goals (e.g., higher
margin products, popularity, freshness, etc.)
https://wiki.apache.org/solr/HierarchicalFaceting
© 2016 headwire.com, Inc. All Rights Reserved. 14
Content Aggregation (1 of 2)
Entertainment Industry
• Shows
• Series
• Episodes
• Images
• Videos
Integration Approach
1. Establish a well-defined tag taxonomy.
2. Tag related, disparate content types with tags.
3. Implement components to query across tags.
4. Consider layering on boost queries or function queries
to meet relevancy goals (e.g., popularity, freshness,
etc.)
© 2016 headwire.com, Inc. All Rights Reserved. 15
Content Aggregation (2 of 2)
Life Sciences Industry
• Product data
• Material Safety & Data Sheets (MSDS)
• Certificates of Analysis (COA)
• FAQs
© 2016 headwire.com, Inc. All Rights Reserved. 16
Item-based Recommendations
More Like This performs document
similarity based on term vectors.
Integration Approach
1. Leverage the More Like This request handler*
and/or search component*.
2. Identify fields that capture the “aboutness” of
the item and enable term vectors.
3. Issue a field query using the document ID for
which you want related documents.
4. Consider layering on boost queries or function
queries to meet relevancy goals (e.g.,
popularity, freshness, etc.)
5. Consider using copy fields and filtering out
“noisy” tokens.
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis
© 2016 headwire.com, Inc. All Rights Reserved. 17
Full Text Search in AEM
Integration Approach Pros Cons
Jackrabbit Oak • OOTB
• JCR based model
• Can be used with the custom approach
(i.e. shared Solr: oak collection + custom
collections)
• JCR property-based model
• Does not map well to UI view
• Limited search features*
External Search Platform (custom integration) • Leverage existing investment
• Full search API support
• Indexing
• Search
• Full control over document model
• Full control of ranking/scoring
• Scale Solr independently for AEM
• Implementation effort
• Additional infrastructure needed
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 18
How do you get
started?
© 2016 headwire.com, Inc. All Rights Reserved. 19
Getting Started with Solr
Infrastructure Sizing
• Document corpus
• Document size
• Index & search latency
• Query volume…
Deployment Mode
• SolrCloud
• Standalone
• Master / Slave
Analysis
• Content Inventory
• Requirements
• Data Modeling
• Signal Modeling…
Solr Implementation
• Collections
• Schema Definition
• Query Definition
• Scoring Functions…
AEM Integration
• Indexing
• Search
• Presentation
Other Data Store
Integration
• Indexing
Assumptions & Recommendations
• Apache Solr as a shared search platform
• Each content source controls its
own indexing.
• AEM “owns the glass” and provides
the user experience.
• Leverage SolrJ directly.
• Considering finding Solr-specific
resources for search-specific
development.
© 2016 headwire.com, Inc. All Rights Reserved. 20
AEM Solr Search: A Reference Implementation
 Open source reference integration between Apache Solr and AEM
 Rapidly prototype front-end, search-driven experiences
 Rich set of extendable UI components (results, facets, pagination, etc.)
 Sample search site – Geometrixx Media Sample
 SolrJ OSGi bundle
 Quickstart distributions
 Solr 4.x – Maven sub project
 Solr 5.x – Vagrant + VirtualBox
 SolrCloud, Standalone, Master/Slave support
http://www.aemsolrsearch.com/
© 2016 headwire.com, Inc. All Rights Reserved. 21
AEM Indexing Approaches
Approach Details AEM Solr Search Notes
Event Driven (Direct) 1. Event Listener
2. Adapter/Sling Model
3. SolrJ API
Yes • Triggered on content
add/update/delete
Event Driven (Indirect) 1. Event Listener
2. Adapter/Sling Model
3. Send to ETL
No. Indexing interface in next
release.
• Triggered on content
add/update/delete
On-Demand (Direct) 1. Adapter/Sling Model
2. Walk JCR
3. SolrJ API
No. Future release. • Triggered by user or scheduler
Poor Man's Polling (Direct) 1. Content serialization Servlet
2. External shell script
3. Post to Solr’s Update
Request Handler
Yes • Only recommended for small
sites
• Triggered by user or scheduler
ETL Polling (In Direct) 1. Connect to data source
2. Transform, merge, enrich,…
3. Index Solr
Yes. AEM Solr Search: Product
Demo
• Approach covered in demo
• Triggered by user or scheduler
http://www.slideshare.net/therealgaston/adapt-to2014-integratingopensourcesearchwithaemfinalr2
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 22
AEM + Product + Solr
Architecture
© 2016 headwire.com, Inc. All Rights Reserved. 23
AEM + Product + Solr Demo
 Data Sources
 AEM Geometrixx Media Site
 Best Buy Movie Product Data
 Platforms
 AEM 6.1
 Apache Solr 5.4.1 (SolrCloud)
 Apache Camel 2.17
 Application Code
 AEM Solr Search
 AEM Solr Search – Product Demo
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
© 2016 headwire.com, Inc. All Rights Reserved. 25
Apache Camel as an ETL (Best Buy -> Solr)
http://www.gastongonzalez.com/tech-blog?tag=Apache+Camel
© 2016 headwire.com, Inc. All Rights Reserved. 26
AEM Solr Search: Product Demo
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 27
Demo
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc. 28
Reference Architecture
© 2016 headwire.com, Inc. All Rights Reserved. 29
Multi-Collection in SolrCloud Mode
© 2016 headwire.com, Inc. All Rights Reserved. 30
Multi-Core in Master/Slave Mode
© 2016 headwire.com, Inc. All Rights Reserved. 31
Resources
 Sample Code
 AEM Solr Search: http://www.aemsolrsearch.com
 AEM Solr Search: Product Sample: https://github.com/GastonGonzalez/aem-solr-search-
product-sample
 Related Talks
 CIRCUIT 2016 - Advanced AEM Search - Consuming External Content and Enriching Content
with Apache Camel
 adaptTo() 2014 - Integrating Open Source with AEM:
http://www.slideshare.net/therealgaston/adapt-to2014-
integratingopensourcesearchwithaemfinalr2
 Blogs
 My tech blog: http://www.gastongonzalez.com
© 2016 headwire.com, Inc. All Rights Reserved.© 2016 headwire.com, Inc.
Q&A
32
Gaston Gonzalez
aemsolr@headwire.com
Twitter: @therealgaston
Blog: www.gastongonzalez.com
© 2016 headwire.com, Inc. All Rights Reserved.

Do you need an external search platform for Adobe Experience Manager?

  • 1.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. A virtual developer conference for Adobe Experience Manager Gaston Gonzalez | Do you need an external search platform for AEM?
  • 2.
    © 2016 headwire.com,Inc. All Rights Reserved. 2 About Me
  • 3.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. Agenda 3 1 | What’s the problem? 2 | How can an external search platform help? 3 | How do you get started? 4 | Demo 5 | Q&A
  • 4.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 4 What’s the problem?
  • 5.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. All Rights Reserved. What’s the problem? 5 Sites are becoming increasingly dynamic Not all content is in AEM (or should be) Enterprise integration can be difficult
  • 6.
    © 2016 headwire.com,Inc. All Rights Reserved. 6 Where is your enterprise data? Web site data is typically distributed in disparate locations: • Digital media services • Analytics • Ratings & reviews • Social & collaboration • Product data • CMS content • Digital assets • Internal RDBMs • Legacy Systems • Cloud-based APIs
  • 7.
    © 2016 headwire.com,Inc. All Rights Reserved. 7 The Voice of the Customer & Marketing
  • 8.
    © 2016 headwire.com,Inc. All Rights Reserved. 8 Enterprise Data Integration & AEM Approach Pro Con Real-time integration • Eliminates content synchronization • Performance dependent on weakest link • Data merging not possible • Multiple calls required for aggregated data views Load data into AEM • Improved rendering performance • Data duplication • Content synchronization • Clustering and/or replication Hybrid approach • A balance of the above • A balance of the above
  • 9.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 9 How can an external search platform help?
  • 10.
    © 2016 headwire.com,Inc. All Rights Reserved. 10 Search Platform: Features Search Platform Full Text Search Text Analysis Linguistics & NLP Federated SearchAutosuggest Geospatial Search Rich Documents Faceted Search Multi Lingual Search Hit Highlighting More Like ThisDid You Mean
  • 11.
    © 2016 headwire.com,Inc. All Rights Reserved. 11 Search Platform: Use Cases  Full Text Search  Federated Search  Dynamic Navigation  Site Navigation  Breadcrumbs  List Pages  Content Aggregation  Landing Pages  Product Pages  Dynamic Content Pods  Recommendations  Carousels, Spotlights
  • 12.
    © 2016 headwire.com,Inc. All Rights Reserved. 12 Federated Search Search across all content types Integration Approach 1. Normalize common fields across content types. 2. Allocate a content type field for filtering and boosting. 3. Load all content type documents into a single collection. 4. Consider using the eDisMax query parser. 5. Consider layering on boost queries or function queries to meet relevancy goals (e.g., popularity, freshness, etc.)
  • 13.
    © 2016 headwire.com,Inc. All Rights Reserved. 13 Dynamic Navigation – List Pages Category pages such as product list pages Integration Approach 1. Index category hierarchy information along with your documents and encode path levels. 2. Implement navigation components using Solr’s facet.prefix along with a wildcard query (*:*). 3. Consider layering on boost queries or function queries to meet relevancy goals (e.g., higher margin products, popularity, freshness, etc.) https://wiki.apache.org/solr/HierarchicalFaceting
  • 14.
    © 2016 headwire.com,Inc. All Rights Reserved. 14 Content Aggregation (1 of 2) Entertainment Industry • Shows • Series • Episodes • Images • Videos Integration Approach 1. Establish a well-defined tag taxonomy. 2. Tag related, disparate content types with tags. 3. Implement components to query across tags. 4. Consider layering on boost queries or function queries to meet relevancy goals (e.g., popularity, freshness, etc.)
  • 15.
    © 2016 headwire.com,Inc. All Rights Reserved. 15 Content Aggregation (2 of 2) Life Sciences Industry • Product data • Material Safety & Data Sheets (MSDS) • Certificates of Analysis (COA) • FAQs
  • 16.
    © 2016 headwire.com,Inc. All Rights Reserved. 16 Item-based Recommendations More Like This performs document similarity based on term vectors. Integration Approach 1. Leverage the More Like This request handler* and/or search component*. 2. Identify fields that capture the “aboutness” of the item and enable term vectors. 3. Issue a field query using the document ID for which you want related documents. 4. Consider layering on boost queries or function queries to meet relevancy goals (e.g., popularity, freshness, etc.) 5. Consider using copy fields and filtering out “noisy” tokens. https://cwiki.apache.org/confluence/display/solr/MoreLikeThis
  • 17.
    © 2016 headwire.com,Inc. All Rights Reserved. 17 Full Text Search in AEM Integration Approach Pros Cons Jackrabbit Oak • OOTB • JCR based model • Can be used with the custom approach (i.e. shared Solr: oak collection + custom collections) • JCR property-based model • Does not map well to UI view • Limited search features* External Search Platform (custom integration) • Leverage existing investment • Full search API support • Indexing • Search • Full control over document model • Full control of ranking/scoring • Scale Solr independently for AEM • Implementation effort • Additional infrastructure needed
  • 18.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 18 How do you get started?
  • 19.
    © 2016 headwire.com,Inc. All Rights Reserved. 19 Getting Started with Solr Infrastructure Sizing • Document corpus • Document size • Index & search latency • Query volume… Deployment Mode • SolrCloud • Standalone • Master / Slave Analysis • Content Inventory • Requirements • Data Modeling • Signal Modeling… Solr Implementation • Collections • Schema Definition • Query Definition • Scoring Functions… AEM Integration • Indexing • Search • Presentation Other Data Store Integration • Indexing Assumptions & Recommendations • Apache Solr as a shared search platform • Each content source controls its own indexing. • AEM “owns the glass” and provides the user experience. • Leverage SolrJ directly. • Considering finding Solr-specific resources for search-specific development.
  • 20.
    © 2016 headwire.com,Inc. All Rights Reserved. 20 AEM Solr Search: A Reference Implementation  Open source reference integration between Apache Solr and AEM  Rapidly prototype front-end, search-driven experiences  Rich set of extendable UI components (results, facets, pagination, etc.)  Sample search site – Geometrixx Media Sample  SolrJ OSGi bundle  Quickstart distributions  Solr 4.x – Maven sub project  Solr 5.x – Vagrant + VirtualBox  SolrCloud, Standalone, Master/Slave support http://www.aemsolrsearch.com/
  • 21.
    © 2016 headwire.com,Inc. All Rights Reserved. 21 AEM Indexing Approaches Approach Details AEM Solr Search Notes Event Driven (Direct) 1. Event Listener 2. Adapter/Sling Model 3. SolrJ API Yes • Triggered on content add/update/delete Event Driven (Indirect) 1. Event Listener 2. Adapter/Sling Model 3. Send to ETL No. Indexing interface in next release. • Triggered on content add/update/delete On-Demand (Direct) 1. Adapter/Sling Model 2. Walk JCR 3. SolrJ API No. Future release. • Triggered by user or scheduler Poor Man's Polling (Direct) 1. Content serialization Servlet 2. External shell script 3. Post to Solr’s Update Request Handler Yes • Only recommended for small sites • Triggered by user or scheduler ETL Polling (In Direct) 1. Connect to data source 2. Transform, merge, enrich,… 3. Index Solr Yes. AEM Solr Search: Product Demo • Approach covered in demo • Triggered by user or scheduler http://www.slideshare.net/therealgaston/adapt-to2014-integratingopensourcesearchwithaemfinalr2
  • 22.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 22 AEM + Product + Solr Architecture
  • 23.
    © 2016 headwire.com,Inc. All Rights Reserved. 23 AEM + Product + Solr Demo  Data Sources  AEM Geometrixx Media Site  Best Buy Movie Product Data  Platforms  AEM 6.1  Apache Solr 5.4.1 (SolrCloud)  Apache Camel 2.17  Application Code  AEM Solr Search  AEM Solr Search – Product Demo
  • 24.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  • 25.
    © 2016 headwire.com,Inc. All Rights Reserved. 25 Apache Camel as an ETL (Best Buy -> Solr) http://www.gastongonzalez.com/tech-blog?tag=Apache+Camel
  • 26.
    © 2016 headwire.com,Inc. All Rights Reserved. 26 AEM Solr Search: Product Demo
  • 27.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 27 Demo
  • 28.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. 28 Reference Architecture
  • 29.
    © 2016 headwire.com,Inc. All Rights Reserved. 29 Multi-Collection in SolrCloud Mode
  • 30.
    © 2016 headwire.com,Inc. All Rights Reserved. 30 Multi-Core in Master/Slave Mode
  • 31.
    © 2016 headwire.com,Inc. All Rights Reserved. 31 Resources  Sample Code  AEM Solr Search: http://www.aemsolrsearch.com  AEM Solr Search: Product Sample: https://github.com/GastonGonzalez/aem-solr-search- product-sample  Related Talks  CIRCUIT 2016 - Advanced AEM Search - Consuming External Content and Enriching Content with Apache Camel  adaptTo() 2014 - Integrating Open Source with AEM: http://www.slideshare.net/therealgaston/adapt-to2014- integratingopensourcesearchwithaemfinalr2  Blogs  My tech blog: http://www.gastongonzalez.com
  • 32.
    © 2016 headwire.com,Inc. All Rights Reserved.© 2016 headwire.com, Inc. Q&A 32 Gaston Gonzalez aemsolr@headwire.com Twitter: @therealgaston Blog: www.gastongonzalez.com
  • 33.
    © 2016 headwire.com,Inc. All Rights Reserved.

Editor's Notes

  • #6 Not all content is in AEM (or should be) Enterprise integration can be difficult Site are becoming increasingly dynamic Customers expect an intelligent and relevant experience.
  • #9 So, what are our options
  • #11 * Image credit to the Apache Solr Project Let’s look at what the search platform provides.
  • #14 Depth Encoding Resources https://wiki.apache.org/solr/HierarchicalFaceting
  • #17 MLT Search Component Limitations: filter queries operate on document MLT Request Handler Does not work in distributed mode. - https://issues.apache.org/jira/browse/SOLR-5480. The document must exist on the shard.
  • #18 Jackrabbit Oak provides several index providers Property Index – standard, does not support full-text search Ordered Index Lucene Index – full-text and property index Solr Index- full-text, requires and external Oak Query Support: https://jackrabbit.apache.org/oak/docs/query/query-engine.html Native query support Facets Suggestions Similiarity
  • #22 Event Listener JCR Observation Sling Eventing