Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Consuming External Content and Enriching Content with Apache Camel

1,025 views

Published on

While AEM Solr Search provides a framework for indexing and searching content within AEM, it does not address other real-world use cases such as indexing and searching content external to AEM (i.e. products). Secondly, it assumes that the final indexable AEM document will be produced entirely by AEM. This is often not the case, as advanced search applications typically need to enrich the document prior to indexing using external data sources.

In this talk we will extend the AEM Solr Search reference architecture to include document processing capabilities using Apache Camel. As an example, two real-world use cases will be provided: 1) ingesting an external product data set via Apache Camel into a shared Solr instance and delivering the results via AEM, and 2) enriching AEM content with analytics and ratings data for the purpose of applying popularity boosting.

Published in: Technology
  • Be the first to comment

Consuming External Content and Enriching Content with Apache Camel

  1. 1. Presented by: Gaston Gonzalez, headwire.com, Inc. + Advanced AEM Search Consuming External Content and Enriching Content with Apache Camel
  2. 2. About Me • Senior Technical Architect at headwire.com, Inc. • Search Engineer / Developer • AEM Architect / Developer • Creator of AEM Solr Search • Tech Blogger • UNIX Systems Administrator +
  3. 3. + Typical AEM + Search Integration
  4. 4. Typical AEM + Search Architecture +
  5. 5. Typical AEM + Search Architecture + Pros Cons • Straight forward implementation • Simple architecture (AEM + Search) • Complete data model in AEM? • Not all data may be in AEM • Processing overhead • Data cleansing, transformation and enrichment handled in AEM • Fault Tolerance • What if Solr is down? • Tight coupling to search platform
  6. 6. Is there another way? +
  7. 7. Goals for a better Architecture • Offload processing outside of AEM • Improve fault tolerance • Provide flexible platform for data cleansing, transformation and aggregation • Allow for changes to indexing logic with impacting AEM • Search engine agnostic +
  8. 8. Introduce an ETL / Document Processor +
  9. 9. + Document Processing
  10. 10. Document Processing Platform • Roles & Responsibilities • Enriches submitted documents prior to indexing. • Submits documents for indexing. • Terms & Definitions • Enrichment: Data cleansing, filtering, transformation, aggregation, etc. • Processing Stage: Independent processing unit responsible for contributing to the enrichment process. • Pipeline: Consists of one or more processing stages or sub pipelines. +
  11. 11. Document Processing Platform +
  12. 12. Document processing is really an integration problem, right? + Integration Library Integration Framework & Stream Processing Enterprise Service Bus Apache Camel Spring Integration Mule ESB Spring Cloud Data Flow & Cloud Stream Low Complexity High
  13. 13. + Apache Camel
  14. 14. Apache Camel • A light-weight, open source integration library. • Mediation engine • Implements well-known Enterprise Integration Patterns (EIPs) • Aggregator • Content Enricher • Content-based router • Message • Message Translator • Pipes and Filters • Splitter… +
  15. 15. Why Apache Camel? • Light weight—it’s a JAR • Imposes no runtime constraints • Routing engine • Powerful, fluent Java DSL • Mature open source project • Extensive list of integration components • Avoid writing boiler plate code—leverage EIPs +
  16. 16. Apache Camel & EIP Concepts + Message • Unit of information exchange between applications Exchange • Wraps inbound & outbound message + headers Message Channel • Allows applications to communicate using messaging Pipes and Filters • Perform loosely coupled processing on a message • Routes and Processors in Camel
  17. 17. Camel’s Data Model +
  18. 18. Camel’s Architecture +
  19. 19. Importing Product Content into Solr Problem: “As an AEM developer, I need to import product content into Solr so that I can display products via search and on PDPs on my AEM-powered site.” + Let’s use Best Buy’s Product API as example… 1. Fetch product data ZIP file via HTTP request. 2. Unzip product data. 3. Parse each JSON file to extract individual products. 4. Transform, enrich and cleanse each product as necessary. 5. Submit each product to Solr for indexing.
  20. 20. A solution using EIPs +
  21. 21. A solution using Camel +
  22. 22. A short list of Camel Components + AMPQ Git RabbitMQ ATOM HTTP / HTTP4 Rest AWS JCR RSS Bean JDBC Solr Box JMS Apache Spark Cache Jsch SQL CouchDB Log Timer Elasticsearch MongoDB XSLT File Netty / Netty4 Quartz http://camel.apache.org/components.html
  23. 23. Back to AEM and indexing AEM content… +
  24. 24. A Better AEM + Search Architecture +
  25. 25. Enrichment Use Cases for AEM • Search Relevancy • Merge ratings and review signals • Merge analytics signals (visits, page views…) • Merge social signals (likes, shares, …) • Cleanse data for search • Rich content processing (Tika) • Natural Language Processing (OpenNLP) • Filter / drop documents • Classify content +
  26. 26. AEM: Data Model (1/3) • Use a serializable object to represent your document • In fact, use a HashMap • No dependency object graph • Most search platforms already think of documents as a series of key/value pairs • Use key name prefixes to model: • Index operation type (aem.op) • Document Fields (aem.field.<field>) • Metadata (aem.meta.<field>) +
  27. 27. AEM: Data Model (1/3) HashMap<String, Object> jmsDoc = new HashMap<String, Object>(); // Operation Type jmsDoc.put("aem.op.type","ADD_DOC"); // Document fields jmsDoc.put("aem.field.id", page.getPath()); jmsDoc.put("aem.field.crxPath", page.getPath()); jmsDoc.put("aem.field.url", page.getPath() + ".html"); jmsDoc.put("aem.field.title", page.getTitle()); jmsDoc.put("aem.field.description", page.getDescription()); // Metadata jmsDoc.put("aem.meta.foo", "bar"); +
  28. 28. AEM: Listener / JMS Producer (2/3) + • Create an AEM Listener • Implement EventHandler interface • Listen for the PageEvent topics • Convert the Page resource to a our data model • Add operation type • Add document fields • Add metadata fields • Send the message to JMS index topic • Example: JmsIndexListener.java
  29. 29. AEM: JMS Camel Consumer (3/3) + • Define your Camel runtime (e.g., standalone, OSGi, etc.) • Define your Camel routes • Consume JMS topic • Route operation type using content-based router • Enrich document as needed • Convert JMS document model to Solr model • Submit index request • Example: AemToSolr.java
  30. 30. + Demo
  31. 31. Demo Prerequisites • Java 8 / Maven 3.2.x • AEM 6.1 • http://www.aemsolrsearch.com • https://github.com/GastonGonzalez/aem-solr- search-product-sample • Best Buy API Key • Vagrant and VirtualBox +
  32. 32. + Camel Runtime Options
  33. 33. Java main: CamelContext
  34. 34. Java main: Wrapper
  35. 35. OSGi Runtime
  36. 36. Resources • My Blog - http://www.gastongonzalez.com/ • AEM Solr Search - http://www.aemsolrsearch.com • Apache Camel • http://camel.apache.org/index.html • https://www.manning.com/books/camel-in- action-second-edition • Contact Us: aemsolr@headwire.com +
  37. 37. In summary… + • If you do not need enrichment, keep it simple and use a direct indexing approach. • If you have a need to enrich your AEM content consider using Camel as your document processing platform. • This architecture is NOT search-specific! • Syndicate AEM content to other systems • Workflow replacement
  38. 38. + THANK YOU.

×