Successfully reported this slideshow.
Your SlideShare is downloading. ×

Endeca Performance Considerations


Check these out next

1 of 26 Ad

Endeca Performance Considerations

Download to read offline

Presentation for Spark::Red Insight Conference in Cambridge, MA on August 25, 2015. This deck summarizes tools, considerations, and common issues with Oracle Endeca Guided Search performance.

Presentation for Spark::Red Insight Conference in Cambridge, MA on August 25, 2015. This deck summarizes tools, considerations, and common issues with Oracle Endeca Guided Search performance.


More Related Content

Slideshows for you (20)

Viewers also liked (12)


Similar to Endeca Performance Considerations (20)

Recently uploaded (20)


Endeca Performance Considerations

  1. 1. Endeca Performance and Scalability Hard-won lessons from the field – Peter Curran, Founder Cirrus10 art by Liam Brazier, buy it here!
  2. 2. Seattle HQ, distributed team ~50 resources (25 EE + subs) All onshore labor Endeca or Oracle partner since 2010 End-to-end implementations Relevance tuning Architecture & process analysis Program roadmaps Upgrades & migrations Time & materials Fixed fee with risk premium Cost + bonus Easy contracts ROI guarantees ~70 Endeca customers B2C and B2B CMS Gurus Marquee Presenter at OOW 2014 100% Referenceable
  3. 3. MDEX Performance Update Performance Case study: Auto Parts
  4. 4. ITL Index ingestion • Forge • CAS MDEX The index itself • Dgraphs Assembler Application interface • Service / Process Diagram here:
  5. 5. The primary consideration
  6. 6. What do I need tools for? • Why did it break? • Will it break this year? Tools 1. MDEX Request Logs 2. Request Log Analyzer (Cheetah) 3. MDEX Perf – Load Testing (Eneperf) Art by Liam Brazier+
  7. 7. What is the request log? • MDEX’s main log file – dumps every query to a log • Includes query latency and time of day Why is it useful? • Parse it to see what the heck happened • Replay or spoof it up to answer “what if” Where do you find it? • <working-dir>/logs/dgraphs/Dgraph1/
  8. 8. Cheetah is an MDEX Log analysis tool Reports performance stats Helps identify trends Downloadable from Oracle
  9. 9. MDEXperf is a load-testing utility • Ships with Endeca What is MDEX load testing? • Send simulated user traffic against MDEX and site • Learn how site performs under specific traffic conditions Keys to a successful load test… • Stress system in way that represents expected production usage • Monitor performance during and after each test iteration • Test all scenarios, functionality, and technology
  10. 10. Avoid default setNavAllRefinements / allgroups=1 if possible Exact, Phrase, and Proximity relevance ranking modules are expensive Response sizes > 500kb Use record filters before text searches Avoid large flat dimensions Art by Liam Brazier+
  11. 11. Wildcarding Interactions of large thesaurus + spelling + stemming on large datasets Frequent Partial Updates Not enough physical RAM on server Art by Liam Brazier+
  12. 12. The primary consideration 2 years after you implement
  13. 13. Is a hot dog a sandwich? Is a pizza an open-faced sandwich? Can an American city be truly great w/o a signature sandwich? • If so: Los Angeles? Is a taco a sandwich? • New Orleans: Po’ Boy or Muffaletta? • Which city should claim the hot dog? • Correct answer: Chicago
  14. 14. Forge Dgidx Index Distribution Join data sources and manipulate the data (Step 1) Generate index file (Step 2) Distribute the files across Dgraph (Step 3) Total Index Time
  15. 15. Size of the index • 1,000,000+ records Type of records in index • Catalog, Web Content, Social Content, Analytical Content Features and functionality • Store inventory, Store level pricing • Compatibility (Fitment) • Endeca Recommendations Data Model • Wide record vs. RRN • Internationalization • Type of joins Data Manipulations • Data cleanups - Java/Perl/XML manipulators Components Usage • Traditional Forge • CAS (Multi-threaded)
  16. 16. • De-normalized model • Adds store inventory to the product record • Joins happen at indexing • Normalized model • Inventory stored in separate record from products • Joins happen at query time • PRO: Fast queries • CON: Slow updates • CON: More back-end code • PRO: Fast updates • CON: Slower-ish queries • CON: More front-end code
  17. 17. Use a real ETL tool if you can Use record cache when joining the data sources in the pipeline. CAS is multi threaded, but it’s not as flexible as traditional Forge Beware Forge left joins Dgidx is multi-threaded. Configure optimal threads to hasten this step. Art by Liam Brazier+
  18. 18. Use Dgidx flags carefully, specifying many pre-computed sorts can affect the performance. If index distribution time is slow, consider rolling your own approach to compress the index before distributing it Art by Liam Brazier+
  19. 19. Major Auto Parts Company
  20. 20. 3 major sites live since 2003 Originally a bridged multi-MDEX Large index due to fitment Re-engineered for wide records • <100ms MDEX response time • 3 updates/wk at many hours each • Tried partial updates but failed Art by Liam Brazier
  21. 21. • 110,000,000 very wide records
  22. 22. • 4,500,000 narrow records

Editor's Notes

  • So let's start talking about some of the tools you'll need, to implement your performance testing plan. As I mentioned earlier, the focus of this presentation will be on monitoring and testing MDEX performance.

    What Tools?
    You'll want to know how to take a snapshot of your current and past performance.

    You'll also need a way to test how changes and traffic growth will affect your future MDEX performance.

    The tools we'll discuss are the MDEX request log, a reporting tool called Cheetah, and Eneperf, Endeca's load testing utility. We'll dive into these tools in a moment, but let's first talk about some important terminology we'll use throughout this presentation.
  • So how can we measure throughput and latency in the MDEX engine? How can we determine the amount of concurrent traffic the MDEX is capable of handling, and how can we tell if that throughput capacity has been exceeded, resulting in slowness for the end user? Well, the MDEX engine logs every query it processes in its request log, including the time the query came in, and how long it took to process.

    The request log has two very important uses.

    First, it's a record of what happened in the past. If your MDEX ever experiences a problem processing queries, the request log will allow for a post mortem of what happened, or simply tell you how much traffic was received during a certain time period. By analyzing this request log, we can determine how much traffic the engine was able to handle each hour, as well as identify queries that resulted in long latency for the end user.

    Second, the request log is essential for running load tests, which are a fundamental component of your performance testing strategy. Using various load testing tools, including Endeca's own eneperf and mdexperf utilities, you can replay the request log against a live MDEX, effectively simulating real user traffic. You might choose to do this in a test environment after you make changes, to see if your changes have had an impact on MDEX performance. We'll get into how to set up this kind of test shortly.

    Where can you locate?
    You can locate these logs by checking your project's app config definition if you're using the deployment template, or you can check your control script if you're on an older Endeca version. If you can't locate these logs, feel free to ask Endeca support for assistance, or ask you're Endeca administrator.
  • Our third tool is ENEperf, a load-testing utility that ships with Endeca. As I mentioned earlier, Eneperf can be used to load-test an MDEX engine, by "replaying" queries from a request log back against the MDEX.

    The purpose of load testing is to stress your MDEX in a way that replicates real user activity, but doing so in an isolated environment, where production traffic won't be impacted by the test. Using a load testing tool like Eneperf, you can learn how your system will perform under specific conditions.

    The keys to a successful load test include:

    Developing a test plan that will stress your system in a way that reflects real-world traffic and user behavior, ensuring you have proper monitoring in place during each test iteration, and making sure your tests cover all functionality and technology used by your site. Let's talk a bit about how to use eneperf to run your load tests.