Presentation for Spark::Red Insight Conference in Cambridge, MA on August 25, 2015. This deck summarizes tools, considerations, and common issues with Oracle Endeca Guided Search performance.
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Endeca Performance Considerations
1. Endeca Performance and Scalability
Hard-won lessons from the field – Peter Curran, Founder Cirrus10
art by Liam Brazier, buy it here! liambrazier.com/Shop
2. Seattle HQ, distributed team
~50 resources (25 EE + subs)
All onshore labor
Endeca or Oracle partner since 2010
End-to-end implementations
Relevance tuning
Architecture & process analysis
Program roadmaps
Upgrades & migrations
Time & materials
Fixed fee with risk premium
Cost + bonus
Easy contracts
ROI guarantees
~70 Endeca customers
B2C and B2B
CMS Gurus
Marquee Presenter at OOW 2014
100% Referenceable
6. What do I need tools for?
• Why did it break?
• Will it break this year?
Tools
1. MDEX Request Logs
2. Request Log Analyzer (Cheetah)
3. MDEX Perf – Load Testing (Eneperf)
Art by Liam Brazier+
7. What is the request log?
• MDEX’s main log file – dumps every query to a log
• Includes query latency and time of day
Why is it useful?
• Parse it to see what the heck happened
• Replay or spoof it up to answer “what if”
Where do you find it?
• <working-dir>/logs/dgraphs/Dgraph1/
8. Cheetah is an MDEX Log analysis tool
Reports performance stats
Helps identify trends
Downloadable from Oracle
9. MDEXperf is a load-testing utility
• Ships with Endeca
What is MDEX load testing?
• Send simulated user traffic against MDEX and site
• Learn how site performs under specific traffic conditions
Keys to a successful load test…
• Stress system in way that represents expected production usage
• Monitor performance during and after each test iteration
• Test all scenarios, functionality, and technology
10. Avoid default setNavAllRefinements /
allgroups=1 if possible
Exact, Phrase, and Proximity
relevance ranking modules are
expensive
Response sizes > 500kb
Use record filters before text
searches
Avoid large flat dimensions
Art by Liam Brazier+
11. Wildcarding
Interactions of large thesaurus +
spelling + stemming on large
datasets
Frequent Partial Updates
Not enough physical RAM on
server
Art by Liam Brazier+
13. Is a hot dog a sandwich?
Is a pizza an open-faced sandwich?
Can an American city be truly great
w/o a signature sandwich?
• If so: Los Angeles? Is a taco a sandwich?
• New Orleans: Po’ Boy or Muffaletta?
• Which city should claim the hot dog?
• Correct answer: Chicago
14. Forge Dgidx
Index
Distribution
Join data sources and
manipulate the data
(Step 1)
Generate index file
(Step 2)
Distribute the files
across Dgraph
(Step 3)
Total Index Time
15. Size of the index
• 1,000,000+ records
Type of records in index
• Catalog, Web Content, Social
Content, Analytical Content
Features and functionality
• Store inventory, Store level pricing
• Compatibility (Fitment)
• Endeca Recommendations
Data Model
• Wide record vs. RRN
• Internationalization
• Type of joins
Data Manipulations
• Data cleanups - Java/Perl/XML
manipulators
Components Usage
• Traditional Forge
• CAS (Multi-threaded)
16. • De-normalized model
• Adds store inventory to
the product record
• Joins happen at indexing
• Normalized model
• Inventory stored in separate
record from products
• Joins happen at query time
• PRO: Fast queries
• CON: Slow updates
• CON: More back-end code
• PRO: Fast updates
• CON: Slower-ish queries
• CON: More front-end code
17. Use a real ETL tool if you can
Use record cache when joining
the data sources in the pipeline.
CAS is multi threaded, but it’s not
as flexible as traditional Forge
Beware Forge left joins
Dgidx is multi-threaded.
Configure optimal threads to
hasten this step.
Art by Liam Brazier+
18. Use Dgidx flags carefully,
specifying many pre-computed
sorts can affect the performance.
If index distribution time is slow,
consider rolling your own
approach to compress the index
before distributing it
Art by Liam Brazier+
20. 3 major sites live since 2003
Originally a bridged multi-MDEX
Large index due to fitment
Re-engineered for wide records
• <100ms MDEX response time
• 3 updates/wk at many hours each
• Tried partial updates but failed
Art by Liam Brazier
So let's start talking about some of the tools you'll need, to implement your performance testing plan. As I mentioned earlier, the focus of this presentation will be on monitoring and testing MDEX performance.
What Tools?
You'll want to know how to take a snapshot of your current and past performance.
You'll also need a way to test how changes and traffic growth will affect your future MDEX performance.
Tools
The tools we'll discuss are the MDEX request log, a reporting tool called Cheetah, and Eneperf, Endeca's load testing utility. We'll dive into these tools in a moment, but let's first talk about some important terminology we'll use throughout this presentation.
So how can we measure throughput and latency in the MDEX engine? How can we determine the amount of concurrent traffic the MDEX is capable of handling, and how can we tell if that throughput capacity has been exceeded, resulting in slowness for the end user? Well, the MDEX engine logs every query it processes in its request log, including the time the query came in, and how long it took to process.
Use?
The request log has two very important uses.
First, it's a record of what happened in the past. If your MDEX ever experiences a problem processing queries, the request log will allow for a post mortem of what happened, or simply tell you how much traffic was received during a certain time period. By analyzing this request log, we can determine how much traffic the engine was able to handle each hour, as well as identify queries that resulted in long latency for the end user.
Second, the request log is essential for running load tests, which are a fundamental component of your performance testing strategy. Using various load testing tools, including Endeca's own eneperf and mdexperf utilities, you can replay the request log against a live MDEX, effectively simulating real user traffic. You might choose to do this in a test environment after you make changes, to see if your changes have had an impact on MDEX performance. We'll get into how to set up this kind of test shortly.
Where can you locate?
You can locate these logs by checking your project's app config definition if you're using the deployment template, or you can check your control script if you're on an older Endeca version. If you can't locate these logs, feel free to ask Endeca support for assistance, or ask you're Endeca administrator.
Our third tool is ENEperf, a load-testing utility that ships with Endeca. As I mentioned earlier, Eneperf can be used to load-test an MDEX engine, by "replaying" queries from a request log back against the MDEX.
The purpose of load testing is to stress your MDEX in a way that replicates real user activity, but doing so in an isolated environment, where production traffic won't be impacted by the test. Using a load testing tool like Eneperf, you can learn how your system will perform under specific conditions.
The keys to a successful load test include:
Developing a test plan that will stress your system in a way that reflects real-world traffic and user behavior, ensuring you have proper monitoring in place during each test iteration, and making sure your tests cover all functionality and technology used by your site. Let's talk a bit about how to use eneperf to run your load tests.