Endeca Performance Considerations

Endeca Performance and Scalability
Hard-won lessons from the field – Peter Curran, Founder Cirrus10
art by Liam Brazier, buy it here! liambrazier.com/Shop

Seattle HQ, distributed team
~50 resources (25 EE + subs)
All onshore labor
Endeca or Oracle partner since 2010
End-to-end implementations
Relevance tuning
Architecture & process analysis
Program roadmaps
Upgrades & migrations
Time & materials
Fixed fee with risk premium
Cost + bonus
Easy contracts
ROI guarantees
~70 Endeca customers
B2C and B2B
CMS Gurus
Marquee Presenter at OOW 2014
100% Referenceable

MDEX Performance
Update Performance
Case study: Auto Parts

ITL
Index ingestion
• Forge
• CAS
MDEX
The index itself
• Dgraphs
Assembler
Application interface
• Service / Process
Diagram here:
bit.ly/1PvJYFX

What do I need tools for?
• Why did it break?
• Will it break this year?
Tools
1. MDEX Request Logs
2. Request Log Analyzer (Cheetah)
3. MDEX Perf – Load Testing (Eneperf)
Art by Liam Brazier+

What is the request log?
• MDEX’s main log file – dumps every query to a log
• Includes query latency and time of day
Why is it useful?
• Parse it to see what the heck happened
• Replay or spoof it up to answer “what if”
Where do you find it?
• <working-dir>/logs/dgraphs/Dgraph1/

Cheetah is an MDEX Log analysis tool
Reports performance stats
Helps identify trends
Downloadable from Oracle

MDEXperf is a load-testing utility
• Ships with Endeca
What is MDEX load testing?
• Send simulated user traffic against MDEX and site
• Learn how site performs under specific traffic conditions
Keys to a successful load test…
• Stress system in way that represents expected production usage
• Monitor performance during and after each test iteration
• Test all scenarios, functionality, and technology

Avoid default setNavAllRefinements /
allgroups=1 if possible
Exact, Phrase, and Proximity
relevance ranking modules are
expensive
Response sizes > 500kb
Use record filters before text
searches
Avoid large flat dimensions

Wildcarding
Interactions of large thesaurus +
spelling + stemming on large
datasets
Frequent Partial Updates
Not enough physical RAM on
server

The primary consideration 2 years after you implement

Is a hot dog a sandwich?
Is a pizza an open-faced sandwich?
Can an American city be truly great
w/o a signature sandwich?
• If so: Los Angeles? Is a taco a sandwich?
• New Orleans: Po’ Boy or Muffaletta?
• Which city should claim the hot dog?
• Correct answer: Chicago

Forge Dgidx
Index
Distribution
Join data sources and
manipulate the data
(Step 1)
Generate index file
(Step 2)
Distribute the files
across Dgraph
(Step 3)
Total Index Time

Size of the index
• 1,000,000+ records
Type of records in index
• Catalog, Web Content, Social
Content, Analytical Content
Features and functionality
• Store inventory, Store level pricing
• Compatibility (Fitment)
• Endeca Recommendations
Data Model
• Wide record vs. RRN
• Internationalization
• Type of joins
Data Manipulations
• Data cleanups - Java/Perl/XML
manipulators
Components Usage
• Traditional Forge
• CAS (Multi-threaded)

• De-normalized model
• Adds store inventory to
the product record
• Joins happen at indexing
• Normalized model
• Inventory stored in separate
record from products
• Joins happen at query time
• PRO: Fast queries
• CON: Slow updates
• CON: More back-end code
• PRO: Fast updates
• CON: Slower-ish queries
• CON: More front-end code

Use a real ETL tool if you can
Use record cache when joining
the data sources in the pipeline.
CAS is multi threaded, but it’s not
as flexible as traditional Forge
Beware Forge left joins
Dgidx is multi-threaded.
Configure optimal threads to
hasten this step.

Use Dgidx flags carefully,
specifying many pre-computed
sorts can affect the performance.
If index distribution time is slow,
consider rolling your own
approach to compress the index
before distributing it

3 major sites live since 2003
Originally a bridged multi-MDEX
Large index due to fitment
Re-engineered for wide records
• <100ms MDEX response time
• 3 updates/wk at many hours each
• Tried partial updates but failed
Art by Liam Brazier

• 110,000,000 very wide records

Endeca Performance Considerations

Endeca Performance Considerations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Endeca Performance Considerations

Similar to Endeca Performance Considerations (20)

Recently uploaded

Recently uploaded (20)

Endeca Performance Considerations

Editor's Notes