Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene"


Published on

Documentum xPlore provides an integrated Search facility for the Documentum Content Server. The standalone search engine is based on EMC's xDB (Native XML database) and Lucene. In this talk we will introduce xPlore and some of its key components and capabilities. These include aspects of a tight integration of Lucene with the XML database: xQuery translation and optimization into Lucene query/API's as well as transactional update Lucene). In addition, xPlore is being deployed aggressively into virtualized environments (both disk I/O and VM). We cover some performance results and tuning tips in these areas.

Published in: Technology
  • Be the first to comment

"A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene"

  1. 1. A Study of I/O and VirtualizationPerformance with a Search Engine based on an XML database and Lucene Ed Bueché, EMC, May 25, 2011
  2. 2. Agenda§  My Background§  Documentum xPlore Context and History§  Overview of Documentum xPlore§  Tips and Observations on IO and Host Virtualization 3
  3. 3. My Background§  Ed Bueché§  Information Intelligence Group within EMC§  EMC Distinguished Engineer & xPlore Architect§  Areas of expertise •  Content Management (especially performance & scalability) •  Database (SQL and XML) and Full text search •  Previous experience: Sybase and Bell Labs§  Part of the EMC Documentum xPlore development team •  Pleasanton (CA), Grenoble (France), Shanghai, and Rotterdam (Netherlands) 4
  4. 4. Documentum search 101•  Documentum Content Server provides an object/ relational data model and query language —  Object metadata called attributes (sample: title, subject, author) —  Sub-types can be created with customer defined attributes —  Documentum Query Language (DQL) —  Example: SELECT object_name FROM foo WHERE subject = bar AND customer_id = ID1234•  DQL also support full text extensions —  Example: SELECT object_name FROM foo SEARCH DOCUMENT CONTAINS hello world WHERE subject = bar AND customer_id = ID1234
  5. 5. Introducing Documentum xPlore§  Provides Integrated Search for Documentum •  but is built as a standalone search engine to replace FAST Instream§  Built over EMC xDB, Lucene, and leading content extraction and linguistic analysis software
  6. 6. Documentum SearchHistory-at-a-glance§  almost 15 years of Structured/Unstructured integrated search 2010 - ??? xPlore Integration •  Replaces FAST in DCTM FAST Integration 2005 – 2011 •  Integrated security • Combined structured / •  Deep facet computationVerity Integration 1996 – 2005 unstructured search •  HA/DR improvements• Basic full text search through • 2 – 5 min latency •  Latency: typically secondsDQL • Score ordered results Improved Administration• Basic attribute search •  Virtualization Support• 1 day à 1 hour latency• Embedded implementation 1996 2005 2010
  7. 7. Enhancing Documentum Deploymentswith Search RDBMS DQL SQL search Content DCTM client Server•  – • 
  8. 8. Enhancing Documentum Deploymentswith Search RDBMS Documentum DQL SQL search client Content Server •  xQuery •  •  Metadata + content
  9. 9. Some Basic Design Concepts behind Documentum xPlore§  Inverted Indexes are not optimized for all use- cases •  B+-tree indexes can be far more efficient for simple, low-latency/highly dynamic scenarios§  De-normalization can t efficiently solve all problems •  Update propagation problem can be deadly •  Joins are a necessary part of most applications§  Applications need fine control over not only search criteria, but also result sets 10
  10. 10. Design concepts (con t)§  Applications need fluid, changing metadata schemas that can be efficiently queried •  Adding metadata through joins with side-tables can be inefficient to query§  Users want the power of Information Retrieval on their structured queries§  Data Management, HA, DR shouldn t be an after-thought§  When possible, operate within standards§  Lucene is not a database. Most Lucene applications deploy with databases. 11
  11. 11. Lessons Learned…Fit touse-case Structured Query Unstructured use-cases Query use-cases
  12. 12. Indexes, DB, and IR Full Text searches Hierarchical data representations (XML)Fit to Constantly changinguse-case schemas Relational DB Scoring, technology Relevance, Entities Structured Query Unstructured use-cases Query use-cases
  13. 13. Indexes, DB, and IR Meta data query JOINsFit to Advanced data managementuse-case (partitions) Full Text Transactions index technology Structured Query Unstructured use-cases Query use-cases
  14. 14. Indexes, DB, and IRFit touse-case Relational DB Full Text technology index technology Structured Query Unstructured use-cases Query use-cases
  15. 15. Documentum xPlore•  Bring  best-­‐of-­‐breed  XML  Database  with   xPlore API powerful  Apache  Lucene  Fulltext  Engine   Indexing Search Services Services•  Provides  structured  and  unstructured  search   Content Node & Data leveraging  XML  and  XQuery  standards   Processing Management Services Services•  Designed  with  Enterprise  readiness,   scalability  and  ingesCon   Analytics Admin Services•  Advanced  Data  Management  funcConality   necessary  for  large  scale  systems   xDB API xDB Query Processing&•  Industry  leading  linguisCc  technology  and   Optimization comprehensive  format  filters   xDB Transaction, Index & Page Management•  Metrics  and  AnalyCcs  
  16. 16. EMC xDB: Native XML database§  Formerly XHive database •  100% java •  XML stored in persistent DOM format §  Each XML node can be located through a 64 bit identifier §  Structure mapped to pages §  Easy to operate on GB XML files •  Full Transactional Database •  Query Language: XQuery with full text extensions§  Indexing & Optimization •  Palette of index options optimizer can pick from •  At it simplest: indexLookup(key) à node id 17
  17. 17. Libraries / Collections & Indexes = xDB segment
  18. 18. Lucene Integration§  Transactional •  Non-committed index updates in separate (typically in memory) lucene indexes •  Recently committed (but dirty) indexes backed by xDB log •  Query to index leverages Lucene multi-searcher with filter to apply update/delete blacklisting§  Lucene indexes managed to fit into xDB s ARIES-based recovery mechanism§  No changes to Lucene •  Goal: no obstacles to be as current as possible 19
  19. 19. Lucene Integration (con t)§  Both value and full text queries supported •  XML elements mapped to lucene fields •  Tokenized and value-based fields available§  Composite key queries supported •  Lucene much more flexible than traditional B- tree composite indexes§  ACL and Facet information stored in Lucene field array •  Documentum s security ACL security model highly complex and potentially dynamic •  Enables secure facet computation 20
  20. 20. xPlore has lucene search engine capabilities plus….ü  XQuery provides powerful query & data manipulation language •  A typical search engine can t even express a join •  Creation of arbitrary structure for result set •  Ability to call to language-based functions or java- based methodsü  Ability to use B-tree based indexes when needed •  xDB optimizer decides thisü  Transactional update and recovery of data/indexü  Hierarchical data modeling capability
  21. 21. Tips and Observations on IO and Host Virtualization§  Virtualization offers huge savings for companies through consolidation and automation§  Both Disk and Host virtualization available§  However, there are pitfalls to avoid •  One-size-fits-all •  Consolidation contention •  Availability of resources 22
  22. 22. Tip #1: Don t assume that one-size-fits all§  Most IT shops will create VM or SAN templates that have a fixed resource consumption •  Reduces admin costs •  Example: Two CPU VM with 2 GB of memory •  Deviations from this must be made in a special request§  Recommendations: •  Size correctly, don t accept insufficient resources •  Test pre-production environments
  23. 23. Same concept applies for diskvirtualization§  The capacity of disks are typically expressed in terms of 50GB and 100 I/ O s per sec two metrics: space and I/O capacity capacity •  Space defined in terms of 50GB and 200 I/ O s per sec GBytes capacity •  I/O capacity defined in terms of I/O s per sec§  NAS and SAN are forms of disk 50GB and 400 I/ O s per sec virtualization capacity •  The space associated with a SAN volume (for example) could be striped over multiple disks •  The more disks allocated, the higher the I/O capacity
  24. 24. Linear mapping s and Luns Four  Luns §  When mapped directly to physical disks then this could concentrate I/ Logical  volume  with   O to fewer than a linear  mapping desired set of Allocated  for   Free  space  in   drives. Index volume §  High-end SAN s like Symmetrix can handle this situation with virtual LUN s 25
  25. 25. EMC Symmetrix:Nondisruptive MobilityVirtual LUN VP Mobility Virtual Pools §  Fast, efficient mobilityFlash §  Maintains replication and400 GBRAID 5 quality of service during relocationsFibre Channel V600 GB 15K L §  Supports up to thousands of U Tier 2RAID 1 N concurrent VP LUN migrationsSATA2 TB §  Recommendation: work withRAID 6 storage technicians to ensure backend storage has sufficient I/O
  26. 26. Tip #2: Consolidation Contention §  Virtualization provides benefit from consolidation §  Consolidation provides resources to the active •  Your resources can be consumed by other VM s, other apps •  Physical resources can be over-stretched §  Recommendations: •  Track actual capacity vs. planned §  Vmware: track number of times your VM is denied CPU §  SANs: track % I/O utilization vs. number of I/O s •  For Vmware leverage guaranteed minimum resource allocations and/or allocate to non- overloaded HW
  27. 27. Some Vmware statistics§  Ready metric •  Generated by Vcenter and represents the number of cycles (across all CPUs) in which VM was denied CPU •  Generated in milliseconds and real-time sample happens at best every 20 secs •  For interactive apps: As a percentage of offered capacity > 10% is considered worrisome§  Pages-in, Pages-out •  Can indicate over subscription of memory 28
  28. 28. Sample %Ready for a production VM with xPlore deployment for an entire week 16% In this case Avg resp time 14% official area that doubled and 12% max resp time Indicates pain 10% grew by 5x 8% 6% 4% 2% 0% 29
  29. 29. Actual Ready samples during several hour period Ready  samples  (#  of  millisecs  VM  denied   CPU  in  20  sec  intervals)2500200015001000 500 0 30
  30. 30. Some Subtleties withInteractive CPU denial§  The Ready metric represents denial upon demand •  Interactive workloads can be bursty •  If no demand, then Ready counter will be low§  Poor user response encourages less usage •  Like walking on a broken leg •  Causing less Ready samples Denial spike 20 sec interval 31
  31. 31. Sharing I/O capacity §  If Multiple VM s (or servers) are sharing the same underlying physical volumes and the capacity is not managed properly •  then the available I/O capacity of the volume could be less than the theoretical capacity §  This can be seen if the OS tools show that the disk is very busy (high utilization) while the number of I/Os is lower than expectedVolume for Volume forother Luceneapplication application Both volumes spread over the same set of drives and effectively sharing the I/O capacity
  32. 32. Recommendations on diagnosingdisk I/O related issues§  On Linux/UNIX •  Have IT group install SAR and IOSTAT §  Also install a disk I/O testing tool (like Bonnie ) •  Compare Bonnie output with SAR & IOSTAT data §  High disk Utilization at much lower achieved rates could indicate contention from other applications •  Also, High SAR I/O wait time might be an indication of slow disks§  On Windows •  Leverage the Windows Performance Monitor •  Objects: Processor, Physical Disk, Memory
  33. 33. Sample output from the Bonnie toolbonnie -s 1024 -y -u -o_direct -v 10 -p 10This will increase the size of the file to 2 Gb.Examine the output. Focus on the random I/O area: ---Sequential Output (sync)----- ---Sequential Input-- --Rnd Seek- -CharUnlk- -DIOBlock- -DRewrite- -CharUnlk- -DIOBlock- --04k (10)-Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPUMach2 10*2024 73928 97 104142 5.3 26246 2.9 8872 22.5 43794 1.9 735.7 15.2 This output means -s 1024 means that 2 GB files will be created that the random read test saw 735 random I/ -o_direct means that direct I/O (by-passing buffer cache) O s per sec at 15% will be done CPU busy -v 10 means that 10 different 2GB files will be created. -p 10 means that 10 different threads will query those files ¹ Bonnie is an open source disk I/O driver tool for Linux that can be useful for pretesting Linux disk environments prior to an xPlore/Lucene install.
  34. 34. Linux indicators compared to bonnie output Notice that at 200+ I/Os per sec the underlying volume is 80% busy. Although there could be multiple causes, one could be that some other VM is consuming theI/O stat output: remaining I/O capacity (735 – 209 = 500+).Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsde 206.10 2402.40 0.80 24024 8SAR –d output:09:29:17 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util09:29:27 dev8-65 209.24 4877.97 1.62 23.32 1.62 7.75 3.80 79.59SAR –u output:09:29:17 PM CPU %user %nice %system %iowait %steal %idle09:29:27 PM all 41.37 0.00 5.56 29.86 0.00 23.2109:29:27 PM 0 62.44 0.00 10.56 25.38 0.00 1.6209:29:27 PM 1 30.90 0.00 4.26 35.56 0.00 29.2809:29:27 PM 2 36.35 0.00 3.96 30.76 0.00 28.9309:29:27 PM 3 35.77 0.00 3.46 27.64 0.00 33.13 High I/O wait See for additional example
  35. 35. Tip #3: Try to ensure availabilityof resources§  Similar to the previous issue, but •  resource displacement not caused by overload, •  Inactivity can cause Lucene resources to be displaced •  Not different from running on large shared native OS host§  Recommendation: •  Periodic warmup §  non-intrusive •  See next example
  36. 36. IO / caching test use-case§  Unselective Term search •  100 sample queries •  Avg( hits per term) = 4,300+, max ~ 60,000 •  Searching over 100 s of DCTM object attributes + content§  Medium result window •  Avg( results returned per query) = 350 (max: 800)§  Stored Fields Utilized •  Some security & facet info§  Goal: •  Pre-cache portions of the index to improve response time in scenarios •  Reboot, buffer cache contention, & vm memory contention
  37. 37. Some xPlore Structures for Search¹ Dictionary of terms Posting list (doc-id s for term) Stored fields (facets and node-ids) 1st doc N-th xDB XML doc store (contains text for Security indexes summary) Facet decompression map (b-tree based) ¹Frequency and position structures ignored for simplicity
  38. 38. IO model for search in xPloreSearch Term: term1 term2 Result set Dictionary Posting list (doc-id s for term) Stored fields Xdb node-id plus facet / xDB XML security info store (contains text for Security lookup summary) Facet decompression map (b-tree based)
  39. 39. Separation of covering values instored fields and summary Potentially Potentially thousands thousands of of hits results Small structure FinalFacet Security Facet calc values lookup Calc over thousands of Small number results for result window Res-1 - sum Stored fields Res-2 - sum (Random access) Res-3 - sum Xdb docs with text for : summary : Res-350-sum
  40. 40. xPlore Memory Pool areasat-a-glance Native code Lucene content OperatingOther vm xPlore Caches extraction & System caches & xDB linguisticworking Buffer processing File Buffer mem working memory Cache memory cache (dynamically sized) xPlore Instance (fixed size) memory
  41. 41. Lucene data resides primarily inOS buffer cache Dictionary of terms Posting list (doc-id’s for term) N-th xDB XML doc store Stored fields (facets and node-ids) (contains text for 1st doc N-th summary) doc Native code Potential for many Lucene Other vm xPlore Caches content extraction & Operating System things to sweep xDB working caches & working Buffer linguistic processing File Buffer lucene from that mem cache memory Cache memory cache (dynamically sized) xPlore Instance (fixed size) memory 42
  42. 42. Test Env§  32 GB memory§  Direct attached storage (no SAN)§  1.4 million documents§  Lucene index size = 10 GB§  Size of internal parts of Lucene CFS file •  Stored fields (fdt, fdx): 230 MB (2% of index) •  Term Dictionary (tis,tii): 537 MB (5% of index) •  Positions (prx): 8.78 GB (80% of index) •  Frequencies (frq) : 1.4 GB (13 % of index)§  Text in xDB stored compressed separately 43
  43. 43. Some results of the query suite Test Avg Resp MB pre- I/O per Total MB to cached result loaded into consume memory all results (cached + test) (sec) Nothing cached 1.89 0 0.89 77 Stored fields cached 0.95 241 0.38 272 Term dict cached 1.73 537 0.79 604 Positions cached 1.58 8,789 0.74 8,800 Frequencies cached 1.65 1,406 0.63 1,436 Entire index cached 0.59 10,970 < 0.05 10,970•  Linux buffer cache cleared completely before each run•  Resp as seen by final user in Documentum•  Facets not computed in this example. Just a result set returned. With Facets response time difference more pronounced.•  Mileage will vary depending on a series of factors that include query complexity, compositions of the index, and number of results consumed 44
  44. 44. Other Notes§  Caching 2% of index yields a response time that is only 60% greater than if the entire index was cached. •  Caching cost only 9 secs on a mirrored drive pair •  Caching cost 6800 large sequential I/O s vs. potentially 58,000 random I/O s§  Mileage will vary, factors include •  Phrase search •  Wildcard search •  Multi-term search§  SAN s can grow I/O capacity as search complexity increases 45
  45. 45. Contact§  Ed Bueché • • • 46