• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Efficient content structures and queries in CRX/CQ
 

Efficient content structures and queries in CRX/CQ

on

  • 1,126 views

Presentation “Efficient content structures and queries in CRX/CQ“ by Marcel Reutegger at CQCON2013 in Basel on 19 and 20 June 2013.

Presentation “Efficient content structures and queries in CRX/CQ“ by Marcel Reutegger at CQCON2013 in Basel on 19 and 20 June 2013.

Statistics

Views

Total Views
1,126
Views on SlideShare
1,122
Embed Views
4

Actions

Likes
2
Downloads
43
Comments
0

1 Embed 4

http://www.markszulc.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Efficient content structures and queries in CRX/CQ Efficient content structures and queries in CRX/CQ Presentation Transcript

    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Efficient content structures and queries in CRX/CQMarcel Reutegger | Senior Software Engineer1
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Agenda Repository storage basics Efficient content structures Query analysis and optimization2
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Repository storage basics Nodes & properties stored in one entity -> bundle Every node/bundle has a UUID (random) Child nodes are linked from the parent node Binaries go into the DataStore3
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Repository storage basics Bundle structure4BundleUUIDParent UUIDPropertiesChild nodereferencesName / ValueName / ValueName / ValueName / UUIDName / UUIDName / UUIDName / UUIDName / UUID
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Repository storage basics – TarPM Nodes & Properties (bundles) stored in tar files Tar files are append only Data is never overwritten Garbage is removed by TarPM optimization (scheduled, incremental)5
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Efficient content structuresNumber of nodes6
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of nodes Increasing number of nodes affects performance Random UUIDs cause random I/O -> Jackrabbit design 15k rpm drive: 200-400 IOPS Tar index file sizes (64 bytes per bundle) 1 million nodes: 70 MB 10 million nodes: 700 MB 100 million nodes: 7 GB7
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of nodes How to reduce number of nodes Use version purge tool Remove archived workflow instances Purge audit events Application specific Bad: document view ‘import’ of XML Good: Pack properties on few nodes Other benefits: DataStore GC will be faster8
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Efficient content structuresNumber of child nodes9
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes Frequently asked questions: «What is the maximum supported number of child nodes?» «I have X number of child nodes. Will performance be OK?»10
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes Frequently asked questions: «What is the maximum supported number of child nodes?» «I have X number of child nodes. Will performance be OK?»It depends!11
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes Maximum number of child nodes12BundleUUIDParent UUIDPropertiesChild nodereferencesName / ValueName / ValueName / ValueName / UUIDName / UUIDName / UUIDName / UUIDName / UUIDHeap isthe limit
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes Adding a single child node13
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes Large number of child nodes OK for: Static content /libs/wcm/core/i18n/de has ~8k child nodes Not OK for: Dynamic content E.g. user generated content14
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Number of child nodes - Recommendations Structure content E.g. date/time based: 2012/09/26 Use utilities like Jackrabbit BTreeManager Keep number of child nodes within limits (e.g. 1000) Save in batches when possible15
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis & optimization16
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis and optimization Query debug log http://dev.day.com/kb/home/Crx/Troubleshooting/HowToDebugJCRQueries.html “executed in <time> ms. (<query>)” JMX (CQ 5.5) QueryStat: slow and most frequent queries TimeSeries: count, duration, average17
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis and optimization Fast: simple comparison sling:resourceType = ‘my/type’ Fast: node type match //element(*, nt:hierarchyNode) Fast: simple fulltext search jcr:contains(@jcr:title, ‘crx’) Fast: like on few distinct values jcr:like(@jcr:mimeType, ‘%/plain’)18
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis and optimization Slow: jcr:contains with initial wildcard jcr:contains(., ‘*rabbit’) Alternative: don’t do it, unless you know exactly what you are doing! Slow: jcr:like on many distinct values jcr:like(@email, ‘%@gmail.com’) Alternative: store data you want to query in separate property,then you can write: @email-host = ‘gmail.com’19
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis and optimization Slow: ranges matching many distinct values @jcr:lastModified > xs:dateTime(‘2001-09-17T18:17:13.000+02:00) Alternative: reduce resolution (e.g. only store date and not time)20
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.Query analysis and optimization - Recommendations Test with real content Structure content to avoid queries Denormalize Avoid path constraints21
    • © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.