Demystifying Oak Search

DEMYST IFYING OAK SEARCH
P R E S E N T E D
B Y
Justin Edelson & Darin Kuntze | Adobe

3
AGENDA
• Oak Query Implementation
• Cost Calculation
• Oak Index Implementations

4
CAVEAT
Covers Oak 1.0.5 (AEM 6.0 SP1)

5
WHY SHOULD YOU CARE?
• Search is the most significant change for AEM developers between CRX2 and
Oak.

6
CRX2 Search – Limited Optimization Opportunities
Baseline Search Performance – OK
No “Plan” Output
Single Index
Minimal Configuration

7
Oak Search – Many Optimization Opportunities
Baseline Performance – Slow
Viewable Plan
Different Index Types

8
OAK QUERY IMPLEMENTAT ION OVERVIEW

9
EXAMPLE
/jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and
@jcr:title = 'Triangle']

10
SEEING THE PLAN
Oak supports an “explain” query prefix, similar to what many RDMBS’s support.
explain /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title’]
Shows you which index was used.
queryResult.getRows().nextRow().getValue("plan")

11
SEEING THE PLAN – EXPLAIN QUERY TOOL
Plan
Explanation

12
INDEX DEFINI T IONS
Stored in the repository as nodes under /oak:index
Node Type is oak:QueryIndexDefinition
Single mandatory property – “type”
Optional generic properties:
async – set to “async” to do index updates asynchronously
reindex – set to true to trigger a reindex
declaringNodeTypes – one or more node types to restrict indexing
entryCount – used to weight indexes

13
SYNC VS. ASYNC INDEX
Sync indexes (the default) update in the context of a save() call
Async indexes do not.
Every 5 seconds, the diff between the last successful indexed state and the
HEAD state is read and used to update the index
CONSEQUENCE - async indexes may not return up-to-date returns
The OOTB ordered and Lucene indexes are defined as async.
All external indexes (e.g. Solr) should also be async.

15
VIEWING INDEX CONTENT
Many indexes store their content in the repository, but hidden.
Cannot be viewed using CRXDE Lite.
Must use oak-run
TarMK – use either “explore” (GUI) or “console” (CLI) command
MongoMK – use “console” command
• Vote for OAK-2096 to get “explore” support working for MongoMK

16
CREAT ING AN INDEX
Created as content via CRXDE Lite / deployed using content package
Created through code.
Created through configuration.

17
WHEN SHOULD YOU REINDEX?
When the configuration changes
For example, changing the declaringNodeTypes
But not the entryCount
(Sometimes) After updating Oak
Check the Release Notes, this should be prominently indicated.
But not arbitrarily…
Reindexing is a resource intensive process.
Reindexing will NOT help query performance.

18
COST CALCULAT ION
Each Index calculates a relative cost for the query
Number between 0 and Infinity
0 = “Pick me!”
Infinity = “Don’t Pick Me!”

19
DEBUGGING COST CALCULAT ION
Enable DEBUG logging on org.apache.jackrabbit.oak.query.QueryImpl
Per Index Type Cost
Enable DEBUG logging on
org.apache.jackrabbit.oak.plugins.index.property.PropertyIndex
Detailed Property Cost
Enable DEBUG logging on
org.apache.jackrabbit.oak.plugins.index.property.OrderedPropertyIndex
Detailed Ordered Property Cost
Enable DEBUG logging on org.apache.jackrabbit.oak.plugins.index.lucene
Detailed Lucene Cost

20
SAMPLE DEBUG OUTPUT
Query = /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and @jcr:title
= 'Triangle']
cost for aggregate lucene is Infinity
cost for reference is Infinity
cost for ordered is Infinity
cost for nodeType is Infinity
property cost for sling:resourceType is 10003.0
property cost for jcr:title is Infinity
Cheapest property cost is 10003.0 for property sling:resourceType
cost for property is 10003.0
cost for traverse is 199996.0

21
SAMPLE DEBUG OUTPUT
Query = /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and
@type='large']
cost for aggregate lucene is Infinity
cost for reference is Infinity
cost for ordered is Infinity
cost for nodeType is Infinity
property cost for sling:resourceType is 10003.0
property cost for type is 21.0
Cheapest property cost is 21.0 for property type
cost for property is 21.0
cost for traverse is 199996.0

22
INDEX IMPLEMENTAT IONS
These indexes you can create new ones of
Property
Ordered Property
Solr
Lucene
These you shouldn’t
Reference
Node Type
And then there is a special one
Traversing

23
PROPERTY INDEX
Stores node paths indexed by a particular property value
Example: /oak:index/slingResourceType
Can be unique (unique = true)
Examples: rep:principalName & jcr:uuid
Only usable with sync indexes

24
PROPERTY INDEX – IN OAK EXPLORER

25
PROPERTY INDEX – IN OAK EXPLORER
A Match!

27
PROPERTY INDEX – COST CALCULAT ION
Generalized Cost Calculation:
Cost per Execution + (Estimated Matches * Cost per Entry)
Cost per Execution – 2
Cost per Entry – 1

28
PROPERTY INDEX – EST IMAT ING MATCHES
For name=value queries (e.g.
[@sling:resourceType=‘foundation/components/text’], including lists
If entry count provided, the estimated cost is entry count / key count + number
of values in the query
• Key count defaults to entry count / 10000, but can be manually specified
Otherwise, counts up to 100 matches across the first three values.
If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)
If > 3 values, estimated matches are extrapolated from the first three values.
For exists queries (e.g. [@sling:resourceType]
If entry count provided, it is the estimated count.
Otherwise, counts up to 100 matches across all values.
If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)

29
ORDERED INDEX
Stores node paths indexed by a particular property value
Has extra :next property on each value node to handle ordering
Example: /oak:index/cqLastModified
WARNING – only supports lexigraphic sorting

30
ORDERED INDEX – IN OAK EXPLORER

31
ORDERED INDEX - COST CALCULAT ION
1 + (Estimated Matches * 1.3)
Similar to Property Index
Doesn’t support entryCount

32
REFERENCE INDEX
Flat list of UUIDs.
Each node points to a path.
Cost is always 1 if a match is available

33
REFERENCE INDEX – IN OAK EXPLORER

34
NODE TYPE INDEX
Special type of Property Index
Note that not all node types are indexed by default
Has a default entryCount of a very high value

35
LUCENE
Oak Index Implementation:

36
LUCENE
What Oak Lucene is (and is not)

37
FLOW
jcr:contains
query
detected
Repo-based
Lucene index
queried
Results
Returned

38
FUL L TEXT QUERIES
//*[jcr:contains(., ‘Experience Manager’)]
Any query that includes a full text condition
Native queries

39
LUCENE DEFINI T ION
oak:QueryIndexDefintion
type = lucene
async = async
includePropertyTypes[] = String, Binary
excludePropertyNames[] = …
reindex = true

40
What can’t you do?
LUCENE

41
LUCENE
Customize the tika configuration
Configurable analyzers (OAK-2177)
Synonyms
Boost Terms at index time (OAK-2178)

42
SOLR
Based on Lucene
Fault Tolerant
Rich Document Handlers
Geospatial Search
Load Balancing
AEM 6.0 Configurable:
Full Text Search
Indexing
Native Queries

43
SOLR CONFIGURAT ION
There are 4 configurable components
Oak Solr embedded server
Oak Solr indexing / search
Oak remote server
Oak Solr server provider

44
SOLR DEFINI T ION
oak:QueryIndexDefintion
type = solr
async = async
reindex = true

SOLR FUL L TEXT QUERIES
//*[jcr:contains(., ‘Experience
45
Manager’)]
Solr enables restrictions based on:
• Path
• Property
• Primary Type

46
jcr:contains
query
detected
Remote solr
index
queried
Results
Returned
FLOW
• In oak-solr-core 1.0.1+ (AEM 6 SP1) you can add property, path & primary
• type restrictions to your query

47
SOLR TYPES
Types of Solr that Oak uses
Embedded Solr
Primarily used for development
work. The solr instance runs within
AEM and can be configured similar
to the remote instance
Remote Solr
Used for non-development
level environments. Typically
these instances take
advantage of fault tolerant
features of the Solr cloud. In
many cases, existing solr
instances are used.

48
SOLR CONCEPTUAL ARCHI TECTURE
AEM 6
Node 1
AEM 6
Node 2
Zookeeper
Solr
Shard 1
Solr
Shard 2
Solr Cloud

49
LUCENE VS. SOLR
Main differences with the Lucene index
You create and control the solr config
Analyzers
Schema
• You must have a schema.xml that accurately reflects the properties and fields you want
indexed (and queried). Which is similar to how the property indexes are configured.
Currency
Language
Enabling additional Solr native functionality (example: mlt - more like this)
Some indexing overhead offloaded
All of this is configured on the Solr servers

//*[rep:native('lucene', 'wine OR beer')]
50
NAT IVE QUERIES
native
function
query type
solr or
lucene
query
select [jcr:path] from [nt:base]
where native('solr', 'mlt?q=Wine&mlt.fl=text&mlt.mindf=1&mlt.mintf=1')

51
JCR BASED SOLR QUERIES
• Oak index cost is
factored
• Transparent to
executor
• Familiar JCR query
syntax
• Easy access to
repository objects

52
SOLR TROUBLESHOOT ING
AEM 6
(Solrj)

53
O N E M O R E T H I N G…
Oak 1.0.8
Lucene Property Indexes
Copy on Write for Lucene Indexes

54
AND ONE MORE T H I N G…
XPath still works.

55
QUERY RESOURCES
ACS AEM Commons & ACS AEM Tools - http://adobe-consulting-services.github.io/
AEM Docs - http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-and-
indexing.html
Oak Docs - http://jackrabbit.apache.org/oak/docs/query/query.html

56
Training http://bit.ly/AEMTraining
Documentation http://bit.ly/AEM5Docs &
http://bit.ly/AEM6Docs
GEMs Webinar Knowledge Exchange
www.adobe.com/go/gems
Mobile Dev: Get started with Adobe PhoneGap
https://github.com/blefebvre/aem-phonegap-kitchen-sink
https://github.com/blefebvre/aem-phonegap-starter-kit
Community
Meet with your peers on-line and in-person, get technical
help from the community, access community articles
• AEM Technologist Community: http://adobe.ly/Qe5BBw
• Evolve for AEM Technologists: http://bit.ly/EvolveDev
• AEM Help Forum: http://adobe.ly/OYdtY0
 PackageShare
Sign in to the Adobe
Marketing Cloud to
access packages
http://bit.ly/AMCPKGSHARE
 Marketing Cloud
Exchange
http://bit.ly/MCXChange
ADOBE EXPERIENCE MANAGER
Developer Resources

Demystifying Oak Search

More Related Content

What's hot

Viewers also liked

Similar to Demystifying Oak Search

Recently uploaded

Demystifying Oak Search

Editor's Notes