Alfresco Search Internals
Upcoming SlideShare
Loading in...5
×
 

Alfresco Search Internals

on

  • 5,562 views

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and ...

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and compare SQL 92 with the CMIS QL. We will also consider sorting and other ways to control the results returned, and how query performance may be affected by ACL evaluation.

Statistics

Views

Total Views
5,562
Views on SlideShare
5,562
Embed Views
0

Actions

Likes
2
Downloads
112
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Alfresco Search Internals Alfresco Search Internals Presentation Transcript

  • Alfresco Search Internals
    0
    Andy Hind
    Senior Developer, Alfresco
    twitter: @andy_hind
  • Agenda
    1
    • Overview
    • Direction
    • Challenges
    • Alfresco FTS
    • CMIS Query Language
  • Overview
    2
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
  • Overview
    3
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
    Type drives analysis
  • Overview
    4
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
    true
    false
  • Overview
    5
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
    true
    false
    both
  • Overview
    6
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
    true
    false
    (d:content)
  • Overview
    7
    Data Modelling Options
    <property name="cmis:name">
    ...
    <type>d:text</type>
    ...
    <index enabled="true">
    <tokenised>both</tokenised>
    <atomic>true</atomic>
    <stored>false</stored>
    </index>
    ....
    </property>
    true
    false
  • Overview
    8
    Configuration
    • IndexerAndSearcher interface and related factory
    • Redirection by store protocol or value
    • Factories
    • AVM
    • DM
    • Unindexed
    • All lucene based with options set via properties
    • Analysis by data type and locale
    • alfresco/model/dataTypeAnalyzers_{local}.properties
  • Overview
    9
    Configuration
    <bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory">
    <property name="proxyInterface">
    <value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value>
    </property>
    <property name="defaultBinding">
    <ref bean="admLuceneIndexerAndSearcherFactory"></ref>
    </property>
    <property name="redirectedProtocolBindings">
    <map>
    <entry key="workspace">
    <ref bean="admLuceneIndexerAndSearcherFactory"></ref>
    </entry>
    <entry key="avm">
    <ref bean="avmLuceneIndexerAndSearcherFactory"></ref>
    </entry>
    </map>
    </property>
    <property name="redirectedStoreBindings">
    <map>
    <entry key="workspace://lightWeightVersionStore">
    <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
    </entry>
    <entry key="workspace://version2Store">
    <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
    </entry>
    </map>
    </property>
    </bean>
  • Overview
    10
    Configuration properties
    lucene.maxAtomicTransformationTime=20
    lucene.query.maxClauses=10000
    lucene.indexer.cacheEnabled=true
    lucene.indexer.maxDocIdCacheSize=10000
    lucene.indexer.maxDocumentCacheSize=100
    lucene.indexer.maxParentCacheSize=10000
    lucene.indexer.maxIsCategoryCacheSize=-1
    lucene.indexer.maxLinkAspectCacheSize=10000
    lucene.indexer.maxPathCacheSize=10000
    lucene.indexer.maxTypeCacheSize=10000
  • Overview
    11
    Configuration properties
    lucene.indexer.mergerTargetIndexCount=5
    lucene.indexer.mergerTargetOverlayCount=5
    lucene.indexer.mergerTargetOverlaysBlockingFactor=1
    lucene.indexer.mergerMergeBlockingFactor=1
    lucene.indexer.maxDocsForInMemoryMerge=10000
    lucene.indexer.maxRamInMbForInMemoryMerge=16
    lucene.indexer.postSortDateTime=true
    lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL
    lucene.indexer.maxFieldLength=10000
  • Overview
    12
    Authorization
    • Post query filter for READ
    • Configuration
    • system.acl.maxPermissionCheckTimeMillis=10000
    • system.acl.maxPermissionChecks=1000
    • Also set at query time
    • Read performance
    • 1.4 old model
    • 2.2/3.0 new
    • 3.4 improved read
    • system/admin/others
  • Overview
    13
    Query Support
    • Lucene based
    • Lucene with Alfresco extensions (PATH, ...)
    • Alfresco FTS
    • CMIS QL + extensions
    • DB based
    • XPath
    • Specific APIs – (using the child association table)
    • NodeService – selecting children
    • PersonService – lookup people
  • Overview
    14
    Issues
    • Factory abstraction
    • Transaction vs Snapshot
    • Query language abstraction
    • Repo reliance on the lucene index
    • Cross locale support
    • Rebuild
    • Cluster (loss of consistency)
    • Lucene limitations
    • Delete/add and reindexing
    • DB schema for properties
    • Read permission evaluation
    • One big store
    • Analyser configuration
    • Associations
    • Richer data model control - analysis
  • Direction
    15
    Query Language Abstraction
    • Alfresco FTS
    • CMIS QL
  • Direction
    16
    Query Language Abstraction
    • Alfresco FTS
    • CMIS QL
    SOLR
    DB/SQL
  • Direction
    17
    SOLR
    • Data model integration
    • Tracking – eventual consistency
    • Not suitable for RM
    • Query time ACL filtering
    • PATH support
    • SOLR scalability and elasticity
    • faceting etc
  • Alfrecso FTS
    18
    Introduction
    • CMIS QL FTS (almost)
    • Google
    • Lucene
    • Developer/App Customisation
    • Define the default namespace (e.g. Allow the user to drop cm: )
    • Disable/enable/modify certain language features
    • Define templates
    • Define the default field, simple templates for users
    • Share defines the “keywords” template as the default field
    • "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT)
  • Alfresco FTS
    19
    Syntax
    • Term (exact/tokenised)
    • Phrase
    • Conjunction/Disjunction/Negation/Boosting
    • Fields
    • Wildcards
    • Ranges
    • Fuzzy matching
    • Proximity
    • Templates
    • See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax
  • Alfresco FTS
    20
    Template example
    "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT
    =keywords:woof
    =cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof
  • Alfresco FTS
    21
    Template example – relevance tuning
    "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT^2
    =keywords:woof
    =cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2
  • CMIS QL
    22
    Introduction
    • Use via CMIS or the SearchService
    • Read-only relational view of the repository
    • Subset of SQL-92 with extensions
    • Type inheritance
    • Multi-valued properties
    • Full text search
    • CONTAINS()
    • SCORE()
    • Location
    • IN_FOLDER()
    • IN_TREE()
  • CMIS QL
    23
    Alfresco extensions
    • JOIN to aspects only
    • SELECT D.*, O.* FROM cmis:document AS DJOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId
    • no JOIN between types/nodes yet
    • use Alfresco FTS instead of SQL QL FTS
    • SELECT * from cmis:documentWHERE CONTAINS('cmis:name:'test*'')
    • relax some constraints
    • SCORE() can be used on its own
    • mvps can use svp syntax for IN, LIKE and comparisons
    • Queries more robust if the data model changes
  • Learn More
    24
    wiki.alfresco.com
    forums.alfresco.com
    twitter: @AlfrescoECM