• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Alfresco Search Internals
 

Alfresco Search Internals

on

  • 5,336 views

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and ...

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and compare SQL 92 with the CMIS QL. We will also consider sorting and other ways to control the results returned, and how query performance may be affected by ACL evaluation.

Statistics

Views

Total Views
5,336
Views on SlideShare
5,336
Embed Views
0

Actions

Likes
2
Downloads
110
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Alfresco Search Internals Alfresco Search Internals Presentation Transcript

    • Alfresco Search Internals
      0
      Andy Hind
      Senior Developer, Alfresco
      twitter: @andy_hind
    • Agenda
      1
      • Overview
      • Direction
      • Challenges
      • Alfresco FTS
      • CMIS Query Language
    • Overview
      2
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
    • Overview
      3
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
      Type drives analysis
    • Overview
      4
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
      true
      false
    • Overview
      5
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
      true
      false
      both
    • Overview
      6
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
      true
      false
      (d:content)
    • Overview
      7
      Data Modelling Options
      <property name="cmis:name">
      ...
      <type>d:text</type>
      ...
      <index enabled="true">
      <tokenised>both</tokenised>
      <atomic>true</atomic>
      <stored>false</stored>
      </index>
      ....
      </property>
      true
      false
    • Overview
      8
      Configuration
      • IndexerAndSearcher interface and related factory
      • Redirection by store protocol or value
      • Factories
      • AVM
      • DM
      • Unindexed
      • All lucene based with options set via properties
      • Analysis by data type and locale
      • alfresco/model/dataTypeAnalyzers_{local}.properties
    • Overview
      9
      Configuration
      <bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory">
      <property name="proxyInterface">
      <value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value>
      </property>
      <property name="defaultBinding">
      <ref bean="admLuceneIndexerAndSearcherFactory"></ref>
      </property>
      <property name="redirectedProtocolBindings">
      <map>
      <entry key="workspace">
      <ref bean="admLuceneIndexerAndSearcherFactory"></ref>
      </entry>
      <entry key="avm">
      <ref bean="avmLuceneIndexerAndSearcherFactory"></ref>
      </entry>
      </map>
      </property>
      <property name="redirectedStoreBindings">
      <map>
      <entry key="workspace://lightWeightVersionStore">
      <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
      </entry>
      <entry key="workspace://version2Store">
      <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
      </entry>
      </map>
      </property>
      </bean>
    • Overview
      10
      Configuration properties
      lucene.maxAtomicTransformationTime=20
      lucene.query.maxClauses=10000
      lucene.indexer.cacheEnabled=true
      lucene.indexer.maxDocIdCacheSize=10000
      lucene.indexer.maxDocumentCacheSize=100
      lucene.indexer.maxParentCacheSize=10000
      lucene.indexer.maxIsCategoryCacheSize=-1
      lucene.indexer.maxLinkAspectCacheSize=10000
      lucene.indexer.maxPathCacheSize=10000
      lucene.indexer.maxTypeCacheSize=10000
    • Overview
      11
      Configuration properties
      lucene.indexer.mergerTargetIndexCount=5
      lucene.indexer.mergerTargetOverlayCount=5
      lucene.indexer.mergerTargetOverlaysBlockingFactor=1
      lucene.indexer.mergerMergeBlockingFactor=1
      lucene.indexer.maxDocsForInMemoryMerge=10000
      lucene.indexer.maxRamInMbForInMemoryMerge=16
      lucene.indexer.postSortDateTime=true
      lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL
      lucene.indexer.maxFieldLength=10000
    • Overview
      12
      Authorization
      • Post query filter for READ
      • Configuration
      • system.acl.maxPermissionCheckTimeMillis=10000
      • system.acl.maxPermissionChecks=1000
      • Also set at query time
      • Read performance
      • 1.4 old model
      • 2.2/3.0 new
      • 3.4 improved read
      • system/admin/others
    • Overview
      13
      Query Support
      • Lucene based
      • Lucene with Alfresco extensions (PATH, ...)
      • Alfresco FTS
      • CMIS QL + extensions
      • DB based
      • XPath
      • Specific APIs – (using the child association table)
      • NodeService – selecting children
      • PersonService – lookup people
    • Overview
      14
      Issues
      • Factory abstraction
      • Transaction vs Snapshot
      • Query language abstraction
      • Repo reliance on the lucene index
      • Cross locale support
      • Rebuild
      • Cluster (loss of consistency)
      • Lucene limitations
      • Delete/add and reindexing
      • DB schema for properties
      • Read permission evaluation
      • One big store
      • Analyser configuration
      • Associations
      • Richer data model control - analysis
    • Direction
      15
      Query Language Abstraction
      • Alfresco FTS
      • CMIS QL
    • Direction
      16
      Query Language Abstraction
      • Alfresco FTS
      • CMIS QL
      SOLR
      DB/SQL
    • Direction
      17
      SOLR
      • Data model integration
      • Tracking – eventual consistency
      • Not suitable for RM
      • Query time ACL filtering
      • PATH support
      • SOLR scalability and elasticity
      • faceting etc
    • Alfrecso FTS
      18
      Introduction
      • CMIS QL FTS (almost)
      • Google
      • Lucene
      • Developer/App Customisation
      • Define the default namespace (e.g. Allow the user to drop cm: )
      • Disable/enable/modify certain language features
      • Define templates
      • Define the default field, simple templates for users
      • Share defines the “keywords” template as the default field
      • "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT)
    • Alfresco FTS
      19
      Syntax
      • Term (exact/tokenised)
      • Phrase
      • Conjunction/Disjunction/Negation/Boosting
      • Fields
      • Wildcards
      • Ranges
      • Fuzzy matching
      • Proximity
      • Templates
      • See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax
    • Alfresco FTS
      20
      Template example
      "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT
      =keywords:woof
      =cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof
    • Alfresco FTS
      21
      Template example – relevance tuning
      "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT^2
      =keywords:woof
      =cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2
    • CMIS QL
      22
      Introduction
      • Use via CMIS or the SearchService
      • Read-only relational view of the repository
      • Subset of SQL-92 with extensions
      • Type inheritance
      • Multi-valued properties
      • Full text search
      • CONTAINS()
      • SCORE()
      • Location
      • IN_FOLDER()
      • IN_TREE()
    • CMIS QL
      23
      Alfresco extensions
      • JOIN to aspects only
      • SELECT D.*, O.* FROM cmis:document AS DJOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId
      • no JOIN between types/nodes yet
      • use Alfresco FTS instead of SQL QL FTS
      • SELECT * from cmis:documentWHERE CONTAINS('cmis:name:'test*'')
      • relax some constraints
      • SCORE() can be used on its own
      • mvps can use svp syntax for IN, LIKE and comparisons
      • Queries more robust if the data model changes
    • Learn More
      24
      wiki.alfresco.com
      forums.alfresco.com
      twitter: @AlfrescoECM