Alfresco Search Internals

5,644
-1

Published on

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and compare SQL 92 with the CMIS QL. We will also consider sorting and other ways to control the results returned, and how query performance may be affected by ACL evaluation.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,644
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
120
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Alfresco Search Internals

  1. 1. Alfresco Search Internals<br />0<br />Andy Hind<br />Senior Developer, Alfresco<br />twitter: @andy_hind<br />
  2. 2. Agenda<br />1<br /><ul><li> Overview
  3. 3. Direction
  4. 4. Challenges
  5. 5. Alfresco FTS
  6. 6. CMIS Query Language</li></li></ul><li>Overview<br />2<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />
  7. 7. Overview<br />3<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />Type drives analysis<br />
  8. 8. Overview<br />4<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />true<br />false<br />
  9. 9. Overview<br />5<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />true<br />false<br />both<br />
  10. 10. Overview<br />6<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />true<br />false<br />(d:content)<br />
  11. 11. Overview<br />7<br />Data Modelling Options<br /><property name="cmis:name"><br />...<br /> <type>d:text</type><br />...<br /> <index enabled="true"><br /> <tokenised>both</tokenised><br /><atomic>true</atomic><br /> <stored>false</stored><br /> </index><br />....<br /></property><br />true<br />false<br />
  12. 12. Overview<br />8<br />Configuration<br /><ul><li>IndexerAndSearcher interface and related factory
  13. 13. Redirection by store protocol or value
  14. 14. Factories
  15. 15. AVM
  16. 16. DM
  17. 17. Unindexed
  18. 18. All lucene based with options set via properties
  19. 19. Analysis by data type and locale
  20. 20. alfresco/model/dataTypeAnalyzers_{local}.properties</li></li></ul><li>Overview<br />9<br />Configuration<br /> <bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory"><br /> <property name="proxyInterface"><br /> <value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value><br /> </property><br /> <property name="defaultBinding"><br /> <ref bean="admLuceneIndexerAndSearcherFactory"></ref><br /> </property><br /> <property name="redirectedProtocolBindings"><br /> <map><br /> <entry key="workspace"><br /> <ref bean="admLuceneIndexerAndSearcherFactory"></ref><br /> </entry><br /> <entry key="avm"><br /> <ref bean="avmLuceneIndexerAndSearcherFactory"></ref><br /> </entry><br /> </map><br /> </property><br /> <property name="redirectedStoreBindings"><br /> <map><br /> <entry key="workspace://lightWeightVersionStore"><br /> <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref><br /> </entry><br /> <entry key="workspace://version2Store"><br /> <ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref><br /> </entry><br /> </map><br /> </property><br /> </bean><br />
  21. 21. Overview<br />10<br />Configuration properties<br />lucene.maxAtomicTransformationTime=20<br />lucene.query.maxClauses=10000 <br />lucene.indexer.cacheEnabled=true<br />lucene.indexer.maxDocIdCacheSize=10000<br />lucene.indexer.maxDocumentCacheSize=100<br />lucene.indexer.maxParentCacheSize=10000<br />lucene.indexer.maxIsCategoryCacheSize=-1<br />lucene.indexer.maxLinkAspectCacheSize=10000<br />lucene.indexer.maxPathCacheSize=10000<br />lucene.indexer.maxTypeCacheSize=10000<br />
  22. 22. Overview<br />11<br />Configuration properties<br />lucene.indexer.mergerTargetIndexCount=5<br />lucene.indexer.mergerTargetOverlayCount=5<br />lucene.indexer.mergerTargetOverlaysBlockingFactor=1<br />lucene.indexer.mergerMergeBlockingFactor=1<br />lucene.indexer.maxDocsForInMemoryMerge=10000<br />lucene.indexer.maxRamInMbForInMemoryMerge=16<br />lucene.indexer.postSortDateTime=true <br />lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL <br />lucene.indexer.maxFieldLength=10000 <br />
  23. 23. Overview<br />12<br />Authorization<br /><ul><li> Post query filter for READ
  24. 24. Configuration
  25. 25. system.acl.maxPermissionCheckTimeMillis=10000
  26. 26. system.acl.maxPermissionChecks=1000
  27. 27. Also set at query time
  28. 28. Read performance
  29. 29. 1.4 old model
  30. 30. 2.2/3.0 new
  31. 31. 3.4 improved read
  32. 32. system/admin/others </li></li></ul><li>Overview<br />13<br />Query Support<br /><ul><li>Lucene based
  33. 33. Lucene with Alfresco extensions (PATH, ...)
  34. 34. Alfresco FTS
  35. 35. CMIS QL + extensions
  36. 36. DB based
  37. 37. XPath
  38. 38. Specific APIs – (using the child association table)
  39. 39. NodeService – selecting children
  40. 40. PersonService – lookup people</li></li></ul><li>Overview<br />14<br />Issues<br /><ul><li> Factory abstraction
  41. 41. Transaction vs Snapshot
  42. 42. Query language abstraction
  43. 43. Repo reliance on the lucene index
  44. 44. Cross locale support
  45. 45. Rebuild
  46. 46. Cluster (loss of consistency)
  47. 47. Lucene limitations
  48. 48. Delete/add and reindexing
  49. 49. DB schema for properties
  50. 50. Read permission evaluation
  51. 51. One big store
  52. 52. Analyser configuration
  53. 53. Associations
  54. 54. Richer data model control - analysis</li></li></ul><li>Direction<br />15<br />Query Language Abstraction<br /><ul><li> Alfresco FTS
  55. 55. CMIS QL</li></li></ul><li>Direction<br />16<br />Query Language Abstraction<br /><ul><li> Alfresco FTS
  56. 56. CMIS QL</li></ul>SOLR<br />DB/SQL<br />
  57. 57. Direction<br />17<br />SOLR<br /><ul><li> Data model integration
  58. 58. Tracking – eventual consistency
  59. 59. Not suitable for RM
  60. 60. Query time ACL filtering
  61. 61. PATH support
  62. 62. SOLR scalability and elasticity
  63. 63. faceting etc </li></li></ul><li>Alfrecso FTS<br />18<br />Introduction<br /><ul><li> CMIS QL FTS (almost)
  64. 64. Google
  65. 65. Lucene
  66. 66. Developer/App Customisation
  67. 67. Define the default namespace (e.g. Allow the user to drop cm: )
  68. 68. Disable/enable/modify certain language features
  69. 69. Define templates
  70. 70. Define the default field, simple templates for users
  71. 71. Share defines the “keywords” template as the default field
  72. 72. "%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT)</li></li></ul><li>Alfresco FTS<br />19<br />Syntax<br /><ul><li> Term (exact/tokenised)
  73. 73. Phrase
  74. 74. Conjunction/Disjunction/Negation/Boosting
  75. 75. Fields
  76. 76. Wildcards
  77. 77. Ranges
  78. 78. Fuzzy matching
  79. 79. Proximity
  80. 80. Templates
  81. 81. See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax</li></li></ul><li>Alfresco FTS<br />20<br />Template example<br />"%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT<br />=keywords:woof<br />=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof<br />
  82. 82. Alfresco FTS<br />21<br />Template example – relevance tuning<br />"%(cm:namecm:titlecm:descriptionia:whatEventia:descriptionEventlnk:titlelnk:description TEXT^2<br />=keywords:woof<br />=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2<br />
  83. 83. CMIS QL<br />22<br />Introduction<br /><ul><li> Use via CMIS or the SearchService
  84. 84. Read-only relational view of the repository
  85. 85. Subset of SQL-92 with extensions
  86. 86. Type inheritance
  87. 87. Multi-valued properties
  88. 88. Full text search
  89. 89. CONTAINS()
  90. 90. SCORE()
  91. 91. Location
  92. 92. IN_FOLDER()
  93. 93. IN_TREE()</li></li></ul><li>CMIS QL<br />23<br />Alfresco extensions<br /><ul><li> JOIN to aspects only
  94. 94. SELECT D.*, O.* FROM cmis:document AS DJOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId
  95. 95. no JOIN between types/nodes yet
  96. 96. use Alfresco FTS instead of SQL QL FTS
  97. 97. SELECT * from cmis:documentWHERE CONTAINS('cmis:name:'test*'')
  98. 98. relax some constraints
  99. 99. SCORE() can be used on its own
  100. 100. mvps can use svp syntax for IN, LIKE and comparisons
  101. 101. Queries more robust if the data model changes</li></li></ul><li>Learn More<br />24<br />wiki.alfresco.com<br />forums.alfresco.com<br />twitter: @AlfrescoECM<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×