Successfully reported this slideshow.
Your SlideShare is downloading. ×

Discovering the 2 in Alfresco Search Services 2.0

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Alfresco Certificates
Alfresco Certificates
Loading in …3
×

Check these out next

1 of 45 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Discovering the 2 in Alfresco Search Services 2.0 (20)

Advertisement

More from Angel Borroy López (18)

Recently uploaded (20)

Advertisement

Discovering the 2 in Alfresco Search Services 2.0

  1. 1. Discovering the 2 in Search Services 2.0 Tech Talk Live October 14th, 2020
  2. 2. 22 Discovering the 2 in Search Services 2.0 Tech Talk Live • Solr Core and Solr Schema • Security, Performance and Precision • Enterprise Enhancements • One more thing... • Q&A 14th October 2020
  3. 3. 33 Solr Core and Solr Schema Tom Page COMMUNITY
  4. 4. 4 Solr Content Store Removal ACS Repository Content Store Search Services 1.4 Content StoreDB Solr Index COMMUNITY
  5. 5. 5 Solr Content Store Removal ACS Repository Content Store Search Services 1.4 Content StoreDB Solr Index ACS Repository Content Store Search Services 2.0 DB Solr Index COMMUNITY
  6. 6. 6 Solr Content Store Removal Benefits Removed custom code 9,311 lines of code removed https://github.com/Alfresco/SearchServices/blob/mas ter/search-services/alfresco- search/doc/architecture/solr-content-store- removal/00001-solr-content-store-removal.md Helps leverage built-in Solr features It's now possible to make use of built-in Solr features (e.g. replication and backups) Reduces I/O work Particularly in systems with replication Reduced disk usage Search Services Version 1.4 2.0 Index Size (bytes per doc) 1 3,000 Content Store Size (bytes per doc) 40,000 0 COMMUNITY
  7. 7. 7 Solr Content Store Removal Reindex • Moving data from the content store to the index requires a reindex Reindexing with sharding: Demo later For more information see: https://github.com/aborroy/solr-sharding-reindex For more information about reindexing see: https://www.alfresco.com/events/webinars/ tech-talk-live-reindexing-large-repositories COMMUNITY TTL #120
  8. 8. 8 Solr Content Store Removal Impact ● More efficient replication as we're now using the default Solr mechanism ○ Docker-compose example available at https://github.com/aborroy/search-services-replication ● Now using atomic updates instead of removing and recreating documents ○ To achieve this we enabled the SOLR Transaction Log ● Review your backup and restore procedures, as the folder $SOLR_HOME/contentstore is not created anymore $ du -h /opt/alfresco-search- services/data/alfresco 4.7M ./index 8.5M ./tlog 4.0K ./snapshot_metadata COMMUNITY FTSSTATUS
  9. 9. 9 Full information for a Document can be still recovered by using Solr Queries. Solr Content Store Removal Impact http://127.0.0.1:8983/solr/alfresco/select?fl=*,[cached]&indent=on&q=DBID:563 COMMUNITY
  10. 10. 10 New Destructured Date Fields Solr schema simplification solrhome/core/conf/schema.xml Improved storage of DATE fields quarter day_of_month day_of_year day_of_week COMMUNITY
  11. 11. 11 New fields *_unit_of_time_* can be used to build queries Get all the documents created in 2020 SOLR FTS Nb. CMIS is also supported, but not for this example: ● cm:created is not supported as cm:auditable aspect is not exposed for CMIS protocol New Destructured Date Fields COMMUNITY
  12. 12. 12 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Commit TrackerIndex ---- ---- ---- https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  13. 13. 13 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Reindext2 Commit TrackerIndex ---- ---- ---- https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  14. 14. 14 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Reindext2 Purge t3 Commit TrackerIndex ---- ---- ---- https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  15. 15. 15 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Reindext2 Purge t3 Fixt4 Commit TrackerIndex ---- ---- ---- https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  16. 16. 16 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Reindext2 Purge t3 Fixt4 Commit TrackerIndex ---- ---- ---- t5 Dequeues scheduled work https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  17. 17. 17 Asynchronous Actions and Maintenance SearchServices Administrator Maintenance Queue Retryt1 Reindext2 Purge t3 Fixt4 Commit TrackerIndex -+- --+ +-- t5 Dequeues scheduled work t6 Index management https://docs.alfresco.com/search-community/concepts/solr-admin-asynchronous-actions.html COMMUNITY
  18. 18. 18 The FIX tool finds transactions and ACL change sets which are mismatched between the DB and Solr It adds them to be reindexed on the next maintenance cycle performed by the CommitTracker FIX Tool { "responseHeader": { "QTime": 1, "status": 0 }, "action": { "status": "scheduled", "txToReindex": [1, 2], "aclChangeSetToReindex": [3, 4] } } Old Response Shape ● “status” is always scheduled ● Only two error categories ● Each category contains the corresponding transaction identifiers COMMUNITY
  19. 19. 19 { "responseHeader": { // As before }, "action": { "dryRun": true, "status": "notScheduled", "txToReindex": { "txInIndexNotInDb": { "192": 282, // Tx 192 is associated to 282 nodes "827": 99 // Tx 192 is associated to 282 nodes }, "duplicatedTxInIndex": {...}, "missingTxInIndex": {...} }, "aclChangeSetToReindex": { // Very similar to txToReindex, but for ACLs } } } FIX Tool New Features ● dryRun (defaults to true): If true the output report is generated, but no reindex work is scheduled. ● fromTxCommitTime: The lower bound (the minimum transaction commit time) of the target transactions that you want to check or fix. ● toTxCommitTime: The upper bound (the maximum transaction commit time) of the target transactions that you want to check or fix. ● maxScheduledTransactions: The maximum number of transactions that will be scheduled. The default is 500 but this can be overridden in solrcore.properties. COMMUNITY
  20. 20. 20 Enable/Disable Indexing Motivation: Disable indexing in order to cancel a huge maintenance load • Enable / disable indexing on a specific core or on all master/standalone cores • MetadataTracker, ContentTracker, CascadeTracker, AclTracker are affected • CommitTracker, ModelTracker, ShardStatePublisher are not affected • When disabled, some admin endpoints (e.g. PURGE,INDEX) won’t execute • When disabled, the FIX endpoint will be forced to run in dryRun mode • If indexing is disabled in the middle of a tracking process, trackers will be set to rollback mode • Commands are idempotent • For more information see https://issues.alfresco.com/jira/browse/SEARCH-2330 Examples: Disable indexing on all master/standalone cores http://localhost:8983/solr/admin/cores?action=enable-indexing Disable indexing on a specific (master or standalone core) http://localhost:8983/solr/admin/cores?action=enable-indexing&core=alfresco COMMUNITY
  21. 21. 21 FIX Tool Demo Postman Collection containing the example requests used in the demo https://www.getpostman.com/collections/4c2fbe407a0134729546 COMMUNITY
  22. 22. 2222 Security, Performance and Precision Angel Borroy COMMUNITY
  23. 23. 23 ● Communication between Repository and SOLR (for searching and indexing) may be protected using mTLS Protocol with client authentication [1] ● New password handling mechanism has been introduced from ASS 2.0 / ACS 6.2.N [2]: ○ Switch from storing configuration in property files with passwords in plain text to JVM system properties ○ The old way of configuring should still work for backwards compatibility, but is discouraged for security reasons [2] ACS 6.2.N is not released yet! New mTLS Configuration [1] https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-6-1-is-coming-with-mutual-tls-authentication-by-default/ba-p/287905 COMMUNITY
  24. 24. 24 alfresco-ssl-generator command Line Tool to generate self- signed certificates (classic and current formats) https://github.com/Alfresco/alfresco-ssl-generator alfresco-solr-docker-mtls sample configuration (repo using classic and solr using current) https://github.com/aborroy/alfresco-solr-docker-mtls Additional resources Installing and configuring Search Services with mutual TLS using the distribution zip https://docs.alfresco.com/search-community/tasks/solr-install.html Alfresco mTLS Configuration Deep Dive https://hub.alfresco.com/t5/alfresco-content-services-blog/alfresco-mtls- configuration-deep-dive/ba-p/296422 New mTLS Configuration COMMUNITY
  25. 25. 25 ALFRESCO SOLR ALFRESCO REPOSITORY Trackers Reworking Transactions NodesMetadata NodesMetadata NodesMetadata NodesMetadata alfresco-remote-apiAlfresco DB TransactionsGet NodesMetadataGet NodesGet Nodes 1 2 1 2 3 3 Transaction Batch Node Batch Parallel Threads JSON COMMUNITY
  26. 26. 26 Trackers Reworking Transaction Batch Size for nodes and ACLs has an impact while the maximum number for your deployment is not reached. After that, you can increase this batch size but there will be no performance changes alfresco.transactionDocsBatchSize (default 2000) alfresco.changeSetAclsBatchSize (default 500) Increasing the Node Batch Size can improve your performance up to an optimal point for your deployment. After that, you can increase this batch size but the performance will be penalised alfresco.nodeBatchSize (default 100) alfresco.cascade.tracker.nodeBatchSize (default 10) alfresco.contentUpdateBatchSize (default 2000) alfresco.aclBatchSize (default 100) Increasing the maximum number of Parallel Threads improved performance until the maximum number for the deployment is reached. alfresco.metadata.tracker.maxParallelism (default 32) alfresco.cascade.tracker.maxParallelism (default 32) alfresco.content.tracker.maxParallelism (default 32) alfresco.acl.tracker.maxParallelism (default 32) HOTSPOT HOTSPOT Execution Time Parameter Size solrcore.properties 1 2 3 COMMUNITY
  27. 27. 27 FTS operator = has changed behaviour in 2.0.0 ● Detailed information is available in https://hub.alfresco.com/t5/alfresco-content-services-blog/exact-term-queries-in- search-services-2-0/ba-p/302200 ● Thanks @AFaust for noticing this issue: https://issues.alfresco.com/jira/browse/SEARCH-2461 Exact Search COMMUNITY
  28. 28. 2828 Enterprise Enhancements Keerat Lalia ENTERPRISE
  29. 29. 29 In previous releases, Shard State was communicated to the repository as part of the retrieval of information from the Metadata Tracker. That could generate problems when the Metadata Tracker cycle takes long time to execute. A new Shard State Publisher tracker has been added in order to report the state to the repository on regular basis. The new configuration for this tracker includes the following property. alfresco.nodestate.tracker.cron If this property is not specified, default cron is applied: alfresco.cron=0/10 * * * * ? * ShardState Tracker solrcore.properties ENTERPRISE Sharding
  30. 30. 30 DB_ID_RANGE Sharding • When a shard goes down then search can now be restored more quickly For more details see MNT-21591 ACS Node 1 ACS Node 2 SOLR Shard 1 DB_ID_RANGE SOLR Shard 2 DB_ID_RANGE Replica 1 Replica 2 ACS (alfresco-global.properties): search.solrShardRegistry.shardInstanceTimeoutInSeconds = 30 (Historically this should be set to more like 300 seconds) InsightEngine (solrcore.properties): alfresco.nodestate.tracker.cron=0/10 * * * * ? * This should be more frequent than the value set in ACS ENTERPRISE Sharding
  31. 31. 31 Solr Sharding Reindex When re-indexing a living Alfresco Repository with SOLR Sharding and solr.useDynamicShardRegistration enabled, the new SOLR Shard Indexer services should be configured with Alfresco NodeState Tracker off. Using this approach, the SOLR Indexer services are not registered in the living Alfresco Repository as available SOLR Shards and the living system can operate normally. Sharding Reindex (Demo) https://github.com/aborroy/solr-sharding-reindex This configuration uses two Docker Compose templates: ● living is an ACS server running 2 SOLR Shards configured with DB_ID method and Alfresco Search Services 1.4.3 ● indexer is an Indexer service running 2 SOLR Shards configured with DB_ID method and Alfresco Search Services 2.0.0.1 ENTERPRISE Sharding
  32. 32. 32 ● Improved SOLR JDBC support ● Added support for Excel and Tableau to Alfresco Search and Insight Engine using an ODBC Driver provided by a 3rd party company called CDATA ○ Download the driver in https://www.cdata.com/drivers/alfresco/ Alfresco REPOSITORY BI Tool Support ENTERPRISE BI Tools
  33. 33. 33 Improvements to SQL Support (JDBC & ODBC) • Support for Date Functions in SELECT Clause • Support for Date Functions in WHERE Clause • Support for Date Functions in GROUP BY Clause • Support for SQL avg(field) with multiple GROUP BY • Support for Date Functions in ORDER BY Clause • Support SQL TIMESTAMP format • Support for CAST AS TIMESTAMP function • Support for QUARTER function • Support for DAYOFMONTH, DAYOFWEEK, DAYOFYEAR functions • Support for TIMESTAMPADD(timeUnit, integer, datetime) function ENTERPRISE BI Tools
  34. 34. 34 JDBC Driver with DBVisualizer (Demo) ENTERPRISE BI Tools Alfresco REPOSITORY >> Working JDBC Client sample is available in https://github.com/aborroy/solr-jdbc-client
  35. 35. 35 CDATA ODBC installation The driver is simple to install on your machine and can be done using the steps on the following page: http://cdn.cdata.com/help/SJF/odbc/ Installation and setup is a simple two-step process, to be performed on end user’s machine 1. Install the driver 2. Configure the ODBC data source Configuration is fully documented by Cdata. ENTERPRISE BI Tools
  36. 36. 36 ODBC for Tableau • Can connect to your relevant data source and portray the results in a table from the source. • The results can be displayed by using the table directly or by entering a custom sql query to portray results specific to what the user wants to see. • Tableau consists of worksheets where we can build views of our data using the fields and graphs. • Each worksheet builds the results of one query through the use of the fields. • Can visualise our results as pie charts, bar charts, stacked bar charts, continuous line graphs and many more • We can edit out results by applying filters within Tableau on our selected fields. • Tableau has the ability to create dashboards to store all of our related queries on each of the sheets in one place. • Can preview the results on different devices like desktop, tablet and more. ENTERPRISE BI Tools
  37. 37. 37 ODBC for Excel • Simply start by doing a data dump into excel • Similar process to connect to the ODBC source like Tableau where you can connect and view all the results from the table or provide a custom sql query similar to Tableau. • Excel gives a preview of the results before going on to displaying the results on a different sheet. • You can filter the data before displaying the results through the preview by clicking the ‘transform’ button and then going on to filter your data to how you want. • You can use native excel functionality from your chosen dataset without heavily relying on SQL in comparison to using Zeppelin. ENTERPRISE BI Tools
  38. 38. 38 Supported Stack • Linux (Red Hat Enterprise v7.6 x64) • CentOS 7 x64 • Ubuntu 18.04 • SUSE 12.0 SP1 x64 • Windows Server 2012 R2 (x64) • Windows Server 2016 Server OS • Solr 6.6.5 Solr • OpenJDK 11.0.8 • Oracle JDK 11.0.1 Java • Alfresco Enterprise Edition (ACS) 6.2 • Alfresco Community Edition 201911 GA Alfresco Content Services COMMUNITY ENTERPRISE Release notes https://hub.alfresco.com/t5/alfresco-content-services-blog/search-services-2-0-0-release/ba-p/301308
  39. 39. 39 2.0.0.0 2.0.0.1 shared.properties • Suggestable Properties and Cross Locale fields • This may have an impact in the SOLR index • Spellcheck and Tokenisation work by default 2.0.x • Settings changed back to commented out by default like previous versions 2.0.0.1 COMMUNITY ENTERPRISE
  40. 40. 4040 One more thing... COMMUNITY
  41. 41. 41 https://hub.alfresco.com/t5/alfresco-content-services-blog/how-to-track-the-progress-of-the-indexing-process-in- alfresco/ba-p/301444 SELECT count(1) FROM alf_node WHERE store_id=6; count ------- 835 Is my SOLR Index (fully) updated? http://127.0.0.1:8983/solr/admin/cores?action=summary&core=alfresco SELECT id FROM alf_store WHERE protocol='workspace' AND identifier='SpacesStore'; COMMUNITY
  42. 42. 42 Index Checker Tool https://github.com/AlfrescoLabs/index-checker Simple report $ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false Count SOLR documents = 814 Count DB nodes = 815 The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content Count SOLR permissions = 58 Count DB permissions = 58 >> Available from Search Services 1.4.3
  43. 43. 43 Index Checker Tool Detailed report $ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=false Count SOLR documents = 814 Count DB nodes = 815 The database contains 2 nodes more than SOLR Index for {http://www.alfresco.org/model/content/1.0}category TYPE {http://www.alfresco.org/model/content/1.0}category: DbIds present in DB but missed in SOLR [212, 213] SOLR indexed 1 nodes more than the existing in database for {http://www.alfresco.org/model/content/1.0}content TYPE {http://www.alfresco.org/model/content/1.0}content: DbIds present in SOLR but missed in DB [584] Count SOLR permissions = 58 Count DB permissions = 58 Batches of 1,000 elements
  44. 44. 44 Fix actions $ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=true --run.fix.actions=true Count SOLR documents = 814 Count DB nodes = 815 ... No Database Rows Were Harmed in the Fixing of This Solr Index $ java -jar target/index-checker-0.0.1-SNAPSHOT.jar --report.detailed=false --run.fix.actions=false Count SOLR documents = 815 Count DB nodes = 815 Index Checker Tool >> Watch the living demo in https://youtu.be/YU-WyNgCH2U
  45. 45. Questions? Join us in https://hub.alfresco.com

×