This ppt explains you the details about an alfresco node lifecycle (including which alfresco database tables are affected upon node operation-like node creation, deletion). Apart from it, it also explain which particular case-sensitive alfresco service should be used (nodeService vs NodeService, searchService vs SearchService) in order to maintain security in your application. Lastly it covers zones in alfresco (authentication-related zones and application-related zones)
3. NodeService vs nodeService
• When you inject the alfresco services, it is
a best practice (highly recommended) to
use services with upper case (ex:
NodeService) instead of nodeService.
• Reason : nodeService bypasses security
check, transaction check and directly
performs the operation on the node.
4. Files involved
• public-services-context.xml - where NodeService
is defined (it is for users to access
services/beans from)
• node-services-context.xml - where actual
nodeService is defined.
• core-services-context.xml - bean for
registryService where NodeService and
SearchService are injected.
5. Alfresco Node Lifecycle
• A node in alfresco can be called as the heart of
the content repository.
• The content repository is basically composed of
three imp accessories -
– Database (containing node metadata)
– file-system (storing the actual content)
– indexes (contains node information from both - db as
well as file system)
• If indexes are lost or corrupted, they can be
rebuilt using reindexing techniques.
6. Step 1 : Creation of node
• User creates a node in alfresco
• Content gets created at <alf_dir>/alf_data/contentstore/<date-
time>/content_uuid.bin
• NOTE: This uuid is not the same as the noderef of the content.
• The nodeRef of the content is stored in the database; the tables
where this node entry will be added - alf_content_data,
alf_content_url and alf_node.
• alf_content_url will actually store the content url and the content
nodeRef (short name).
• alf_node table will have the full nodeRef of the content along with
the store id (6); which stands for workspace://SpacesStore.
• The search index will be created under alf_data/lucene-
indexes/workspace/SpacesStore.
• For solr search engine, the indexes will be created under
alf_datasolrworkspaceSpacesStoreindex.
7. Step 2 : Deletion of node
• User deletes the node.
• After deletion (from any UI-DM, share or other interface), the node
lives exactly at the same place. (alf_data/contentStore)
• In db (alf_node table), node is marked as living in a different store
(archive://SpacesStore - having store id 5)
• With indexes, it will still remain in the search index, but its now
moved to archive store (alf_data/lucene-
indexes/archive/SpacesStore).
• For solr search, it will be moved to
alf_datasolrarchiveSpacesStoreindex.
• At this point, if a user goes to 'Manage deleted items' from user
profile, and restores the item , then : the node will move back to
workspace://SpacesStore.
• Db store id will change again from 5 to 6.
• And index will move from back to alf_data/luence-
indexes/workspace/SpacesStore from alf_data/luence-
indexes/archive/SpacesStore.
8. Step 3: Empty the trashcan
• User empties the trashcan.
• Let's assume he empties the trashcan 30 days after he deleted that node.
• What happens now ?
• File system : Node lives at same place
• DB : It's not yet deleted; it's only marked as deleted.
• The alf_node table has a field named 'node_deleted' which is set to '1' to
indicate that is a deleted node.
• NOTE: From alfresco 4.1.x versions, the node_deleted column is removed
from the table and sys:deleted type is applied to identify deleted nodes.
• Alfresco now considers any related content file found in the file-system
content-store as 'orphan'.
• At this point where the 'node_deleted' field becomes '1', the orphan is
declared by updating the 'orphan_time' field in the table alf_content_url from
null to current_timestamp.
• This is done to quickly identify the orphaned items later on.
• Now onwards, all db queries made by alfresco will only read the rows where
node_deleted = 0.
• The search index will be empty for this node. Its removed from all search
indexes. (i.e it cannot be found either in workspace/SpacesStore or in
archive/SpacesStore).
9. Step 4: Node's last breath
• An orphan-cleaner job (schedular) runs (which is the contentStoreCleanerTrigger)
• It executes 4 am every night, by default.
• This orphan cleaner trigger doesn't act on the orphans immediately. It waits for a
period of x protected days.
• That is, it queries the table (alf_content_url) for orphan_time field values greater than
14 days old.
• Lookup file for reading x protected days - content-services-context.xml and further
repository.properties.
• It does not actually delete the content files, instead it simply moves them out of the
'/alf_data/contentstore' folder location and into the folder
'alf_data/contentstore.deleted'.
• After moving an orphaned content file out of the active content-store, the relevant
line/row in alf_content_url table is deleted from the DB.
• Orphaned content files that have been deleted from the content store, sit around the
'contentstore.deleted' folder forever... until a system administrator either backs it up,
moves it, or deletes it.
• So, node gone finally from file system (contentstore to contentstore.deleted)
• DB: It's still unchanged in the alf_node table (however the reference in alf_content_url
is removed)
• Search index : it doesn't exist in any search index.
10. Step 5 : Removal from db
• A scheduled job (nodeServiceCleanupTrigger) runs at 9 PM
everyday to clean the db.
• After 30 days from when the 'node_deleted' field was set to '1', this
process considers it safe to truly delete the node.
• Note: it doesn't use the audit_modifed date, since this wasn't
changed when the row was marked for deletion. Instead, it uses the
commit_time_ms transaction time from the alf_transaction table.
• Note: this job also removes old transactions from the alf_transaction
table. Transactions are considerd old using the same property as
node removal work: '30 days'; defined using the property
'index.tracking.minRecordPurgeAgeDays').
• So, finally, the node from :
• File system : Gone
• DB : Gone
• Index : Gone
• So, after 14 days of removing a node from the archive store, it's
taken out of the content store on the file-system, and after a further
15 days (approx) it is finally removed from the database too.
12. Zones
• Zones are used for classification of authorities.
• For e.g, Alfresco synchronization uses zones to record
from which LDAP server users and groups have been
synchronized.
• Zones are used to hide some groups that provide Role
Based Access Control (RBAC) role-like functionality from
the administration pages of the Alfresco Explorer and
Alfresco Share web clients.
• Examples of hidden groups are the roles used in
Alfresco Share.
• Only users and groups present in the default zone are
shown on the alfresco explorer/share administration
pages.
• Each and every user or group in alfresco fall under one
or more zones.
13. • Zones cannot be managed from the administration pages of Alfresco
Explorer and Share.
• Zones are grouped into two areas: Application-related zones and
authentication-related zones.
• Application-related zones are prefixed with APP whereas
authentication-related zones are prefixed with AUTH.
• Preview from : Node Browser > workspace://SpacesStore > System >
zones.
14. • AuthorityContainer and Person are sub-classes of
Authority and as such can be in any number of Zones.
• Example : APP.SHARE (a zone) > GROUP_site_oreilly-
clms (and all the ROLES you see there) - (an
authorityContainer)
• And inside authorityContainer or group there are
members (person).
15. Application-related zones
• Application-related zones, other than the default
(APP.DEFAULT), hide groups that implement RBAC like
roles. (ex: APP.SHARE, APP.RM)
• APP.DEFAULT is for person and group nodes to be
found by a normal search (through DM and Share).
• By default, each and every user and group you create in
alfresco DM/Share will belong to this default zone.
• Also, ALFRESCO_ADMINS and EMAIL_CONTRIB..
(OOTB groups) will belong to default zone.
• But, oreilly-clms_SiteManager, SiteContributor, etc are
ROLES; so they won't belong to this default zone.
• APP.SHARE is for hidden authorities related to Alfresco
Share.
• APP.RM will be added for authorities related to RM.
16. Authentication-related zones
• Authentication-related zone is where the ROLES come into picture
because the role of a person or group authenticate them to access a
resource.
• AUTH.ALF is for authorities defined within Alfresco and not
synchronized from an external source. This is the default zone for
authentication.
• Ex: Go to AUTH.ALF zone. It shows the authorityContainers and
users (belonging to those authorityContainers). But not the ones
synched from LDAP.
• AUTH.EXT.<ID> is for authorities defined externally, such as in
LDAP.
• Ex: Go to AUTH.EXT.supplierLDAP and AUTH.EXT.internalLDAP.
• It will show up the user and groups that were synched from LDAP.
(shown based on which sync is run - full or differential)
• More on LDAP sync :
• http://wiki.alfresco.com/wiki/The_Synchronization_Subsystem