Alfresco Backup and Recovery Tool: a real world backup solution for Alfresco


Published on

Presentation used in the Alfresco Summit 2014 (both Barcelona and Boston).
If you want to see the demo visit:

White Paper and presentation video can be found here:

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Multiple types of hazards can occur while a system is operating, hardware or software failures, data corruption, natural disasters, human errors, performance issues, etc.Also planned or unplanned interruptions What is? Backup, Archiving, DRBackup: copy of data to restore in case of loseArchiving: moving data to separate storage, no longer used or requiredDR: process, policies and procedures for recovery or service continuation after a disasterBusiness continuity.Financial impact to the business when the system is unavailable, performance, corruptionA Backup and DR strategy must be design based on these metrics: The time between backups is called Recovery Point Objective (RPO) Time taken to restore the application and make it available is called Recovery Time Objective (RTO)Who need to take the decision?
  • There are different backup levels:Full backup: when we are doing a complete copy of all of the files. This backup tends to be slow and is typically performed as first backup or at a regular interval of time.Incremental: when only the changes from the last backup are backed up. Faster backup than cumulative; could be slower to restore than cumulative because there could be more files to restore.Cumulative or Differential: only copy changes after the most recent full backup. This method may be slower than incremental but is usually faster to restore.Types of backup techniques depending on the system availability:Cold: a complete backup of all components of Alfresco with the entire system shut down. Warm: backup performed while some services of Alfresco are unavailable, i.e.: set the repository to read only mode. Hot: backup performed while the system is running and potentially being used.Other concepts to take into account:Backup window: time to do it. With Alfresco it depends on the type of backup chosen.Backup rotation: time period while doing incremental backups between periodic and full backups: daily, weekly or monthly are most common.Backup destination: Network device (NAS, Amazon S3, SCP, FTP, etc.), SAN, disk to tape, disk to disk. Each backup method can be oriented for different solutions and depending on the amount of data to backup. For disaster recovery consider using a remote backup method.
  • Geo or not geoA disaster recovery plan and environment can be performed in different ways:Disaster recovery deployment with full capacity: backups and configuration are replicated to a target deployment with existing hardware and software that has the same capacity as the original one.Disaster recovery deployment with reduced capacity: backups and configuration are replicated to a target deployment with existing hardware and software but with less capacity that the original one.Data disaster recovery only: Backups and configuration are replicated without hardware or software deployed.The Alfresco subscriptionincludes a stand-by disaster recovery environment.
  • Static DataOperating System (not covered by this procedure).Application Server Install and configuration files.Database installation files (if it is in same server, not recommendable).Alfresco extensions (customizations).3rd Party applications used by Alfresco (Open Office, ImageMagick, SWFTools).Dynamic DataAlfresco Indexes (Solr or Lucene)Database (RDBMS data files, table spaces, archive logs and control files).Alfresco Content Stores – the default and any other additional store used by Content Store Selector. Content Store Deleted is not required.Indexes should be backed up first. If new rows are added in the database after the Lucene/SOLR backup is done, it’s still possible to regenerate the missing Lucene/SOLR indexes from the SQL transaction data. Database backup should be performed next. If you have a SQL node pointing to a missing file, that node will be an orphan. If you have a file without a SQL node data, that file will not be included in the backup. DB tools for backup: talk about
  • Also called sets (Alfresco BART)
  • When a node is deleted in Alfresco by a user, this content will remain in the trashcan forever unless the user or the administrator clean the trashcan. If the trashcan is cleaned by the user or administrator, content are marked as orphans (deleted) and after 14 days the content is moved to contentstore.deleted. (See section “Other scheduled jobs to consider on a backup strategy” for more details).
  • The procedures and concepts covered by this guide can be used to implement your disaster recovery plan as an asynchronous procedure.DB SolutionsStorage SolutionsAWS
  • Duplicity incrementally backs up files and folders into tar-format volumes encrypted with GnuPG and places them to a remote (or local) storage backend. See chapter URL FORMAT for a list of all supported backends and how to address them. Because duplicity uses librsync, incremental backups are space efficient and only record the parts of files that have changed since the last backup. Currently duplicity supports deleted files, full Unix permissions, uid/gid, directories, symbolic links, fifos, etc., but not hard links.
  • 5AM
  • Alfresco Backup and Recovery Tool: a real world backup solution for Alfresco

    1. 1. Alfresco Backup and Recovery Tool: a real world backup solution November 2013 Toni de la Fuente – Alfresco Senior Solutions Engineer - @toniblyx @toniblyx at #SummitNow
    2. 2. Who I am? @toniblyx • Alfresco Senior Solutions Engineer, Americas • Working with Alfresco for 6 years • 3.5 years as part of the team • Former Consultant & Security Auditor: ethical hacking, penetration tests. • And writing at since 2002 @toniblyx at #SummitNow @toniblyx at #SummitNow
    3. 3. Agenda • Foundation Concepts • Alfresco Backup Overview • Backup and Restore Alfresco with Alfresco BART @toniblyx at #SummitNow @toniblyx at #SummitNow
    4. 4. White Paper Status: • Draft • On review @toniblyx at #SummitNow @toniblyx at #SummitNow
    5. 5. Backup and Disaster Recovery • • • • Backup, Archiving, Disaster Recovery Why? Business impact RPO and RTO @toniblyx at #SummitNow @toniblyx at #SummitNow
    6. 6. Backup and Disaster Recovery • Methods • Full, incremental, differential • Techniques • Cold, warm, hot • Window, rotation, destination @toniblyx at #SummitNow @toniblyx at #SummitNow
    7. 7. Geo Disaster Recovery • DR Primary Disaster Recovery preparedness: Backup • Daily content backups • Backup servers on standby (active/passive) • Regular backup and restore testing @toniblyx at #SummitNow @toniblyx at #SummitNow
    8. 8. Backup Procedure and Methods • • • • What? Static Dynamic Order • Cold • Warm • Hot @toniblyx at #SummitNow @toniblyx at #SummitNow
    9. 9. Alfresco Backup Overview • Components • Scheduled jobs • Other scheduled jobs to consider system.content.orphanCleanup.cronExpression=0 0 4 * * ? system.content.orphanProtectDays=14 Physical system.content.eagerOrphanCleanup=false Storage Lucene or SOLR Relational Database File System Installation, Config and logs files @toniblyx at #SummitNow @toniblyx at #SummitNow
    10. 10. Restore Procedure - User • Trashcan @toniblyx at #SummitNow @toniblyx at #SummitNow
    11. 11. Restore Procedure – Sys Admin 1. 2. 3. 4. 5. 6. Installation Configuration Customization DB Content Store Indexes @toniblyx at #SummitNow @toniblyx at #SummitNow
    12. 12. Disaster Recovery • DR Active - Active Active - Passive @toniblyx at #SummitNow @toniblyx at #SummitNow
    13. 13. Alfresco BART • Description • Features • Concepts • Installation • Usage • Disaster Recovery • Demo • TODO @toniblyx at #SummitNow @toniblyx at #SummitNow
    14. 14. Alfresco BART - Description • Alfresco Backups and Recovery Tool written in shell script on top of Duplicity (Linux servers). • Local file system, FTP, SCP or Amazon S3. • Indexes, data base, content store and deployment and configuration files. @toniblyx at #SummitNow @toniblyx at #SummitNow • v0.2 at
    15. 15. Alfresco BART - Features Bash + properties Recovery commands & wizard Backup Policies pg_dump mysqldump Indexes: Lucene / Solr Duplicity Oracle imp/exp librsync Log Reporting S3, FTP, SCP, Local Custom Tape Volume Size Pip Single File Recovery Cluster Aware Boto Fabric NcFTP Major DBs Supported Content Store Selector Aware Python GnuPG Full / Incremental Backup Geo Desaster Recovery Compress Encryption @toniblyx at #SummitNow @toniblyx at #SummitNow
    16. 16. Alfresco BART - Concepts • Full, incremental • Backup sets • all • index (backup & config) • db • Cs • files • Dates • now, s, m, h, D, W, M or Y • YYYY/MM/DD, YYYY-MM-DD, MM/DD/YYYY, or MM-DD-YYYY @toniblyx at #SummitNow @toniblyx at #SummitNow
    17. 17. Alfresco BART - Installation • Dependences • python + duplicity + DB dump/export • Create PGP key • Copy files to Alfresco “scripts” dir • Configure • • Add to crontab* @toniblyx at #SummitNow @toniblyx at #SummitNow • "0 5 * * * /path/to/ backup"
    18. 18. Alfresco BART - Usage Modes of work: • Backup: runs an incremental backup or a full if first time • Restore: runs the restore, wizard if no arguments • Verify: verifies the latest backup with current files • Collection: shows all the backup sets in the archive sorted by date and type (full or inc) • List:lists the files currently backed up in the archive @toniblyx at #SummitNow @toniblyx at #SummitNow
    19. 19. Alfresco BART – Disaster Recovery Procedure 1. *Backup destination must be remote 2. Install and configure Alfresco BART as the source. 3. Copy the directory ~/.gnupg from de original server to the new one (if gpg encryption is used). 4. Run the recovery wizard or command as usual. 5. Enjoy restoring your disaster recovery #SummitNow @toniblyx at at #SummitNow @toniblyx environment.
    20. 20. Demo – Let’s Rock & Roll m/ @toniblyx at #SummitNow @toniblyx at #SummitNow
    21. 21. Alfresco BART - TODO • Documentation • Validators • Postgresql and Oracle single repo file recovery • Admin panel configuration • Suggestions? @toniblyx at #SummitNow @toniblyx at #SummitNow
    22. 22. FAQ • Windows Support? • Really large repositories? • Alfresco versions supported @toniblyx at #SummitNow #SummitNow
    23. 23. Any questions? @toniblyx at #SummitNow #SummitNow
    24. 24. Conclusions Backup thing my not be the most amazing task but… it always could be worse!! @toniblyx at #SummitNow @toniblyx at #SummitNow
    25. 25. if [ $you = applause ]; then echo “THANKS!”; fi Toni de la Fuente Alfresco Senior Solutions Engineer Blog: Twitter: @ToniBlyx @toniblyx at #SummitNow
    26. 26. @toniblyx at #SummitNow
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.