Successfully reported this slideshow.
Your SlideShare is downloading. ×

Storage and Alfresco

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 31 Ad

More Related Content

Slideshows for you (20)

Similar to Storage and Alfresco (20)

Advertisement

More from Toni de la Fuente (20)

Recently uploaded (20)

Advertisement

Storage and Alfresco

  1. 1. Storage Foundation and Alfresco Toni de la Fuente Principal Solutions Engineer, Americas toni.delafuente@alfresco.com Blog: blyx.com – Twitter: @ToniBlyx
  2. 2. Agenda •  Intro to Storage Concepts •  Hardware •  Alfresco Storage Related Solutions –  Alfresco S3 •  Caching contentstore –  Alfresco XAM –  Content Store Selector –  Replication / Geo-clusters / Redundancy •  Partners Solutions –  Alf2CAS, Star Storage •  Storage Best Practices with Alfresco •  Backup and Recovery
  3. 3. Intro to Storage Concepts: stack File Protocol NFS, CIFS, SMB File System Ext3, Ext4, RaiserFS, XFS, GFS, NTFS, FAT32, GlusterFS, OCFS, ZFS Block Management MDM, LVM (Logical Volume Management) Block Protocol SCSI, SATA, FC RAID (HW or SW) Mirrors, Stripes Hardware Disks, connectors, racks, FC switches
  4. 4. Intro to Storage Concepts •  Hard drive types and interfaces –  PATA: Parallel Advanced Technology Attachment •  AKA IDE or EIDE, older, 20pin connector, less efficient, use to be 4K – 5K rpm. –  SATA: Serial ATA •  Similar to PATA, different connector, more energy efficient, between 5K and 10K rpm. –  SCSI: Small Computer System Interface •  Spin at 10K and 15K rpm, need a controller –  SSD: Solid State Drives •  No mechanical, semiconductors, much faster than mechanical and less likely to break down than others.
  5. 5. Intro to Storage Concepts •  Hard drive types and interfaces –  FC: Fibre Channel •  Successor to parallel SCSI, broader usage than mere disk interfaces, used for SANs. –  SAS: Serial Attached SCSI •  Similar to SCSI but serial rather than parallel. –  Other interfaces end user oriented: •  USB •  Firewire •  Thunderbolt •  CAS Content-addressable storage, is a mechanism for storing information that can be retrieved based on its content, not its storage location. (EMC Centera / Caringo) •  XAM standard interface for archiving in CAS.
  6. 6. Intro to Storage Concepts •  RAID types (SW or HW) ß Faster with parity
  7. 7. Intro to Storage Concepts Main differences between SAN and NAS A SAN is a shared "network" of storage •  Block access to LUNs •  Online and offline storage •  SAN device = storage array •  Zoning: data integrity and security •  Dedicated fiber network Protocols: •  SCSI over Fibre Channel •  SCSI over IP/Ethernet (iSCSI) and FC, Infiniband NAS is a file system shared over a network •  File access to data •  Online storage only •  NAS device = File server or "filer” already formatted Protocols: •  NFS, CIFS over IP over Ethernet
  8. 8. Intro to Storage Concepts Who should need a SAN? •  Database servers and ECM: Oracle, SQL Server, DB2 and other database servers. •  File servers: Using SAN-based storage for file servers lets you expand file server resources quickly, makes them run better, and enables you to manage your file-based NAS storage through the SAN. •  Backup servers: SAN-based backup is dramatically faster than LAN-based backup. •  Voice/video servers: Manage large amounts of data very quickly. •  High-performance application servers: Applications such as document management, customer relationship management, billing, data warehouses, and other high- performance and critical applications all benefit by what a SAN can provide.
  9. 9. Intro to Storage Concepts •  Evolution Internal Storage Direct-Attach Storage (DAS) Network-Attached Storage (NAS)
  10. 10. Hardware H B A C A R D Tape Library Fibre Cables Storage Arrays
  11. 11. Alfresco Storage Related Solutions Alfresco S3 Connector •  An alternative contentstore implementation that uses S3 directly (S3 APIs) •  Somewhat equivalent to XAM, but not identical –  Unlike XAM, S3 doesn’t offer retention policies •  Enterprise only –  USD10K for Alfresco Standard –  USD13.4K for Alfresco Enterprise •  Shipped as a single repo-side AMP •  Can only be installed into a new Alfresco instance (no migration!) •  Configuration must be done before first start. •  Can also configure caching content store (default cache size: 50GB) •  Only supported if Alfresco is running on Amazon EC2 •  Amazon EBS still required for database files, indexes, etc. •  Does not support S3 Encryption yet.
  12. 12. Alfresco Storage Related Solutions Alfresco XAM Connector (deprecated) •  Made to get access from Alfresco to XAM enabled storage devices. •  New XAM connector available •  Only EMC Centera supported •  Released with 3.4, Jan 2011. •  Enterprise only •  Still being supported for existing customers –  until November 30th 2014 or their current subscription runs out, whichever comes first.
  13. 13. Alfresco Storage Related Solutions Content Store Selector •  Storage policies based in business rules •  Since Alfresco 3.2 •  Examples o  By type: Large video files on fast expensive drives. Office documents on slower, more cost effective, drives. o  By business unit, by age, by usage, by ... •  Leverage Rules and Actions to drive SSD $$$ SATA Drive $ SSD = Solid State Drives FC = Fibre Channel Policy Rules Policy Rules FC Drives $$
  14. 14. Alfresco Storage Related Solutions Content Replication (Alfresco on-premise to Alfresco on- premise) •  Distributed repository replication –  Selective replication of spaces and content –  Support for full, incremental and delete –  One source – multiple destinations –  Replicas are read-only (update at source only - re- direct if needed) •  Benefits –  Support geographically dispersed companies –  Provide fast local access –  Remove single point of failure –  Reduce wide area network traffic
  15. 15. Alfresco Storage Related Solutions Content Replication / Geo-clusters / Redundancy •  Alfresco Cloud Sync: On premise ßà Cloud –  Content oriented not for storage replication •  Synchronization feature between Alfresco on- premises (Not available yet). •  Alfresco Desktop Sync: from Windows or Mac desktop to Alfresco on-premise (not available yet)
  16. 16. Alfresco Storage Related Solutions Geo-clusters and Redundancy •  Geo-clusters can be done by replicating DB and Content store. Supported? –  Low level replication/sync –  Some customers has this. –  Some customer uses NetApp NAS storage and Golden-gate for DB replication –  Other replication tools: EMC Clariion, EMC Symmetrix or IBM Total Storage.
  17. 17. Partners Solutions •  Xenit Alf2Cas –  Caringo Castor integration –  Deprecated? •  Star Storage – Hitachi Content Platform (HCP) –  Content archiving, additional storage and faster content backup –  Alfresco Enterprise: 3.4.x, 4.0.x –  Hitachi Content Platform (HCP): 4.x, 5.x, 6.x
  18. 18. Third Party – Community Solutions •  StorNext –  It is not a connector is a solution for data life cycle management in the background –  Alfresco can see it as mount point and is not aware about that –  Runs over FC •  EMC Atmos –  XAM connector for Alfresco •  Alfresco Cloud Store –  Amazon S3 –  https://code.google.com/p/alfresco-cloud-store/ •  Amazon S3 for on premise –  https://issues.alfresco.com/jira/browse/AMZNSSS-26 •  Walrus? The S3 alternative for Eucalyptus
  19. 19. Storage Best Practices •  Content Store –  Use Content Store Selector for managing different size of contents. –  Default content store should be faster than others for writing to avoid bottlenecks (contents come to default then copied to other content store) –  WORM disks as non default content store (cleaner - Jefferies) –  SAN if possible –  If NAS use a dedicated LAN if possible –  LVM if possible (scalability, snapshot) –  Clean trash bin often –  Delete “contentstore.deleted” often
  20. 20. Storage Best Practices •  Indexes (SOLR or Lucene) –  Dedicated disk local or SAN. –  Avoid NAS. –  Have at least 50-75% of space free (backup and merge) –  Consider using different file system for Lucene backup and Solr backup. •  Logs –  Set your logs directory in different file system as Content Store and Indexes.
  21. 21. Backup and Recovery •  Recovery Time Objective: (RTO) The amount of time that it takes to get your systems back online. •  Recovery Point Objective: (RPO)This is the last consistent data transaction prior to the disaster. If you had a disaster, how much data would be lost? •  The Disaster Recovery plan (DR) focuses on getting your business back up and running after a major outage •  The Business Continuance plan (BCP) focuses on keeping your business running DURING the disaster.
  22. 22. Backup and Recovery •  Alfresco Backup and Recovery Tool is available: –  http://blyx.com/open-source-contributions/alfresco- bart/ •  Alfresco Backup and Recovery White Paper: –  http://www.slideshare.net/toniblyx/alfresco-backup- and-disaster-recovery-white-paper
  23. 23. Common Questions to SE? •  Best practices to storage. –  You got it •  NAS or SAN? –  SAN if possible! Or NAS backed by a SAN is common as well. NAS is not bad but now you know why is different. •  Required space for DB, Indexes, Content Store? –  It depends of any case but DB and Indexes use to be a 20% of the Content Store space (each). •  Do you have an Archiving solution? –  Alfresco can be integrated with Archiving solutions like mentioned above and implemented with Content Store Selector. •  Do you have a backup/recovery solution? –  http://www.slideshare.net/toniblyx/alfresco-backup-and-disaster-recovery-white- paper •  Do you have an data encryption solution? –  Yes, Alfresco Encryption at Rest: http://docs.alfresco.com/5.0/concepts/encrypted-overview.html
  24. 24. What kind of storage can I use with Alfresco? •  Any mountable volumes that can be made to appear as standard local filesystems (local disks, NAS, SAN, etc.) •  Amazon S3 (for Alfresco installations in AWS) •  Centera (through the now open source connector) •  EMC Atmos (through a partner-created integration) •  CAStor (through a dated partner-created integration)
  25. 25. Appendix 1: Deleting content
  26. 26. Deleting Content •  A complex process •  You need to know this because it impacts –  Disk space management –  Backup and recovery procedures (and their integrity) –  Security and auditing •  You have a wide degree of control over what happens and when •  You need to do some work •  More info page 24 http://www.slideshare.net/toniblyx/alfresco-security-best- practices-guide
  27. 27. Node deletion workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   User  deletes  document   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database  
  28. 28. Node deletion Wastebasket  emp5es   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   orphan_time  =  'now'   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   others   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database  
  29. 29. Node deletion workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   orphan_time  =  'now'   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database   contentStoreCleaner   Runs   workspace://SpacesStore   alf_node   alf_content_data   alf_content_url   alf_node_properties   2e3839d2d345.bin   archive://SpacesStore   contentstore   ~/alf_data   contentstore.deleted   filesystem  database  
  30. 30. Questions?

×