IBM Spectrum Scale (formerly Elastic Storage) provides software defined storage capabilities using standard commodity hardware. It delivers automated, policy-driven storage services through orchestration of the underlying storage infrastructure. Key features include massive scalability up to a yottabyte in size, built-in high availability, data integrity, and the ability to non-disruptively add or remove storage resources. The software provides a single global namespace, inline and offline data tiering, and integration with applications like HDFS to enable analytics on existing storage infrastructure.
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
The document discusses IBM Spectrum Scale, a software-defined storage product. It provides a unified file and object storage system with integrated analytics support. New features in versions 4.2 and 3.5 include reducing costs through compression and quality of service policies, accelerating analytics with native HDFS support, and simplifying deployment with new graphical user interfaces.
This document discusses IBM's Elastic Storage product. It provides an overview of Elastic Storage's key features such as extreme scalability, high performance, support for various operating systems and hardware, data lifecycle management capabilities, integration with Hadoop, and editions/pricing. It also compares Elastic Storage to alternative storage solutions and discusses how Elastic Storage can be used to build private and hybrid clouds with OpenStack.
Introduction to IBM Spectrum Scale and Its Use in Life ScienceSandeep Patil
IBM Spectrum Scale is a scalable file system that can be used to support life science research. It provides high scalability, high availability, and a software read cache called Local Read Only Cache (LROC) that uses SSDs to improve performance. The University of Basel uses Spectrum Scale in their scientific computing and storage infrastructure to support various research areas including bioinformatics, structural biology, and hosting reference services. It provides features such as cluster file systems, data migration, hierarchical storage management, encryption, and disaster recovery between two sites using asynchronous file migration.
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
This document discusses how HDFS tiered storage can be used to reduce storage costs by 5x. It introduces the new HDFS storage model that supports multiple storage types like ARCHIVE, DISK, SSD, and RAM_DISK. Block storage policies like HOT, WARM, and COLD can be defined to control where blocks are stored. eBay uses HDFS tiered storage to archive older data to cheaper ARCHIVE nodes, analyzing access patterns to define archival policies. Data is moved between storage types using the HDFS mover tool while maintaining replication and rack requirements.
IBM Spectrum Scale is software-defined storage that provides file storage for cloud, big data, and analytics solutions. It offers data security through native encryption and secure erase, scalability via snapshots, and high performance using flash acceleration. Spectrum Scale is proven at over 3,000 customers handling large datasets for applications such as weather modeling, digital media, and healthcare. It scales to over a billion petabytes and supports file sharing in on-premises, private, and public cloud deployments.
S ss0885 spectrum-scale-elastic-edge2015-v5Tony Pearson
IBM Spectrum Scale offerings include the Spectrum Scale software that you can deploy on your own choice of hardware, Elastic Storage Server and Storwize V7000 Unified pre-built systems.
Snapshots have been a key feature of primary storage infrastructures that IT professionals have relied on for years. But storage systems have traditionally been able to support only a limited number of active snapshots. And snapshots, being pointers and not actual data, are also susceptible to a primary storage system failure. As a result, most IT professionals use snapshots sparingly for protecting data. In this webinar Storage Switzerland and Nexenta show you how primary storage can be architected so that snapshots are able to meet almost all of the data protection requirements an organization has.
Ben Golub gives insight to the latest storage trends including the EMC's latest acquisition of Isilon.
http://blog.gluster.com/2010/11/storage-is-sexy-again/
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
The document discusses IBM Spectrum Scale, a software-defined storage product. It provides a unified file and object storage system with integrated analytics support. New features in versions 4.2 and 3.5 include reducing costs through compression and quality of service policies, accelerating analytics with native HDFS support, and simplifying deployment with new graphical user interfaces.
This document discusses IBM's Elastic Storage product. It provides an overview of Elastic Storage's key features such as extreme scalability, high performance, support for various operating systems and hardware, data lifecycle management capabilities, integration with Hadoop, and editions/pricing. It also compares Elastic Storage to alternative storage solutions and discusses how Elastic Storage can be used to build private and hybrid clouds with OpenStack.
Introduction to IBM Spectrum Scale and Its Use in Life ScienceSandeep Patil
IBM Spectrum Scale is a scalable file system that can be used to support life science research. It provides high scalability, high availability, and a software read cache called Local Read Only Cache (LROC) that uses SSDs to improve performance. The University of Basel uses Spectrum Scale in their scientific computing and storage infrastructure to support various research areas including bioinformatics, structural biology, and hosting reference services. It provides features such as cluster file systems, data migration, hierarchical storage management, encryption, and disaster recovery between two sites using asynchronous file migration.
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature DataWorks Summit
This document discusses how HDFS tiered storage can be used to reduce storage costs by 5x. It introduces the new HDFS storage model that supports multiple storage types like ARCHIVE, DISK, SSD, and RAM_DISK. Block storage policies like HOT, WARM, and COLD can be defined to control where blocks are stored. eBay uses HDFS tiered storage to archive older data to cheaper ARCHIVE nodes, analyzing access patterns to define archival policies. Data is moved between storage types using the HDFS mover tool while maintaining replication and rack requirements.
IBM Spectrum Scale is software-defined storage that provides file storage for cloud, big data, and analytics solutions. It offers data security through native encryption and secure erase, scalability via snapshots, and high performance using flash acceleration. Spectrum Scale is proven at over 3,000 customers handling large datasets for applications such as weather modeling, digital media, and healthcare. It scales to over a billion petabytes and supports file sharing in on-premises, private, and public cloud deployments.
S ss0885 spectrum-scale-elastic-edge2015-v5Tony Pearson
IBM Spectrum Scale offerings include the Spectrum Scale software that you can deploy on your own choice of hardware, Elastic Storage Server and Storwize V7000 Unified pre-built systems.
Snapshots have been a key feature of primary storage infrastructures that IT professionals have relied on for years. But storage systems have traditionally been able to support only a limited number of active snapshots. And snapshots, being pointers and not actual data, are also susceptible to a primary storage system failure. As a result, most IT professionals use snapshots sparingly for protecting data. In this webinar Storage Switzerland and Nexenta show you how primary storage can be architected so that snapshots are able to meet almost all of the data protection requirements an organization has.
Ben Golub gives insight to the latest storage trends including the EMC's latest acquisition of Isilon.
http://blog.gluster.com/2010/11/storage-is-sexy-again/
GPFS (General Parallel File System) is a high-performance clustered file system developed by IBM that can be deployed in shared disk or shared-nothing distributed parallel modes. It was created to address the growing imbalance between increasing CPU, memory, and network speeds, and the relatively slower growth of disk drive speeds. GPFS provides high scalability, availability, and advanced data management features like snapshots and replication. It is used extensively by large companies and supercomputers due to its ability to handle large volumes of data and high input/output workloads in distributed, parallel environments.
IBM general parallel file system - introductionIBM Danmark
The document provides information about IBM's General Parallel File System (GPFS) 3.5 and introduces the GPFS Storage Server (GSS). It summarizes that GPFS is a scalable high-performance file management system that can scale from 1 to 8192 nodes. The GSS is a new storage solution using IBM servers and JBOD storage to provide high capacity and performance storage in a scalable building block approach. The GSS has no storage controllers and provides a single integrated storage solution built on GPFS software.
Introduction to GlusterFS Webinar - September 2011GlusterFS
Looking for a high performance, scale-out NAS file system? Or are you a new user of GlusterFS and want to learn more? This educational monthly webinar provides an introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two.
Analytics with unified file and object Sandeep Patil
Presentation takes you through on way to achive in-place hadoop based analytics for your file and object data. Also give you example of storage integration with cloud congnitive services
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
This document provides an overview of Spectrum Scale 4.1 system administration. It describes the Elastic Storage Server options and components, Spectrum Scale native RAID (GNR), and tips for best practices. GNR implements sophisticated data placement and error correction algorithms using software RAID to provide high reliability and performance without additional hardware. It features auto-rebalancing, low rebuild overhead through declustering, and end-to-end data checksumming.
This document proposes a design for tiered storage in HDFS that allows data to be stored in heterogeneous storage tiers including an external storage system. It describes challenges in synchronizing metadata and data across clusters and proposes using HDFS to coordinate an external storage system in a transparent way to users. The "PROVIDED" storage type would allow blocks to be retrieved directly from the external store via aliases, handling data consistency and security while leveraging HDFS features like quotas and replication policies. Implementation would start with read-only support and progress to full read-write capabilities.
Award winning scale-up and scale-out storage for XenGlusterFS
This webinar discusses the Gluster Virtual Storage Appliance for Xen which packages GlusterFS in a virtual machine container optimized for ease of use with little to no configuration required. The Virtual Appliance seamlessly integrates with existing virtualization environments such as Citrix Xen, allowing you to deploy virtual storage the same way you deploy virtual machines. Deploy on premise to create a private cloud using any certified Xen server hardware platforms and certified storage: JBOD, DAS, or SAN.
The document discusses IBM Spectrum Scale, a software-defined storage solution from IBM. It provides:
1) A family of software-defined storage products including IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum Accelerate, and IBM Spectrum Scale.
2) IBM Spectrum Scale allows storing data everywhere and running applications anywhere. It provides highly scalable, high-performance storage for files, objects, and analytics workloads.
3) The document provides an overview of the IBM Spectrum Scale product and its capabilities for optimizing storage costs, improving data protection, enabling global collaboration, and ensuring data availability, integrity and security.
IBM Spectrum Scale for File and Object StorageTony Pearson
This document discusses IBM Spectrum Scale, which provides universal access to files and objects across data centers. It can scale to support up to 18 quintillion files per file system and 256 file systems per cluster. IBM Spectrum Scale provides high performance, proven reliability, and flexible access to data through various file and object protocols. It can be deployed as software on various systems, as pre-built systems, or as cloud services. The document outlines the various capabilities and uses of IBM Spectrum Scale, such as file management policies, caching, encryption, protocol servers, integration with Hadoop and backup/disaster recovery.
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
This session will cover private and public cloud storage options, including flash, disk and tape, to address the different types of cloud storage requirements. It will also explain the use of Active File Management for local space management and global access to files, and support for file-and-sync.
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...xKinAnx
The document provides instructions for installing and configuring Spectrum Scale 4.1. Key steps include: installing Spectrum Scale software on nodes; creating a cluster using mmcrcluster and designating primary/secondary servers; verifying the cluster status with mmlscluster; creating Network Shared Disks (NSDs); and creating a file system. The document also covers licensing, system requirements, and IBM and client responsibilities for installation and maintenance.
Hadoop and Spark Analytics over Better StorageSandeep Patil
This document discusses using IBM Spectrum Scale to provide a colder storage tier for Hadoop & Spark workloads using IBM Elastic Storage Server (ESS) and HDFS transparency. Some key points discussed include:
- Using Spectrum Scale to federate ESS with existing HDFS or Spectrum Scale filesystems, allowing data to be seamlessly accessed even if moved to the ESS tier.
- Extending HDFS across multiple HDFS and Spectrum Scale clusters without needing to move data using Spectrum Scale's HDFS transparency connector.
- Integrating ESS tier with Spectrum Protect for backup and Spectrum Archive for archiving to take advantage of their policy engines and automation.
- Examples of using the unified storage for analytics workflows, life
This document summarizes the benefits of SoftLayer cloud infrastructure services. It highlights testimonials from customers in the UK and Germany who have improved reliability, reduced development times, and avoided issues like scaling by using SoftLayer. Data shows SoftLayer is nearly three times faster than competitors and provides lower total cost of ownership. SoftLayer offers flexible, reliable cloud services across 28 data centers globally.
IBM Spectrum Scale 4.2.3 provides concise security capabilities including:
1) Secure data at rest through encryption and secure deletion capabilities as well as support for NIST algorithms.
2) Secure data in transit with support for Kerberos, SSL/TLS, and configurable security levels for cluster communication.
3) Role-based access control and support for directory services like Active Directory for authentication and authorization.
4) Secure administration through SSH/TLS for commands and REST APIs, role-based access in the GUI, and limited admin nodes.
5) Additional features like file and object access control lists, firewall support, immutability mode for compliance, and audit logging.
In Place Analytics For File and Object DataSandeep Patil
The document discusses IBM Spectrum Scale's unified file and object access feature. It introduces Spectrum Scale and its support for file and object access. The unified file and object access feature allows data to be accessed as both files and objects without copying, through a single management plane. Use cases like in-place analytics for object data and common identity management across file and object access are enabled. A demo is presented where a file is uploaded as an object, analytics is run on it, and the result downloaded as an object, without data movement.
Ibm spectrum scale_backup_n_archive_v03_ashAshutosh Mate
IBM Spectrum Scale can be used as both the source and destination for backup and archiving. As a source, Spectrum Scale data can be backed up to products like Spectrum Protect, Spectrum Archive, and third-party backup software. As a destination, Spectrum Protect can use Spectrum Scale and ESS storage for storing backed up or archived data, providing scalability, performance, and cost benefits over other solutions. Case studies demonstrate how large enterprises and regional hospital networks have consolidated backup infrastructure and improved availability, capacity, and backup/restore speeds by combining Spectrum Scale and Spectrum Protect.
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsBrillix
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications session by Srini V. Srinivasan at Brillix and Aerospike conference in Israel on June 14, 2017
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...xKinAnx
The document provides an overview of key concepts covered in a GPFS 4.1 system administration course, including backups using mmbackup, SOBAR integration, snapshots, quotas, clones, and extended attributes. The document includes examples of commands and procedures for administering these GPFS functions.
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
In questa sessione HPE e SUSE illustrano con casi reali come HPE Data Management Framework e SUSE Enterprise Storage permettano di risolvere i problemi di gestione della crescita esponenziale dei dati realizzando un’architettura software-defined flessibile, scalabile ed economica. (Alberto Galli, HPE Italia e SUSE)
Introduction to types of cloud storage and overview and comparison of the SoftLayer Storage Services. Topics covered include Block and File offerings"Codename: Prime", Consistent Performance, Mass Storage Servers (QuantaStor), and Backup (EVault, R1Soft), Object Storage (OpenStack Swift), CDN, Data Transfer Service, and Aspera.
GPFS (General Parallel File System) is a high-performance clustered file system developed by IBM that can be deployed in shared disk or shared-nothing distributed parallel modes. It was created to address the growing imbalance between increasing CPU, memory, and network speeds, and the relatively slower growth of disk drive speeds. GPFS provides high scalability, availability, and advanced data management features like snapshots and replication. It is used extensively by large companies and supercomputers due to its ability to handle large volumes of data and high input/output workloads in distributed, parallel environments.
IBM general parallel file system - introductionIBM Danmark
The document provides information about IBM's General Parallel File System (GPFS) 3.5 and introduces the GPFS Storage Server (GSS). It summarizes that GPFS is a scalable high-performance file management system that can scale from 1 to 8192 nodes. The GSS is a new storage solution using IBM servers and JBOD storage to provide high capacity and performance storage in a scalable building block approach. The GSS has no storage controllers and provides a single integrated storage solution built on GPFS software.
Introduction to GlusterFS Webinar - September 2011GlusterFS
Looking for a high performance, scale-out NAS file system? Or are you a new user of GlusterFS and want to learn more? This educational monthly webinar provides an introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two.
Analytics with unified file and object Sandeep Patil
Presentation takes you through on way to achive in-place hadoop based analytics for your file and object data. Also give you example of storage integration with cloud congnitive services
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
This document provides an overview of Spectrum Scale 4.1 system administration. It describes the Elastic Storage Server options and components, Spectrum Scale native RAID (GNR), and tips for best practices. GNR implements sophisticated data placement and error correction algorithms using software RAID to provide high reliability and performance without additional hardware. It features auto-rebalancing, low rebuild overhead through declustering, and end-to-end data checksumming.
This document proposes a design for tiered storage in HDFS that allows data to be stored in heterogeneous storage tiers including an external storage system. It describes challenges in synchronizing metadata and data across clusters and proposes using HDFS to coordinate an external storage system in a transparent way to users. The "PROVIDED" storage type would allow blocks to be retrieved directly from the external store via aliases, handling data consistency and security while leveraging HDFS features like quotas and replication policies. Implementation would start with read-only support and progress to full read-write capabilities.
Award winning scale-up and scale-out storage for XenGlusterFS
This webinar discusses the Gluster Virtual Storage Appliance for Xen which packages GlusterFS in a virtual machine container optimized for ease of use with little to no configuration required. The Virtual Appliance seamlessly integrates with existing virtualization environments such as Citrix Xen, allowing you to deploy virtual storage the same way you deploy virtual machines. Deploy on premise to create a private cloud using any certified Xen server hardware platforms and certified storage: JBOD, DAS, or SAN.
The document discusses IBM Spectrum Scale, a software-defined storage solution from IBM. It provides:
1) A family of software-defined storage products including IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum Accelerate, and IBM Spectrum Scale.
2) IBM Spectrum Scale allows storing data everywhere and running applications anywhere. It provides highly scalable, high-performance storage for files, objects, and analytics workloads.
3) The document provides an overview of the IBM Spectrum Scale product and its capabilities for optimizing storage costs, improving data protection, enabling global collaboration, and ensuring data availability, integrity and security.
IBM Spectrum Scale for File and Object StorageTony Pearson
This document discusses IBM Spectrum Scale, which provides universal access to files and objects across data centers. It can scale to support up to 18 quintillion files per file system and 256 file systems per cluster. IBM Spectrum Scale provides high performance, proven reliability, and flexible access to data through various file and object protocols. It can be deployed as software on various systems, as pre-built systems, or as cloud services. The document outlines the various capabilities and uses of IBM Spectrum Scale, such as file management policies, caching, encryption, protocol servers, integration with Hadoop and backup/disaster recovery.
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
This session will cover private and public cloud storage options, including flash, disk and tape, to address the different types of cloud storage requirements. It will also explain the use of Active File Management for local space management and global access to files, and support for file-and-sync.
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...xKinAnx
The document provides instructions for installing and configuring Spectrum Scale 4.1. Key steps include: installing Spectrum Scale software on nodes; creating a cluster using mmcrcluster and designating primary/secondary servers; verifying the cluster status with mmlscluster; creating Network Shared Disks (NSDs); and creating a file system. The document also covers licensing, system requirements, and IBM and client responsibilities for installation and maintenance.
Hadoop and Spark Analytics over Better StorageSandeep Patil
This document discusses using IBM Spectrum Scale to provide a colder storage tier for Hadoop & Spark workloads using IBM Elastic Storage Server (ESS) and HDFS transparency. Some key points discussed include:
- Using Spectrum Scale to federate ESS with existing HDFS or Spectrum Scale filesystems, allowing data to be seamlessly accessed even if moved to the ESS tier.
- Extending HDFS across multiple HDFS and Spectrum Scale clusters without needing to move data using Spectrum Scale's HDFS transparency connector.
- Integrating ESS tier with Spectrum Protect for backup and Spectrum Archive for archiving to take advantage of their policy engines and automation.
- Examples of using the unified storage for analytics workflows, life
This document summarizes the benefits of SoftLayer cloud infrastructure services. It highlights testimonials from customers in the UK and Germany who have improved reliability, reduced development times, and avoided issues like scaling by using SoftLayer. Data shows SoftLayer is nearly three times faster than competitors and provides lower total cost of ownership. SoftLayer offers flexible, reliable cloud services across 28 data centers globally.
IBM Spectrum Scale 4.2.3 provides concise security capabilities including:
1) Secure data at rest through encryption and secure deletion capabilities as well as support for NIST algorithms.
2) Secure data in transit with support for Kerberos, SSL/TLS, and configurable security levels for cluster communication.
3) Role-based access control and support for directory services like Active Directory for authentication and authorization.
4) Secure administration through SSH/TLS for commands and REST APIs, role-based access in the GUI, and limited admin nodes.
5) Additional features like file and object access control lists, firewall support, immutability mode for compliance, and audit logging.
In Place Analytics For File and Object DataSandeep Patil
The document discusses IBM Spectrum Scale's unified file and object access feature. It introduces Spectrum Scale and its support for file and object access. The unified file and object access feature allows data to be accessed as both files and objects without copying, through a single management plane. Use cases like in-place analytics for object data and common identity management across file and object access are enabled. A demo is presented where a file is uploaded as an object, analytics is run on it, and the result downloaded as an object, without data movement.
Ibm spectrum scale_backup_n_archive_v03_ashAshutosh Mate
IBM Spectrum Scale can be used as both the source and destination for backup and archiving. As a source, Spectrum Scale data can be backed up to products like Spectrum Protect, Spectrum Archive, and third-party backup software. As a destination, Spectrum Protect can use Spectrum Scale and ESS storage for storing backed up or archived data, providing scalability, performance, and cost benefits over other solutions. Case studies demonstrate how large enterprises and regional hospital networks have consolidated backup infrastructure and improved availability, capacity, and backup/restore speeds by combining Spectrum Scale and Spectrum Protect.
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsBrillix
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications session by Srini V. Srinivasan at Brillix and Aerospike conference in Israel on June 14, 2017
Ibm spectrum scale fundamentals workshop for americas part 8 spectrumscale ba...xKinAnx
The document provides an overview of key concepts covered in a GPFS 4.1 system administration course, including backups using mmbackup, SOBAR integration, snapshots, quotas, clones, and extended attributes. The document includes examples of commands and procedures for administering these GPFS functions.
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
In questa sessione HPE e SUSE illustrano con casi reali come HPE Data Management Framework e SUSE Enterprise Storage permettano di risolvere i problemi di gestione della crescita esponenziale dei dati realizzando un’architettura software-defined flessibile, scalabile ed economica. (Alberto Galli, HPE Italia e SUSE)
Introduction to types of cloud storage and overview and comparison of the SoftLayer Storage Services. Topics covered include Block and File offerings"Codename: Prime", Consistent Performance, Mass Storage Servers (QuantaStor), and Backup (EVault, R1Soft), Object Storage (OpenStack Swift), CDN, Data Transfer Service, and Aspera.
SIS Storage Services offers various managed storage services including storage management, backup/recovery, data protection, replication, and archiving. As a Storage Service Provider, SIS offers storage space and management services with options for pure-play or traditional storage, capacity-on-demand or utility storage, and on-site or off-site hosting. SIS aims to address customer needs around reduced management overhead, regulatory compliance, data sharing, high availability, and quick storage provisioning.
#MFSummit2016 Operate: The race for spaceMicro Focus
The Race for Space: File Storage Challenges and Solutions Facing escalating storage requirements? Being held to ransom by your vendors? Would secure, scalable, highly-available and cost-effective file storage that works with your current infrastructure help? Micro Focus and SUSE could help. Presenters: David Shepherd, Solutions Consultant, Micro Focus and Stephen Mogg, Solutions Consultant SUSE
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)Michael Fork
Introduction to SoftLayer's Storage Services. Topics covered include Block and File offerings Endurance, Performance, Mass Storage Servers (QuantaStor), and Backup (EVault, R1Soft), Object Storage (OpenStack Swift), CDN, Data Transfer Service, and Aspera.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
This document provides an overview of various data storage technologies including RAID, DAS, NAS, and SAN. It discusses RAID levels like RAID 0, 1, 5 which provide data striping and redundancy. Direct attached storage (DAS) connects directly to servers but cannot be shared, while network attached storage (NAS) uses file sharing protocols over IP networks. Storage area networks (SAN) use dedicated storage networks like Fibre Channel and iSCSI to provide block-level access to consolidated storage. The key is choosing the right solution based on capacity, performance, scalability, availability, data protection needs, and budget.
This document discusses storage virtualization techniques. It covers what can be virtualized (file system and block levels), where virtualization can occur (host-based, network-based, storage-based), and how virtualization is implemented (in-band and out-of-band). Examples of storage virtualization include logical volume management (LVM) on Linux hosts, SAN volume controllers, and virtualization features in disk arrays. Key benefits are improved manageability, availability, scalability and security of storage resources.
With AWS, you can choose the right storage service for the right use case. Given the myriad of choices, from object storage to block storage, this session will profile details and examples of some of the choices available to you, with details on real world deployments from customers who are using Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), Amazon Glacier, and AWS Storage Gateway.
File virtualization technology simplifies access to consolidated file storage, provides flexibility in data placement, and optimizes resource utilization. It breaks the constraints of today's complex, inflexible, and inefficient storage infrastructures by decoupling data access from physical storage locations. Intelligent policies can then automate data movement based on file classification and business value to optimize costs and performance by matching data with the most appropriate storage tier.
This document discusses Hitachi's Unified Storage (HUS) and Hitachi NAS Platform (HNAS) solutions for file storage. It summarizes that these solutions provide high performance, scalability, and efficiency to help organizations consolidate more file data using less storage. This allows organizations to reduce costs through features like deduplication while improving productivity. The solutions include a range of models and flexibility to address various workload sizes and requirements.
002-Storage Basics and Application Environments V1.0.pptxDrewMe1
Storage Basics and Application Environments is a document that discusses storage concepts, hardware, protocols, and data protection basics. It begins by defining storage and describing different types including block storage, file storage, and object storage. It then covers basic concepts of storage hardware such as disks, disk arrays, controllers, enclosures, and I/O modules. Storage protocols like SCSI, NVMe, iSCSI, and Fibre Channel are also introduced. Additional concepts like RAID, LUNs, multipathing, and file systems are explained. The document provides a high-level overview of fundamental storage topics.
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
AWS gives designers of enterprise storage systems a completely new set of options. Aimed at enterprise storage specialists and managers of cloud-integration teams, this session gives you the tools and perspective to confidently integrate your storage workloads with AWS. We show working use cases, a thorough TCO model, and detailed customer blueprints. Throughout we analyze how data-tiering options measure up to the design criteria that matter most: performance, efficiency, cost, security, and integration.
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
This document provides an overview of IBM Spectrum Scale's Active File Management (AFM) capabilities and use cases. AFM uses a home-and-cache model to cache data from a home site at local clusters for low-latency access. It expands GPFS' global namespace across geographical distances and provides automated namespace management. The document discusses AFM caching basics, global sharing, use cases like content distribution and disaster recovery. It also provides details on Spectrum Scale's protocol support, unified file and object access, using AFM with object storage, and configuration.
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
Introduction to Spectrum Scale Active File Management (AFM)
and its use cases. Spectrum Scale Protocols - Unified File & Object Access (UFO) Feature Details
AFM + Object : Unique Wan Caching for Object Store
This document introduces scale-out file servers and storage solutions from Microsoft. It discusses how Storage Spaces provides storage tiering across SSDs and HDDs for optimized data placement. It also describes how clustered file servers using Storage Spaces can provide high availability, data deduplication, and flexible storage for Hyper-V clusters. The document then introduces StorSimple hybrid storage solutions that provide efficient primary storage, archival storage in the cloud, and automated offsite data protection using cloud snapshots.
Data is gravity. Your workloads and processing is dependent on where your data is and how it is stored. With AWS, you have a host of storage options and the key to successfully leverage them is to know when to use which option. This session will explain in details about each of the AWS Storage offerings along with data ingestion optins into the Cloud using Snowball and Snowmobile
Marc Trimuschat,
Head - Business Developement, AWS Storage, AWS APAC
This document provides an overview and agenda for a presentation on Dell storage solutions for mid-market organizations. It discusses Dell Storage and Fluid Data Architecture, provides a deep dive on the Dell PowerVault MD3 and Dell EqualLogic storage arrays, and covers storage tools. Key points include Dell's vision for making data fluid by optimizing storage across primary, offsite, backup and cloud storage. It also summarizes features and benefits of the Dell PowerVault MD3 such as scalability, performance, availability, manageability and reliable data protection capabilities like dynamic disk pools and remote replication.
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
Virtualizing big data platforms like Hadoop provides organizations with agility, elasticity, and operational simplicity. It allows clusters to be quickly provisioned on demand, workloads to be independently scaled, and mixed workloads to be consolidated on shared infrastructure. This reduces costs while improving resource utilization for emerging big data use cases across many industries.
Similar to Elastic storage in the cloud session 5224 final v2 (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
2. Agenda
• What is IBM Spectrum Scale (a.k.a. Elastic Storage)?
• IBM Spectrum Scale Key Features
• Real World Use Cases
1
3. Software Defined Storage
• Software defined storage is enterprise class storage that uses
standard commodity hardware with all the important storage and
management functions performed in intelligent software
• Software defined storage delivers automated, policy-driven,
application-aware storage services through orchestration of the
underlining storage infrastructure in support of an overall
software defined environment (SDE)
4. Software Defined Storage Benefits
• Save acquisition costs by using standard servers and storage
instead of expensive, special purpose hardware
• Realize extreme scale and performance through linear, building
block, scale-out
• Increase resource and operational efficiency by pooling
redundant isolated resources and optimizing utilization
• Achieve greater IT agility by being able to quickly react,
provision and redeploy resources in response to new
requirements
• Lower data management costs through policy driven
automation and tiered storage management
5. What is Elastic Storage?
Elastic Storage is the Infrastructure for
Global Data Management
Virtualized Access to Data
Elastic Storage
Software Virtualized, Centrally deployed,
managed, backed up and
grown
Clustered file system, all nodes
access data.
Seamless capacity and
performance scaling
6. Elastic Storage Strengths
Maximum file system size: 299
bytes : 1 Million Yottabytes
263 ~ 9 quintillion files per file
system
Maximum file size equals file
system size
Customers with 18 PB file
systems
IPv6
Future proof
Commodity hardware
Extreme Scalability
Snapshots, replication
Declustered RAID
Built-in heartbeat, automatic
failover/failback
Add/remove nodes and
storage on the fly
Rolling upgrades
End-to-end data integrity
Administer from any node
Commodity hardware
Proven Reliability
Parallel file access
Distributed, scalable, high
performance metadata
Flash acceleration
Automatic tiering
Over 400 GB/s
Commodity hardware
High Performance
7. Global Name Space
Computation & Analytics
Local Data Access
Site B
Site C
Site C
Single site +
Multi-site +
Global Data Access
Tape
Automatic
Tiering +
Migration
GPFS Storage + Server Cluster
Data
Ingest
DISK
FLASH
SSD
INTERNAL DISK
HADOOP
FILES
RDBMS
HADOOP FILES
RDBMS
High Data Output
Life Sciences
Research
Energy
Financial Services
FAILOVER
Elastic Storage: A deployment scenario
9. • Elastic Storage software is a single scale-out data plane for the entire data center
• Unifies storage for VMs, analytics, objects, HPC, and file serving
• Single name space view no matter where data resides
• De-clustered parity - Software RAID for commodity storage with no HW RAID
• Abstracts storage “pools” across various back-end storage (direct attached, SAN-
based, JBOD arrays, integrated storage
• Data in best location, on the best tier (performance & cost), at the right time
ICStore and AFM for data movement across cloud geographies
IBM Spectrum Scale
Active
File
Manager
ES
ES
ES
POSIX
NFS
SMB3
iSCSI
Hadoop
OpenStack
Cinder
Manila Glance
Nova
VMWare
VASAvVols
VAAI vSphere
File Sharing Object
Access
Open
Stack
Swift
Virtualization
Analytics
Single Namespace
ILM from Flash to Cloud
SSD Fast
Disk
Tape
Optical
DVD
SAN SW
RAID
Cloud
StorageICStore
Gateway
IBM Spectrum Scale: Cloud Data Plane Vision
10. Elastic Storage Cluster Models
Symmetric
cluster with
direct-attached
disks
(FPO)
Dedicated storage
Nodes
(ESS)SAN storage
Global namespace with combination of multiple cluster models is also possible
11. Elastic Storage parallel architecture
Clients use data, Network Storage
Devices (NSDs) serve shared data
All NSD servers export to all clients in
active-active mode
Elastic Storage stripes files across
NSD servers and NSDs in units of file-
system block-size
NSD client communicates with all the
servers
File-system load spread evenly across
all the servers and storage. No
HotSpots
Easy to scale file-system capacity and
performance while keeping the
architecture balanced
NSD Client does real-time parallel I/O
to all the NSD servers and storage
volumes/NSDs
File stored in blocks
12. Data ingestion
or creation
Data
processing
Access Archival
High Performance
Disk Tier
Flash, SSD, SAS
Parallel Access
Provide highest
performance for most
demanding
applications
High volume storage
Single Global Name
Space across all
tiersLower costs by
allocating the right tier
of storage to the right
need
Archival storage with low
cost disk or tape
Integration with Tivoli
Storage Manager/LTFS
Policy based Archival
and remote Disaster
Recovery
Manage the full data lifecycle cost effectively
13. 12
Leverage the Metadata for Management Purposes
• Use the Right Tool for the Right Job
• Average utilization >80%
• Automated tiered storage
• Policy driven file movement between tiers
• Store petabytes of storage on terabytes of disk (inactive files are auto-migrated to tape)
• Migration as granular as a ‘per file’ basis
$ $ $ $
High
Capacity
HDD
SSDFlashSystems
LUNs
Storage
pools
Tier 1 Tier 2
14. … … …>
….. ….. ...
Elastic Storage Virtualization
/home/appl/data/web/important_big_spreadsheet.xls
/home/appl/data/web/big_architecture_drawing.ppt
/home/appl/data/web/unstructured_big_video.mpg
/home
/appl
/data
/web
/home/appl/data/web/important_big_spreadsheet.xls
/home/appl/data/web/big_architecture_drawing.ppt
/home/appl/data/web/unstructured_big_video.mpg
Policy Engine
Virtualized Global Namespace
Pool 1: FlashSystems Pool 2: Solid State Drives Pool 3: Nearline SAS Drives
GPFS
Nodes
GPFS
Nodes
GPFS
Nodes
Storage
Controllers
Storage
Controllers >
Storage
Controllers
LogicalPhysical
15. Elastic Storage: Flash for optimization
• Solid State Drives for Metadata
• Metadata includes directories, inodes, indirect blocks
• All Metadata will fit in relatively few SSDs (metadata typically 1% of total storage)
• Solid State Drives for Data Caching (LROC)
• Extend the page pool memory to include SSD for read caching
– Writes invalidate cache and are consistent across nodes
• Highly Available Write Cache (HAWC)
• Data is written to GPFS recovery log and committed by forcing the log to Flash
• As log blocks fills, rewrite to home location
– Relatively small amount NVDIMM required to maximize b/w to disk
– Recovery log will be b/w bound not IOP bound
• Solves problems with small writes
• Select data for highest tiers based on the file’s “heat”
• On-line SSD, SAS, SATA,
• Off-line TAPE, Cloud,
• GPFS Policy transparently moves data between on-line and off-line storage
– File name & directory do not indicate where the data resides
• File system tracks access to file data and computes “temperature” for data
• Tracking is done at the file level and assumes uniform access to file data
• Users define policy rules for migration & choose time for execution
EXTERNAL POOL 'bronze' EXEC ‘/var/mmfs/etc/GlueCode’
RULE 'DefineTiers' GROUP POOL 'TIERS'
IS 'gold' LIMIT(80)
THEN 'silver' LIMIT(90)
THEN 'bronze'
RULE 'Rebalance' MIGRATE FROM POOL 'TIERS' TO POOL 'TIERS' WEIGHT(FILE_HEAT)
– WEIGHT(FILE_HEAT)
16. Local Read-Only Cache (LROC) Architecture
• Cache data and directory blocks
on client side of switch
• Lowest latency (close to consumer)
• Cached data must be treated as volatile
– Read-only cache
– Synchronous write-through to disk
• Largest benefit from caching metadata
• Extends node’s buffer pool
• Cheaper than adding more DRAM -> more capacity
• Buffer priority/LRU list maintained like standard memory
buffers
• Cache consistency insured by standard GPFS byte-range
tokens
– Remote caching is very hard for block devices to do
• Hot swappable SSD
• Increase/decrease LROC space while file system is on-
line
• Dynamic configuration – file system stays on-line
GPFS Client nodesLROC
SSD SSD SSD
Add 500GBs of SSD to
each interface node
17. 16
LROC Flash Cache Example Speed Up
• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~
5,000 IOPS
• As more data is cached in Flash, client performance increases to 32,000 IOPS while reducing the load
on the disk subsystem by more than 95%
~ 5,000 IOPS 10K RPM SAS Drives
~ 32,000 IOPS Flash SSD
~ 6x
• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk storage system
18. Highly-Available Write Cache (HAWC) GPFS Client Side
• Place both metadata and small writes in recovery log,
which is now stored in fast storage (NVRAM) such as
Flash-backed DIMMs, Fast SSDs
• Log small writes and send large writes directly to disk
• Scale linearly with number of nodes
• Each file system and server has its own recovery log
• On node failure, quickly run recovery log to recover file system
• Designed to handle *bursts* of small I/O requests
• Optimize write requests to slow disk
• Allow aggregation of small writes into more efficient large write requests
to storage
• Workloads that will benefit include VMs, DBs, Logging
• GPFS Flash optimizations
• Creating a “Hot File” policy with a fast storage pool improves HAWC steady state
performance
1. Write data to HAWC
2. Write data to fast storage
3. Down tier to slow storage
19. HAWC Configuration 1: Store in Fast Storage
sync
write
Page
Pool
GPFS Client
Fast
storage
Recovery
Log M
M
2. Write to primary storage
GPFS Storage
SSD Fast
Disk
Slow
Disk
Tape
1. Store in fast storage
• Store small data write requests in recovery log on fast storage
• Fast storage could be a separate system (FAS840) or integrated (SSDs in a
storage server or V7K)
• Once recovery log is full, or memory is low, write data to primary storage
system
20. HAWC Configuration 2: Replicate Across GPFS
Clients
• Replicate small data write requests in recovery log on fast
storage device
• Once recovery log is full, or memory is low, write data to primary
storage system
Page
Pool
sync
write
GPFS Client
Recovery
Log
Page
Pool
GPFS Client
Recovery
Log
2. Write to
storage
M M M
M - Data
- RAM
- NVRAM
GPFS Storage
SSD Fast
Disk
Slow
Disk
Tape
1. Replicate across clients
21. Elastic Storage: Active File Management (AFM) for Global
Data Mobility
Global WAN Caching
removes latency effects
Protocols
CIFS
NFS
HTTP
FTP
SCP
Management
Central
Administration
Monitoring
File Mgmt
Availability
Data Migration
Replication
Backup
GPFS/AFM
Protocols
CIFS
NFS
HTTP
FTP
SCP
Management
Central
Administration
Monitoring
File Mgmt
Availability
Data Migration
Replication
Backup
GPFS/AFM
Network
Protocols
CIFS
NFS
HTTP
FTP
SCP
Manage
ment
Central
Administra
tion
Monitoring
File Mgmt
Availabili
ty
Data
Migration
Replication
Backup
GPFS/AFM
Protocols
CIFS
NFS
HTTP
FTP
SCP
Manage
ment
Central
Administra
tion
Monitoring
File Mgmt
Availabili
ty
Data
Migration
Replication
Backup
GPFS/AFM
Geo-dispersed
Replicas
“Global Namespace, not
just “Common” Namespace
Data Center
Ownership and Relationships all
done on a fileset boundary
Data Migration/Ingest
from legacy NAS
22. Global Namespace and Caching with AFM
4-21
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Clients access:
/global/data1
/global/data2
/global/data3
/global/data4
/global/data5
/global/data6
Cache Filesets:
/data1
/data2
Local Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6
File System: store1
Local Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6
File System: store2
Cache Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Local Filesets:
/data5
/data6
File System: store3
Map local fileset to any remote export
See all data from any Cluster
Cache as much data as required or fetch
data on demand
All VFS operations trapped at the cache
cluster to be transparent to all application
Multi writer support (Independent Writer)
Multi protocol support (NFS, GPFS RPC)
Multiple end point support (legacy filer,
other POSIX local filesystems
Multi node parallel read/write support
WAN performance improvements: Aspera
transport
GPFS
GPFS
GPFS
23. IBM Spectrum Scale RAID: De-clustered RAID
• ESS : Elastic Storage Server with GPFS Native RAID (De-clustered
RAID)
‒ Data and parity stripes are uniformly partitioned and distributed across array
‒ Rebuilds that take days on other systems, take minutes on Elastic Storage
• 2-fault and 3-fault tolerance
‒ Reed-Solomon parity encoding; 2-fault or 3-fault tolerant
‒ 3 or 4-way mirroring
• End-to-end checksum & dropped write detection
‒ From disk surface to Elastic Storage user / client
‒ Detects and corrects off-track and lost / dropped disk writes
• Asynchronous error diagnosis while affected I/Os continue
‒ If media error: verify and restore if possible
‒ If path problem: attempt alternate paths
• Supports live replacement of disks
‒ I/O operations continue for tracks whose disks are removed during service
24. Declustered RAID Example
7 disks3 groups
6 disks
spare
disk
21 stripes
(42 strips)
49 strips
7 stripes per group
(2 strips per stripe)
7 spare
strips
3 1-fault-tolerant
mirrored groups
(RAID1)
25. Elastic Storage: Native Encryption and Secure Erase
Encryption of data at rest
• Files are encrypted before they are stored on disk
• Master Keys are never written to file-system disks
• Protects data from security breaches,
unauthorized access, and being lost, stolen or
improperly discarded
• Complies with NIST SP 800-131A and is FIPS
140-2 certified
• Supports HIPAA, Sarbanes-Oxley, EU and
national data privacy law compliance
Secure deletion
• Ability to destroy arbitrarily large subsets of a file
system
• No “digital shredding”, no overwriting: secure
deletion is a cryptographic operation
26. Elastic
Storage
Hadoop
Connector
GPFS Storage Server Cluster
Elastic Storage: Application data access
Cinder Swift
Single software defined storage solution across all these application types
Linear capacity &
performance scale out
Enterprise storage on
commodity hardware
Single Name Space
Technical Computing Big Data & Analytics Cloud
Block Object
Elastic
Storage NFS
POSIX
File
27. Elastic Storage :HDFS
• Elastic Storage Hadoop connector
• Supports IBM Big Insights Analytics and
Apache Hadoop
• Existing infrastructure can do Hadoop-based
Analytics
‒ No need to purchase a dedicated Analytics
infrastructure, lowering CAPEX and OPEX
• No need to move data in and out of an
Analytics dedicated silo
‒ Speeds results
• Enterprise-class protection and efficiency
‒ Full data lifecycle management
‒ Policy based tiering from flash to disk to tape
• Reduce cost, simplify management
Compute Cluster
GPFS Storage Server Cluster
HDFS
28. GPFS Hadoop Connector Overview
FileSystem API abstracts
the file system interface
Distributed file system: GPFS Distributed file system: HDFS
Map/Reduce API
Hadoop FS APIs
Applications
The reason why
applications don’t see
a difference between
GPFS and HDFS
Higher-level languages:
Hive, BigSQL JAQL, Pig …
Higher level languages
abstract application from
Map/Reduce API
Applications
GPFS Hadoop
Connector
29. Category Features GPFS HDFS
Enterprise
readiness
POSIX compliance Full support Limited support
Meta-data replication
Triplication, scale-out metadata
management for many years
High Availability for HDFS since v2.2
Access protocols
Data lake with rich access protocols: file,
NFSv3/v4, CNFS, FTP, M/R, object, block, etc.
M/R, NFSv3 gateway (no HA yet) since
v2.2
Protection &
Recovery
Snapshot Yes, mature feature for years Yes since v2.2
Geographical
Distribution, DR
Yes (AFM) No
Backup Scalable, Fast Recovery, TSM No
Tape integration LTFS, DMAPI, HPSS, TSM No
Storage
Efficiency &
Cost
Optimization
Erasure code
GPFS Native RAID for storage efficiency,
and end-to-end data availability, reliability,
and integrity in tier-1 storage
Non-tier-1 storage implementations, for
data archive
Heterogeneous
storage pools
Yes for many years, with policy driven
ILM capability, no application
modification needed
Yes since v2.3/2.6, API driven. Need to
modify applications
Variable block sizes
Variable block sizes – suited to multiple
types of data and data access patterns
Large block-sizes – poor support for
small files
Workload
Access pattern All, update in place Write-once-read-many, append only
Preferred workload Covers the whole spectrum Write-once-read-many app, large files
High-Level Comparison of GPFS and HDFS
30. Category Features GPFS HDFS
Privacy and
Security
Encryption Encryption and Secure Erase Encryption for data at rest since v2.6
Access Control Lists ACL for years ACL support since v2.4
Data Retention Immutability and retention features No
Ease of Use
Policy based ILM Policy driven, automatic API driven, no policy and automation
Fine grained policy
control
Fileset, user, group, file, etc. No
Disk maintenance &
replacement
Yes, GNR/GSS disk management
features, like disk LED control, etc.
No
Rolling upgrades
Yes for many years Yes since v2.4 with limitations like down
time in rollback & downgrade, etc.
User defined node
classes
Yes No
Flexible
Architecture
Server x86, Power x86
OS Linux (x/p), AIX, Windows Linux
SSD as dynamic read
and write cache
LROC, HAWC No
Hybrid storage
architecture
External shared storage, GSS, server
internal storage
Server internal storage
High-Level Comparison of Elastic Storage and HDFS (cont.)
31. • OpenStack Havana release includes a Elastic Storage Cinder driver
• Giving architects access to the features and capabilities of the industry’s leading
enterprise scale-out software defined storage
• With OpenStack on Elastic Storage, all nodes see all data
• Copying data between services, like Glance to Cinder is minimized
or eliminated
• Speeding instance creation and conserving storage space
• Rich set of data management and information lifecycle features
• Efficient file clones
• Policy based automation optimizing data placement for locality or performance tier
• Backup
• Industrial strength reliability, minimizing risk
• Elastic Storage Cinder driver provides resilient block storage, minimal
data copying between services, speedy instance creation and efficient
space utilization
Elastic Storage and OpenStack : Cinder Driver
32. Elastic Storage Object: OpenStack Swift
31
Challenge
The world is not object today!
and never will be completely…
Inefficient NAS “copy and change” gateways
Primary Use Cases
1.Single Management Plane
Manage file and object within single system
2.Create and Share
Create data via file interface and share globally using object
interface
3.Sync/Archive and Analyze
Ingest through object and analyze data (Hadoop) using file
interface
Collaborating with open-source community on
SwiftOnFile StackForge project
Elastic Storage
Object
SSD Fast
Disk
Slow
Disk
Tape
33. 32
Elastic Storage Object: Design and Benefits
• Only configuration changes required
• Place Swift Proxy, Swift Object Service, and GPFS client
on all nodes
• Object Ring
• Set only a single object server per ring
• Proxy only contacts local object service daemon
• Object-replicator
• Run infrequently to clean up tombstone and deleted files
• Object-auditor (disk scrubbing)
• Compare file-level checksum in xattr with data on disk
• Do not run
– Leverage GSS checksums and disk scrubbing/auditing
– Leverage ‘immutability’ bits to prevent changes
• Swift Virtual Devices and Partitions
• Create a ‘reasonably’ sized directory tree depending on
expected number of objects
• Currently focus on shared storage GPFS deployment
• E.g., GSS, NSD servers,=SAN, but will work with FPO
• Keystone Authentication
• Integration into LDAP/AD
Proxy
Service
HTTP Swift
Requests
GPFS
Object
Nodes
Load Balancer
Storage
Network
..Object
Service
GPFS
Geo-Distributed GPFS Object Store
SSD Fast
Disk
Slow
Disk
Tape
Keystone
Authentication
Service
Swift Services
Proxy
Service
Object
Service
GPFS
Additional
Services in
Cluster
Memcached
34. In Summary: Why Elastic Storage?
HPC
Big Data
CloudFile
Sharing
Compute
Network
Storage
Compute
Network
Storage
Compute
Network
Storage
Compute
Network
Storage
IT infrastructure silos lead to:
Rigid and manual assignment of redundant IT resources
Low utilization
Needless data movement and copying
35. Why Elastic Storage?
Smart Storage
Versatile + flexible +“silo-less”
High performance + scaling
Low TCO + easy to manage
Reliable + proven
Advanced features: all data
37. Elastic Storage in Cloud Deployments
1. IBM Spectrum Scale Cloud Services (on IBM
SoftLayer)
2. University of Birmingham
3. eMedLab-UK
4. SuperVessel Cloud for Open Power
38. More storage capability - Easily meet additional
resource demands without the cost of purchasing
or managing in-house infrastructure
Lower risks and upfront costs – increase storage
incrementally as needed. No more guessing how
much you will need 3 years from now.
Secure - Ensure data security with physical
isolation of storage and networks on the cloud
Easy public cloud adoption - Minimize
administrative burden with fully supported, ready-
to-run software defined storage in the cloud
What’s new
• Elastic Storage delivered as a service, bringing high performance, scalable storage and integrated data
governance for managing large amounts of data and files in the cloud
• Deployed on dedicated bare metal resources at a named data center for optimal I/O performance & security
• Optimized for technical computing & analytics workloads
• Installed, integrated & administered by skilled Cloud Ops team
A complete, application-ready cluster in the cloud,
optimized for technical computing & analytics
Use Case #1: Elastic Storage on SoftLayer Cloud
Platform LSF
(SaaS)
Platform
Symphony
(SaaS)
SoftLayer, an IBM Company
bare metal infrastructure
24X7 CloudOps Support
Elastic Storage (GPFS) on Cloud
IBM® Platform Computing™ Cloud Service
39. Non-shared storage paradigm
Private VLAN
Platform LSF® Master
& Platform Application
Center Server
Compute Nodes + GPFS clients
NSD Servers
Replication
Elastic Storage Cluster
Elastic Storage servers and storage are isolated inside each organization's
private VLAN no sharing for maximum security
Elastic Storage on Cloud is a fully integrated solution that includes server and client
licenses, installation, support & maintenance of the Elastic Storage environment
40. Terabytes
1980 1990 2000 2010
Gigabytes
Petabytes
1000 X
1000 X
Exabytes
Zettabytes
1000 X
1000 X
Storage budgets increase only 1-5%
2014
Storage Requirements are Devouring CAPEX and
OPEX Resources
1) Easy to
manage at
scale
2) Data lifecycle
automation
3) Commodity
hardware
Data doubles approximately
every 2 years Elastic Storage
closes this gap 3
ways
41. Use Case #2: CLIMB - University of Birmingham
• CLIMB project (Cloud Infrastructure for Microbial Bioinformatics)
• Funded by Medical Research Council (MRC) : ~£8m (~$13M) grant
• Four partner Universities
– Birmingham
– Cardiff
– Swansea
– Warwick
• CLIMB goal is to develop and deploy a world leading cyber
infrastructure for microbial bioinformatics.
• Private cloud, running 1000 VMs over 4 sites
40
43. CLIMB Specs
• Private cloud, running 1000 VMs over 4 sites
• Separate OpenStack region per site, with a single gateway for access
• Local GPFS high performance (~0.5PB per site)
• Storage cluster replicated across sites
• Takes advantage of the GPFS driver for Cinder in OpenStack
• Nova Compute, Swift, and Glance use GPFS directly
GPFS magic sauce & OpenStack
• Swift : Object storage, separate file-set
• Glance : Image service (where we store VM images)
• Cinder : Volume (block disk service)
• Share file-set with Cinder, using file-clone to create images
• Nova compute: The bit that runs on the Hypervisor servers
• Point Nova compute at GPFS No GPFS magic, its just ‘local’ storage for NOVA to
use
• It’s a shared file-system so can live migrate
• Normal GPFS storage so can use RDMA
• Will LROC improve performance here?
42
44. CLIMB : GPFS @UoB
• GPFS @UoB
• BlueBEAR – Linux HPC running over FDR-10
• Research Data Store – multi-data centre, replicated, HA failover system for bulk data
for research projects
• Hadoop?
• CLIMB : Future work
• Tune GPFS environment – any thoughts?
• Add local SSDs to enable LROC for nova-compute nodes?
• AFM to replicate glance across sites
• Integrate OpenStack environment with GPFS and CEPH storage
• Contact : Simon Thompson, Research Computing Team, University of
Birmingham, England, UK
• S.J.Thompson@bham.ac.uk
• www.roamingzebra.co.uk (shameless blog plug)
• Project: www.climb.ac.uk
• Twitter: @MRCClimb
43
45. Use Case #3: eMedLab
• Background
• Funded by Medical Research Council for 5 years
• UK research council focusing on health, budget of £850M ($1.3B) in
2013/14
• Similar to NIH in the US
• Allocation of £6.8M ($10M) for capital equipment (including hosting and
power costs)
• Medical Bioinformatics: Data-Driven Discovery for Personalized
Medicine
• Objective of creating an off-site data center
• This resource will allow scientists to analyze human genome
data and medical images, together with clinical and other
physiological and social data, for the benefit of human health.
44
46. eMedLab Objectives
• To accumulate medical and biological data on an
unprecedented scale and complexity
• To coordinate it
• To store it safely and securely
• To make it readily available to interested researchers
• To allow customized use of resources
• To enable innovative ways of working collaboratively
• To allow a distributed support model
• To help generate new insights and clinical outcomes by
combining data from diverse sources
45
47. eMedLab Infrastructure
• High Performance Scratch (1.3PB @ ~25GB/s)
• VM storage (475TB) – storage for VMs and snapshots
• Project storage (1.5PB) – project specific, medium term
• Reference data (2.7PB) – where data will be shared
• 252 servers
• Why GPFS/GSS?
• Good pedigree
• Great scalability as and when we
grow
• POSIX compatible
• Active in OpenStack projects
• 4.1 Security model
• Management tools improving
46
49. eMedLab: Courtesy and contacts
48
Dr Bruno Silva
High Performance Computing Lead
The Francis Crick Institute
Thomas King
Head of Research Infrastructure
Queen Mary University of London
t.king@qmul.ac.uk
50. Use Case #4 – SuperVessel: Cloud for OpenPOWER
• In China, there are around 300,000 Computer Science students
graduated from universities. Only 1000~2000 of them have
training on POWER (3~6 months).
• China STG donated POWER servers to 10 ~ 15 universities
years ago. Most of them are POWER6 machines and AIX only,
too old to install those new platforms today (e.g. Hadoop).
• Limited by the teachers’ capability in university, the POWER
machine couldn’t be effectively shared by students, or create
up-to-date content (e.g. IaaS, PaaS) for POWER learning.
• POWER Technology Open Lab is the first lab to support
OpenPOWER ecosystem in GCG.
• co-led by IBM Research – China and IBM STG lab in China
• Endorsed by GCG(Andy Ho) and Global OpenPOWER (Ken
King)
49
51. Architecture of SuperVessel - Cloud for
OpenPOWER
50
Nova Neutron Cinder
KVM
Nova Neutron Cinder
KVM
Nova Neutron Cinder
LxC/
Docker
Nova Neutron Cinder
LxC
/ Docker
Nova Neutron Cinder
KVM
KVM pool for POWER8 LE KVM pool for POWER8 BE Container pool for
POWER8 LE
KVM pool for x86Container pool for
POWER8 BE
System
maintenance
System
monitoring
Resource usage
metering
System analysis
Services for cloud admin
User account &
authentication
management
User interface
Horizon
OpenStack controller
Nova
NeutronGlance Cinder
HEAT
Admin
interface
Virtual point
management
Statistic and analysis
Baremetal
management
Image
management
Cloud Infrastructure
Service
Big Data Service OpenPOWER
enablement service
Super Class
service
Super Project
Service
FPGA/GPU
OpenPOWER
server
GPFS
KeyStone
Manila
IBM POWER
servers
52. GPFS providing shared file system for Cloud IaaS
and Big Data Service on SuperVessel Cloud
51
Docker
(Sympho
ny)
Horizon
OpenStack controller
HEAT
NeutronGlance Manila
Nova
Cloud Infrastructure
Service
Big Data Service
• Select Big data
computing framework
(Mapreduce, SPARK
• Select cluster size
• Select data folder size
HEAT template for big data cluster
Docker
(Sympho
ny)
Docker
(Symphony)
Docker
(Sympho
ny)
Docker
(Sympho
ny)
Docker
(SPARK)
POWER7/POWER8
KVM/Docker
(Web app)
Folder
A
User BUser A
Folder
B
User A
• HEAT will orchestrate docker instances, subnet and data folder based on user’s request
• Manila provides the NFS service using GPFS as backend, and the folder will be mounted via nova-docker (with –v support)
• Folder created by Manila could be accessed by the KVM/docker instances created for big data and other purpose
GPFS FPO
POWER7/POWER8
Servers
GPFS FPO
Servers
GPFS FPO
KeySton
e
Cinder
53. SuperVessel: IBM technologies in integrated cloud
environment
• IBM Cloud Management with OpenStack: Product to provide the unify
management for cloud infrastructure and big data infrastructure on POWER
• IBM General Parallel File System (FPO) : Product to provide share file system
for cloud IaaS and big data
• Platform Symphony: Product to provide big data service
• Manila : Open Source project to provide NFS management service with
OpenStack
• Nova-docker: Open Source project to provide docker driver with Nova in
OpenStack
• Research technologies on cloud
52
56. Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products in connection with this
publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any
IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Bluemix, Blueworks Live, CICS, Clearcase, DOORS®, Enterprise Document
Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand,
ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™,
PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®,
pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, SoDA, SPSS, StoredIQ, Tivoli®, Trusteer®,
urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of
International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on
the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
57. Thank You
Your Feedback is
Important!
Access the InterConnect 2015
Conference CONNECT Attendee
Portal to complete your session
surveys from your smartphone,
laptop or conference kiosk.