The document discusses the architecture of the next-generation parallel file system OrangeFS. OrangeFS distributes file data and metadata across multiple file servers and storage devices. It supports simultaneous access by multiple clients. Recent additions to OrangeFS include a scalable metadata operation protocol, support for SSD metadata storage, a Windows client, direct client access interface, client caching, WebDAV integration, and an S3 interface.
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemAll Things Open
OrangeFS is a parallel file system that provides distributed, shared-nothing metadata and data storage across multiple servers. It allows for high performance parallel I/O and a unified namespace. The document discusses OrangeFS's architecture, performance advantages, and areas for future improvement including enhanced availability, security, integrity checking, and administration. Upcoming versions will feature distributed primary object replication, geographic file replication, capability-based security, parallel background jobs for maintenance and verification, and improved metadata and scaling performance.
HDFS is a distributed file system designed for storing very large data files across commodity servers or clusters. It works on a master-slave architecture with one namenode (master) and multiple datanodes (slaves). The namenode manages the file system metadata and regulates client access, while datanodes store and retrieve block data from their local file systems. Files are divided into large blocks which are replicated across datanodes for fault tolerance. The namenode monitors datanodes and replicates blocks if their replication drops below a threshold.
Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has very large files (over 100 million files) and is optimized for batch processing huge datasets across large clusters (over 10,000 nodes). HDFS stores multiple replicas of data blocks on different nodes to handle failures. It provides high aggregate bandwidth and allows computations to move to where data resides.
This document discusses file systems and file management. It begins by defining key file concepts like file attributes and operations. It then covers topics like access methods, directory structures, file sharing, protection, and file system implementation details. The objectives are to explain file system functions, describe interfaces, discuss design tradeoffs for components like access methods and directories, and explore file system protection.
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
Virtualizing big data platforms like Hadoop provides organizations with agility, elasticity, and operational simplicity. It allows clusters to be quickly provisioned on demand, workloads to be independently scaled, and mixed workloads to be consolidated on shared infrastructure. This reduces costs while improving resource utilization for emerging big data use cases across many industries.
HDFS is a distributed file system designed for storing very large data sets reliably and efficiently across commodity hardware. It has three main components - the NameNode, Secondary NameNode, and DataNodes. The NameNode manages the file system namespace and regulates access to files. DataNodes store and retrieve blocks when requested by clients. HDFS provides reliable storage through replication of blocks across DataNodes and detects hardware failures to ensure data is not lost. It is highly scalable, fault-tolerant, and suitable for applications processing large datasets.
Hadoop Distributed File System (HDFS) is a distributed file system that stores large datasets across commodity hardware. It is highly fault tolerant, provides high throughput, and is suitable for applications with large datasets. HDFS uses a master/slave architecture where a NameNode manages the file system namespace and DataNodes store data blocks. The NameNode ensures data replication across DataNodes for reliability. HDFS is optimized for batch processing workloads where computations are moved to nodes storing data blocks.
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
There are many frameworks that can offer real time on top of Hadoop. This talk will show you the usage of Pivotal HAWQ and how it is easy to use SQL for querying your Hadoop data. Come and see the power and easy of use that can help you on using the Hadoop ecosystem.
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemAll Things Open
OrangeFS is a parallel file system that provides distributed, shared-nothing metadata and data storage across multiple servers. It allows for high performance parallel I/O and a unified namespace. The document discusses OrangeFS's architecture, performance advantages, and areas for future improvement including enhanced availability, security, integrity checking, and administration. Upcoming versions will feature distributed primary object replication, geographic file replication, capability-based security, parallel background jobs for maintenance and verification, and improved metadata and scaling performance.
HDFS is a distributed file system designed for storing very large data files across commodity servers or clusters. It works on a master-slave architecture with one namenode (master) and multiple datanodes (slaves). The namenode manages the file system metadata and regulates client access, while datanodes store and retrieve block data from their local file systems. Files are divided into large blocks which are replicated across datanodes for fault tolerance. The namenode monitors datanodes and replicates blocks if their replication drops below a threshold.
Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has very large files (over 100 million files) and is optimized for batch processing huge datasets across large clusters (over 10,000 nodes). HDFS stores multiple replicas of data blocks on different nodes to handle failures. It provides high aggregate bandwidth and allows computations to move to where data resides.
This document discusses file systems and file management. It begins by defining key file concepts like file attributes and operations. It then covers topics like access methods, directory structures, file sharing, protection, and file system implementation details. The objectives are to explain file system functions, describe interfaces, discuss design tradeoffs for components like access methods and directories, and explore file system protection.
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
Virtualizing big data platforms like Hadoop provides organizations with agility, elasticity, and operational simplicity. It allows clusters to be quickly provisioned on demand, workloads to be independently scaled, and mixed workloads to be consolidated on shared infrastructure. This reduces costs while improving resource utilization for emerging big data use cases across many industries.
HDFS is a distributed file system designed for storing very large data sets reliably and efficiently across commodity hardware. It has three main components - the NameNode, Secondary NameNode, and DataNodes. The NameNode manages the file system namespace and regulates access to files. DataNodes store and retrieve blocks when requested by clients. HDFS provides reliable storage through replication of blocks across DataNodes and detects hardware failures to ensure data is not lost. It is highly scalable, fault-tolerant, and suitable for applications processing large datasets.
Hadoop Distributed File System (HDFS) is a distributed file system that stores large datasets across commodity hardware. It is highly fault tolerant, provides high throughput, and is suitable for applications with large datasets. HDFS uses a master/slave architecture where a NameNode manages the file system namespace and DataNodes store data blocks. The NameNode ensures data replication across DataNodes for reliability. HDFS is optimized for batch processing workloads where computations are moved to nodes storing data blocks.
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
There are many frameworks that can offer real time on top of Hadoop. This talk will show you the usage of Pivotal HAWQ and how it is easy to use SQL for querying your Hadoop data. Come and see the power and easy of use that can help you on using the Hadoop ecosystem.
IBM SONAS and the Cloud Storage TaxonomyTony Pearson
This document discusses IBM's Scale Out Network Attached Storage (SONAS) solution. SONAS provides a global namespace and can scale to support large amounts of unstructured file data across various cloud environments. The document outlines how SONAS utilizes IBM's General Parallel File System (GPFS) and provides features such as high performance, data replication, backups, antivirus integration, and information lifecycle management through migration to tape storage.
This document outlines and compares two NameNode high availability (HA) solutions for HDFS: AvatarNode used by Facebook and BackupNode used by Yahoo. AvatarNode provides a complete hot standby with fast failover times of seconds by using an active-passive pair and ZooKeeper for coordination. BackupNode has limitations including slower restart times of 25+ minutes and supporting only two-machine failures. While it provides hot standby for the namespace, block reports are sent only to the active NameNode, making it a semi-hot standby solution. The document also briefly mentions other experimental HA solutions for HDFS.
The 3.0 release of the Maginatics Cloud Storage Platform (MCSP) includes great improvements in Data Protection, Multi-tier Caching and APIs, as well as other significant new features that make Maginatics the ideal choice for enterprise businesses with demanding storage requirements.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Red Hat Storage - Introduction to GlusterFSGlusterFS
Red Hat Storage introduces GlusterFS, an open source scale-out file system. GlusterFS provides scalable, affordable storage using commodity hardware. It allows linearly scaling performance and capacity by adding servers. GlusterFS has a global namespace and supports various protocols, enabling flexible deployment across private and public clouds. Many enterprises rely on GlusterFS for applications, virtual machines, Hadoop, and hybrid cloud solutions.
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
The Hadoop Distributed File System (HDFS) has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access, and multiple DataNodes that store and retrieve blocks of data files. The NameNode maintains metadata and a map of blocks to files, while DataNodes store blocks and report their locations. Blocks are replicated across DataNodes for fault tolerance following a configurable replication factor. The system uses rack awareness and preferential selection of local replicas to optimize performance and bandwidth utilization.
This document discusses directory write leases in MagFS, a globally distributed file system. It introduces the concept of directory write leases, which allow clients to cache and execute namespace-modifying operations locally to improve performance over high-latency networks. Evaluation results show that directory write leases enable workloads to complete much faster with increasing network latency compared to synchronous approaches.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
This document summarizes new file system and storage features in Red Hat Enterprise Linux (RHEL) 6 and 7. It discusses enhancements to logical volume management (LVM) such as thin provisioning and snapshots. It also covers expanded file system options like XFS, improvements to NFS including parallel NFS, and general performance enhancements.
This document discusses designing and building an inexpensive distributed file system (DFS). It begins with an overview of why DFS systems are used and their advantages over centralized storage, such as lower costs and better scalability. It then provides details on openAFS, an open-source DFS, including its main elements, implementation, and usage. The document also introduces new approaches using object-based storage and distributed systems like Hadoop.
Comparison between mongo db and cassandra using ycsbsonalighai
Performed YCSB benchmarking test to check the performances of MongoDB and Cassandra for different workloads and a million opcounts and generated a report discussing clear insights.
Interactive Hadoop via Flash and MemoryChris Nauroth
Enterprises are using Hadoop for interactive real-time data processing via projects such as the Stinger Initiative. We describe two new HDFS features – Centralized Cache Management and Heterogeneous Storage – that allow applications to effectively use low latency storage media such as Solid State Disks and RAM. In the first part of this talk, we discuss Centralized Cache Management to coordinate caching important datasets and place tasks for memory locality. HDFS deployments today rely on the OS buffer cache to keep data in RAM for faster access. However, the user has no direct control over what data is held in RAM or how long it?s going to stay there. Centralized Cache Management allows users to specify which data to lock into RAM. Next, we describe Heterogeneous Storage support for applications to choose storage media based on their performance and durability requirements. Perhaps the most interesting of the newer storage media are Solid State Drives which provide improved random IO performance over spinning disks. We also discuss memory as a storage tier which can be useful for temporary files and intermediate data for latency sensitive real-time applications. In the last part of the talk we describe how administrators can use quota mechanism extensions to manage fair distribution of scarce storage resources across users and applications.
DAOS (Distributed Application Object Storage) is a high-performance storage architecture and software stack that delivers scalable object storage capabilities. It uses Intel Optane memory and NVMe SSDs to provide high IOPS, bandwidth, and low latency storage. DAOS supports various data models and interfaces like POSIX, HDF5, Spark, and Python. It allows applications to access storage with library calls instead of system calls for high performance.
Gluster Webinar: Introduction to GlusterFSGlusterFS
GlusterFS is an open source, scale-out network filesystem. It runs on commodity hardware and allows indefinite growth in capacity and performance by simply adding server nodes. Key benefits include flexibility to deploy on any hardware, linearly scalable performance, and superior storage economics compared to traditional storage solutions. GlusterFS uses a distributed hashing technique instead of a metadata server to provide high availability and reliability.
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
This document provides a summary of a presentation that benchmarked the performance of three popular NoSQL databases: Apache Cassandra, Apache HBase, and MongoDB. It describes the architectures and data models of each database. Benchmark tests were run using the Yahoo Cloud Serving Benchmark and found that Apache Cassandra consistently outperformed the other databases across different workloads in terms of load time, read and write performance, and latency. The presentation emphasizes the importance of benchmarks for evaluating NoSQL database performance and choosing the right database based on application requirements.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
This document outlines the agenda for a training on Oracle RDBMS 12c new features. The training will cover 6 chapters: introduction, multitenant architecture, upgrade features, Flex Cluster, Global Data Service, and an overview of RDBMS features. The agenda provides a high-level overview of topics to be discussed in each chapter, including multitenant architecture concepts, upgrade options and tools, Flex Cluster configurations, Global Data Service components, and new features such as temporary undo and multiple indexes on the same columns.
Ceph is a distributed file system that provides excellent performance, reliability and scalability for IaaS platforms like OpenStack. It uses an object-based storage model with dynamic distributed metadata management and reliable replication to store data across multiple servers. While CephFS for POSIX file access is still maturing, Ceph block storage via RBD is stable and commonly used in IaaS to provide block-level volumes to VMs from images stored in Ceph.
This document discusses ORM cache hierarchies and distributed caching. It describes two levels of caching - a session level cache and a query/entry cache level. It addresses consistency problems with caching and discusses solutions like using distributed locks or JTA transaction managers. It also proposes executing queries directly in the entity cache to avoid invalidation issues, and discusses writing data behind asynchronously to caches instead of synchronously to databases.
This document discusses containerization and the Docker ecosystem. It begins by describing the challenges of managing different software stacks across multiple environments. It then introduces Docker as a solution that packages applications into standardized units called containers that are portable and can run anywhere. The rest of the document covers key aspects of the Docker ecosystem like orchestration tools like Kubernetes and Docker Swarm, networking solutions like Flannel and Weave, storage solutions, and security considerations. It aims to provide an overview of the container landscape and components.
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
This document provides an overview of IBM Spectrum Scale's Active File Management (AFM) capabilities and use cases. AFM uses a home-and-cache model to cache data from a home site at local clusters for low-latency access. It expands GPFS' global namespace across geographical distances and provides automated namespace management. The document discusses AFM caching basics, global sharing, use cases like content distribution and disaster recovery. It also provides details on Spectrum Scale's protocol support, unified file and object access, using AFM with object storage, and configuration.
IBM SONAS and the Cloud Storage TaxonomyTony Pearson
This document discusses IBM's Scale Out Network Attached Storage (SONAS) solution. SONAS provides a global namespace and can scale to support large amounts of unstructured file data across various cloud environments. The document outlines how SONAS utilizes IBM's General Parallel File System (GPFS) and provides features such as high performance, data replication, backups, antivirus integration, and information lifecycle management through migration to tape storage.
This document outlines and compares two NameNode high availability (HA) solutions for HDFS: AvatarNode used by Facebook and BackupNode used by Yahoo. AvatarNode provides a complete hot standby with fast failover times of seconds by using an active-passive pair and ZooKeeper for coordination. BackupNode has limitations including slower restart times of 25+ minutes and supporting only two-machine failures. While it provides hot standby for the namespace, block reports are sent only to the active NameNode, making it a semi-hot standby solution. The document also briefly mentions other experimental HA solutions for HDFS.
The 3.0 release of the Maginatics Cloud Storage Platform (MCSP) includes great improvements in Data Protection, Multi-tier Caching and APIs, as well as other significant new features that make Maginatics the ideal choice for enterprise businesses with demanding storage requirements.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
Red Hat Storage - Introduction to GlusterFSGlusterFS
Red Hat Storage introduces GlusterFS, an open source scale-out file system. GlusterFS provides scalable, affordable storage using commodity hardware. It allows linearly scaling performance and capacity by adding servers. GlusterFS has a global namespace and supports various protocols, enabling flexible deployment across private and public clouds. Many enterprises rely on GlusterFS for applications, virtual machines, Hadoop, and hybrid cloud solutions.
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
How did Maginatics build a strongly consistent and secure distributed file system? Niraj Tolia, Chief Architect at Maginatics, gave this presentation on the design of MagFS at the Storage Developer Conference on September 16, 2013.
For more information about MagFS—The File System for the Cloud, visit maginatics.com or contact us directly at info@maginatics.com.
The Hadoop Distributed File System (HDFS) has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access, and multiple DataNodes that store and retrieve blocks of data files. The NameNode maintains metadata and a map of blocks to files, while DataNodes store blocks and report their locations. Blocks are replicated across DataNodes for fault tolerance following a configurable replication factor. The system uses rack awareness and preferential selection of local replicas to optimize performance and bandwidth utilization.
This document discusses directory write leases in MagFS, a globally distributed file system. It introduces the concept of directory write leases, which allow clients to cache and execute namespace-modifying operations locally to improve performance over high-latency networks. Evaluation results show that directory write leases enable workloads to complete much faster with increasing network latency compared to synchronous approaches.
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
With YARN running Docker containers, it is possible to run applications that are not HDFS aware inside these containers. It is hard to customize these applications since most of them assume a Posix file system with rewrite capabilities. In this talk, we will dive into how we created a block storage, how it is being tested internally and the storage containers which makes it all possible.
The storage container framework was developed as part of Ozone (HDFS-7240). This is talk will also explore the current state of Ozone along with CBlocks. This talk will explore architecture of storage containers, how replication is handled, scaling to millions of volumes and I/O performance optimizations.
This document summarizes new file system and storage features in Red Hat Enterprise Linux (RHEL) 6 and 7. It discusses enhancements to logical volume management (LVM) such as thin provisioning and snapshots. It also covers expanded file system options like XFS, improvements to NFS including parallel NFS, and general performance enhancements.
This document discusses designing and building an inexpensive distributed file system (DFS). It begins with an overview of why DFS systems are used and their advantages over centralized storage, such as lower costs and better scalability. It then provides details on openAFS, an open-source DFS, including its main elements, implementation, and usage. The document also introduces new approaches using object-based storage and distributed systems like Hadoop.
Comparison between mongo db and cassandra using ycsbsonalighai
Performed YCSB benchmarking test to check the performances of MongoDB and Cassandra for different workloads and a million opcounts and generated a report discussing clear insights.
Interactive Hadoop via Flash and MemoryChris Nauroth
Enterprises are using Hadoop for interactive real-time data processing via projects such as the Stinger Initiative. We describe two new HDFS features – Centralized Cache Management and Heterogeneous Storage – that allow applications to effectively use low latency storage media such as Solid State Disks and RAM. In the first part of this talk, we discuss Centralized Cache Management to coordinate caching important datasets and place tasks for memory locality. HDFS deployments today rely on the OS buffer cache to keep data in RAM for faster access. However, the user has no direct control over what data is held in RAM or how long it?s going to stay there. Centralized Cache Management allows users to specify which data to lock into RAM. Next, we describe Heterogeneous Storage support for applications to choose storage media based on their performance and durability requirements. Perhaps the most interesting of the newer storage media are Solid State Drives which provide improved random IO performance over spinning disks. We also discuss memory as a storage tier which can be useful for temporary files and intermediate data for latency sensitive real-time applications. In the last part of the talk we describe how administrators can use quota mechanism extensions to manage fair distribution of scarce storage resources across users and applications.
DAOS (Distributed Application Object Storage) is a high-performance storage architecture and software stack that delivers scalable object storage capabilities. It uses Intel Optane memory and NVMe SSDs to provide high IOPS, bandwidth, and low latency storage. DAOS supports various data models and interfaces like POSIX, HDF5, Spark, and Python. It allows applications to access storage with library calls instead of system calls for high performance.
Gluster Webinar: Introduction to GlusterFSGlusterFS
GlusterFS is an open source, scale-out network filesystem. It runs on commodity hardware and allows indefinite growth in capacity and performance by simply adding server nodes. Key benefits include flexibility to deploy on any hardware, linearly scalable performance, and superior storage economics compared to traditional storage solutions. GlusterFS uses a distributed hashing technique instead of a metadata server to provide high availability and reliability.
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
This document provides a summary of a presentation that benchmarked the performance of three popular NoSQL databases: Apache Cassandra, Apache HBase, and MongoDB. It describes the architectures and data models of each database. Benchmark tests were run using the Yahoo Cloud Serving Benchmark and found that Apache Cassandra consistently outperformed the other databases across different workloads in terms of load time, read and write performance, and latency. The presentation emphasizes the importance of benchmarks for evaluating NoSQL database performance and choosing the right database based on application requirements.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
This document outlines the agenda for a training on Oracle RDBMS 12c new features. The training will cover 6 chapters: introduction, multitenant architecture, upgrade features, Flex Cluster, Global Data Service, and an overview of RDBMS features. The agenda provides a high-level overview of topics to be discussed in each chapter, including multitenant architecture concepts, upgrade options and tools, Flex Cluster configurations, Global Data Service components, and new features such as temporary undo and multiple indexes on the same columns.
Ceph is a distributed file system that provides excellent performance, reliability and scalability for IaaS platforms like OpenStack. It uses an object-based storage model with dynamic distributed metadata management and reliable replication to store data across multiple servers. While CephFS for POSIX file access is still maturing, Ceph block storage via RBD is stable and commonly used in IaaS to provide block-level volumes to VMs from images stored in Ceph.
This document discusses ORM cache hierarchies and distributed caching. It describes two levels of caching - a session level cache and a query/entry cache level. It addresses consistency problems with caching and discusses solutions like using distributed locks or JTA transaction managers. It also proposes executing queries directly in the entity cache to avoid invalidation issues, and discusses writing data behind asynchronously to caches instead of synchronously to databases.
This document discusses containerization and the Docker ecosystem. It begins by describing the challenges of managing different software stacks across multiple environments. It then introduces Docker as a solution that packages applications into standardized units called containers that are portable and can run anywhere. The rest of the document covers key aspects of the Docker ecosystem like orchestration tools like Kubernetes and Docker Swarm, networking solutions like Flannel and Weave, storage solutions, and security considerations. It aims to provide an overview of the container landscape and components.
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
This document provides an overview of IBM Spectrum Scale's Active File Management (AFM) capabilities and use cases. AFM uses a home-and-cache model to cache data from a home site at local clusters for low-latency access. It expands GPFS' global namespace across geographical distances and provides automated namespace management. The document discusses AFM caching basics, global sharing, use cases like content distribution and disaster recovery. It also provides details on Spectrum Scale's protocol support, unified file and object access, using AFM with object storage, and configuration.
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
Introduction to Spectrum Scale Active File Management (AFM)
and its use cases. Spectrum Scale Protocols - Unified File & Object Access (UFO) Feature Details
AFM + Object : Unique Wan Caching for Object Store
Windows Server 2012 introduces new storage technologies like Storage Spaces and SMB 3.0 that can replace traditional SANs. These technologies provide high performance storage with easier administration and lower costs when used together. They enable virtualized storage through storage pools and spaces, storage resilience through hardware redundancy, and optimization of storage utilization.
Ceph Day London 2014 - The current state of CephFS development Ceph Community
The document discusses recent developments in CephFS. It provides an overview of CephFS architecture including components like clients, servers, storage and data placement. The focus is on improving resilience and making CephFS production-ready with features like online filesystem checking, journal resilience tools, client management and online diagnostics. The goal is to handle failures and diagnose problems in a distributed filesystem environment.
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
In questa sessione HPE e SUSE illustrano con casi reali come HPE Data Management Framework e SUSE Enterprise Storage permettano di risolvere i problemi di gestione della crescita esponenziale dei dati realizzando un’architettura software-defined flessibile, scalabile ed economica. (Alberto Galli, HPE Italia e SUSE)
Service Fabric is an open-source distributed systems platform from Microsoft for packaging, deploying and managing distributed applications and services at scale. Azure Service Fabric Mesh is a new fully-managed platform that allows developing and running microservices applications without having to manage infrastructure. Key features of Service Fabric Mesh include serverless infrastructure, lifecycle management, intelligent traffic routing, and health monitoring. It allows building applications using any programming language or framework that can run in containers.
VMworld 2015: Advanced SQL Server on vSphereVMworld
Microsoft SQL Server is one of the most widely deployed “apps” in the market today and is used as the database layer for a myriad of applications, ranging from departmental content repositories to large enterprise OLTP systems. Typical SQL Server workloads are somewhat trivial to virtualize; however, business critical SQL Servers require careful planning to satisfy performance, high availability, and disaster recovery requirements. It is the design of these business critical databases that will be the focus of this breakout session. You will learn how build high-performance SQL Server virtual machines through proper resource allocation, database file management, and use of all-flash storage like XtremIO. You will also learn how to protect these critical systems using a combination of SQL Server and vSphere high availability features. For example, did you know you can vMotion shared-disk Windows Failover Cluster nodes? You can in vSphere 6! Finally, you will learn techniques for rapid deployment, backup, and recovery of SQL Server virtual machines using an all-flash array.
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
Sanjay Sabnis presented on next generation storage solutions for modern big data applications. He discussed how NVMe storage provides significantly higher performance than SATA, with speeds over 6x faster for reads and over 40x faster for writes. Pavilion Data offers an all-NVMe rack scale storage array that provides 120GB/s of throughput with DAS-level latency. This solution can meet the performance and scalability demands of big data workloads like MongoDB, Splunk, and containerized applications.
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
The document discusses the future of software-defined storage in 3 years. It predicts that storage media will continue to advance with higher capacities and lower latencies using technologies like 3D NAND and NVDIMMs. Networking and interconnects like NVMe over Fabrics will allow disaggregated storage resources to be pooled and shared across servers. Software-defined storage platforms will evolve to provide common services for distributed data platforms beyond just block storage, with advanced data placement and policy controls to optimize different workloads.
The document summarizes Novell's roadmap for Open Enterprise Server 2 (OES2), including upcoming support pack 3 (SP3). SP3 will include enhancements to Domain Services for Windows, CIFS, QuickFinder, and iFolder. It also discusses the "Remote Office Appliance" which will help centrally manage remote sites. Long term, Novell is focusing on simplification, interoperability, and the "Ponderosa" vision of decoupling workloads and deploying appliances for the cloud or on-premise.
The document provides an introduction to NVMe over Fabrics, including:
- What NVMe over Fabrics is and its advantages like end-to-end NVMe semantics and low latency remote storage.
- How NVMe is being expanded to support message-based operations over various fabrics like RDMA, Fibre Channel, and Ethernet.
- Examples of how NVMe over Fabrics is being implemented in data center architectures and storage solutions.
This document provides an overview of a NoSQL Night event presented by Clarence J M Tauro from Couchbase. The presentation introduces NoSQL databases and discusses some of their advantages over relational databases, including scalability, availability, and partition tolerance. It covers key concepts like the CAP theorem and BASE properties. The document also provides details about Couchbase, a popular document-oriented NoSQL database, including its architecture, data model using JSON documents, and basic operations. Finally, it advertises Couchbase training courses for getting started and administration.
This document summarizes new features in .NET Framework 4.5, including improvements to WeakReferences, streams, ReadOnlyDictionary, compression, and large objects. It describes enhancements to server GC, asynchronous programming, the Task Parallel Library, ASP.NET, Entity Framework, WCF, WPF, and more. The .NET 4.5 update focuses on performance improvements, support for asynchronous code and parallel operations, and enabling modern app development patterns.
TechTalk: Connext DDS 5.2 - Faster and Easier Development of Industrial Internet Systems and Applications
Watch on-demand: https://youtu.be/j1G0MHC0Vwc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
The document discusses the challenges of modern data platforms including disparate systems, multiple tools, high costs, and siloed insights. It introduces the Microsoft Data Platform as a way to manage all data in a scalable and secure way, gain insights across data without movement, utilize existing skills and investments, and provide consistent experiences on-premises, in the cloud, and hybrid environments. Key elements of the Microsoft Data Platform include SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Azure Data Lake, and Analytics Platform System.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
The document discusses accelerating Apache Hadoop through high-performance networking and I/O technologies. It describes how technologies like InfiniBand, RoCE, SSDs, and NVMe can benefit big data applications by alleviating bottlenecks. It outlines projects from the High-Performance Big Data project that implement RDMA for Hadoop, Spark, HBase and Memcached to improve performance. Evaluation results demonstrate significant acceleration of HDFS, MapReduce, and other workloads through the high-performance designs.
Similar to Architecture of a Next-Generation Parallel File System (20)
Breaking Free from Proprietary Gravitational PullGreat Wide Open
This document provides an overview and agenda for a presentation about breaking free from proprietary software and embracing open source. The presentation covers the business and legal considerations for open sourcing existing software projects, including ownership models, licensing strategies, and governance approaches. It also addresses how to structure R&D, sales, and support organizations to be successful with open source and how to build and invest in developer and user communities. The goal is to help companies chart a course to transition existing proprietary software to open source models and practices.
You Don't Know Node: Quick Intro to 6 Core FeaturesGreat Wide Open
This document provides an introduction to Node.js and discusses its core features including:
- Node.js is asynchronous and event-driven, allowing it to handle multiple requests simultaneously without blocking.
- It uses a single thread model with non-blocking I/O, utilizing an event loop to process tasks in parallel.
- Common data types like streams and buffers are used to handle binary data and large files efficiently without blocking the thread.
Andy Watson, an employee of Ionic Security, gave a presentation on properly using cryptography in applications. The presentation covered topics such as random number generation, hashing, salting passwords, key derivation functions, symmetric encryption algorithms and common mistakes made with cryptography. The goal was to help people avoid vulnerabilities like unsalted hashes, hardcoded keys, weak random number generation and improper encryption modes.
Lightning Talk - Getting Students Involved In Open SourceGreat Wide Open
Lightning Talks are presented by Opensource.com
Chris Aniszczyk
Executive Director (interim)
Cloud Native Computing Foundation
Great Wide Open 2016
Atlanta, GA
March 17th, 2016
The document discusses test automation using Selenium and provides guidance on best practices. It covers topics like test design approaches, automation-friendly test techniques, special test cases for things like data and graphics, and perspectives on test automation. The document also discusses test frameworks, libraries and patterns commonly used with Selenium. It provides examples of keyword-driven and behavior-driven test automation using domain-specific languages.
The document discusses how constraints can cultivate growth. It suggests 5 ways that constraints can help: 1) use fewer resources, 2) create regulations, 3) remove distractions, 4) self-organize, and 5) stretch your comfort zone. Constraints shape problems and provide clear challenges to overcome, helping to make decisions, improve experiences, increase productivity, work together, and grow and learn.
The document discusses best practices for running MySQL on Linux, covering choices for Linux distributions, hardware recommendations including using solid state drives, OS configuration such as tuning the filesystem and IO scheduler, and MySQL installation and configuration options. It provides guidance on topics like virtualization, networking, and MySQL variants to help ensure successful and high performance deployment of MySQL on Linux.
This document discusses search interfaces and principles. It begins with an introduction to the presenter and then covers topics like how search engines work, principles of good search design, and common front-end search patterns. Specific concepts discussed include indexing text, query analysis, scoring and ranking documents, filtering results, aggregations, autocomplete, highlighting search terms, and loading more results. The overall message is that search provides a powerful and flexible way to return relevant content to users.
This document provides an overview of open source software. It defines open source as software that is freely available with its source code and allows others to use, modify, and distribute the software. It discusses the main open source licenses like permissive, weak copyleft, and strong copyleft licenses. It also covers the different types of open source community governance models like walled gardens, benevolent dictators, and meritocracies. Finally, it provides tips for building open source communities through email lists, consensus, positivity, and sharing.
This document discusses principles of antifragile design. It emphasizes designing for diversity among users by understanding different mindsets and contexts through user research and data. It stresses iterating quickly based on feedback, sharing work publicly in early stages, and embracing uncertainty. Well-designed systems can evolve and adapt to users' changing needs over time by deciding on defaults instead of excessive options and customization.
This document discusses using Elasticsearch for SQL users. It covers search queries, data modeling, and architecture approaches. The agenda includes search queries, data modeling, and architecture. A live demo shows searching a single field, multiple fields, and phrases. Data modeling discusses analyzing or not analyzing fields. Relationships can be modeled through application joins, data denormalization, nested objects, or parent-child documents. Architecture approaches include using triggers, asynchronous replication, and forked writes from applications with or without Logstash.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
4. What
is
OrangeFS?
• OrangeFS
is
a
next
generation
Parallel
File
System
• Based
on
PVFS
• Distributes
file
data
across
multiple
file
servers
leveraging
any
block
level
file
system.
• Distributed
Meta
Data
across
1
to
all
storage
servers
• Supports
simultaneous
access
by
multiple
clients,
including
Windows
using
the
PVFS
protocol
Directly
• Works
w/
standard
kernel
releases
and
does
not
require
custom
kernel
patches
• Easy
to
install
and
maintain
5. Why
Parallel
File
System?
HPC
–
Data
Intensive
Parallel
(PVFS)
Protocol
• Large
datasets
• Checkpointing
• Visualization
• Video
• BigData
Unstructured
Data
Silos
Interfaces
to
Match
Problems
• Unify
Dispersed
File
Systems
• Simplify
Storage
Leveling
§ Multidimensional
arrays
§
typed
data
§ portable
formats
6. Original
PVFS
Design
Goals
§ Scalable
§ Configurable
file
striping
§ Non-‐contiguous
I/O
patterns
§ Eliminates
bottlenecks
in
I/O
path
§ Does
not
need
locks
for
metadata
ops
§ Does
not
need
locks
for
non-‐conflicting
applications
§ Usability
§ Very
easy
to
install,
small
VFS
kernel
driver
§ Modular
design
for
disk,
network,
etc
§ Easy
to
extend
-‐>
Hundreds
of
Research
Projects
have
used
it,
including
dissertations,
thesis,
etc…
7. OrangeFS
Philosophy
• Focus
on
a
Broader
Set
of
Applications
• Customer
&
Community
Focused
• (>300
Member
Strong
Community
&
Growing)
• Open
Source
• Commercially
Viable
• Enable
Research
9. System
Architecture
• OrangeFS
servers
manage
objects
• Objects
map
to
a
specific
server
• Objects
store
data
or
metadata
• Request
protocol
specifies
operations
on
one
or
more
objects
• OrangeFS
object
implementation
• DB
for
indexing
key/value
data
• Local
block
file
system
for
data
stream
of
bytes
11. 1994-‐2004
Design
and
Development
at
CU
Dr.
Ligon
+
ANL
(CU
Graduates)
Primary
Maint
&
Development
ANL
(CU
Graduates)
+
Community
2004-‐2010
2007-‐2010
New
PVFS
Branch
SC10
(fall
2010)
2015
Announced
with
community
and
is
now
Mainline
of
future
development
as
of
2.8.4
Spring
2012
New
Development
focused
on
a
broader
set
of
problems
SC11
(fall
2011)
Performance
improvements,
Direct
Lib
+
Cache
Stability,
WebDAV,
S3
PVFS2
PVFS
Improved
MD,
Stability,
Server
Side
Operations,
Newer
Kernels,
Testing
Windows
Client,
Stability,
Replicate
on
Immutable
2.8.6
+
Webpack
2.8.5
+
Win
Support
and
Targeted
Development
Services
Initially
Offered
by
Omnibond
OrangeFS
3.0
Summer
2014
Distributed
Dir
MD,
Capability
based
security
2.9.0
Winter
2013
Performance
improvements,
Stability,
2.8.7
+
Webpack
Spring
2014
Performance
improvements,
Stability,
shared
mmap,
multi
TCP/IP
Server
Homing,
Hadoop
MapReduce,
user
lib
fixes,
new
spec
file
for
RPMS
+
DKMS
2.8.8
+
Webpack
Available
in
the
AWS
Marketplace
Replicated
MD,
File
Data,
128
bit
UUID
for
File
Handles,
Parallel
Background
Processes,
web
based
Mgt
Ui,
self
healing
processes,
data
balancing
13. Server
to
Server
Communications
(2.8.5)
Traditional
Metadata
Operation
Create
request
causes
client
to
communicate
with
all
servers
O(p)
Scalable
Metadata
Operation
Create
request
communicates
with
a
single
server
which
in
turn
communicates
with
other
servers
using
a
tree-‐based
protocol
O(log
p)
Mid
Client
Serv
App
Mid
Mid
Client
Serv
Client
Serv
App
Network
App
Mid
Client
Serv
App
Mid
Mid
Client
Serv
Client
Serv
App
Network
App
14. Recent
Additions
(2.8.5)
SSD
Metadata
Storage
Replicate
on
Immutable
(file
based)
Windows
Client
Supports
Windows
32/64
bit
Server
2008,
R2,
Vista,
7
15. Direct
Access
Interface
(2.8.6)
• Implements:
• POSIX
system
calls
• Stdio
library
calls
• Parallel
extensions
• Noncontiguous
I/O
• Non-‐blocking
I/O
• MPI-‐IO
library
• Found
more
boundary
conditions
fixed
in
upcoming
2.8.7
App
Kernel
PVFS
lib
Client
Core
Direct
lib
PVFS
lib
Kernel
App
IB
TCP
16. File
System
File
System
File
System
Direct
Interface
Client
Caching
(2.8.6)
• Direct
Interface
enables
Multi-‐Process
Coherent
Client
Caching
for
a
single
client
File
System
File
System
Client
Application
Direct
Interface
Client
Cache
File
System
17. WebDAV
(2.8.6
webpack)
PVFS
Protocol
OrangeFS
Apache
• Supports
DAV
protocol
and
tested
with
the
Litmus
DAV
test
suite
• Supports
DAV
cooperative
locking
in
metadata
18. S3
(2.8.6
webpack)
PVFS
Protocol
OrangeFS
Apache
• Tested
using
s3cmd
client
• Files
accessible
via
other
access
methods
• Containers
are
Directories
• Accounting
Pieces
not
implimented
19. Summary
-‐
Recently
Added
to
OrangeFS
• In
2.8.3
• Server-‐to-‐Server
Communication
• SSD
Metadata
Storage
• Replicate
on
Immutable
• 2.8.4,
2.8.5
(fixes,
support
for
newer
kernels)
• Windows
Client
• 2.8.6
–
Performance,
Fixes,
IB
updates
• Direct
Access
Libraries
(initial
release)
• preload
library
for
applications,
Including
Optional
Client
Cache
• Webpack
• WebDAV
(with
file
locking),
S3
20. Available
on
the
Amazon
AWS
Marketplace
and
brought
to
you
by
Omnibond
OrangeFS
Instance
Unified High Performance File System
DynamoDB
EBS
Volumes
OrangeFS
on
AWS
Marketplace
22. Hadoop
JNI
Interface
(2.8.8)
• OrangeFS
Java
Native
Interface
• Extension
of
Hadoop
File
System
Class
–>JNI
• Buffering
• Distribution
• Fast
PVFS
Protocol
for
Remote
Configuration
PVFS
Protocol
23. Additional
Items(2.8.8)
• Updated
user
lib
• Shared
mmap
support
in
kernel
module
• Support
for
kernels
up
to
3.11
• Multi-‐homing
servers
over
IP
• Clients
can
access
server
over
multiple
interfaces
(say
clients
on
IPoIB
+clients
on
IPoEthernet
+clients
on
IPoMx
• Enterprise
Installers
(Coming
Shortly)
• Client
(with
DKMS
for
Kernel
Module)
• Server
• Devel
25. Scaling
Tests
16
Storage
Servers
with
2
LVM’d
5+1
RAID
sets
were
tested
with
up
to
32
clients,
with
read
performance
reaching
nearly
12GB/s
and
write
performance
reaching
nearly
8GB/s.
26. MapReduce
over
OrangeFS
• 8
Dell
R720
Servers
Connected
with
10Gb/s
Ethernet
• Remote
Case
adds
an
additional
8
Identical
Servers
and
does
all
OrangeFS
work
Remotely
and
only
Local
work
is
done
on
Compute
Node
(Traditional
HPC
Model)
• *25%
improvement
with
OrangeFS
running
Remotely
27. MapReduce
over
OrangeFS
• 8
Dell
R720
Servers
Connected
with
10Gb/s
Ethernet
• Remote
Clients
are
R720s
with
single
SAS
disks
for
local
data
(vs.
12
disk
arrays
in
the
previous
test).
31. Distributed
Directory
Metadata
(2.9.0)
DirEnt1
DirEnt2
DirEnt3
DirEnt4
DirEnt5
DirEnt6
DirEnt1
DirEnt5
DirEnt3
DirEnt2
DirEnt6
DirEnt4
Server0
Server1
Server2
Server3
Extensible Hashing
u State
Management
based
on
Giga+
u Garth
Gibson,
CMU
u Improves
access
times
for
directories
with
a
very
large
number
of
entries
32. Cert
or
Credential
Signed
Capability
I/O
Signed
Capability
Signed
Capability
I/O
Signed
Capability
I/O
OpenSSL
PKI
• 3
Security
Modes
• Basic
–
OrangeFS/PVFS
Classic
Mode
• Key-‐Based
–
Keys
are
used
to
authorize
clients
for
use
with
the
FS
• User
Certificate
Based
with
LDAP
–
user
certs
are
used
for
access
to
the
file
system
and
are
generated
based
on
LDAP
uid/gid
info
34. Replication
/
Redundancy
• Redundant
Metadata
• seamless
recovery
after
a
failure
• Redundant
objects
from
root
directory
down
• Configurable
• Redundant
Data
Update
mode
(real
time,
on
close,
on
immutable,
none)
Configurable
Number
of
Replicas
• Real
Time
“forked
flow”
work
shows
little
overhead
• Replicate
on
Close
• Replicate
to
external
(like
LTFS)
• Looking
at
supporting
HSM
option
to
external
(no
local
replica)
• Emphasis
on
continuous
operation
OrangeFS
3.0
35. • An
OID
(object
identifier)
is
a
128-‐bit
UUID
that
is
unique
to
the
data-‐space
• An
SID
(server
identifier)
is
a
128-‐bit
UUID
that
is
unique
to
each
server.
• No
more
than
one
copy
of
a
given
data-‐space
can
exist
on
any
server
• The
(OID,
SID)
tuple
is
unique
within
the
file
system.
• (OID,
SID1),
(OID,
SID2),
(OID,
SID3)
are
copies
of
the
object
identifier
on
different
servers.
Handles
-‐>
UUIDs
OrangeFS
3.0
36. • In
an
Exascale
environment
with
the
potential
for
thousands
of
I/O
servers,
it
will
no
longer
be
feasible
for
each
server
to
know
about
all
other
servers.
• Servers
Discovery
• Will
know
a
subset
of
their
neighbors
at
startup
(or
may
be
cached
from
previous
startups).
Similar
to
DNS
domains.
• Servers
will
learn
about
unknown
servers
on
an
as
needed
basis
and
cache
them.
Similar
to
DNS
query
mechanisms
(root
servers,
authoritative
domain
servers).
• SID
Cache,
in
memory
db
to
store
server
attributes
Server
Location
/
SID
Mgt
OrangeFS
3.0
37. Policy
Based
Location
• User
defined
attributes
for
servers
and
clients
• Stored
in
SID
cache
• Policy
is
used
for
data
location,
replication
location
and
multi-‐tenant
support
• Completely
Flexible
• Rack
• Row
• App
• Region
OrangeFS
3.0
38. • Modular
infrastructure
to
easily
build
background
parallel
processes
for
the
file
system
Used
for:
•
Gathering
Stats
for
Monitoring
• Usage
Calculation
(can
be
leveraged
for
Directory
Space
Restrictions,
chargebacks)
• Background
safe
FSCK
processing
(can
mark
bad
items
in
MD)
• Background
Checksum
comparisons
• Etc…
Background
Parallel
Processing
Infrastructure
(3.0)
40. Data
Migration
/
Mgt
• Built
on
Redundancy
&
DBG
processes
• Migrate
objects
between
servers
• De-‐populate
a
server
going
out
of
service
• Populate
a
newly
activated
server
(HW
lifecycle)
• Moving
computation
to
data
• Hierarchical
storage
• Use
existing
metadata
services
• Possible
-‐
Directory
Hierarchy
Cloning
•
Copy
on
Write
(Dev,
QA,
Prod
environments
with
high
%
data
overlap)
OrangeFS
3.x
42. Attribute
Based
Metadata
Search
• Client
tags
files
with
Keys/Values
•
Keys/Values
indexed
on
Metadata
Servers
•
Clients
query
for
files
based
on
Keys/Values
•
Returns
file
handles
with
options
for
filename
and
path
Key/Value
Parallel
Query
Data
Data
File
Access
OrangeFS
3.x
44. Extend
Capability
based
security
• Enable
certificate
level
access
(in
process)
• Federated
access
capable
• Can
be
integrated
with
rules
based
access
control
• Department
x
in
company
y
can
share
with
Department
q
in
company
z
• rules
and
roles
establish
the
relationship
• Each
company
manages
their
own
control
of
who
is
in
the
company
and
in
department
45. SDN
-‐
OpenFlow
• Working
with
OF
research
team
at
CU
• OF
separates
the
control
plane
from
delivery,
gives
ability
to
control
network
with
SW
• Looking
at
bandwidth
optimization
leveraging
OF
and
OrangeFS
46. ParalleX
ParalleX
is
a
new
parallel
execution
model
• Key
components
are:
• Asynchronous
Global
Address
Space
(AGAS)
• Threads
• Parcels
(message
driven
instead
of
message
passing)
• Locality
• Percolation
• Synchronization
primitives
• High
Performance
ParalleX
(HPX)
library
implementation
written
in
C++
47. PXFS
• Parallel
I/O
for
ParalleX
based
on
PVFS
• Common
themes
with
OrangeFS
Next
• Primary
objective:
unification
of
ParalleX
and
storage
name
spaces.
• Integration
of
AGAS
and
storage
metadata
subsystems
• Persistent
object
model
• Extends
ParalleX
with
a
number
of
IO
concepts
• Replication
• Metadata
• Extending
IO
with
ParalleX
concepts
• Moving
work
to
data
• Local
synchronization
• Effort
with
LSU,
Clemson,
and
Indiana
U.
• Walt
Ligon,
Thomas
Sterling
49. Johns
Hopkins
OrangeFS
Selection
• JHU
-‐
HLTCOE
Selected
OrangeFS
• After
evaluating:
Ceph,
GlusterFS,
Lustre
and
OrangeFS
“Leveraging
OrangeFS
for
the
parallel
filesystem,
the
system
as
a
whole
is
capable
of
delivering
30GB/s
write,
46GB/s
read,
and
be-‐
tween
37,260-‐237,180
IOPS
of
performance.
The
variation
in
IOPS
performance
is
dependent
on
the
file
size
and
number
of
bytes
written
per
commit
as
documented
in
the
Test
Results
section.”*
*
http://hltcoe.jhu.edu/uploads/publications/papers/14662_slides.pdf
“The
final
system
design
rep-‐
resents
a
2,775%
increase
in
read
performance
and
a
1,763-‐11,759%
increase
in
IOPS”*
50. Learning
More
• www.orangefs.org
web
site
• Releases
• Documentation
• Wiki
• pvfs2-‐users@beowulf-‐underground.org
• Support
for
users
• pvfs2-‐developers@beowulf-‐underground.org
• Support
for
developers
51. Support
&
Development
Services
• www.orangefs.com
&
www.omnibond.com
• Professional
Support
&
Development
team
• Buy
into
the
project
52.
Intelligent
Transportation
Solutions
Identity
Manager
Drivers
&
Sentinel
Connectors
Parallel
Scale-‐Out
Storage
Software
Social
Media
Interaction
System
Omnibond
Info
Computer
Vision
Enterprise
Personal
Solution
Areas