SlideShare a Scribd company logo
Fabrizio Manfred Furuholmen!
Use Distributed File system as a Storage Tier !
6/23/10!
2!
Agenda
  Introduction
  Next Generation Data Center
  Distributed File system
  Distributed File system
  OpenAFS
  GlusterFS
  HDFS
  Ceph
  Case Studies
  Conclusion
6/23/10!
3!
Class Exam
  What do you know about DFS ?
  How can you create a Petabyte
storage ?
  How can you make a centralized
system log ?
  How can you allocate space for your
user or system, when you have a
thousands of users/systems ?
  How can you retrieve data from
everywhere ?
6/23/10!
4!
Introduction
Next Generation Data Center: the “FABRIC”
Key categories:
  Continuous data protection and disaster
recovery
  File and block data migration across
heterogeneous environments
  Server and storage virtualization
  Encryption for data in-flight and at-rest
In other words: Cloud data center
6/23/10!
5!
Introduction
Storage Tier in the “FABRIC”
  High Performance
  Scalability
  Simplified Management
  Security
  High Availability
Solutions
  Storage Area Network
  Network Attached Storage
  Distributed file system
6/23/10!
6!
Introduction
What is a Distributed File system ?
“A distributed file system takes advantage of the
interconnected nature of the network by storing
files on more than one computer in the network
and making them accessible to all of them..”
What do you expected from a distributed file system ?
•  Uniform Access: file names global support
•  Security: to provide a global authentication/authorization
•  Reliability: the elimination of each single point of failure
•  Availability: administrators perform routine maintenance while the file
server is in operation, without disrupting the user’s routines
•  Scalability: Handle terabytes of data
•  Standard conformance: some IEEE POSIX file system semantics standard
•  Performance: high performance
Introduction
7!
Part II
Implementations
8!
How many DFS do you know ?
6/23/10!
9!
OpenAFS: introduction
Key ideas:
  Make clients do work whenever possible.
  Cache whenever possible.
  Exploit file usage properties. Understand them. One-third of Unix
files are temporary.
  Minimize system-wide knowledge and change. Do not hardwire
locations.
  Trust the fewest possible entities. Do not trust workstations.
  Batch if possible to group operations.
is the open source implementation of
Andrew File system of IBM
6/23/10!
10!
OpenAFS: design
• Cell is collection of file servers and
workstation
• The directories under /afs are cells,
unique tree
• Fileserver contains volumes
Cell
• Volumes are "containers" or sets of
related files and directories
• Have size limit
• 3 type rw, ro, backup
Volumes
• Access to a volume is provided through
a mount point
• A mount point is just like a static
directory
Mount Point Directory
OpenAFS: components
Server A
Server A+B
Server C
11!
OpenAFS: performances
OpenAFS OpenAFS OSD 2 Servers
6/23/10!
13!
OpenAFS: features
  Uniform name space: same path on all
workstations
  Security: base to krb4/krb5, extended ACL,
traffic encryption
  Reliability: read-only replication, HA
database, read/write replica in OSD version
  Availability: maintenance tasks without
stopping the service
  Scalability: server aggregation
  Administration: administration delegation
  Performance: client side disk base persistent
cache, big rate client per Server
•  Internal usage
•  Storage: 450 TB (ro)+ 15 TB (rw)
•  Client: 22.000
Morgan Stanley IT
•  Online picture album
•  Storage: 265TB ( planned growth to 425TB in twelve months)
•  Volumes: 800,000.
•  Files: 200 000 000.
Pictage, Inc
• Internet Shared folder
• Storage: 500TB
• Server: 200 Storage server
• 300 App server
Embian
• Internal usage 210TB
RZH
openAFS: who uses it ?
14!
Good
•  Wide Area Network
•  Heterogeneous System
•  Read operation > write operation
•  Large number of clients/systems
•  Usage directly by end-users
•  Federation
Bad
• Locking
• Database
• Unicode
• Large File
• Some limitations on ..
OpenAFS: good for ...
15!
6/23/10!
16!
GlusterFS
“Gluster can manage data in a
single global namespace on
commodity hardware..”
Keys:
  Lower Storage Cost—Open source software runs on commodity
hardware
  Scalability—Linearly scales to hundreds of Petabytes
  Performance—No metadata server means no bottlenecks
  High Availability—Data mirroring and real time self-healing
  Virtual Storage for Virtual Servers—Simplifies storage and keeps VMs
always-on
  Simplicity—Complete web based management suite
6/23/10!
17!
GlusterFS: design
6/23/10!
18!
GlusterFS: components
• Volume is the basic element for data
export
• The volumes can be stacked for
extension
Volume
• Specific options (features) can be
enabled for each volume (cache, pre
fetch, etc.)
• Simple creation for custom extensions
with api interface
Capabilities
• Access to a volume is provided through
services like tcp, unix socket,
infiniband
Services
volume posix1!
type storage/posix!
option directory /home/export1!
end-volume!
volume brick1!
type features/posix-locks!
option mandatory!
subvolumes posix1!
end-volume!
volume server!
type protocol/server!
option transport-type tcp !
option transport.socket.listen-port 6996!
subvolumes brick1!
option auth.addr.brick1.allow * !
end-volume!
6/23/10!
19!
Gluster: components
6/23/10!
20!
Gluster: performance
6/23/10!
21!
Gluster: carateristics
  Uniform name space: same path on all
workstation
  Reliability: read-1 replication, asynchronous
replication for disaster recovery
  Availability: No system downtime for
maintenance (better in the next release)
  Scalability: Truly linear scalability
  Administration: Self Healing, Centralized logging
and reporting, Appliance version
  Performance: Stripe files across dozens of
storage blocks, Automatic load balancing, per
volume i/o tuning
Gluster: who uses it ?
  Avail TVN (USA)
400TB for Video on demand, video
storage
  Fido Film (Sweden)
visual FX and Animation studio
  University of Minnesota (USA)
142TB Supercomputing
  Partners Healthcare (USA)
336TB Integrated health system
  Origo (Switzerland)
open source software development
and collaboration platform
22!
Good
• Large amount of data
• Access with different protocols
• Directly access from applications
(api layer)
• Disaster recover (better in the
next release)
• SAN replacement, vm storage
Bad
• User-space
• Low granularity in security setting
• High volumes of operations on
same file
Gluster: good for ...
23!
6/23/10!
24!
Implementations
Implementations
Old way
  Metadata and data in the same place
  Single stream per file
New way
  Multiple streams are parallel channels
through which data can flow
  Files are striped across a set of nodes in
order to facilitate parallel access
  OSD Separation of file metadata
management (MDS) from the storage of
file data
6/23/10!
25!
HDFS: Hadoop
HDFS is part of the Apache
Hadoop project which
develops open-source software
for reliable, scalable,
distributed computing.
Hadoop was inspired by Google’s
MapReduce and Google File
system
6/23/10!
26!
HDFS: Google File System
“ Design of a file systems for a different environment
where assumptions of a general purpose file system
do not hold—interesting to see how new assumptions
lead to a different type of system…”
Key ideas:
  Component failures are the norm.
  Huge files (not just the occasional file)
  Append rather than overwrite is typical
  Co-design of application and file system API—specialization.
For example can have relaxed consistency.
Map!
•  Split and mapped in key-
value pairs!
Combine!
•  For efficiency reasons, the
combiner works directly to map
operation outputs .!
Reduce!
•  The files are then merged,
sorted and reduced!
“Moving Computation is Cheaper than Moving Data”
HDFS: MapReduce
27!
Goals!
Scalable: can reliably store and
process petabytes.!
Economical: It distributes the data and
processing across clusters of
commonly available computers. !
Efficient: can process data in parallel
on the nodes where the data is
located.!
Reliable: automatically maintains
multiple copies of data and
automatically redeploys computing
tasks based on failures.!
HDFS: goals
28!
HDFS: design
29!
•  An HDFS cluster consists of a single
NameNode
•  It is a master server that manages
the file system namespace and
regulates access to files by clients.
Namenode
•  Datanode manage storage attached
to the system it run on
•  Applay the map rule of MapReduce
Datanodes
•  File is split into one or more blocks
and these blocks are stored in a set
of DataNodes
Blocks
HDFS: components
30!
6/23/10!
31!
HDFS: features
  Uniform name space: same path on all
workstations
  Reliability: rw replication, re-balancing, copy
in different locations
  Availability: hot deploy
  Scalability: server aggregation
  Administration: HOD
  Performance: “grid” computation, parallel
transfer
HDFS: who uses it ?
Major players
Yahoo!
A9.com
AOL
Booz Allen Hamilton
EHarmony
Facebook
Freebase
Fox Interactive Media
IBM
ImageShack
ISI
Joost
Last.fm
LinkedIn
Metaweb
Meebo
Ning
Powerset (now part of Microsoft)
Proteus Technologies
The New York Times
Rackspace
Veoh
Twitter
…
32!
Good
• Task distribution (Basic GRID
infrastructure)
• Distribution of content (High
throughput of data access )
• Archiving
• Etherogenous envirorment
Bad
• Not General purpose File system
• Not Posix Compliant
• Low granularity in security setting
• Java
HDFS: good for ...
33!
Ceph
“Ceph is designed to handle workloads
in which tens thousands of clients or
more simultaneously access the same
file or write to the same directory–
usage scenarios that bring typical
enterprise storage systems to their
knees.”
Keys:
  Seamless scaling — The file system can be seamlessly expanded by simply
adding storage nodes (OSDs). However, unlike most existing file systems, Ceph
proactively migrates data onto new devices in order to maintain a balanced
distribution of data.
  Strong reliability and fast recovery — All data is replicated across multiple
OSDs. If any OSD fails, data is automatically re-replicated to other devices.
  Adaptive MDS — The Ceph metadata server (MDS) is designed to dynamically
adapt its behavior to the current workload.
34!
• Client
• Metadata
Cluster
• Object
Storage
Cluster
OSD
Ceph: design
35!
• Metadata Storage
• Dynamic Subtree Partitioning
• Traffic Control
Dynamic Distributed Metadata
• Data Distribution
• Replication
• Data Safety
• Failure Detection
• Recovery and Cluster Updates
Reliable Autonomic Distributed Object
Storage
Ceph: features
36!
Pseudo-random data distribution function (CRUSH)!
Reliable object storage service (RADOS)!
Extent B-tree object File System (today btrfs) !
37!
Ceph: features
Splay Replication
•  Only after it has been safely committed to disk is a final commit
notification sent to the client.
Ceph: features
38!
Good
• Scientific application, High
throughput of data access
• Heavy Read / Write operations
• It is the most advance distributed
file system
Bad
• Young (Linux 2.6.34)
• Linux only
• Complex
Ceph: good for …
39!
Lustre
Cloudstore
(kosmos)!
XtreemFS! Tahoe-LAFS!
PNFS!
PVFS! MooseFS!
…!
Search
Wikipedia..!
Others
40!
Part III
Case Studies
41!
6/23/10!
42!
Class Exam
  What can DFS do for you ?
  How can you create a Petabyte
storage ?
  How can you make a centralized
system log ?
  How can you allocate space for your
user or system, when you have a
thousands of users/systems ?
  How can you retrieve data from
everywhere ?
• Share Documents across a wide
network area
• Share home folder across different
Terminal servers
Problem
• OpenAFS
• Samba
Solution
• Single ID, Kerberos/ldap
• Single file system
Results
• 800 users
• 15 branch offices
• File sharing /home dir
Usage
File sharing
43!
• Big Storage on a little budget
Problem
• Gluster
Solution
• High Availability data storage
• Low price
Results
• 100 TB image archive
• Multimedia content for web site
Usage
Web Service
44!
• Data from everywhere
• Disaster Recover
Problems
• myS3
• Hadoop / OpenAFS
Solution
• High Availability
• Access through HTTP protocol (REST
Interface)
• Disaster Recovery
Results
• Users backup
• Application backend
• 200 Users
• 6 TB
Usage
Internet Disk: myS3
45!
• Log concentrator
Problem
• Hadoop cluster
• Syslog-NG
Solution
•  High availability
• Fast search
• “Storage without limits”
Results
• Security audit and access control
Usage
Log concentrator
46!
• Low cost VM storage
• VM self provisioning
Problems
• GlusterFS
• openAFS
• Custom provisioning
Solution
• Auto provisioning
• Low cost
• Flexible solution
Rresults
• Development env
• Production env
Usage
Private cloud
6/23/10!
48!
Conclusion: problems
  Failure
For 10 PB of storage, you will have an
average of 22 consumer-grade SATA drives
failing per day.
  Read/write time
Each of the 2TB drives takes approximately
best case 24,390 seconds to be read and
written over the network.
  Data Replication
Data replication is the number of the disk
drives, plus difference.
Do you have enough bandwidth ? !
Environment Analysis!
•  No true Generic DFS!
•  Not simple move 800TB btw different solutions!
Dimension !
•  Start with the right size!
•  Servers number is related to speed needed and number of clients !
•  Network for Replication !
Divide system in Class of Service!
•  Different disk Type !
•  Different Computer Type!
System Management!
•  Monitoring Tools!
•  System/Software Deploy Tools!
Conclusion
49!
6/23/10!
50!
Conclusion: next step
OpenAFS!
•  www.openafs.org!
•  www.beolink.org!
Gluster!
•  www.gluster.org!
Hadoop!
•  Hadoop.apache.org!
•  Isabel Drost!
Ceph!
•  ceph.newdream.n
et!
•  Publication!
•  Mailing list!
Links
51!
I look forward to meeting you…
XVII European AFS meeting 2010
PILSEN - CZECH REPUBLIC
September 13-15
Who should attend:
  Everyone interested in deploying a globally accessible
file system
  Everyone interested in learning more about real
world usage of Kerberos authentication in single
realm and federated single sign-on environments
  Everyone who wants to share their knowledge and
experience with other members of the AFS and
Kerberos communities
  Everyone who wants to find out the latest
developments affecting AFS and Kerberos
More Info: http://afs2010.civ.zcu.cz/
6/23/10!
52!
Thank you!
manfred@zeropiu.com

More Related Content

What's hot

Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
gmateesc
 
Dfs in iaa_s
Dfs in iaa_sDfs in iaa_s
Dfs in iaa_s
Chih-Chieh Huang
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
databloginfo
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
Zoltan C. Toth
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
Plamen Jeliazkov
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 
Hdfs
HdfsHdfs
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon Nexus, Inc.
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
Brett Keim
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Unit 3 file management
Unit 3 file managementUnit 3 file management
Unit 3 file management
Kalai Selvi
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Anand Kulkarni
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Vaibhav Jain
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
John Mark Walker
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
GlusterFS
 

What's hot (19)

Storage solutions for High Performance Computing
Storage solutions for High Performance ComputingStorage solutions for High Performance Computing
Storage solutions for High Performance Computing
 
Dfs in iaa_s
Dfs in iaa_sDfs in iaa_s
Dfs in iaa_s
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hdfs
HdfsHdfs
Hdfs
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Unit 3 file management
Unit 3 file managementUnit 3 file management
Unit 3 file management
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 

Similar to OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred

PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
RaheemUnnisa1
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Amazon Web Services
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
AmirReza Mohammadi
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
SatyaHadoop
 
Dfs
DfsDfs
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
Sri Prasanna
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
DrPDShebaKeziaMalarc
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
JasmineMichael1
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
Sandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
Trishali Nayar
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
Moeez Ahmad
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
Sunil D Patil
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
KavyaGo
 

Similar to OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred (20)

PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Dfs
DfsDfs
Dfs
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Distributed file systems
Distributed file systemsDistributed file systems
Distributed file systems
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 

Recently uploaded

Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
narinav14
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
KrishnaveniMohan1
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
OnePlan Solutions
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
aeeva
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
WebConnect Pvt Ltd
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 

Recently uploaded (20)

Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
bgiolcb
bgiolcbbgiolcb
bgiolcb
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.Penify - Let AI do the Documentation, you write the Code.
Penify - Let AI do the Documentation, you write the Code.
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
 
TMU毕业证书精仿办理
TMU毕业证书精仿办理TMU毕业证书精仿办理
TMU毕业证书精仿办理
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 

OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred

  • 1. Fabrizio Manfred Furuholmen! Use Distributed File system as a Storage Tier !
  • 2. 6/23/10! 2! Agenda   Introduction   Next Generation Data Center   Distributed File system   Distributed File system   OpenAFS   GlusterFS   HDFS   Ceph   Case Studies   Conclusion
  • 3. 6/23/10! 3! Class Exam   What do you know about DFS ?   How can you create a Petabyte storage ?   How can you make a centralized system log ?   How can you allocate space for your user or system, when you have a thousands of users/systems ?   How can you retrieve data from everywhere ?
  • 4. 6/23/10! 4! Introduction Next Generation Data Center: the “FABRIC” Key categories:   Continuous data protection and disaster recovery   File and block data migration across heterogeneous environments   Server and storage virtualization   Encryption for data in-flight and at-rest In other words: Cloud data center
  • 5. 6/23/10! 5! Introduction Storage Tier in the “FABRIC”   High Performance   Scalability   Simplified Management   Security   High Availability Solutions   Storage Area Network   Network Attached Storage   Distributed file system
  • 6. 6/23/10! 6! Introduction What is a Distributed File system ? “A distributed file system takes advantage of the interconnected nature of the network by storing files on more than one computer in the network and making them accessible to all of them..”
  • 7. What do you expected from a distributed file system ? •  Uniform Access: file names global support •  Security: to provide a global authentication/authorization •  Reliability: the elimination of each single point of failure •  Availability: administrators perform routine maintenance while the file server is in operation, without disrupting the user’s routines •  Scalability: Handle terabytes of data •  Standard conformance: some IEEE POSIX file system semantics standard •  Performance: high performance Introduction 7!
  • 9. 6/23/10! 9! OpenAFS: introduction Key ideas:   Make clients do work whenever possible.   Cache whenever possible.   Exploit file usage properties. Understand them. One-third of Unix files are temporary.   Minimize system-wide knowledge and change. Do not hardwire locations.   Trust the fewest possible entities. Do not trust workstations.   Batch if possible to group operations. is the open source implementation of Andrew File system of IBM
  • 11. • Cell is collection of file servers and workstation • The directories under /afs are cells, unique tree • Fileserver contains volumes Cell • Volumes are "containers" or sets of related files and directories • Have size limit • 3 type rw, ro, backup Volumes • Access to a volume is provided through a mount point • A mount point is just like a static directory Mount Point Directory OpenAFS: components Server A Server A+B Server C 11!
  • 13. 6/23/10! 13! OpenAFS: features   Uniform name space: same path on all workstations   Security: base to krb4/krb5, extended ACL, traffic encryption   Reliability: read-only replication, HA database, read/write replica in OSD version   Availability: maintenance tasks without stopping the service   Scalability: server aggregation   Administration: administration delegation   Performance: client side disk base persistent cache, big rate client per Server
  • 14. •  Internal usage •  Storage: 450 TB (ro)+ 15 TB (rw) •  Client: 22.000 Morgan Stanley IT •  Online picture album •  Storage: 265TB ( planned growth to 425TB in twelve months) •  Volumes: 800,000. •  Files: 200 000 000. Pictage, Inc • Internet Shared folder • Storage: 500TB • Server: 200 Storage server • 300 App server Embian • Internal usage 210TB RZH openAFS: who uses it ? 14!
  • 15. Good •  Wide Area Network •  Heterogeneous System •  Read operation > write operation •  Large number of clients/systems •  Usage directly by end-users •  Federation Bad • Locking • Database • Unicode • Large File • Some limitations on .. OpenAFS: good for ... 15!
  • 16. 6/23/10! 16! GlusterFS “Gluster can manage data in a single global namespace on commodity hardware..” Keys:   Lower Storage Cost—Open source software runs on commodity hardware   Scalability—Linearly scales to hundreds of Petabytes   Performance—No metadata server means no bottlenecks   High Availability—Data mirroring and real time self-healing   Virtual Storage for Virtual Servers—Simplifies storage and keeps VMs always-on   Simplicity—Complete web based management suite
  • 18. 6/23/10! 18! GlusterFS: components • Volume is the basic element for data export • The volumes can be stacked for extension Volume • Specific options (features) can be enabled for each volume (cache, pre fetch, etc.) • Simple creation for custom extensions with api interface Capabilities • Access to a volume is provided through services like tcp, unix socket, infiniband Services volume posix1! type storage/posix! option directory /home/export1! end-volume! volume brick1! type features/posix-locks! option mandatory! subvolumes posix1! end-volume! volume server! type protocol/server! option transport-type tcp ! option transport.socket.listen-port 6996! subvolumes brick1! option auth.addr.brick1.allow * ! end-volume!
  • 21. 6/23/10! 21! Gluster: carateristics   Uniform name space: same path on all workstation   Reliability: read-1 replication, asynchronous replication for disaster recovery   Availability: No system downtime for maintenance (better in the next release)   Scalability: Truly linear scalability   Administration: Self Healing, Centralized logging and reporting, Appliance version   Performance: Stripe files across dozens of storage blocks, Automatic load balancing, per volume i/o tuning
  • 22. Gluster: who uses it ?   Avail TVN (USA) 400TB for Video on demand, video storage   Fido Film (Sweden) visual FX and Animation studio   University of Minnesota (USA) 142TB Supercomputing   Partners Healthcare (USA) 336TB Integrated health system   Origo (Switzerland) open source software development and collaboration platform 22!
  • 23. Good • Large amount of data • Access with different protocols • Directly access from applications (api layer) • Disaster recover (better in the next release) • SAN replacement, vm storage Bad • User-space • Low granularity in security setting • High volumes of operations on same file Gluster: good for ... 23!
  • 24. 6/23/10! 24! Implementations Implementations Old way   Metadata and data in the same place   Single stream per file New way   Multiple streams are parallel channels through which data can flow   Files are striped across a set of nodes in order to facilitate parallel access   OSD Separation of file metadata management (MDS) from the storage of file data
  • 25. 6/23/10! 25! HDFS: Hadoop HDFS is part of the Apache Hadoop project which develops open-source software for reliable, scalable, distributed computing. Hadoop was inspired by Google’s MapReduce and Google File system
  • 26. 6/23/10! 26! HDFS: Google File System “ Design of a file systems for a different environment where assumptions of a general purpose file system do not hold—interesting to see how new assumptions lead to a different type of system…” Key ideas:   Component failures are the norm.   Huge files (not just the occasional file)   Append rather than overwrite is typical   Co-design of application and file system API—specialization. For example can have relaxed consistency.
  • 27. Map! •  Split and mapped in key- value pairs! Combine! •  For efficiency reasons, the combiner works directly to map operation outputs .! Reduce! •  The files are then merged, sorted and reduced! “Moving Computation is Cheaper than Moving Data” HDFS: MapReduce 27!
  • 28. Goals! Scalable: can reliably store and process petabytes.! Economical: It distributes the data and processing across clusters of commonly available computers. ! Efficient: can process data in parallel on the nodes where the data is located.! Reliable: automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.! HDFS: goals 28!
  • 30. •  An HDFS cluster consists of a single NameNode •  It is a master server that manages the file system namespace and regulates access to files by clients. Namenode •  Datanode manage storage attached to the system it run on •  Applay the map rule of MapReduce Datanodes •  File is split into one or more blocks and these blocks are stored in a set of DataNodes Blocks HDFS: components 30!
  • 31. 6/23/10! 31! HDFS: features   Uniform name space: same path on all workstations   Reliability: rw replication, re-balancing, copy in different locations   Availability: hot deploy   Scalability: server aggregation   Administration: HOD   Performance: “grid” computation, parallel transfer
  • 32. HDFS: who uses it ? Major players Yahoo! A9.com AOL Booz Allen Hamilton EHarmony Facebook Freebase Fox Interactive Media IBM ImageShack ISI Joost Last.fm LinkedIn Metaweb Meebo Ning Powerset (now part of Microsoft) Proteus Technologies The New York Times Rackspace Veoh Twitter … 32!
  • 33. Good • Task distribution (Basic GRID infrastructure) • Distribution of content (High throughput of data access ) • Archiving • Etherogenous envirorment Bad • Not General purpose File system • Not Posix Compliant • Low granularity in security setting • Java HDFS: good for ... 33!
  • 34. Ceph “Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file or write to the same directory– usage scenarios that bring typical enterprise storage systems to their knees.” Keys:   Seamless scaling — The file system can be seamlessly expanded by simply adding storage nodes (OSDs). However, unlike most existing file systems, Ceph proactively migrates data onto new devices in order to maintain a balanced distribution of data.   Strong reliability and fast recovery — All data is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices.   Adaptive MDS — The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload. 34!
  • 36. • Metadata Storage • Dynamic Subtree Partitioning • Traffic Control Dynamic Distributed Metadata • Data Distribution • Replication • Data Safety • Failure Detection • Recovery and Cluster Updates Reliable Autonomic Distributed Object Storage Ceph: features 36!
  • 37. Pseudo-random data distribution function (CRUSH)! Reliable object storage service (RADOS)! Extent B-tree object File System (today btrfs) ! 37! Ceph: features
  • 38. Splay Replication •  Only after it has been safely committed to disk is a final commit notification sent to the client. Ceph: features 38!
  • 39. Good • Scientific application, High throughput of data access • Heavy Read / Write operations • It is the most advance distributed file system Bad • Young (Linux 2.6.34) • Linux only • Complex Ceph: good for … 39!
  • 42. 6/23/10! 42! Class Exam   What can DFS do for you ?   How can you create a Petabyte storage ?   How can you make a centralized system log ?   How can you allocate space for your user or system, when you have a thousands of users/systems ?   How can you retrieve data from everywhere ?
  • 43. • Share Documents across a wide network area • Share home folder across different Terminal servers Problem • OpenAFS • Samba Solution • Single ID, Kerberos/ldap • Single file system Results • 800 users • 15 branch offices • File sharing /home dir Usage File sharing 43!
  • 44. • Big Storage on a little budget Problem • Gluster Solution • High Availability data storage • Low price Results • 100 TB image archive • Multimedia content for web site Usage Web Service 44!
  • 45. • Data from everywhere • Disaster Recover Problems • myS3 • Hadoop / OpenAFS Solution • High Availability • Access through HTTP protocol (REST Interface) • Disaster Recovery Results • Users backup • Application backend • 200 Users • 6 TB Usage Internet Disk: myS3 45!
  • 46. • Log concentrator Problem • Hadoop cluster • Syslog-NG Solution •  High availability • Fast search • “Storage without limits” Results • Security audit and access control Usage Log concentrator 46!
  • 47. • Low cost VM storage • VM self provisioning Problems • GlusterFS • openAFS • Custom provisioning Solution • Auto provisioning • Low cost • Flexible solution Rresults • Development env • Production env Usage Private cloud
  • 48. 6/23/10! 48! Conclusion: problems   Failure For 10 PB of storage, you will have an average of 22 consumer-grade SATA drives failing per day.   Read/write time Each of the 2TB drives takes approximately best case 24,390 seconds to be read and written over the network.   Data Replication Data replication is the number of the disk drives, plus difference. Do you have enough bandwidth ? !
  • 49. Environment Analysis! •  No true Generic DFS! •  Not simple move 800TB btw different solutions! Dimension ! •  Start with the right size! •  Servers number is related to speed needed and number of clients ! •  Network for Replication ! Divide system in Class of Service! •  Different disk Type ! •  Different Computer Type! System Management! •  Monitoring Tools! •  System/Software Deploy Tools! Conclusion 49!
  • 51. OpenAFS! •  www.openafs.org! •  www.beolink.org! Gluster! •  www.gluster.org! Hadoop! •  Hadoop.apache.org! •  Isabel Drost! Ceph! •  ceph.newdream.n et! •  Publication! •  Mailing list! Links 51!
  • 52. I look forward to meeting you… XVII European AFS meeting 2010 PILSEN - CZECH REPUBLIC September 13-15 Who should attend:   Everyone interested in deploying a globally accessible file system   Everyone interested in learning more about real world usage of Kerberos authentication in single realm and federated single sign-on environments   Everyone who wants to share their knowledge and experience with other members of the AFS and Kerberos communities   Everyone who wants to find out the latest developments affecting AFS and Kerberos More Info: http://afs2010.civ.zcu.cz/ 6/23/10! 52!