• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SDEC2011 Glory-FS development & Experiences
 

SDEC2011 Glory-FS development & Experiences

on

  • 2,818 views

http://sdec.kr/

http://sdec.kr/

Statistics

Views

Total Views
2,818
Views on SlideShare
2,425
Embed Views
393

Actions

Likes
2
Downloads
155
Comments
0

3 Embeds 393

http://www.scoop.it 391
http://paper.li 1
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SDEC2011 Glory-FS development & Experiences SDEC2011 Glory-FS development & Experiences Presentation Transcript

    • ETRI Proprietary 1 저장시스템연구팀
    • Development & Experiences -  GLORY: Global Resource Management System For Future Internet Services 2011.6.28 진기성(ksjin@etri.re.kr)ETRI Proprietary 2 저장시스템연구팀
    • 목차 개요 구조 주요 기능 성능 현황ETRI Proprietary 3 저장시스템연구팀
    • ETRI Proprietary 4 저장시스템연구팀
    • 개요 What’s GLORY-FS? 저비용 스토리지 서버를 이용하여 대용량 파일 저장 공간을 제공하는 클러스터 파일 시스템 S/W  대용량 데이터: 컴퓨터 모델링, 3-D모델링, CAD/CAM, PDF, 위성사진, 음악/오디오, 비디오/그래픽(UCC), CDN, Archive  데이터 특성: throughput 중심 처리, 연속적 read/write Research Goal 수천에서 수만 대의 보급형 서버들을 스토리지 서버로 이용하여 - 스토리지 TCO를 최소화 - 높은 성능 - 장애에 대한 효율적인 통제 능력 을 갖는 대규모 인터넷 서비스를 위한 스토리지 시스템 S/W  수천~수만대 스토리지 서버: more than Petabytes  TCO 최소화: autonomous storage management  높은 성능: linear scalable IO performance  장애 통제: self-detection, self-healing, etc.ETRI Proprietary 5 저장시스템연구팀
    • 개요  Target 100PB High Performance/ 10PB Parallel File System for Super Computing Cloud Storage 1000TB Google FS, HadoopDFS Lustre, Panasas, Clustered NAS GLORY-FS for Enterprise, Web2.0,… 100TB Capacity Isilon, IBIRX SAN File System 10TB Network Attached Storage (GPFS, StorageTank, ZFS) for SMB business CIFS, NFS 5TB 로컬 파일시스템 500GB NTFS, ext3 60MBps 2Gbps 10Gbps 100Gbps 1000Gbps I/O performanceETRI Proprietary 6 저장시스템연구팀
    • 개요 GLORY 스토리지 솔루션 TM GLORY-FS 모든 클래스 GLORY 클러스터파일시스템  (pc/entry-level/mid-range/high-end)  스토리지 S/W 하드웨어 시스템 • 하나의 거대 가상 드라이브 생성 • 비 대칭형 클러스터를 통한 최고 성능 • 서비스 중단 없는 손쉬운 확장 • 저가 하드웨어 및 자율 관리를 통한 스토리지 비용 최소화 • 통합 웹 관리 도구ETRI Proprietary 7 저장시스템연구팀
    • ETRI Proprietary 8 저장시스템연구팀
    • 구조 - 구성 요소 GLORY-FS 메타데이터 서버 •Data Server 관리/모니터링 •파일 시스템 메타데이터 관리 •디렉토리 트리 •Inodes •Chunk Locations •MySQL에 메타데이터 저장 •최소 1대, 가용성을 위해서 2대, 고성능을 위해서는 2대 이상 GLORY-FS 데이터 서버 •파일 데이터 (variable sized chunk) •Ext3, xfs 등에 파일 데이터 저장 •최소 1대, 가용성 및 용량을 위해서는 2대 이상 GLORY-FS 클라이언트 •Linux POSIX API 호환 마운트 포인트 제공 •Windows FS API 호환 네트워크 드라이브 제공ETRI Proprietary 9 저장시스템연구팀
    • 구조 - 동작 흐름 Volume / home share Metadata big.avi Data Data GLORY-FS GLORY-FS Client SW Metadata Server SW (mount –t gfs mds:/vol /mnt) (gfs_mds start) 1/10Gbps Ethernet Switch 1TB, 80MBps 10TB, 2Gbps PC용 HDD X86 기반 저가형 서버 박스 GLORY-FS (7200rpm SATA HDD) Data Server SW (gfs_ds start)ETRI Proprietary 10 저장시스템연구팀
    • 구조 – 프로세스 구조 All Usermode Architecture •Metadata Server & Data Server process is a user mode daemon •No kernel dependency •Binary level distribution •No kernel panic Linux Client requires fuse kernel support •Fuse is a user level file system SDK for Linux (http://fuse.sourceforge.net) •Fuse is supported by nearly all Linux distribution Linux 2.4.21 or later, Linux 2.6.x, FreeBSD, NetBSD, Mac OS X, OpenSolaris, GNU/Hurd Windows Client requires Callback File System •Callback File System is a user level file system SDK for Windows (http://www.eldos.com) •GLORY-FS Windows Client are distributed with free CBFS binary licenseETRI Proprietary 11 저장시스템연구팀
    • ETRI Proprietary 12 저장시스템연구팀
    • 온라인 용량/성능 확장 GLORY-FS consists of at least 2 data servers (for reliability issues) Capacity/Performance/Reliability increases as data servers are added 7Gbps 70TB 6Gbps 60TB 5Gbps 50TB 4Gbps 40TB 3Gbps 30TB Clients 2Gbps 20TB Performance Capacity StorageServer Each Storage Server has 10TB Storage and Gigabit EthernetGLORY-FS 2008 LG CNS Meeting (20081008)ETRI Proprietary 13 저장시스템연구팀
    • 복제 기반 자동 복구 Each file is sliced into pieces, called CHUNK, and stored across multiple data servers Once CHUNKs are stored, makes REPLICA chunks to different data servers When data server failure occurs, RECOVERS lost chunks from their replicas All REPLICAs are used for file Read Access (Read load balance) F I L E F I L E I Data Server Data Server Data Server Data Server Data ServerETRI Proprietary 14 저장시스템연구팀
    • 복제 전용 네트워크 지원 Dedicated Replication Network • seperate replication network from service network • guarantee stable service I/O quality Data Servers Filesystem Client Gigabit Switch 1/10 Gigabit Switch Service I/O Traffic Data Replication TrafficETRI Proprietary 15 저장시스템연구팀
    • 볼륨/디렉토리/파일별 복제수 지정 지원 Configure Directory Level Replication • set replication factor to each directory • critical & recent directory to higher value • old directory to lower value default volume replica = 3 / 2009 2010 2011 old directorys replica can be set 1 or 2 01 02 ** 12 01 02 ** 12 01 02 get more usable storage space without data server expansion !!ETRI Proprietary 16 저장시스템연구팀
    • 자동 복구 고급 기능 Pioritized Recovery • Ex) If a chunk has only 1 replica, it is re-replicated first that a chunk with 2 replica Physical Disk Relocation & IP transparency • MDS assigns a UUID to each disk • Supports disk relocation to different DS (upon node failure) • This will preserve the chunks within that disk, eliminating replica recovery User-defined Procedures support on major event • When any DS start, stop, crash,…ETRI Proprietary 17 저장시스템연구팀
    • 메타데이터 서버 클러스터 Scalable Metadata Capacity & Performance • up to 10 MDS nodes & 1 billion files • up to 50,000 metadata IOPS Cluster Architecture • Management Server(MGT) : cluster resouce management • Metadata Server(MDS) : inode & chunk location Filesystem Client Metadata Server MGT Metadata Lookup Cluster MDS MDS MDS MDS Unbounded Metadata Capacity & Performance (over 1 billion files) Data Server Data I/O Unbounded Data Capacity & Performance (over 10PB)ETRI Proprietary 18 저장시스템연구팀
    • 다중 볼륨 지원 Service Oriented Multiple Volumes • online volume addition • support 30,000 unique volumes Online Management of Volume Attributes • resizing volume quota • resizing volume replication level • real-time volume statistics monitoring Web Management Screen ShotETRI Proprietary 19 저장시스템연구팀
    • 핫스팟 회피 기능  For massive read operations such as streaming services  Hot file will be detected and replicated among more data servers automatically H H H H Data Server Data Server Data Server Data Server Data Server File “H” is HOT File “H” is REPLICATEDETRI Proprietary 20 저장시스템연구팀
    • POSIX 표준 호환 Linux Client API API API API access fstat lstat rename chdir fstatfs mkdir rmdir chmod fsync mknod setxattr chown ftruncate mmap stat chroot getcwd mount statfs close getdents munmap symlink create getxattr open sync fchdir lchown pread truncate fchmod link pwrite umount fchown listxattr read unlink fcntl lockf readlink ustat fdatasync lseek readv utime flock  fcntl: POSIX locking not supported yet.  flock: not supported (NFS also does not support)  lockf: POSIX locking not supported yet  mmap: writable shared mmap not supported  open: O_DIRECT mode not supportedETRI Proprietary 21 저장시스템연구팀
    • Windows API 호환 Client Windows Client • use windows callback filesystem library(CBFS V3) • integrated into windows exporer • support various winows version(XP, Windows 2003,2008, Vista, Win7) • GLORY-FS client mangement tool Windows Exporer Mount / unmount GLORY-FS Management ToolETRI Proprietary 22 저장시스템연구팀
    • 모니터링 및 웹관리도구 명령어 기반 관리 도구 웹 기반 관리 도구ETRI Proprietary 23 저장시스템연구팀
    • ETRI Proprietary 24 저장시스템연구팀
    • 단일 MDS 메타데이터 성능 3500 3000 2500 2000 1500 GLORY LAKE 1000 NFS NFS(mds1) 500 0 • IOZone 성능 시험 도구 활용 (fileop)ETRI Proprietary 25 저장시스템연구팀
    • 다중 MDS 메타데이터 성능 35000 30000 25000 20000 open setattr 15000 create 10000 5000 0 single MDS CMDS #1 CMDS #2 CMDS #4ETRI Proprietary 26 저장시스템연구팀
    • 단일 클라이언트 I/O 성능 Initial Read (Initial Write & Re-write looks similar) Throughput(MBps) 120 100 80 100-120 80-100 60 60-80 40-60 40 20-40 0-20 20 64 0 16 4 1 I/O Size (KB) File Size (MB) • IOZone 성능 시험 도구 활용ETRI Proprietary 27 저장시스템연구팀
    • 단일 클라이언트 I/O 성능 Re-Read Throughput(MBps) 3000 2500 2000 2500-3000 2000-2500 1500 1500-2000 1000-1500 1000 500-1000 0-500 500 64 0 16 4 I/O Size (KB) 1 File Size (MB) • IOZone 성능 시험 도구 활용ETRI Proprietary 28 저장시스템연구팀
    • 단일 클라이언트 I/O 성능 요약 DS Cache DS Disk FM Cache Cache Cache Disk Network I/O I/O I/O I/O (4GBps) (4GBps) (60MBps) (1GBps) Cached Read Bandwidth = > 2GBps (Depends on CPU and Bus speed) Read Bursty Read Bandwidth ≒ 105MBps Sustained Read Bandwidth ≒ 80MBps Write-back cache not supported by fuse. Write Bursty Write Bandwidth ≒ 105MBps Sustained Write Bandwidth ≒ 80MBpsETRI Proprietary 29 저장시스템연구팀
    • 다중 클라이언트 통합 I/O 성능  20 client, 1GB file I/O for each node I/O performance 600 Ideal Network Limit 500 성능(MB/sec) 400 300 200 WRITE WRITE(LAKE) 100 READ READ(LAKE) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 • MDS 1대, DataServer 6대, Client 20대 • IOZone 성능 시험 도구 활용ETRI Proprietary 30 저장시스템연구팀
    • ETRI Proprietary 31 저장시스템연구팀