Submit Search
Upload
Use Alluxio to Unify Storage Systems in Suning
•
0 likes
•
1,523 views
Alluxio, Inc.
Follow
Shanghai Meetup - Jan 2018
Read less
Read more
Technology
Report
Share
Report
Share
1 of 20
Download now
Download to read offline
Recommended
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
Alluxio, Inc.
Tachyon 2015 08 China
Tachyon 2015 08 China
Tachyon Nexus, Inc.
應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局
Alex Lau
分布式存储的元数据设计
分布式存储的元数据设计
LI Daobing
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
Ceph Community
云梯的多Namenode和跨机房之路
云梯的多Namenode和跨机房之路
li luo
美团点评技术沙龙14:美团云对象存储系统
美团点评技术沙龙14:美团云对象存储系统
美团点评技术团队
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...
Ceph Community
Recommended
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
Alluxio, Inc.
Tachyon 2015 08 China
Tachyon 2015 08 China
Tachyon Nexus, Inc.
應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局
Alex Lau
分布式存储的元数据设计
分布式存储的元数据设计
LI Daobing
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
Ceph Community
云梯的多Namenode和跨机房之路
云梯的多Namenode和跨机房之路
li luo
美团点评技术沙龙14:美团云对象存储系统
美团点评技术沙龙14:美团云对象存储系统
美团点评技术团队
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...
Ceph Community
“云存储系统”赏析系列分享三:Sql与nosql
“云存储系统”赏析系列分享三:Sql与nosql
knuthocean
Cephfs架构解读和测试分析
Cephfs架构解读和测试分析
Yang Guanjun
redis 适用场景与实现
redis 适用场景与实现
iammutex
Hadoop安裝 (1)
Hadoop安裝 (1)
銘鴻 陳
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetup
Jiaying Ren
高性能Web应用缓存架构设计浅谈
高性能Web应用缓存架构设计浅谈
Alvin Qi
美团点评技术沙龙010-点评RDS系统介绍
美团点评技术沙龙010-点评RDS系统介绍
美团点评技术团队
Mr&ueh数据库方面
Mr&ueh数据库方面
Tianwei Liu
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
iammutex
Level db
Level db
宗志 陈
Memcached vs redis
Memcached vs redis
qianshi
阿里CDN技术揭秘
阿里CDN技术揭秘
Joshua Zhu
Hantuo openstack
Hantuo openstack
OpenCity Community
Ceph perf-tunning
Ceph perf-tunning
Yang Guanjun
Hacking Nginx at Taobao
Hacking Nginx at Taobao
Joshua Zhu
Ceph intro
Ceph intro
Yang Guanjun
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Danielle Womboldt
基于Ubuntu 12.04 LTS Server的无盘工作站
基于Ubuntu 12.04 LTS Server的无盘工作站
Shawn Zhung
淘宝软件基础设施构建实践
淘宝软件基础设施构建实践
Wensong Zhang
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Ceph Community
Hdfs
Hdfs
jiang yu
HDFS-In-Cloud
HDFS-In-Cloud
Lei Xu
More Related Content
What's hot
“云存储系统”赏析系列分享三:Sql与nosql
“云存储系统”赏析系列分享三:Sql与nosql
knuthocean
Cephfs架构解读和测试分析
Cephfs架构解读和测试分析
Yang Guanjun
redis 适用场景与实现
redis 适用场景与实现
iammutex
Hadoop安裝 (1)
Hadoop安裝 (1)
銘鴻 陳
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetup
Jiaying Ren
高性能Web应用缓存架构设计浅谈
高性能Web应用缓存架构设计浅谈
Alvin Qi
美团点评技术沙龙010-点评RDS系统介绍
美团点评技术沙龙010-点评RDS系统介绍
美团点评技术团队
Mr&ueh数据库方面
Mr&ueh数据库方面
Tianwei Liu
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
iammutex
Level db
Level db
宗志 陈
Memcached vs redis
Memcached vs redis
qianshi
阿里CDN技术揭秘
阿里CDN技术揭秘
Joshua Zhu
Hantuo openstack
Hantuo openstack
OpenCity Community
Ceph perf-tunning
Ceph perf-tunning
Yang Guanjun
Hacking Nginx at Taobao
Hacking Nginx at Taobao
Joshua Zhu
Ceph intro
Ceph intro
Yang Guanjun
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Danielle Womboldt
基于Ubuntu 12.04 LTS Server的无盘工作站
基于Ubuntu 12.04 LTS Server的无盘工作站
Shawn Zhung
淘宝软件基础设施构建实践
淘宝软件基础设施构建实践
Wensong Zhang
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Ceph Community
What's hot
(20)
“云存储系统”赏析系列分享三:Sql与nosql
“云存储系统”赏析系列分享三:Sql与nosql
Cephfs架构解读和测试分析
Cephfs架构解读和测试分析
redis 适用场景与实现
redis 适用场景与实现
Hadoop安裝 (1)
Hadoop安裝 (1)
Divein ceph objectstorage-cephchinacommunity-meetup
Divein ceph objectstorage-cephchinacommunity-meetup
高性能Web应用缓存架构设计浅谈
高性能Web应用缓存架构设计浅谈
美团点评技术沙龙010-点评RDS系统介绍
美团点评技术沙龙010-点评RDS系统介绍
Mr&ueh数据库方面
Mr&ueh数据库方面
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
Level db
Level db
Memcached vs redis
Memcached vs redis
阿里CDN技术揭秘
阿里CDN技术揭秘
Hantuo openstack
Hantuo openstack
Ceph perf-tunning
Ceph perf-tunning
Hacking Nginx at Taobao
Hacking Nginx at Taobao
Ceph intro
Ceph intro
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
基于Ubuntu 12.04 LTS Server的无盘工作站
基于Ubuntu 12.04 LTS Server的无盘工作站
淘宝软件基础设施构建实践
淘宝软件基础设施构建实践
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Similar to Use Alluxio to Unify Storage Systems in Suning
Hdfs
Hdfs
jiang yu
HDFS-In-Cloud
HDFS-In-Cloud
Lei Xu
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討
Billy Yang
Nosql三步曲
Nosql三步曲
84zhu
Hdfs
Hdfs
baggioss
Hdfs
Hdfs
baggioss
Hdfs introduction
Hdfs introduction
baggioss
What could hadoop do for us
What could hadoop do for us
Simon Hsu
Ted yu:h base and hoya
Ted yu:h base and hoya
hdhappy001
Track1dongsiying4
Track1dongsiying4
drewz lin
百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010
Chuanying Du
Zh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfs
TrendProgContest13
Hadoop大数据实践经验
Hadoop大数据实践经验
Hanborq Inc.
Couchbase introduction - Chinese
Couchbase introduction - Chinese
Vickie Zeng
Bypat博客出品-服务器运维集群方法总结2
Bypat博客出品-服务器运维集群方法总结2
redhat9
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
James Chen
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1
medcl
SRE 讀書會 Round 4 #37 - DNS 時快時慢,我該怎麼辦
SRE 讀書會 Round 4 #37 - DNS 時快時慢,我該怎麼辦
HanLing Shen
賽門鐵克 Storage Foundation 6.0 簡報
賽門鐵克 Storage Foundation 6.0 簡報
Wales Chen
Linux File system
Linux File system
Kenny (netman)
Similar to Use Alluxio to Unify Storage Systems in Suning
(20)
Hdfs
Hdfs
HDFS-In-Cloud
HDFS-In-Cloud
HDFS與MapReduce架構研討
HDFS與MapReduce架構研討
Nosql三步曲
Nosql三步曲
Hdfs
Hdfs
Hdfs
Hdfs
Hdfs introduction
Hdfs introduction
What could hadoop do for us
What could hadoop do for us
Ted yu:h base and hoya
Ted yu:h base and hoya
Track1dongsiying4
Track1dongsiying4
百度系统部分布式系统介绍 马如悦 Sacc2010
百度系统部分布式系统介绍 马如悦 Sacc2010
Zh tw introduction_to_hadoop and hdfs
Zh tw introduction_to_hadoop and hdfs
Hadoop大数据实践经验
Hadoop大数据实践经验
Couchbase introduction - Chinese
Couchbase introduction - Chinese
Bypat博客出品-服务器运维集群方法总结2
Bypat博客出品-服务器运维集群方法总结2
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1
SRE 讀書會 Round 4 #37 - DNS 時快時慢,我該怎麼辦
SRE 讀書會 Round 4 #37 - DNS 時快時慢,我該怎麼辦
賽門鐵克 Storage Foundation 6.0 簡報
賽門鐵克 Storage Foundation 6.0 簡報
Linux File system
Linux File system
More from Alluxio, Inc.
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
Alluxio, Inc.
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
Alluxio, Inc.
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
Alluxio, Inc.
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
Alluxio, Inc.
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio, Inc.
More from Alluxio, Inc.
(20)
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Use Alluxio to Unify Storage Systems in Suning
1.
ALLUXIO – 分布式存储系统的统一 入口
2.
概要 • HDFS在苏宁的使用和存在的问题; • 多HDFS集群的解决方案; •
Alluxio Porxy分布式存储系统的统一入口的 实现; • Alluxio在苏宁的未来; • Q&A;
3.
苏宁的大数据平台 数据源 存储层 计算层 服务层
4.
HDFS在苏宁的使用 集群1 • 528 datanodes * 40TB/ node •
DFS Used: – 1.3PB – 1.3亿的文件和目录 – 1.3亿的块 集群2 • 100 datanodes * 40TB/ node • DFS Used: – 1PB – 5千万的文件和目录; – 5千万的块 其他:为Hbase搭建的HDFS集群
5.
单一的HDFS集群存在的问题 • HDFS namenode在高并发的情况下的RPC延迟很高; – Client cloud not complete file,从而导致任务失败; –
Datanode Last contact时间较长,在namenode重启的时候较为明显; 在高并发的情况下,HDFS 的水平扩展能力不足。
6.
多HDFS集群的解决方案 • 将HDFS的集群进行拆分;需要考虑的问题有: – 底层的多HDFS集群对用户透明; –
跨集群的数据访问; – 集群切分的维度; • HDFS社区的解决方案: – Federation + viewFs – HDFS Router • 苏宁的解决方案: – Alluxio Proxy: 利用Alluxio的UnifiedNameSpace功能,选取Alluxio作为多HDFS集群 或者其他存储集群的统一入口;
7.
社区的解决方案 • Federation + viewFs: – 可以解决HDFS的横向扩展问题; – 该方案是在client端通过配置来实现路由功能的; 不利于大规模集群的运维和管理; • Router: – HDFS 2.9.0 release;
8.
Alluxio – Unifiy data at memory speed
9.
使用Alluxio遇到的问题 • 多HDFS集群的Metadata都会进入到Alluxio Master中, Alluxio Master会遇到内存的瓶颈; – 通过测试,相比于HDFS,Alluxio的Metadata消耗的内存 为HDFS的一倍; •
Alluxio Client和Master的连接是长连接 • Alluxio不支持Append操作; • Client的兼容性问题;
10.
Alluxio Proxy的架构图
11.
Alluxio Master的元数据量 • 解决办法:各自管理自己存储空间的数据 的元数据; Alluxio Master HDFS NameNode 只管理Cache在Alluxio Space中数据的元数据 管理HDFS Space中数据的元数据
12.
13.
Alluxio Master的长连接问题 • 解决办法: – Client主动去关闭connect; – 通过测试,client reconnect的时间消耗 < 1ms, 在苏宁的使用场景中,可以接受;
14.
Client支持Append操作 • 由于采用分层管理,各自管理自己Space的 数据的元数据信息,所以在client可以支持 直接through到底层的分布式文件中进行 append操作;
15.
Client端兼容性 • 在实际使用中,Alluxio Proxy在Client以 plugin的形式提供服务,整个过程中对用户 是无感知的; • 由于是部署在client,所以和相关组件的依 赖兼容,从而导致任务失败; •
解决办法:将Alluxio runtime相关的jar包全 部shaded;
16.
Alluxio Proxy总结 • 利用Alluxio的Unified Namespace功能实现多HDFS集 群的统一入口; – MountTable在Alluxio Master端保存,便于运维和管理; –
Alluxio的Master具有HA机制; • 在路由的功能之上,对热数据进行缓存,从而对计 算进行加速; • 将临时的,不需要落地的数据直接放在Alluxio的内 存中,从而减少namenode的元数据的频繁的增加和 删除;
17.
Alluxio Proxy在苏宁的使用 Alluxio 集群的规模 • 2 masters + 3 workers • Alluxio当前只用于多HDFS 集群的路由功能; •
按照用户对HDFS集群进行 切分; 集成的组件 • Hadoop(HDFS + YARN) • Hive • Spark • Flink • Druid • Sqoop • Hbase • Flume • OLAP
18.
Alluxio Proxy的规划 • 推进Alluxio Proxy成为分布式系统的存储统 一入口; • 利用Alluxio的缓存功能; •
积极参与社区的发展;
19.
Q&A
20.
Thanks
Download now