SlideShare a Scribd company logo
1 of 31
Manila on CephFS at CERN
Our way to production
Arne Wiebalck
Dan van der Ster
OpenStack Summit
Boston, MA, U.S.
May 11, 2017
http://home.cern
About CERN
European Organization for
Nuclear Research
(Conseil Européen pour la Recherche Nucléaire)
- Founded in 1954
- World’s largest particle
physics laboratory
- Located at Franco-Swiss
border near Geneva
- ~2’300 staff members
>12’500 users
- Budget: ~1000 MCHF (2016)
Primary mission:
Find answers to some of the fundamental
questions about the universe! http://home.cern
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 3
The CERN Cloud at a Glance
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 4
• Production service since July 2013
- Several rolling upgrades since, now on Newton
• Two data centers, 23ms distance
- One region, one API entry point
• Currently ~220’000 cores
- 7’000 hypervisors (+2’000 more soon)
- ~27k instances
• 50+ cells
- Separate h/w, use case, power, location, …
Setting the Scene: Storage
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 5
A
F
S
NFS
RB
D
CERNboxT
S
M
S3
HSM Data Archive
Developed at CERN
140PB – 25k tapes
Data Analysis
Developed at CERN
120PB – 44k HDDs
File share & sync
Owncloud/EOS
9’500 users
CVMFS
OpenStack backend
CephFS
Several PB-sized clusters
NFS Filer
OpenZFS/RBD/OpenStack
Strong POSIX
Infrastructure Services
NFS Filer Service Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 6
• NFS appliances on top of OpenStack
- Several VMs with Cinder volume and OpenZFS
- ZFS replication to remote data center
- Local SSD as accelerator (l2arc, ZIL)
• POSIX and strong consistency
- Puppet, GitLab, OpenShift, Twiki, …
- LSF, Grid CE, Argus, …
- BOINC, Microelectronics, CDS, …
NFS Filer Service Limitations
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 7
• Scalability
- Metadata Operations (read)
• Availability
- SPoF per NFS volume
- ‘shared block device’
• Emerging use cases
- HPC, see next slide
Use Cases: HPC
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 8
• MPI Applications
- Beam simulations, accelerator physics, plasma
simulations, computation fluid dynamics, QCD …
• Different from HTC model
- On dedicated low-latency clusters
- Fast access to shared storage
- Very long running jobs
• Dedicated CephFS cluster in 2016
- 3-node cluster, 150TB usable
- RADOS: quite low activity
- 52TB used, 8M files, 350k dirs
- <100 file creations/sec
Can we converge on CephFS ?
(and if yes, how do we make it available to users?)
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 9
Part I: The CephFS Backend
CephFS Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 10
POSIX-compliant shared FS on top of
RADOS
- Same foundation as RBD
Userland and kernel clients available
- ‘ceph-fuse’ and ‘mount -t ceph’
- Features added to ceph-fuse first, then ML kernel
‘jewel’ release tagged production-ready
- April 2016, main addition: fsck
- ‘Almost awesome’ before, focus on object and block
CephFS Meta-Data Server (MDS)
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 11
• Crucial component to build a fast and scalable FS
- Creates and manages inodes (persisted in RADOS, but cached in memory)
- Tracking client inode ‘capabilities’ (which client is using which inodes)
• Larger cache can speed up meta-data throughput
- More RAM can avoid meta-data stalls when reading from RADOS
• Single MDS can handle limited number of client reqs/sec
- Faster network / CPU enables more reqs/sec, but multiple MDSs needed for scaling
- MDS keeps nothing in disk, a flash-only RADOS pool may accelerate meta-data intensive
workloads
CephFS MDS Testing
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 12
• Correctness and single-user performance
- POSIX compliance: ok! (Tuxera POSIX Test Suite v20090130)
- Two-client consistency delays: ok! (20-30ms with fsping)
- Two-client parallel IO: reproducible slowdown
• Mimic Puppet master
- ‘stat’ all files in prod env from mutliple clients
- 20k stats/sec limit
• Meta-data scalability in multi-user scenarios
- Multiple active MDS fail-over w/ 1000 clients: ok!
- Meta-data load balancing heuristics: todo …
CephFS Issues Discovered
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 13
• ‘ceph-fuse’ crash in quota code, fixed in 10.2.5
- http://tracker.ceph.com/issues/16066
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Objectcacher ignores max objects limits when writing large files
- http://tracker.ceph.com/issues/19343
• Network outage can leave ‘ceph-fuse’ stuck
- Difficult to reproduce, have server-side work-around, ‘luminous’ ?
• Parallel IO to single file can be slower than expected
- If CephFS detects several writers to the same file, it switches clients to unbuffered IO
Desired:
Quotas
- should be mandatory
for userland and kernel
QoS
- throttle/protect users
‘CephFS is awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 14
• POSIX compliance looks good
• Months of testing: no ‘difficult’ problems
- Quotas and QoS?
• Single MDS meta-data limits are close to our Filer
- We need multi-MDS!
• ToDo: Backup, NFS Ganesha (legacy clients, Kerberos)
• ‘luminous’ testing has started ...
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 15
Part II: The OpenStack Integration
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 16
LHC Incident in April 2016
Manila: Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 17
• File Share Project in OpenStack
- Provisioning of shared file systems to VMs
- ‘Cinder for file shares’
• APIs for tenants to request shares
- Fulfilled by backend drivers
- Acessed from instances
• Support for variety of NAS protocols
- NFS, CIFS, MapR-FS, GlusterFS, CephFS, …
• Supports the notion of share types
- Map features to backends
Manila
Backend
1. Request share
2. Create share
4. Access share
User
instances
3. Provide handle
Manila: Components
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 18
manila-share
Driver
manila-share
Driver
manila-share
Driver
Message Queue (e.g. RabbitMQ)
manila-
scheduler
manila-api
DB
REST API
DB for storage of service data
Message queue for
inter-component communication
manila-share:
Manages backends
manila-api:
Receives, authenticates and
handles requestsmanila-scheduler:
Routes requests to appropriate
share service (by filters)
(Not shown: manila-data: copy, migration,
backup)
m-share
Manila: Our Initial Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 19
- Three controllers running
m-{api, scheduler, share}
- Separate Rabbit cluster
- DB from external service
- Existing CephFS backend
- Set up and ‘working’ in <1h !
Driver
RabbitMQ
m-sched
m-api DBm-api
m-share
Driver
m-sched
m-share
Driver
m-sched
m-api
Our Cinder setup has been changed as well …
Sequential Share Creation/Deletion: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 20
• Create: ~2sec
• Delete: ~5sec
• “manila list” creates
two auth calls
- First one for discovery
- Cinder/Nova do only one … ?
Bulk Deletion of Shares: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 21
This is what users (and Magnum/Heat/K8s) do …
(... and this breaks our Cinder! Bug 1685818).
• Sequentially
- manila delete share-01
- manila delete share-02
- manila delete share-03
- …
• In parallel
- manila delete share-01 share-02 …
- Done with 100 shares in parallel
After successful 24h tests of constant creations/deletions …
Manila testing: #fouinehammer
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 22
m-share
Driver
RabbitMQ
m-sched
m-api DBm-api m-api
1 … 500 nodes
1 ... 10k PODs
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 23
https://github.com/johanhaleby/kubetail
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 24
k8s DNS
restarts
Image registry DDOS
DB connection limits
Connection pool limits
This way 
Stressing
the API (1)
PODs running
‘manila list’
in a loop
~linear until API
processes exhausted …
ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 25
1 pod 10010 10 1000
Request time [sec]
Log messages
DB connection
pool exhausted
Stressing
the API (2)
PODs running
‘manila create’
‘manila delete’
in a loop
~works until DB limits
are reached …
ok!
Components ‘fouined’ on the way
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 26
• Right away: Image registry (when scaling the k8s cluster)
- 500 nodes downloading the image …
- Increased RAM on image registry nodes
- Used 100 nodes
• ~350 pods: Kubernetes DNS (~350 PODs)
- Constantly restarted: Name or service not known
- Scaled it out to 10 nodes
• ~1’000 pods: Central monitoring on Elastic Search
- Too many logs messages
• ~4’000 pods: Allowed DB connections (and the connection pool)
- Backtrace in Manila API logs
m-share
Manila: Our (Pre-)Prod Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 27
• Three virtual controllers
- 4-core, 8GB
- Newton
- Puppet
• 3-node RabbitMQ cluster
- V3.6.5
• Three share types
- Test/Prod: Ceph ‘jewel’
- Dev: Ceph ‘luminous’
Driver RabbitMQ
m-sched
m-api DBm-api m-api
RabbitMQ
RabbitMQ
prod test dev
Early user: Containers in ATLAS
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 28
Lukas Heinrich: Containers in ATLAS
• Analysis workflows modeled as directed
acyclic graphs, built at run-time
- Graph’s nodes are containers
• Addresses the issue of workflow
preservation
- Store the parametrized container workload
• Built using a k8s cluster
- On top of Magnum
• Using CephFS to store intermediate job
stages
- Share creation via Manila
- Leverageing CephFS integration in k8s
‘Manila is also awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 29
• Setup is straightforward
• No Manila issues during our testing
- Functional as well as stress testing
• Most features we found missing are in the plans
- Per share type quotas, service HA
- CephFS NFS driver
• Some features need some more attention
- OSC integration, CLI/UI feature parity, AuthID goes with last share
• Welcoming, helpful team!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 30
Arne.Wiebalck@cern.ch
@ArneWiebalck
Q
U
E
S
T
I
O
N
S
?
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

More Related Content

What's hot

Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionArne Wiebalck
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)Eggy Cheng
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsMatthew Rocklin
 
OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)JeSam Kim
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpStacy Véronneau
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack TutorialBret Piatt
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack IntroductionJimi Chen
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStackCisco DevNet
 
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...Cloud Native Day Tel Aviv
 
Architecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseArchitecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseKeith Tobin
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksJakub Pavlik
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstackYaniv Zadka
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 

What's hot (20)

Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python Applications
 
OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
OpenStack basics
OpenStack basicsOpenStack basics
OpenStack basics
 
OpenStack 101 update
OpenStack 101 updateOpenStack 101 update
OpenStack 101 update
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack Tutorial
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
OpenStack 101
OpenStack 101OpenStack 101
OpenStack 101
 
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
 
OpenStack 101 @ ENEI 2014
OpenStack 101 @ ENEI 2014OpenStack 101 @ ENEI 2014
OpenStack 101 @ ENEI 2014
 
Architecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseArchitecture Openstack for the Enterprise
Architecture Openstack for the Enterprise
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 

Similar to Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summitdrewz lin
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Lessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeLessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeYevgeniy Brikman
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudArne Wiebalck
 
BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment Dirk Petersen
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerVu Nguyen Duy
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 

Similar to Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017) (20)

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
CERNBox: Site Report
CERNBox: Site ReportCERNBox: Site Report
CERNBox: Site Report
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Lessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeLessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure code
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
 
BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

  • 1.
  • 2. Manila on CephFS at CERN Our way to production Arne Wiebalck Dan van der Ster OpenStack Summit Boston, MA, U.S. May 11, 2017
  • 3. http://home.cern About CERN European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) - Founded in 1954 - World’s largest particle physics laboratory - Located at Franco-Swiss border near Geneva - ~2’300 staff members >12’500 users - Budget: ~1000 MCHF (2016) Primary mission: Find answers to some of the fundamental questions about the universe! http://home.cern Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 3
  • 4. The CERN Cloud at a Glance Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 4 • Production service since July 2013 - Several rolling upgrades since, now on Newton • Two data centers, 23ms distance - One region, one API entry point • Currently ~220’000 cores - 7’000 hypervisors (+2’000 more soon) - ~27k instances • 50+ cells - Separate h/w, use case, power, location, …
  • 5. Setting the Scene: Storage Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 5 A F S NFS RB D CERNboxT S M S3 HSM Data Archive Developed at CERN 140PB – 25k tapes Data Analysis Developed at CERN 120PB – 44k HDDs File share & sync Owncloud/EOS 9’500 users CVMFS OpenStack backend CephFS Several PB-sized clusters NFS Filer OpenZFS/RBD/OpenStack Strong POSIX Infrastructure Services
  • 6. NFS Filer Service Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 6 • NFS appliances on top of OpenStack - Several VMs with Cinder volume and OpenZFS - ZFS replication to remote data center - Local SSD as accelerator (l2arc, ZIL) • POSIX and strong consistency - Puppet, GitLab, OpenShift, Twiki, … - LSF, Grid CE, Argus, … - BOINC, Microelectronics, CDS, …
  • 7. NFS Filer Service Limitations Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 7 • Scalability - Metadata Operations (read) • Availability - SPoF per NFS volume - ‘shared block device’ • Emerging use cases - HPC, see next slide
  • 8. Use Cases: HPC Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 8 • MPI Applications - Beam simulations, accelerator physics, plasma simulations, computation fluid dynamics, QCD … • Different from HTC model - On dedicated low-latency clusters - Fast access to shared storage - Very long running jobs • Dedicated CephFS cluster in 2016 - 3-node cluster, 150TB usable - RADOS: quite low activity - 52TB used, 8M files, 350k dirs - <100 file creations/sec Can we converge on CephFS ? (and if yes, how do we make it available to users?)
  • 9. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 9 Part I: The CephFS Backend
  • 10. CephFS Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 10 POSIX-compliant shared FS on top of RADOS - Same foundation as RBD Userland and kernel clients available - ‘ceph-fuse’ and ‘mount -t ceph’ - Features added to ceph-fuse first, then ML kernel ‘jewel’ release tagged production-ready - April 2016, main addition: fsck - ‘Almost awesome’ before, focus on object and block
  • 11. CephFS Meta-Data Server (MDS) Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 11 • Crucial component to build a fast and scalable FS - Creates and manages inodes (persisted in RADOS, but cached in memory) - Tracking client inode ‘capabilities’ (which client is using which inodes) • Larger cache can speed up meta-data throughput - More RAM can avoid meta-data stalls when reading from RADOS • Single MDS can handle limited number of client reqs/sec - Faster network / CPU enables more reqs/sec, but multiple MDSs needed for scaling - MDS keeps nothing in disk, a flash-only RADOS pool may accelerate meta-data intensive workloads
  • 12. CephFS MDS Testing Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 12 • Correctness and single-user performance - POSIX compliance: ok! (Tuxera POSIX Test Suite v20090130) - Two-client consistency delays: ok! (20-30ms with fsping) - Two-client parallel IO: reproducible slowdown • Mimic Puppet master - ‘stat’ all files in prod env from mutliple clients - 20k stats/sec limit • Meta-data scalability in multi-user scenarios - Multiple active MDS fail-over w/ 1000 clients: ok! - Meta-data load balancing heuristics: todo …
  • 13. CephFS Issues Discovered Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 13 • ‘ceph-fuse’ crash in quota code, fixed in 10.2.5 - http://tracker.ceph.com/issues/16066 • Bug when creating deep directores, fixed in 10.2.6 - http://tracker.ceph.com/issues/18008 • Bug when creating deep directores, fixed in 10.2.6 - http://tracker.ceph.com/issues/18008 • Objectcacher ignores max objects limits when writing large files - http://tracker.ceph.com/issues/19343 • Network outage can leave ‘ceph-fuse’ stuck - Difficult to reproduce, have server-side work-around, ‘luminous’ ? • Parallel IO to single file can be slower than expected - If CephFS detects several writers to the same file, it switches clients to unbuffered IO Desired: Quotas - should be mandatory for userland and kernel QoS - throttle/protect users
  • 14. ‘CephFS is awesome!’ Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 14 • POSIX compliance looks good • Months of testing: no ‘difficult’ problems - Quotas and QoS? • Single MDS meta-data limits are close to our Filer - We need multi-MDS! • ToDo: Backup, NFS Ganesha (legacy clients, Kerberos) • ‘luminous’ testing has started ...
  • 15. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 15 Part II: The OpenStack Integration
  • 16. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 16 LHC Incident in April 2016
  • 17. Manila: Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 17 • File Share Project in OpenStack - Provisioning of shared file systems to VMs - ‘Cinder for file shares’ • APIs for tenants to request shares - Fulfilled by backend drivers - Acessed from instances • Support for variety of NAS protocols - NFS, CIFS, MapR-FS, GlusterFS, CephFS, … • Supports the notion of share types - Map features to backends Manila Backend 1. Request share 2. Create share 4. Access share User instances 3. Provide handle
  • 18. Manila: Components Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 18 manila-share Driver manila-share Driver manila-share Driver Message Queue (e.g. RabbitMQ) manila- scheduler manila-api DB REST API DB for storage of service data Message queue for inter-component communication manila-share: Manages backends manila-api: Receives, authenticates and handles requestsmanila-scheduler: Routes requests to appropriate share service (by filters) (Not shown: manila-data: copy, migration, backup)
  • 19. m-share Manila: Our Initial Setup Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 19 - Three controllers running m-{api, scheduler, share} - Separate Rabbit cluster - DB from external service - Existing CephFS backend - Set up and ‘working’ in <1h ! Driver RabbitMQ m-sched m-api DBm-api m-share Driver m-sched m-share Driver m-sched m-api Our Cinder setup has been changed as well …
  • 20. Sequential Share Creation/Deletion: ok! Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 20 • Create: ~2sec • Delete: ~5sec • “manila list” creates two auth calls - First one for discovery - Cinder/Nova do only one … ?
  • 21. Bulk Deletion of Shares: ok! Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 21 This is what users (and Magnum/Heat/K8s) do … (... and this breaks our Cinder! Bug 1685818). • Sequentially - manila delete share-01 - manila delete share-02 - manila delete share-03 - … • In parallel - manila delete share-01 share-02 … - Done with 100 shares in parallel After successful 24h tests of constant creations/deletions …
  • 22. Manila testing: #fouinehammer Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 22 m-share Driver RabbitMQ m-sched m-api DBm-api m-api 1 … 500 nodes 1 ... 10k PODs
  • 23. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 23 https://github.com/johanhaleby/kubetail
  • 24. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 24 k8s DNS restarts Image registry DDOS DB connection limits Connection pool limits This way  Stressing the API (1) PODs running ‘manila list’ in a loop ~linear until API processes exhausted … ok!
  • 25. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 25 1 pod 10010 10 1000 Request time [sec] Log messages DB connection pool exhausted Stressing the API (2) PODs running ‘manila create’ ‘manila delete’ in a loop ~works until DB limits are reached … ok!
  • 26. Components ‘fouined’ on the way Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 26 • Right away: Image registry (when scaling the k8s cluster) - 500 nodes downloading the image … - Increased RAM on image registry nodes - Used 100 nodes • ~350 pods: Kubernetes DNS (~350 PODs) - Constantly restarted: Name or service not known - Scaled it out to 10 nodes • ~1’000 pods: Central monitoring on Elastic Search - Too many logs messages • ~4’000 pods: Allowed DB connections (and the connection pool) - Backtrace in Manila API logs
  • 27. m-share Manila: Our (Pre-)Prod Setup Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 27 • Three virtual controllers - 4-core, 8GB - Newton - Puppet • 3-node RabbitMQ cluster - V3.6.5 • Three share types - Test/Prod: Ceph ‘jewel’ - Dev: Ceph ‘luminous’ Driver RabbitMQ m-sched m-api DBm-api m-api RabbitMQ RabbitMQ prod test dev
  • 28. Early user: Containers in ATLAS Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 28 Lukas Heinrich: Containers in ATLAS • Analysis workflows modeled as directed acyclic graphs, built at run-time - Graph’s nodes are containers • Addresses the issue of workflow preservation - Store the parametrized container workload • Built using a k8s cluster - On top of Magnum • Using CephFS to store intermediate job stages - Share creation via Manila - Leverageing CephFS integration in k8s
  • 29. ‘Manila is also awesome!’ Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 29 • Setup is straightforward • No Manila issues during our testing - Functional as well as stress testing • Most features we found missing are in the plans - Per share type quotas, service HA - CephFS NFS driver • Some features need some more attention - OSC integration, CLI/UI feature parity, AuthID goes with last share • Welcoming, helpful team!
  • 30. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 30 Arne.Wiebalck@cern.ch @ArneWiebalck Q U E S T I O N S ?

Editor's Notes

  1. European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) Founded in 1954, today 22 member states World’s largest particle physics laboratory Located at Franco-Swiss border near Geneva ~2’300 staff members, >12’500 users Budget: ~1000 MCHF (2016) CERN’s mission Answer fundamental questions of the universe Advance the technology frontiers Train the scientists and engineers of tomorrow Bring nations together