SlideShare a Scribd company logo
Manila on CephFS at CERN
Our way to production
Arne Wiebalck
Dan van der Ster
OpenStack Summit
Boston, MA, U.S.
May 11, 2017
http://home.cern
About CERN
European Organization for
Nuclear Research
(Conseil Européen pour la Recherche Nucléaire)
- Founded in 1954
- World’s largest particle
physics laboratory
- Located at Franco-Swiss
border near Geneva
- ~2’300 staff members
>12’500 users
- Budget: ~1000 MCHF (2016)
Primary mission:
Find answers to some of the fundamental
questions about the universe! http://home.cern
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 3
The CERN Cloud at a Glance
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 4
• Production service since July 2013
- Several rolling upgrades since, now on Newton
• Two data centers, 23ms distance
- One region, one API entry point
• Currently ~220’000 cores
- 7’000 hypervisors (+2’000 more soon)
- ~27k instances
• 50+ cells
- Separate h/w, use case, power, location, …
Setting the Scene: Storage
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 5
A
F
S
NFS
RB
D
CERNboxT
S
M
S3
HSM Data Archive
Developed at CERN
140PB – 25k tapes
Data Analysis
Developed at CERN
120PB – 44k HDDs
File share & sync
Owncloud/EOS
9’500 users
CVMFS
OpenStack backend
CephFS
Several PB-sized clusters
NFS Filer
OpenZFS/RBD/OpenStack
Strong POSIX
Infrastructure Services
NFS Filer Service Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 6
• NFS appliances on top of OpenStack
- Several VMs with Cinder volume and OpenZFS
- ZFS replication to remote data center
- Local SSD as accelerator (l2arc, ZIL)
• POSIX and strong consistency
- Puppet, GitLab, OpenShift, Twiki, …
- LSF, Grid CE, Argus, …
- BOINC, Microelectronics, CDS, …
NFS Filer Service Limitations
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 7
• Scalability
- Metadata Operations (read)
• Availability
- SPoF per NFS volume
- ‘shared block device’
• Emerging use cases
- HPC, see next slide
Use Cases: HPC
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 8
• MPI Applications
- Beam simulations, accelerator physics, plasma
simulations, computation fluid dynamics, QCD …
• Different from HTC model
- On dedicated low-latency clusters
- Fast access to shared storage
- Very long running jobs
• Dedicated CephFS cluster in 2016
- 3-node cluster, 150TB usable
- RADOS: quite low activity
- 52TB used, 8M files, 350k dirs
- <100 file creations/sec
Can we converge on CephFS ?
(and if yes, how do we make it available to users?)
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 9
Part I: The CephFS Backend
CephFS Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 10
POSIX-compliant shared FS on top of
RADOS
- Same foundation as RBD
Userland and kernel clients available
- ‘ceph-fuse’ and ‘mount -t ceph’
- Features added to ceph-fuse first, then ML kernel
‘jewel’ release tagged production-ready
- April 2016, main addition: fsck
- ‘Almost awesome’ before, focus on object and block
CephFS Meta-Data Server (MDS)
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 11
• Crucial component to build a fast and scalable FS
- Creates and manages inodes (persisted in RADOS, but cached in memory)
- Tracking client inode ‘capabilities’ (which client is using which inodes)
• Larger cache can speed up meta-data throughput
- More RAM can avoid meta-data stalls when reading from RADOS
• Single MDS can handle limited number of client reqs/sec
- Faster network / CPU enables more reqs/sec, but multiple MDSs needed for scaling
- MDS keeps nothing in disk, a flash-only RADOS pool may accelerate meta-data intensive
workloads
CephFS MDS Testing
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 12
• Correctness and single-user performance
- POSIX compliance: ok! (Tuxera POSIX Test Suite v20090130)
- Two-client consistency delays: ok! (20-30ms with fsping)
- Two-client parallel IO: reproducible slowdown
• Mimic Puppet master
- ‘stat’ all files in prod env from mutliple clients
- 20k stats/sec limit
• Meta-data scalability in multi-user scenarios
- Multiple active MDS fail-over w/ 1000 clients: ok!
- Meta-data load balancing heuristics: todo …
CephFS Issues Discovered
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 13
• ‘ceph-fuse’ crash in quota code, fixed in 10.2.5
- http://tracker.ceph.com/issues/16066
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Objectcacher ignores max objects limits when writing large files
- http://tracker.ceph.com/issues/19343
• Network outage can leave ‘ceph-fuse’ stuck
- Difficult to reproduce, have server-side work-around, ‘luminous’ ?
• Parallel IO to single file can be slower than expected
- If CephFS detects several writers to the same file, it switches clients to unbuffered IO
Desired:
Quotas
- should be mandatory
for userland and kernel
QoS
- throttle/protect users
‘CephFS is awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 14
• POSIX compliance looks good
• Months of testing: no ‘difficult’ problems
- Quotas and QoS?
• Single MDS meta-data limits are close to our Filer
- We need multi-MDS!
• ToDo: Backup, NFS Ganesha (legacy clients, Kerberos)
• ‘luminous’ testing has started ...
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 15
Part II: The OpenStack Integration
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 16
LHC Incident in April 2016
Manila: Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 17
• File Share Project in OpenStack
- Provisioning of shared file systems to VMs
- ‘Cinder for file shares’
• APIs for tenants to request shares
- Fulfilled by backend drivers
- Acessed from instances
• Support for variety of NAS protocols
- NFS, CIFS, MapR-FS, GlusterFS, CephFS, …
• Supports the notion of share types
- Map features to backends
Manila
Backend
1. Request share
2. Create share
4. Access share
User
instances
3. Provide handle
Manila: Components
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 18
manila-share
Driver
manila-share
Driver
manila-share
Driver
Message Queue (e.g. RabbitMQ)
manila-
scheduler
manila-api
DB
REST API
DB for storage of service data
Message queue for
inter-component communication
manila-share:
Manages backends
manila-api:
Receives, authenticates and
handles requestsmanila-scheduler:
Routes requests to appropriate
share service (by filters)
(Not shown: manila-data: copy, migration,
backup)
m-share
Manila: Our Initial Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 19
- Three controllers running
m-{api, scheduler, share}
- Separate Rabbit cluster
- DB from external service
- Existing CephFS backend
- Set up and ‘working’ in <1h !
Driver
RabbitMQ
m-sched
m-api DBm-api
m-share
Driver
m-sched
m-share
Driver
m-sched
m-api
Our Cinder setup has been changed as well …
Sequential Share Creation/Deletion: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 20
• Create: ~2sec
• Delete: ~5sec
• “manila list” creates
two auth calls
- First one for discovery
- Cinder/Nova do only one … ?
Bulk Deletion of Shares: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 21
This is what users (and Magnum/Heat/K8s) do …
(... and this breaks our Cinder! Bug 1685818).
• Sequentially
- manila delete share-01
- manila delete share-02
- manila delete share-03
- …
• In parallel
- manila delete share-01 share-02 …
- Done with 100 shares in parallel
After successful 24h tests of constant creations/deletions …
Manila testing: #fouinehammer
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 22
m-share
Driver
RabbitMQ
m-sched
m-api DBm-api m-api
1 … 500 nodes
1 ... 10k PODs
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 23
https://github.com/johanhaleby/kubetail
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 24
k8s DNS
restarts
Image registry DDOS
DB connection limits
Connection pool limits
This way 
Stressing
the API (1)
PODs running
‘manila list’
in a loop
~linear until API
processes exhausted …
ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 25
1 pod 10010 10 1000
Request time [sec]
Log messages
DB connection
pool exhausted
Stressing
the API (2)
PODs running
‘manila create’
‘manila delete’
in a loop
~works until DB limits
are reached …
ok!
Components ‘fouined’ on the way
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 26
• Right away: Image registry (when scaling the k8s cluster)
- 500 nodes downloading the image …
- Increased RAM on image registry nodes
- Used 100 nodes
• ~350 pods: Kubernetes DNS (~350 PODs)
- Constantly restarted: Name or service not known
- Scaled it out to 10 nodes
• ~1’000 pods: Central monitoring on Elastic Search
- Too many logs messages
• ~4’000 pods: Allowed DB connections (and the connection pool)
- Backtrace in Manila API logs
m-share
Manila: Our (Pre-)Prod Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 27
• Three virtual controllers
- 4-core, 8GB
- Newton
- Puppet
• 3-node RabbitMQ cluster
- V3.6.5
• Three share types
- Test/Prod: Ceph ‘jewel’
- Dev: Ceph ‘luminous’
Driver RabbitMQ
m-sched
m-api DBm-api m-api
RabbitMQ
RabbitMQ
prod test dev
Early user: Containers in ATLAS
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 28
Lukas Heinrich: Containers in ATLAS
• Analysis workflows modeled as directed
acyclic graphs, built at run-time
- Graph’s nodes are containers
• Addresses the issue of workflow
preservation
- Store the parametrized container workload
• Built using a k8s cluster
- On top of Magnum
• Using CephFS to store intermediate job
stages
- Share creation via Manila
- Leverageing CephFS integration in k8s
‘Manila is also awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 29
• Setup is straightforward
• No Manila issues during our testing
- Functional as well as stress testing
• Most features we found missing are in the plans
- Per share type quotas, service HA
- CephFS NFS driver
• Some features need some more attention
- OSC integration, CLI/UI feature parity, AuthID goes with last share
• Welcoming, helpful team!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 30
Arne.Wiebalck@cern.ch
@ArneWiebalck
Q
U
E
S
T
I
O
N
S
?
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

More Related Content

What's hot

Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
Arne Wiebalck
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
Tim Bell
 
Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)
Eggy Cheng
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
Stacy Véronneau
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python Applications
Matthew Rocklin
 
OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)
JeSam Kim
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
Stacy Véronneau
 
OpenStack basics
OpenStack basicsOpenStack basics
OpenStack basics
Thanassis Parathyras
 
OpenStack 101 update
OpenStack 101 updateOpenStack 101 update
OpenStack 101 update
Kamesh Pemmaraju
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack Tutorial
Bret Piatt
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
DataWorks Summit
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
Jimi Chen
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
Cisco DevNet
 
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
Cloud Native Day Tel Aviv
 
Architecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseArchitecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseKeith Tobin
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
Jakub Pavlik
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
Yaniv Zadka
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 

What's hot (20)

Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)Openstack Installation (ver. liberty)
Openstack Installation (ver. liberty)
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
UCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python ApplicationsUCX-Python - A Flexible Communication Library for Python Applications
UCX-Python - A Flexible Communication Library for Python Applications
 
OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)OpenStack hands-on (All-in-One)
OpenStack hands-on (All-in-One)
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
OpenStack basics
OpenStack basicsOpenStack basics
OpenStack basics
 
OpenStack 101 update
OpenStack 101 updateOpenStack 101 update
OpenStack 101 update
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack Tutorial
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
OpenStack 101
OpenStack 101OpenStack 101
OpenStack 101
 
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...
 
OpenStack 101 @ ENEI 2014
OpenStack 101 @ ENEI 2014OpenStack 101 @ ENEI 2014
OpenStack 101 @ ENEI 2014
 
Architecture Openstack for the Enterprise
Architecture Openstack for the EnterpriseArchitecture Openstack for the Enterprise
Architecture Openstack for the Enterprise
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 

Similar to Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Jim Dowling
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
Ceph Community
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
Tim Bell
 
CERNBox: Site Report
CERNBox: Site ReportCERNBox: Site Report
CERNBox: Site Report
Hugo González Labrador
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summitdrewz lin
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
Pramit Choudhary
 
Lessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeLessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure code
Yevgeniy Brikman
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
laurabeckcahoon
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
Tim Bell
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
Arne Wiebalck
 
BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment
Dirk Petersen
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
Vu Nguyen Duy
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
Ashish Bansal
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
Ton Ngo
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
Ceph Community
 

Similar to Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017) (20)

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
CERNBox: Site Report
CERNBox: Site ReportCERNBox: Site Report
CERNBox: Site Report
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Fb talk arch_summit
Fb talk arch_summitFb talk arch_summit
Fb talk arch_summit
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Lessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure codeLessons learned from writing over 300,000 lines of infrastructure code
Lessons learned from writing over 300,000 lines of infrastructure code
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
 
BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment BeeGFS Enterprise Deployment
BeeGFS Enterprise Deployment
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)

  • 1.
  • 2. Manila on CephFS at CERN Our way to production Arne Wiebalck Dan van der Ster OpenStack Summit Boston, MA, U.S. May 11, 2017
  • 3. http://home.cern About CERN European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) - Founded in 1954 - World’s largest particle physics laboratory - Located at Franco-Swiss border near Geneva - ~2’300 staff members >12’500 users - Budget: ~1000 MCHF (2016) Primary mission: Find answers to some of the fundamental questions about the universe! http://home.cern Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 3
  • 4. The CERN Cloud at a Glance Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 4 • Production service since July 2013 - Several rolling upgrades since, now on Newton • Two data centers, 23ms distance - One region, one API entry point • Currently ~220’000 cores - 7’000 hypervisors (+2’000 more soon) - ~27k instances • 50+ cells - Separate h/w, use case, power, location, …
  • 5. Setting the Scene: Storage Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 5 A F S NFS RB D CERNboxT S M S3 HSM Data Archive Developed at CERN 140PB – 25k tapes Data Analysis Developed at CERN 120PB – 44k HDDs File share & sync Owncloud/EOS 9’500 users CVMFS OpenStack backend CephFS Several PB-sized clusters NFS Filer OpenZFS/RBD/OpenStack Strong POSIX Infrastructure Services
  • 6. NFS Filer Service Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 6 • NFS appliances on top of OpenStack - Several VMs with Cinder volume and OpenZFS - ZFS replication to remote data center - Local SSD as accelerator (l2arc, ZIL) • POSIX and strong consistency - Puppet, GitLab, OpenShift, Twiki, … - LSF, Grid CE, Argus, … - BOINC, Microelectronics, CDS, …
  • 7. NFS Filer Service Limitations Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 7 • Scalability - Metadata Operations (read) • Availability - SPoF per NFS volume - ‘shared block device’ • Emerging use cases - HPC, see next slide
  • 8. Use Cases: HPC Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 8 • MPI Applications - Beam simulations, accelerator physics, plasma simulations, computation fluid dynamics, QCD … • Different from HTC model - On dedicated low-latency clusters - Fast access to shared storage - Very long running jobs • Dedicated CephFS cluster in 2016 - 3-node cluster, 150TB usable - RADOS: quite low activity - 52TB used, 8M files, 350k dirs - <100 file creations/sec Can we converge on CephFS ? (and if yes, how do we make it available to users?)
  • 9. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 9 Part I: The CephFS Backend
  • 10. CephFS Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 10 POSIX-compliant shared FS on top of RADOS - Same foundation as RBD Userland and kernel clients available - ‘ceph-fuse’ and ‘mount -t ceph’ - Features added to ceph-fuse first, then ML kernel ‘jewel’ release tagged production-ready - April 2016, main addition: fsck - ‘Almost awesome’ before, focus on object and block
  • 11. CephFS Meta-Data Server (MDS) Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 11 • Crucial component to build a fast and scalable FS - Creates and manages inodes (persisted in RADOS, but cached in memory) - Tracking client inode ‘capabilities’ (which client is using which inodes) • Larger cache can speed up meta-data throughput - More RAM can avoid meta-data stalls when reading from RADOS • Single MDS can handle limited number of client reqs/sec - Faster network / CPU enables more reqs/sec, but multiple MDSs needed for scaling - MDS keeps nothing in disk, a flash-only RADOS pool may accelerate meta-data intensive workloads
  • 12. CephFS MDS Testing Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 12 • Correctness and single-user performance - POSIX compliance: ok! (Tuxera POSIX Test Suite v20090130) - Two-client consistency delays: ok! (20-30ms with fsping) - Two-client parallel IO: reproducible slowdown • Mimic Puppet master - ‘stat’ all files in prod env from mutliple clients - 20k stats/sec limit • Meta-data scalability in multi-user scenarios - Multiple active MDS fail-over w/ 1000 clients: ok! - Meta-data load balancing heuristics: todo …
  • 13. CephFS Issues Discovered Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 13 • ‘ceph-fuse’ crash in quota code, fixed in 10.2.5 - http://tracker.ceph.com/issues/16066 • Bug when creating deep directores, fixed in 10.2.6 - http://tracker.ceph.com/issues/18008 • Bug when creating deep directores, fixed in 10.2.6 - http://tracker.ceph.com/issues/18008 • Objectcacher ignores max objects limits when writing large files - http://tracker.ceph.com/issues/19343 • Network outage can leave ‘ceph-fuse’ stuck - Difficult to reproduce, have server-side work-around, ‘luminous’ ? • Parallel IO to single file can be slower than expected - If CephFS detects several writers to the same file, it switches clients to unbuffered IO Desired: Quotas - should be mandatory for userland and kernel QoS - throttle/protect users
  • 14. ‘CephFS is awesome!’ Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 14 • POSIX compliance looks good • Months of testing: no ‘difficult’ problems - Quotas and QoS? • Single MDS meta-data limits are close to our Filer - We need multi-MDS! • ToDo: Backup, NFS Ganesha (legacy clients, Kerberos) • ‘luminous’ testing has started ...
  • 15. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 15 Part II: The OpenStack Integration
  • 16. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 16 LHC Incident in April 2016
  • 17. Manila: Overview Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 17 • File Share Project in OpenStack - Provisioning of shared file systems to VMs - ‘Cinder for file shares’ • APIs for tenants to request shares - Fulfilled by backend drivers - Acessed from instances • Support for variety of NAS protocols - NFS, CIFS, MapR-FS, GlusterFS, CephFS, … • Supports the notion of share types - Map features to backends Manila Backend 1. Request share 2. Create share 4. Access share User instances 3. Provide handle
  • 18. Manila: Components Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 18 manila-share Driver manila-share Driver manila-share Driver Message Queue (e.g. RabbitMQ) manila- scheduler manila-api DB REST API DB for storage of service data Message queue for inter-component communication manila-share: Manages backends manila-api: Receives, authenticates and handles requestsmanila-scheduler: Routes requests to appropriate share service (by filters) (Not shown: manila-data: copy, migration, backup)
  • 19. m-share Manila: Our Initial Setup Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 19 - Three controllers running m-{api, scheduler, share} - Separate Rabbit cluster - DB from external service - Existing CephFS backend - Set up and ‘working’ in <1h ! Driver RabbitMQ m-sched m-api DBm-api m-share Driver m-sched m-share Driver m-sched m-api Our Cinder setup has been changed as well …
  • 20. Sequential Share Creation/Deletion: ok! Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 20 • Create: ~2sec • Delete: ~5sec • “manila list” creates two auth calls - First one for discovery - Cinder/Nova do only one … ?
  • 21. Bulk Deletion of Shares: ok! Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 21 This is what users (and Magnum/Heat/K8s) do … (... and this breaks our Cinder! Bug 1685818). • Sequentially - manila delete share-01 - manila delete share-02 - manila delete share-03 - … • In parallel - manila delete share-01 share-02 … - Done with 100 shares in parallel After successful 24h tests of constant creations/deletions …
  • 22. Manila testing: #fouinehammer Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 22 m-share Driver RabbitMQ m-sched m-api DBm-api m-api 1 … 500 nodes 1 ... 10k PODs
  • 23. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 23 https://github.com/johanhaleby/kubetail
  • 24. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 24 k8s DNS restarts Image registry DDOS DB connection limits Connection pool limits This way  Stressing the API (1) PODs running ‘manila list’ in a loop ~linear until API processes exhausted … ok!
  • 25. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 25 1 pod 10010 10 1000 Request time [sec] Log messages DB connection pool exhausted Stressing the API (2) PODs running ‘manila create’ ‘manila delete’ in a loop ~works until DB limits are reached … ok!
  • 26. Components ‘fouined’ on the way Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 26 • Right away: Image registry (when scaling the k8s cluster) - 500 nodes downloading the image … - Increased RAM on image registry nodes - Used 100 nodes • ~350 pods: Kubernetes DNS (~350 PODs) - Constantly restarted: Name or service not known - Scaled it out to 10 nodes • ~1’000 pods: Central monitoring on Elastic Search - Too many logs messages • ~4’000 pods: Allowed DB connections (and the connection pool) - Backtrace in Manila API logs
  • 27. m-share Manila: Our (Pre-)Prod Setup Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 27 • Three virtual controllers - 4-core, 8GB - Newton - Puppet • 3-node RabbitMQ cluster - V3.6.5 • Three share types - Test/Prod: Ceph ‘jewel’ - Dev: Ceph ‘luminous’ Driver RabbitMQ m-sched m-api DBm-api m-api RabbitMQ RabbitMQ prod test dev
  • 28. Early user: Containers in ATLAS Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 28 Lukas Heinrich: Containers in ATLAS • Analysis workflows modeled as directed acyclic graphs, built at run-time - Graph’s nodes are containers • Addresses the issue of workflow preservation - Store the parametrized container workload • Built using a k8s cluster - On top of Magnum • Using CephFS to store intermediate job stages - Share creation via Manila - Leverageing CephFS integration in k8s
  • 29. ‘Manila is also awesome!’ Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 29 • Setup is straightforward • No Manila issues during our testing - Functional as well as stress testing • Most features we found missing are in the plans - Per share type quotas, service HA - CephFS NFS driver • Some features need some more attention - OSC integration, CLI/UI feature parity, AuthID goes with last share • Welcoming, helpful team!
  • 30. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 30 Arne.Wiebalck@cern.ch @ArneWiebalck Q U E S T I O N S ?

Editor's Notes

  1. European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) Founded in 1954, today 22 member states World’s largest particle physics laboratory Located at Franco-Swiss border near Geneva ~2’300 staff members, >12’500 users Budget: ~1000 MCHF (2016) CERN’s mission Answer fundamental questions of the universe Advance the technology frontiers Train the scientists and engineers of tomorrow Bring nations together