This document summarizes a presentation about Manila, an OpenStack file sharing service, running on CephFS at CERN. It discusses:
- Testing Manila against a CephFS backend, finding it works well for sequential share creation/deletion and bulk deletion.
- Stress testing Manila extensively using thousands of Kubernetes pods making concurrent file share requests, identifying components like the image registry, DNS, and database connections that were impacted.
- CERN's pre-production Manila setup using three controller nodes, separate RabbitMQ cluster, and existing CephFS backend, which was able to be set up in under an hour.
Slides from our Q3 meetup held in Montreal on September 27th 2017 at the Cloud.ca Center.
Video recording can be seen at: https://www.youtube.com/watch?v=_1btwHW39ms&list=PLSsQodeQD6LPyqrvvczcC5mkOOnPt469o
An introduction to OpenStack as project. This overview covers the basic components and architecture of the OpenStack platform, as well as presents facts around the global and local community.
Do you think that Nova, Cinder, Heat, Ceilometer, and Neutron are all references to global warming and looming apocalypse? For all those who come to the OpenStack community and wonder what all the fuss is about, this quick introduction will answer your many questions. It includes a short history of the largest Open Source project in history and will touch on
the basic OpenStack components, so you will be prepared the next time someone mentions Keystone, Nova and Swift in the same sentence.
This session was presented by Beth Cohen at the OpenStack meetup on Feb 19th, 2014 in Boston. Beth works for Verizon developing cool Cloud based products that she can't talk about without a strict NDA. She is a technical leader with over 25 years of experience architecting leading-edge system infrastructures and managing complex projects in the telecom, manufacturing, financial services, government, and technology industries. She has been involved in building some of the world's largest OpenStack architectures and has way too much fun at OpenStack Summits!
OpenStack is an open source cloud project and community with broad commercial and developer support. OpenStack is currently developing two interrelated technologies: OpenStack Compute and OpenStack Object Storage. OpenStack Compute is the internal fabric of the cloud creating and managing large groups of virtual private servers and OpenStack Object Storage is software for creating redundant, scalable object storage using clusters of commodity servers to store terabytes or even petabytes of data. In this tutorial, Bret Piatt will explain how to deploy OpenStack Compute and Object Storage, including an overview of the architecture and technology requirements.
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
The proliferation of connected devices and sensors is leading the Digital Transformation. By 2020 there will be over 20 billion connected devices. Data from these devices need to be ingested at extreme speeds in order to be analyzed before the data decays. The life cycle of the data is critical in revealing what insight can be revealed and how quickly they can be acted upon.
In this session we will look at the past, present and future architecture trends streaming analytics. We will look at how to turn all the data from devices into actionable insights and dive into recommendations for streaming architecture depending on the data streams and time factor of the data. We will also discuss how to manage all the sensor data, understand the life cycle cost of the data, and how to scale capacity and capability easily with a modern infrastructure strategy.
A session in the DevNet Zone at Cisco Live, Berlin. There are several new OpenStack projects/services that are built on core OpenStack infrastructure services. This session will first briefly discuss the changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum -- Container Service. A summary of other OpenStack related services will also be provided.
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...Cloud Native Day Tel Aviv
Many of the existing network functions, such as routers, firewalls, load balancers and such, have undergone the initial transition from a physical appliance to a virtual appliance. That transition required mostly performance optimization to accommodate the additional I/O overhead of the hypervisor and some configuration changes to accommodate the fact that a VM can be more dynamic in nature.
This shift to NFV, which is basically a cloud-based data center, has revolutionized the way network functions can be delivered. The transition to a cloud native world is considered far more disruptive as it touches changes in both the architecture, to accommodate hyper-scale and multi-tenancy, as well as the business model, which needs to be more consumption based, rather than fixed.
This talk will dive into the main requirements that differentiate a cloud native network function from the traditional network function, and, after making the leap from non-virtualized to virtualized network functions, what is then required to achieve cloud native capabilities, along with the challenges and benefits of this transition.
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
Many data scientists are already making heavy usage of the Jupyter ecosystem for analyzing data using interactive notebooks.
Apache Toree (incubating) is a Jupyter kernel designed to act as a gateway to Spark by enabling users Spark from standard Jupyter notebooks. This allows users to easily integrate Spark into their existing Jupyter deployments, This allows users to easily move between languages and contexts without needing to switch to a different set of tools.
Apache Toree is designed expressly for interactive work. It supports interpreters in Scala, Python, and R.
In this talk, I will cover the design of Toree, how it interacts with the Jupyter ecosystem and various ways in which users can extend the functionality of Apache Toree via a powerful plugin system.
Slides from our Q3 meetup held in Montreal on September 27th 2017 at the Cloud.ca Center.
Video recording can be seen at: https://www.youtube.com/watch?v=_1btwHW39ms&list=PLSsQodeQD6LPyqrvvczcC5mkOOnPt469o
An introduction to OpenStack as project. This overview covers the basic components and architecture of the OpenStack platform, as well as presents facts around the global and local community.
Do you think that Nova, Cinder, Heat, Ceilometer, and Neutron are all references to global warming and looming apocalypse? For all those who come to the OpenStack community and wonder what all the fuss is about, this quick introduction will answer your many questions. It includes a short history of the largest Open Source project in history and will touch on
the basic OpenStack components, so you will be prepared the next time someone mentions Keystone, Nova and Swift in the same sentence.
This session was presented by Beth Cohen at the OpenStack meetup on Feb 19th, 2014 in Boston. Beth works for Verizon developing cool Cloud based products that she can't talk about without a strict NDA. She is a technical leader with over 25 years of experience architecting leading-edge system infrastructures and managing complex projects in the telecom, manufacturing, financial services, government, and technology industries. She has been involved in building some of the world's largest OpenStack architectures and has way too much fun at OpenStack Summits!
OpenStack is an open source cloud project and community with broad commercial and developer support. OpenStack is currently developing two interrelated technologies: OpenStack Compute and OpenStack Object Storage. OpenStack Compute is the internal fabric of the cloud creating and managing large groups of virtual private servers and OpenStack Object Storage is software for creating redundant, scalable object storage using clusters of commodity servers to store terabytes or even petabytes of data. In this tutorial, Bret Piatt will explain how to deploy OpenStack Compute and Object Storage, including an overview of the architecture and technology requirements.
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
The proliferation of connected devices and sensors is leading the Digital Transformation. By 2020 there will be over 20 billion connected devices. Data from these devices need to be ingested at extreme speeds in order to be analyzed before the data decays. The life cycle of the data is critical in revealing what insight can be revealed and how quickly they can be acted upon.
In this session we will look at the past, present and future architecture trends streaming analytics. We will look at how to turn all the data from devices into actionable insights and dive into recommendations for streaming architecture depending on the data streams and time factor of the data. We will also discuss how to manage all the sensor data, understand the life cycle cost of the data, and how to scale capacity and capability easily with a modern infrastructure strategy.
A session in the DevNet Zone at Cisco Live, Berlin. There are several new OpenStack projects/services that are built on core OpenStack infrastructure services. This session will first briefly discuss the changes introduced for the project governance structure in OpenStack. Subsequently, the focus of the presentation will be to provide feature and architecture details on few of the new projects and services in OpenStack. These will include Trove-Database Service, Sahara-Dataprocessing Service, Congress - Policy Service and Magnum -- Container Service. A summary of other OpenStack related services will also be provided.
How Cloud Native VNFs Deployed on OpenStack Will Change the Telecom Industry ...Cloud Native Day Tel Aviv
Many of the existing network functions, such as routers, firewalls, load balancers and such, have undergone the initial transition from a physical appliance to a virtual appliance. That transition required mostly performance optimization to accommodate the additional I/O overhead of the hypervisor and some configuration changes to accommodate the fact that a VM can be more dynamic in nature.
This shift to NFV, which is basically a cloud-based data center, has revolutionized the way network functions can be delivered. The transition to a cloud native world is considered far more disruptive as it touches changes in both the architecture, to accommodate hyper-scale and multi-tenancy, as well as the business model, which needs to be more consumption based, rather than fixed.
This talk will dive into the main requirements that differentiate a cloud native network function from the traditional network function, and, after making the leap from non-virtualized to virtualized network functions, what is then required to achieve cloud native capabilities, along with the challenges and benefits of this transition.
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
Many data scientists are already making heavy usage of the Jupyter ecosystem for analyzing data using interactive notebooks.
Apache Toree (incubating) is a Jupyter kernel designed to act as a gateway to Spark by enabling users Spark from standard Jupyter notebooks. This allows users to easily integrate Spark into their existing Jupyter deployments, This allows users to easily move between languages and contexts without needing to switch to a different set of tools.
Apache Toree is designed expressly for interactive work. It supports interpreters in Scala, Python, and R.
In this talk, I will cover the design of Toree, how it interacts with the Jupyter ecosystem and various ways in which users can extend the functionality of Apache Toree via a powerful plugin system.
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
The physicists at CERN are increasingly turning to Spark to process large physics datasets in a distributed fashion with the aim of reducing time-to-physics with increased interactivity. The physics data itself is stored in CERN’s mass storage system: EOS and CERN’s IT department runs on-premise private cloud based on OpenStack as a way to provide on-demand compute resources to physicists. This provides both opportunity and challenges to Big Data team at CERN to provide elastic, scalable, reliable spark-as-a-service on OpenStack.
The talk focuses on the design choices made and challenges faced while developing spark-as-a-service over kubernetes on openstack to simplify provisioning, automate management, and minimize the operating burden of managing Spark Clusters. In addition, the service tooling simplifies submitting applications on the behalf of the users, mounting user-specified ConfigMaps, copying application logs to s3 buckets for troubleshooting, performance analysis and accounting of spark applications and support for stateful spark streaming applications. We will also share results from running large scale sustained workloads over terabytes of physics data.
The Hadoop Distributed File System is the foundational storage layer in typical Hadoop deployments. Performance and stability of HDFS are crucial to the correct functioning of applications at higher layers in the Hadoop stack. This session is a technical deep dive into recent enhancements committed to HDFS by the entire Apache contributor community. We describe real-world incidents that motivated these changes and how the enhancements prevent those problems from reoccurring. Attendees will leave this session with a deeper understanding of the implementation challenges in a distributed file system and identify helpful new metrics to monitor in their own clusters.
This presentation was inspired post read of "TimeSeries Databases" -- Ted Dunning & Ellen Friedman.
I have tried to summarize a lot of the previous bench marks. Hope others find it useful. The slides were compiled early 2015 so some of the results might have changed but the core literature should still hold.
Lessons learned from writing over 300,000 lines of infrastructure codeYevgeniy Brikman
This talk is a concise masterclass on how to write infrastructure code. I share key lessons from the “Infrastructure Cookbook” we developed at Gruntwork while creating and maintaining a library of over 300,000 lines of infrastructure code that’s used in production by hundreds of companies. Come and hear our war stories, laugh about all the mistakes we’ve made along the way, and learn what Terraform, Packer, Docker, and Go look like in the wild.
CERN, the European Organization for Nuclear Research, is one of the world’s largest centres for scientific research. Its business is fundamental physics, finding out what the universe is made of and how it works. At CERN, accelerators such as the 27km Large Hadron Collider, are used to study the basic constituents of matter. This talk reviews the challenges to record and analyse the 25 Petabytes/year produced by the experiments and the investigations into how OpenStack could help to deliver a more agile computing infrastructure.
Comparing TensorFlow and MxNet from a cloud engineering and production app building perspective. Looks at adoption, support, deployment and cloud support cross vendors.
Charts for the presentation at the OpenStack Summit at Barcelona, October 2016. The video is available at:
https://www.openstack.org/videos/video/toward-10000-containers-on-openstack
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
1.
2. Manila on CephFS at CERN
Our way to production
Arne Wiebalck
Dan van der Ster
OpenStack Summit
Boston, MA, U.S.
May 11, 2017
3. http://home.cern
About CERN
European Organization for
Nuclear Research
(Conseil Européen pour la Recherche Nucléaire)
- Founded in 1954
- World’s largest particle
physics laboratory
- Located at Franco-Swiss
border near Geneva
- ~2’300 staff members
>12’500 users
- Budget: ~1000 MCHF (2016)
Primary mission:
Find answers to some of the fundamental
questions about the universe! http://home.cern
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 3
4. The CERN Cloud at a Glance
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 4
• Production service since July 2013
- Several rolling upgrades since, now on Newton
• Two data centers, 23ms distance
- One region, one API entry point
• Currently ~220’000 cores
- 7’000 hypervisors (+2’000 more soon)
- ~27k instances
• 50+ cells
- Separate h/w, use case, power, location, …
5. Setting the Scene: Storage
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 5
A
F
S
NFS
RB
D
CERNboxT
S
M
S3
HSM Data Archive
Developed at CERN
140PB – 25k tapes
Data Analysis
Developed at CERN
120PB – 44k HDDs
File share & sync
Owncloud/EOS
9’500 users
CVMFS
OpenStack backend
CephFS
Several PB-sized clusters
NFS Filer
OpenZFS/RBD/OpenStack
Strong POSIX
Infrastructure Services
6. NFS Filer Service Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 6
• NFS appliances on top of OpenStack
- Several VMs with Cinder volume and OpenZFS
- ZFS replication to remote data center
- Local SSD as accelerator (l2arc, ZIL)
• POSIX and strong consistency
- Puppet, GitLab, OpenShift, Twiki, …
- LSF, Grid CE, Argus, …
- BOINC, Microelectronics, CDS, …
7. NFS Filer Service Limitations
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 7
• Scalability
- Metadata Operations (read)
• Availability
- SPoF per NFS volume
- ‘shared block device’
• Emerging use cases
- HPC, see next slide
8. Use Cases: HPC
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 8
• MPI Applications
- Beam simulations, accelerator physics, plasma
simulations, computation fluid dynamics, QCD …
• Different from HTC model
- On dedicated low-latency clusters
- Fast access to shared storage
- Very long running jobs
• Dedicated CephFS cluster in 2016
- 3-node cluster, 150TB usable
- RADOS: quite low activity
- 52TB used, 8M files, 350k dirs
- <100 file creations/sec
Can we converge on CephFS ?
(and if yes, how do we make it available to users?)
9. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 9
Part I: The CephFS Backend
10. CephFS Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 10
POSIX-compliant shared FS on top of
RADOS
- Same foundation as RBD
Userland and kernel clients available
- ‘ceph-fuse’ and ‘mount -t ceph’
- Features added to ceph-fuse first, then ML kernel
‘jewel’ release tagged production-ready
- April 2016, main addition: fsck
- ‘Almost awesome’ before, focus on object and block
11. CephFS Meta-Data Server (MDS)
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 11
• Crucial component to build a fast and scalable FS
- Creates and manages inodes (persisted in RADOS, but cached in memory)
- Tracking client inode ‘capabilities’ (which client is using which inodes)
• Larger cache can speed up meta-data throughput
- More RAM can avoid meta-data stalls when reading from RADOS
• Single MDS can handle limited number of client reqs/sec
- Faster network / CPU enables more reqs/sec, but multiple MDSs needed for scaling
- MDS keeps nothing in disk, a flash-only RADOS pool may accelerate meta-data intensive
workloads
12. CephFS MDS Testing
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 12
• Correctness and single-user performance
- POSIX compliance: ok! (Tuxera POSIX Test Suite v20090130)
- Two-client consistency delays: ok! (20-30ms with fsping)
- Two-client parallel IO: reproducible slowdown
• Mimic Puppet master
- ‘stat’ all files in prod env from mutliple clients
- 20k stats/sec limit
• Meta-data scalability in multi-user scenarios
- Multiple active MDS fail-over w/ 1000 clients: ok!
- Meta-data load balancing heuristics: todo …
13. CephFS Issues Discovered
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 13
• ‘ceph-fuse’ crash in quota code, fixed in 10.2.5
- http://tracker.ceph.com/issues/16066
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Bug when creating deep directores, fixed in 10.2.6
- http://tracker.ceph.com/issues/18008
• Objectcacher ignores max objects limits when writing large files
- http://tracker.ceph.com/issues/19343
• Network outage can leave ‘ceph-fuse’ stuck
- Difficult to reproduce, have server-side work-around, ‘luminous’ ?
• Parallel IO to single file can be slower than expected
- If CephFS detects several writers to the same file, it switches clients to unbuffered IO
Desired:
Quotas
- should be mandatory
for userland and kernel
QoS
- throttle/protect users
14. ‘CephFS is awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 14
• POSIX compliance looks good
• Months of testing: no ‘difficult’ problems
- Quotas and QoS?
• Single MDS meta-data limits are close to our Filer
- We need multi-MDS!
• ToDo: Backup, NFS Ganesha (legacy clients, Kerberos)
• ‘luminous’ testing has started ...
15. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 15
Part II: The OpenStack Integration
16. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 16
LHC Incident in April 2016
17. Manila: Overview
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 17
• File Share Project in OpenStack
- Provisioning of shared file systems to VMs
- ‘Cinder for file shares’
• APIs for tenants to request shares
- Fulfilled by backend drivers
- Acessed from instances
• Support for variety of NAS protocols
- NFS, CIFS, MapR-FS, GlusterFS, CephFS, …
• Supports the notion of share types
- Map features to backends
Manila
Backend
1. Request share
2. Create share
4. Access share
User
instances
3. Provide handle
18. Manila: Components
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 18
manila-share
Driver
manila-share
Driver
manila-share
Driver
Message Queue (e.g. RabbitMQ)
manila-
scheduler
manila-api
DB
REST API
DB for storage of service data
Message queue for
inter-component communication
manila-share:
Manages backends
manila-api:
Receives, authenticates and
handles requestsmanila-scheduler:
Routes requests to appropriate
share service (by filters)
(Not shown: manila-data: copy, migration,
backup)
19. m-share
Manila: Our Initial Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 19
- Three controllers running
m-{api, scheduler, share}
- Separate Rabbit cluster
- DB from external service
- Existing CephFS backend
- Set up and ‘working’ in <1h !
Driver
RabbitMQ
m-sched
m-api DBm-api
m-share
Driver
m-sched
m-share
Driver
m-sched
m-api
Our Cinder setup has been changed as well …
20. Sequential Share Creation/Deletion: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 20
• Create: ~2sec
• Delete: ~5sec
• “manila list” creates
two auth calls
- First one for discovery
- Cinder/Nova do only one … ?
21. Bulk Deletion of Shares: ok!
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 21
This is what users (and Magnum/Heat/K8s) do …
(... and this breaks our Cinder! Bug 1685818).
• Sequentially
- manila delete share-01
- manila delete share-02
- manila delete share-03
- …
• In parallel
- manila delete share-01 share-02 …
- Done with 100 shares in parallel
After successful 24h tests of constant creations/deletions …
22. Manila testing: #fouinehammer
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 22
m-share
Driver
RabbitMQ
m-sched
m-api DBm-api m-api
1 … 500 nodes
1 ... 10k PODs
23. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 23
https://github.com/johanhaleby/kubetail
24. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 24
k8s DNS
restarts
Image registry DDOS
DB connection limits
Connection pool limits
This way
Stressing
the API (1)
PODs running
‘manila list’
in a loop
~linear until API
processes exhausted …
ok!
25. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 25
1 pod 10010 10 1000
Request time [sec]
Log messages
DB connection
pool exhausted
Stressing
the API (2)
PODs running
‘manila create’
‘manila delete’
in a loop
~works until DB limits
are reached …
ok!
26. Components ‘fouined’ on the way
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 26
• Right away: Image registry (when scaling the k8s cluster)
- 500 nodes downloading the image …
- Increased RAM on image registry nodes
- Used 100 nodes
• ~350 pods: Kubernetes DNS (~350 PODs)
- Constantly restarted: Name or service not known
- Scaled it out to 10 nodes
• ~1’000 pods: Central monitoring on Elastic Search
- Too many logs messages
• ~4’000 pods: Allowed DB connections (and the connection pool)
- Backtrace in Manila API logs
27. m-share
Manila: Our (Pre-)Prod Setup
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 27
• Three virtual controllers
- 4-core, 8GB
- Newton
- Puppet
• 3-node RabbitMQ cluster
- V3.6.5
• Three share types
- Test/Prod: Ceph ‘jewel’
- Dev: Ceph ‘luminous’
Driver RabbitMQ
m-sched
m-api DBm-api m-api
RabbitMQ
RabbitMQ
prod test dev
28. Early user: Containers in ATLAS
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 28
Lukas Heinrich: Containers in ATLAS
• Analysis workflows modeled as directed
acyclic graphs, built at run-time
- Graph’s nodes are containers
• Addresses the issue of workflow
preservation
- Store the parametrized container workload
• Built using a k8s cluster
- On top of Magnum
• Using CephFS to store intermediate job
stages
- Share creation via Manila
- Leverageing CephFS integration in k8s
29. ‘Manila is also awesome!’
Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 29
• Setup is straightforward
• No Manila issues during our testing
- Functional as well as stress testing
• Most features we found missing are in the plans
- Per share type quotas, service HA
- CephFS NFS driver
• Some features need some more attention
- OSC integration, CLI/UI feature parity, AuthID goes with last share
• Welcoming, helpful team!
30. Arne Wiebalck & Dan van der Ster: Manila on CephFS at CERN, OpenStack Summit Boston, May 2017 30
Arne.Wiebalck@cern.ch
@ArneWiebalck
Q
U
E
S
T
I
O
N
S
?
Editor's Notes
European Organization for NuclearResearch (Conseil Européen pour la Recherche Nucléaire)
Founded in 1954, today 22 member states
World’s largest particle physics laboratory
Located at Franco-Swiss border near Geneva
~2’300 staff members, >12’500 users
Budget: ~1000 MCHF (2016)
CERN’s mission
Answer fundamental questions of the universe
Advance the technology frontiers
Train the scientists and engineers of tomorrow
Bring nations together