Henri DOREAU - Taking part in the Lustre Filesystem community
The open-source Lustre distributed filesystem is the cornerstone of numerous world-class High Performance Computing (HPC) sites. The CEA/DAM has been a user and contributor for years, working with major tech companies and other leading HPC sites all over the world. Beyond the fruitful technical collaboration, CEA/DAM has also participated in building community organizations such as EOFS (European Open FileSystem). Despite major organizational changes throughout its history, the Lustre project has always exhibited a remarkable sustainability in its community of users and developers. This presentation will describe the multiple interactions between the CEA/DAM and the Lustre community in its largest definition, how they're managed and some of their direct outcomes.
Philippe DENIEL - NFS-Ganesha: an Opensource NFS server in the User Space
This paper describe why CEA developed its own NFS server and why this project was pushed as open source software. It shows how it gave birth to a "NFS-Ganesha developers community" and how this community organized itself. The extension and development of the community years after years is shown too, with the incoming benefit of such a collaboration in-between a institutional scientific research center like CEA and the industry.
1. Working with FOSS communities at
CEA
Philippe DENIEL (philippe.deniel@cea.fr)
Henri DOREAU (henri.doreau@cea.fr)
21 octobre 2014 | PAGE 1
2. CEA in a few words
Teaching and dissemination
of knowledge
Valuation and technological
dissemination
Low Carbon
Energy
Very Large
Research
Infrastructures
Defense and
deterrence
Technologies for
Information and health
Fundamental Research
Recherche Fondamentale
30% de subvention
6. PAGE 6
Shared TERA/TGCC tools
CEA TERA/TGCC teams have expertise in HPC
Managing very large clusters
Managing high performance parallel file systems
Managing highly capacitive (~100PB) storage systems
Those teams have develop their own tools
ClusterShell: a python based parallel shell capable of dealing with large clusters
http://cea-hpc.github.io/clustershell/
Collaboration to the development of the Lustre file system
http://lustre-shine.sourceforge.net/
Shine: clustershell based utility to administrate large Lustre configuration
http://lustre-shine.sourceforge.net/
NFS-Ganesha: a generic NFS server running in user-space
https://github.com/nfs-ganesha/nfs-ganesha/wiki
Robinhood: advanced FS audit and monitoring software
http://robinhood.sf.net
The rest of the topic focuses on Lustre and NFS-Ganesha
7. PAGE 7
OpenSource products at TERA and TGCC
HPC is a âniche marketâ
HPC market brings good image to companies involved in it
...but HPC market brings less money than the enterprise market
Companies will shoot the works on enterprise customers
Enterprise products do not fit HPC needs
Lack of scalability
Weak inter-operability with HPC simulation code
HPC generates a âpressureâ on software that is beyond compared
CEA chose to develop its own tools
We have something which fits perfectly our needs
The estimated cost, in man-years, is smaller than the cost to maintain a badly adapted
solution in production
CEA policy is to collaborate and
share knowledge
Share home-made tools in Open Source is a natural behavior
All other HPC sites will behave the same
8. PAGE 8
Ganesha : a community born at CEA
Ganesha was born because of TERA's needs
We needed a server to export a proprietary HSM's namespace via NFSv3
We had to develop something of our own
We choose to made it generic and capable of dealing with various protocols and
backends
Ganesha was an opensource product since its design
Backend-specific part of the product was isolated in dedicated library called FSALs
(File System Abstraction Layer). Today FSALs exist for XFS, VFS, LUSTRE, CEPH,
GPFS, GLUSTERFS, ZFS
Ganesha become the first âmulti-usageâ NFSv3+NFSv4 server in User Space for Linux
The Industry is in love with the Open Source Model
Ganesha is OpenSource since 7/21/2007 (first release on SourceForge)
IBM became an active contributor in 2009
LinuxBox/CohortFS came in late 2009
Panasas joined the community in early 2011
RedHat joined in 2013 (Ganesha will be part of Fedora21)
The community now involves more than 35 steady commiters from about 10
companies
9. PAGE 9
Bringing up the Ganesha Community
Creating a community = communicating
Expose project's releases on SourceForge
Create mailing lists related to the project (a least one dedicated to users and one
dedicated to developers). SourceForge can host such lists
Expose source repositories to encourage people downloading dev versions and
compile/modify them
Manage source using Git : managing remote commiters is easy
Expose git tree on the web (for example on github.com)
Have a website and/or a wiki to give information
A centralized bug repository is critical
Ganesha bug tracker is hosted by RedHat's bugzilla
There is nothing like verbal communication
Submit abstracts and papers to conference
A 30' topic is really cool : people will attend your topic and read the proceedings
Do not underestimate âlesserâ sessions
BBOOFF sseessssiioonnss :: technically skilled people attend it, some may find interest in your
project and start collaborating. At least, they can do positive report to their bosses
WiP Sessions : very small topics (about 5') but people involved in technology
watch often attend it
Poster Sessions : makes it possible to have long talks with potential contributors
10. PAGE 10
The community in action (1/2)
Main issue : deal with remote people
Contributors are spread across different countries and timezones
India is 4h30 âlaterâ
USA Central Time is 7 hours âsoonerâ
The main problem is to keep people in sync.
Information channels
Use the mailing list as much as possible
It's easy to follow a discussion thread
Majordomo is keeping archive of the messages post on the list.
People ask for review of the patch on the list
Currently, reviews are made via github.com website
For âsynchronousâ discussion, people prefer talking on IRC
After several years, the project finally has a logo !!!
11. PAGE 11
The community in action (1/2)
Checkpoints
Weekly concalls (phone conference)
New features and patches are discussed
Status of branches in the official source repository is addressed
Decisions are taken during the concall
Attendees can introduce âopen topicsâ to talk about possible new features or bugs
IRL meeting
Ganesha community meets once a year during Connectathon, a larger conference
dedicated to NFS interoperability
Part of the community attends the âbake-a-thonâ (non official connectathon), twice
a year
Industrial contributors: good or bad ?
90% of the Ganesha contributors belong to the industry
Ganesha is part of a future product (we use LGPL)
People from the industry have very strict test suites and QA
The open source economical model is recognized by a valuable one by everyone
BBUUTT
People from different companies are competitors
They play the game of the open source but do not forget the rules of the market
The balance is quite positive: the project wins almost 10 man-year each year
through FOSS collaboration
12. PAGE 12
Lustre, the galactic filesystem
Scalable clustered filesystem
Powers the world's most powerful supercomputers
Tens of thousand of clients
Hundreds of petabytes of storage
TeraBytes per second of I/O throughput
Fully software solution
Kernel-land (Linux)
Distributed under the terms of GNU GPLv2
Actively developed (~100 contributors per major release)
Drives an entire ecosystem (robinhood policy engine, hadoop adapter...)
13. PAGE 13
Project history
Started 1999, P. Braam at Carnegie Mellon University
Founded Cluster Filesystem (CFS) company in 2001
Acquired by Sun Microsystems in 2007
Acquired by Oracle in 2010, which dropped it less than a year later
Creation of whamcloud
Acquired by Intel in 2012
Xyratex Ltd. bought the IP in Feb. 2013 and gave it back to the community
The core developers mostly remained the same
The community organized itself to cope with these changes (OpenSFS, EOFS)
14. PAGE 14
Lustre community today
Diverse backgrounds, same goals
Major HPC actors
Intel
Seagate
Cray
Bull
Large computing centers
USA: LLNL, ORNL, NASA, NCSA...
France: CEA, Total, EDF, MeteoFrance...
Germany: FZJ, HLRS
Italy: Cineca
Asia: RIKEN
Australia: NCI
...
Universities
University of Indiana
University of Reims
University of Dresden
Stanford University
...
15. PAGE 15
Working together
Sharing ideas, sharing code, sharing benefits
Continuous integration techniques
Each and every patch involves several developers
Improve code quality
Improve communication within the project
Regular, major events
Lustre User Group (OpenSFS, USA)
Lustre Admin & Dev workshop (EOFS, France)
China Lustre User Group
Japan Lustre User Group
Strong links between administrators and developers
Sysdevs get feedback from sysadms...
...sometimes they are the same persons!
Product architects are quite active on the mailing lists
Shared best practices
Code Reviews and âofficialâ branches hosted in Gerrit
Unified coding style and documentation writing
16. PAGE 16
As a conclusion
Collaboration with FOSS community is good
A way to bring more men-years to the project
Contributors with different use cases will highlight bugs
Sharing and communicating
A community is a good place to implement and share good practices
The community is structured by the common tools and common âvirtuousâ ways of
using them
Contributors âTables of the Lawâ will provide a strong and reliable backbone to the
project
Do not hesitate to work with the industry
Open Source software is a valuable economical model
The industry invests a lot in Open Source
Industrial won't have the same goals as research institution, but common âroot needsâ
are easy to find to start collaborating
Choose a license which is compatible with such a collaboration (LGPLv3, CeCILL-C,...)