0
CernVM File SystemWorkshop   Steve Traylen, steve.traylen@cern.ch   CERN, IT-PS-PES.            EGEE User Forum           ...
Outline• CvmFS Description  – Design  – Security• CvmFS at Sites  – Clients, Squids• CvmFS at Stratum Ones and Zero  – Cvm...
CvmFS Motivation• CvmFS is a Network File System  – Designed with software distribution in    mind.    • Lots of small file...
CvmFS Design• Indistinguishable from a real filesystem  – Easy for the end user.• Security  – File integrity is checked by ...
CvmFS Deployed/cvmfs/repo/MyFile Shadow Tree: the one write                                                               ...
Day in The Life of a File• Scenario:  – Repository maintainer wants a file on all    batch workers.• Steps  – File publicat...
File Publication• Maintainer copies or creates file at  stratum 0:  – This is in the eventual correct path e.g,    • /cvmfs...
File Publication (2)             /cvmfs/biomed.example.org/MyFile                                      cvmfs_sync  •http:/...
File Retrieval via fuse.• CvmFS clients are a plugin in to fuse.  – fuse intercepts all filesystem requests,    • e.g stat,...
File Retrieval (2)• Batch job wants the file  – /cvmfs/biomed.examle.org/MyFile• cvmfs performs the following  – Client dow...
What was the Point of• Why bother with all that complication  – Why not serve files as is.• File system layout in sqlite da...
File Security/Integrity• Main risks  – Files are being delivered via http.  – Files may pass through 3rd party squids, ......
CvmFS at Sites - Squid• CvmFS clients should not connect  directly to stratum one servers.  – A squid or other http proxy ...
Squid Setup• A standard squid from OS vendor is  perfectly good enough. A few  configurations are important.  – maximum_obj...
Squid Setup (2)• Site squids are contacted by all batch  workers:• Following config’s are for large clusters.  – max_filedes...
Squid Setup (3)• CvmFS clients support a list of squid  servers.  – Random list “SquidA|SquidB”    • One site with two squ...
Squid and Cache Digests• Cache digests allow a cluster of squids  to work together.  – A pair (or more) site squids or str...
CvmFS at Sites - Client• Install CvmFS packages via                      http://cernvm.cern.ch/portal/  filesystem   – Inst...
CvmFS Client• CvmFS uses a default file and override  configuration method.  – /etc/cvmfs/default.conf is in the package  – ...
CvmFS Client• Cache location and size.  – CVMFS_QUOTA_LIMIT=10000 (MB)  – CVMFS_CACHE_BASE=/var/cache/• Note the cache is ...
CvmFS Client• Per domain/repository overrides are also  possible:  – /etc/cvmfs/default.conf     • global configuration fro...
CvmFS Client• The previous richness of config allows  for specials per repository - Use cases:  – Repository A requires mor...
Debugging Clients• Dump resulting configuration, all those  files make it complicated.  – cvmfs_config showconfig• Enable lots...
Interrogating Clients• When CvmFS file system is mounted it  can be spoken to via a socket as root,  e.g  – cvmfs-talk -i a...
CvmFS at Stratum 1• The stratum one level provides all the  redundancy for the clients.• There should be several stratum o...
Stratum One Architecture                 Stratum 00                  Stratum                                Stratum one re...
Stratum 1 downloads, Feb• Spike on 7th February caused by one  batch cluster connecting directly with a  bug.  – More than...
CvmFS at Stratum Zero• The stratum 0 is the one write location.• Typically a stratum zero is made up of  – A large NFS or ...
CvmFS Stratum Zero• Repository maintainer writes files to  – /cvmfs/biomed.example.org• A log of all file operations are mad...
Stratum Zero Advice• Stratum Zeros is the point where bad  releases may have to be rolled back.  – Once a bad release has ...
Stratum Zero Failure• The stratum ones continue to serve all  their existing files.• Clients will not notice in anyway that...
Stratum Zero Security• Two x509 key pairs are involved:  – Repository managers key.    • Private key lives on repository m...
Stratum Zero Security(2)• Once per month a file (.cvmfswhitelist)  is injected into biomed repository by  the Stratum 0 man...
Atlas Comments on CvmFS• Currently used for  – Software both stable and nightly builds.  – Conditions data  – Around 0.5TB...
CvmFS Current/Future• Migration from automake to cmake.• MacOS client - available but no official  release.• Shared cache ...
Support• A mailing list hosted at http://cern.ch/  egroups  – cvmfs-talk@cern.ch• Bug tracker:  – https://savannah.cern.ch...
Conclusions• CvmFS solves well the problem of file  distribution to 100,000s of clients in a  fast, efficient and secure wa...
Upcoming SlideShare
Loading in...5
×

CvmFS Workshop

4,923

Published on

An introduction to CvmFS and subsequent deployment of CvmFS for the LHC experiments.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,923
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "CvmFS Workshop"

    1. 1. CernVM File SystemWorkshop Steve Traylen, steve.traylen@cern.ch CERN, IT-PS-PES. EGEE User Forum 28th March 2012
    2. 2. Outline• CvmFS Description – Design – Security• CvmFS at Sites – Clients, Squids• CvmFS at Stratum Ones and Zero – CvmFS for repository maintainers.• State of CvmFS within WLCG.• CvmFS Future Work Steve Traylen, CERN-IT. 2
    3. 3. CvmFS Motivation• CvmFS is a Network File System – Designed with software distribution in mind. • Lots of small files. • Additions not constant but every few hours or days. • Minimize Distribution Delay • Files are typically accessed more than once. – Write in one location • Repository node. – Read in 100,000s of locations. Steve Traylen, CERN-IT. 3
    4. 4. CvmFS Design• Indistinguishable from a real filesystem – Easy for the end user.• Security – File integrity is checked by every client on every file.• Standard Protocols and Software – Uses http (not s) everywhere as it is easy. – apache httpd, squid but whatever. – Standard linux fuse at the client. Steve Traylen, CERN-IT. 4
    5. 5. CvmFS Deployed/cvmfs/repo/MyFile Shadow Tree: the one write Stratum 0 location. cvmfs_sysc, operates on all new files in repo, e.g MyFile./repo/A345....de43b Public tree: contains hashed compressed files. Stratum 0 Web Server - only Stratum 1s ever connect. Stratum ones copy all new Stratum data with “cvmfs_replicate”. Stratum 1 Full Copy Stratum 1 Full Copy Geo separated and 1s fully redundant. SiteA OnDemand SiteB Partial CacheSquids Sites Batch Batch Steve Traylen, CERN-IT. 5
    6. 6. Day in The Life of a File• Scenario: – Repository maintainer wants a file on all batch workers.• Steps – File publication , happens on repository maintainer node. – File retrieval, happens on all batch workers. Steve Traylen, CERN-IT. 6
    7. 7. File Publication• Maintainer copies or creates file at stratum 0: – This is in the eventual correct path e.g, • /cvmfs/biomed.example.org/MyFile• Maintainer tests new file system – /cvmfs/biomed.example.org• Maintainer “commits” all new files - cvmfs_sync – Files are compressed and renamed to their sha1. – An SQLite db has a new record for MyFile added. Steve Traylen, CERN-IT. 7
    8. 8. File Publication (2) /cvmfs/biomed.example.org/MyFile cvmfs_sync •http://example.org/biomed/12a..edf2 - Actual compressed MyFile •http://example.org/biomed/23f..ad22C - SQLite Database containing | /MyFile | 12a..edf2 | one record per file. •http://example.org/biomed/.cvmfspublished - A pointer to catalog Simple text file with catalog file name , 12a..edf2.• The .cvmpublished has a TTL of 15 minutes.• All other files have a TTL of 3 days. Steve Traylen, CERN-IT. 8
    9. 9. File Retrieval via fuse.• CvmFS clients are a plugin in to fuse. – fuse intercepts all filesystem requests, • e.g stat, ls, cat , gcc, open, .... • cvmfs handles all file retrieval and presents file normally to the application. – a local area of disk is configured as disk cache. Steve Traylen, CERN-IT. 9
    10. 10. File Retrieval (2)• Batch job wants the file – /cvmfs/biomed.examle.org/MyFile• cvmfs performs the following – Client downloads .cvmfspublished • This provides the file name “23f...ad22C” of sqlite catalog of the user required file paths. – Client downloads sqlite catalog • This provides the real on disk file name of ‘MyFile’, i.e 12a..edf2 – Client downloads data file ‘12a..edf2’ • fuse presents ‘12a..edf2’ as MyFile to batchjob. Steve Traylen, CERN-IT. 10
    11. 11. What was the Point of• Why bother with all that complication – Why not serve files as is.• File system layout in sqlite database. – Operations like ls, stat, find . -type f are very quick. – The data is only downloaded as files are opened.• De-duplication, e.g MyFile and MySameFile. – All files are saved with name of their sha1. – The duplicates are just extra rows in sqlite db. • No point having two files the same in cache or downloading same file twice. – Cache slots never need to be overwritten with new version of file. Steve Traylen, CERN-IT. 11
    12. 12. File Security/Integrity• Main risks – Files are being delivered via http. – Files may pass through 3rd party squids, ... • files from cern to cern sometimes go via BNL.• x509 keys and certs are generated. – public certificate is delivered in advance to all sites. – release machine signs the first file .cvmfspublished at cvmfs_sync time.• All files opened after this are located by sha1 name only and the sha1 is verified for each file.• This is the simplified version of what Steve Traylen, CERN-IT. 12
    13. 13. CvmFS at Sites - Squid• CvmFS clients should not connect directly to stratum one servers. – A squid or other http proxy should be installed. • Can be a squid for a batch farm. • A university level squid. • A squid shared with another site. – Setting up two squids in redundant fashion is easy. • Client supports random and/or ordered lists of squids.• CvmFS clients are not blocked from Steve Traylen, CERN-IT. 13
    14. 14. Squid Setup• A standard squid from OS vendor is perfectly good enough. A few configurations are important. – maximum_object_size - specifies max file size to cache. • default is 4MB , recommended 4GB. – cache_dir - specifies size of disk cache. • default is 100MB , recommended 50GB minimum.• Both values depend greatly on active total and individual file size in repository. Steve Traylen, CERN-IT. 14
    15. 15. Squid Setup (2)• Site squids are contacted by all batch workers:• Following config’s are for large clusters. – max_filedesc - Increase maximum open sockets. • Default 1024, increase to 8192 • Verify usage with: squidclient mgr:info – Maximum number of file descriptors: 8192 Largest file desc currently in use: 2839 Number of file desc currently in use: 2753 – net.ipv4.neigh.default.gc_thresh* - arp table. Steve Traylen, CERN-IT. 15
    16. 16. Squid Setup (3)• CvmFS clients support a list of squid servers. – Random list “SquidA|SquidB” • One site with two squid servers. – Ordered list ‘SquidSiteMine;SquidSiteOther’ • One site using its own squid in preference to another site’s squid server.• CvmFS clients move to next squid if files cannot be downloaded correctly. – Files are always checksummed after Steve Traylen, CERN-IT. 16
    17. 17. Squid and Cache Digests• Cache digests allow a cluster of squids to work together. – A pair (or more) site squids or stratum one squids can benefit.• Squids peer from one another. – i.e A site with 3 site squid servers will only download each file once. After that each squid will fetch it first from an adjacent squid rather than going to the higher level to fetch the file.• http://wiki.squid-cache.org/SquidFaq/ Steve Traylen, CERN-IT. 17
    18. 18. CvmFS at Sites - Client• Install CvmFS packages via http://cernvm.cern.ch/portal/ filesystem – Install guide present. – RHEL 5 and 6 packages, debian has been built from source.• Configure either with script (cvmfs_config setup) or by hand: – /etc/fuse.conf # Fuse Configuration • Allow other people to read fuse mount – /etc/auto.master # AutoFS configuration • Enable the /etc/auto.cvmfs• chkconfing cvmfs on && service cvmfs on• CvmFS clients default to enable e.g /cvmfs/ Steve Traylen, CERN-IT. 18
    19. 19. CvmFS Client• CvmFS uses a default file and override configuration method. – /etc/cvmfs/default.conf is in the package – /etc/cvmfs/default.local is custom overrides.• Minimal changes to make: – Sites should specify a squid service for their site. • CVMFS_HTTP_PROXY=http://yoursquid:2138 – Sites should specify an ordered stratum Steve Traylen, CERN-IT. 19
    20. 20. CvmFS Client• Cache location and size. – CVMFS_QUOTA_LIMIT=10000 (MB) – CVMFS_CACHE_BASE=/var/cache/• Note the cache is exclusive to each repository. – A future version of CvmFS will share a cache across all repositories. Steve Traylen, CERN-IT. 20
    21. 21. CvmFS Client• Per domain/repository overrides are also possible: – /etc/cvmfs/default.conf • global configuration from package. – /etc/cvmfs/default.local • global configuration from site admin. – /etc/cvmfs/domain.d/example.org.conf • configuration for *.example.org repos from package – /etc/cvmfs/domain.d/example.org.local • configuration for *.example.org repos from site admin – /etc/cvmfs/config.d/biomed.example.org.conf • configuration for biomed.example.org from package. – /etc/cvmfs/config.d/biomed.example.org.local Steve Traylen, CERN-IT. 21
    22. 22. CvmFS Client• The previous richness of config allows for specials per repository - Use cases: – Repository A requires more cache space than default. • Currently 4GB is enough for LHC VOs but LHCb requires 6. – Repository B is not supported on all or different stratum one services. • Currently ams.cern.ch is only on CERN stratum one. Steve Traylen, CERN-IT. 22
    23. 23. Debugging Clients• Dump resulting configuration, all those files make it complicated. – cvmfs_config showconfig• Enable lots of verbosity to a log file: – CVMFS_DEBUGLOG = /tmp/cvmfs.log • Files grows quickly so switch off.• Mount outside the auto mounter – mkdir /tmp/mnt mount -t cvmfs biomed /tmp/mnt• Check syslog – cvmfs dumps a stack trace on crash. Steve Traylen, CERN-IT. 23
    24. 24. Interrogating Clients• When CvmFS file system is mounted it can be spoken to via a socket as root, e.g – cvmfs-talk -i atlas host info - determine which stratum one is being used. • Active host 0: http://cvmfs1.example.ch/opt/ biomed – The local cache can be inspected. • What space is pinned or can be purged. – The active site squid server can be found. • Are all my hosts using that remote squid server and not mine? Steve Traylen, CERN-IT. 24
    25. 25. CvmFS at Stratum 1• The stratum one level provides all the redundancy for the clients.• There should be several stratum ones at different sites.• WLCG has 5 stratum ones. 2 or 3 (or even one) can easily handle the current load of 70,000 clients providing site squids are used. – CERN’s stratum one peaks around 40 megabits.• Stratum ones update once per hour from stratum zero. Steve Traylen, CERN-IT. 25
    26. 26. Stratum One Architecture Stratum 00 Stratum Stratum one replicates all files from stratum 0. It uses CvmFS meta data, i.e SQLite files to only download new Stratum 1 Backend files. Stratum one frontends are reverse proxies. i.e Stratum 1 Frontend Stratum 1 Frontend web servers that fetch and cache files from backend node. Site A Site B Site C• Number of sites cannot impact replication of stratum 0 to stratum 1.• Stratum 1 can be scaled up with more front-ends. Steve Traylen, CERN-IT. 26
    27. 27. Stratum 1 downloads, Feb• Spike on 7th February caused by one batch cluster connecting directly with a bug. – More than trebled sum of all other traffic. – Site contacted, they changed their configuration.• Stratum 1 is vulnerable to this but plenty of capacity is available, it can Steve Traylen, CERN-IT. 27
    28. 28. CvmFS at Stratum Zero• The stratum 0 is the one write location.• Typically a stratum zero is made up of – A large NFS or similar diskspace with two areas: • shadow tree /cvmfs/biomed.example.org – The write version of the repository • public tree /pub – The processed tree served via a web server. – One small virtual machine per repository. • Each repository must have its own dedicated node. • Write access to the repository controlled with login access to the node. Steve Traylen, CERN-IT. 28
    29. 29. CvmFS Stratum Zero• Repository maintainer writes files to – /cvmfs/biomed.example.org• A log of all file operations are made. – This is done with a 3rd party kernel module - redirfs• Repository maintainer can now validate his installation and decide if he wants to publish. – Provides a window of opportunity to uncover mistakes, bad software, .... Steve Traylen, CERN-IT. 29
    30. 30. Stratum Zero Advice• Stratum Zeros is the point where bad releases may have to be rolled back. – Once a bad release has been published it will be visible at all sites in your entire infrastructure possibly declaring your whole infrastructure useless.• Within WLCG stratum zero, filesystem snapshots are in place to allow a rollback. – Various mechanisms have been used, e.g • Netapp, LVM and ZFS snapshots have all been Steve Traylen, CERN-IT. 30
    31. 31. Stratum Zero Failure• The stratum ones continue to serve all their existing files.• Clients will not notice in anyway that the stratum zero is missing.• During failure new writes to the repository can not be made. Steve Traylen, CERN-IT. 31
    32. 32. Stratum Zero Security• Two x509 key pairs are involved: – Repository managers key. • Private key lives on repository manager machine • It is used to sign the .cvmfspublished file during a release of biomed.example.org. • Clients do not trust this signature in advance of release. – Stratum Zero managers key. • Private key lives offline , e.g on crypto card. • Public certificate is deployed to every single CvmFS client. – CvmFS clients trust this service managers key completely. Steve Traylen, CERN-IT. 32
    33. 33. Stratum Zero Security(2)• Once per month a file (.cvmfswhitelist) is injected into biomed repository by the Stratum 0 manager. – The whitelist file is signed by the Stratum 0 manager and contains a list of repository manager identities.• The file states to the client: – Given you trust me please also trust these release manager machines for the next month.• The client checks the whitelist first to Steve Traylen, CERN-IT. 33
    34. 34. Atlas Comments on CvmFS• Currently used for – Software both stable and nightly builds. – Conditions data – Around 0.5TB of files are served.• While CvmFS is recommended for sites it is not universally used yet. – Some sites unwilling/unable to install fuse clients. • policy, diskless, only nfs space or similar weirdness. – To use CvmFS at these sites they require both: Steve Traylen, CERN-IT. 34
    35. 35. CvmFS Current/Future• Migration from automake to cmake.• MacOS client - available but no official release.• Shared cache on client between repositories.• A cvmfs plugin to parrot , i.e user space.• Server side to use AUFS for release changes. • AUFS = Advanced multilayered unification filesystem. Steve Traylen, CERN-IT. 35
    36. 36. Support• A mailing list hosted at http://cern.ch/ egroups – cvmfs-talk@cern.ch• Bug tracker: – https://savannah.cern.ch/projects/cernvm/• Source code migrating now. – Current Release - cern svn. – Devel - http://github.com/cvmfs• Release and documentation: – http://cernvm.cern.ch/portal/filesystem Steve Traylen, CERN-IT. 36
    37. 37. Conclusions• CvmFS solves well the problem of file distribution to 100,000s of clients in a fast, efficient and secure way.• CvmFS is mission critical today for ATLAS, LHCb and shortly CMS.• It is easy to set up the client so long as fuse is acceptable.• The server side has been setup for other VOs outside WLCG in particular at SLAC and OSG. INFN and SARA have Steve Traylen, CERN-IT. 37
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×