Handling High Energy Physics Data using Cloud Computing

3,405 views

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,405
On SlideShare
0
From Embeds
0
Number of Embeds
2,561
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Handling High Energy Physics Data using Cloud Computing

  1. 1. High Energy Physics DataManagement usingCLOUD ComputingANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLINGPAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year |Mr. SOMENATH ROY CHOWDHURY1
  2. 2. Contents Motivation HEP Legacy Project CANFAR Astronomical Research Facility System Architecture Operational Experience Summary5/25/20132
  3. 3. What exactly is BaBar? It’s design was motivated by the investigationof CP violation. set up to understand the disparity betweenthe matter and antimatter content of the universeby measuring CP violation. BaBar focuses on the study of CP violation in the Bmeson system. nomenclature for the B meson (symbol B) andits antiparticle (symbol B, pronounced B bar)5/25/20133
  4. 4. BaBar : Data Point of View 9.5 million lines of C++and Fortran Compiled size is 30 GB Significant amount ofmanpower is required tomaintain the software Each installation must bevalidated beforegenerated results will beaccepted.CANFAR is a partnership between :– University of Victoria– University of British Columbia– National Research Council, Canadian AstronomyData Centre– Herzberg Institute for Astrophysics Helps in providing Infrastructure for VMs.5/25/20134
  5. 5. Need for Cloud Computing: Jobs are embarrassingly parallel, muchlike HEP. Each of these surveys requires a differentprocessing environment, which require: A specific version of a Linuxdistribution. A specific compiler version. Specific libraries Applications have little documentation. These environments are evolving rapidly5/25/20135
  6. 6. DATA is precious,too precious..We need Infrastructure,which comes easily as aService5/25/20136
  7. 7. A word about Cloud Computing:5/25/20137
  8. 8. IaaS: What next? With IaaS, we can easily createmany instances of a VM image How do we Manage the VMsonce booted? How do we get jobs to theVMs?5/25/20138
  9. 9. Our Solution: Cloud Scheduler + Condor Users create a VM with theirexperiment software installed. A basic VM is created by one group,and users add on their analysis orprocessing software to create theircustom VM. Users then create batch jobs as theywould on a regular cluster, but theyspecify which VM should run theirimages.CONDOR5/25/20139
  10. 10. Steps for the successful architecturesetup:5/25/201310
  11. 11. 5/25/201311
  12. 12. 5/25/201312
  13. 13. 5/25/201313
  14. 14. CANFAR : MAssive Compact HaloObjects Detailed re-analysis of data fromthe MACHO experiment DarkMatter search. Jobs perform a wget to retrieve theinput data (40 M) and have a 4-6hour run time. Low I/O great forclouds. Astronomers happy with theenvironment.5/25/201314
  15. 15. Data Handling in BaBar:Analysis JobsEvent dataReal DataSimulatedDataConfigurationBaBarConditionsDatabase Data is approximately 2PB. The file system is hosted on acluster of six nodes, consisting ofa Management/Metadataserver (MGS/MDS). five Object Storage servers(OSS). single gigabit interface/VLAN tocommunicate both internallyand externally.5/25/201315
  16. 16. Xrootd : Need for Distributed Data Xrootd is a file serverproviding byte level accessand is used by many highenergy physics experiments. provides access to thedistributed data. a read-ahead value of 1 MB a read-ahead cache size of10 MB was set on eachXrootd client5/25/201316
  17. 17. How a DFS works? Blocks replicated across severaldatanodes(usually 3) Single namenode stores metadata (file names,block locations, etc.) Optimized for large files, sequential reads Clients read from closest replica available.(note:locality of reference.) If the replication for a block drops below target, itis automatically re-replicated.Datanodes1234124213143324Namenode5/25/201317
  18. 18. Results and Analysis:5/25/201318
  19. 19. Fault tolerant model:5/25/201319
  20. 20. Acknowledgements A special word of appreciation andthanks to Mr. Somenath Roy Chowdhury. My heartiest thanks to the entire teamwho worked hard to build the cloud.5/25/201320
  21. 21. Questions Please?21

×