Your SlideShare is downloading. ×
Handling High Energy Physics Data using Cloud Computing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Handling High Energy Physics Data using Cloud Computing

1,623

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,623
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. High Energy Physics DataManagement usingCLOUD ComputingANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLINGPAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year |Mr. SOMENATH ROY CHOWDHURY1
  • 2. Contents Motivation HEP Legacy Project CANFAR Astronomical Research Facility System Architecture Operational Experience Summary5/25/20132
  • 3. What exactly is BaBar? It’s design was motivated by the investigationof CP violation. set up to understand the disparity betweenthe matter and antimatter content of the universeby measuring CP violation. BaBar focuses on the study of CP violation in the Bmeson system. nomenclature for the B meson (symbol B) andits antiparticle (symbol B, pronounced B bar)5/25/20133
  • 4. BaBar : Data Point of View 9.5 million lines of C++and Fortran Compiled size is 30 GB Significant amount ofmanpower is required tomaintain the software Each installation must bevalidated beforegenerated results will beaccepted.CANFAR is a partnership between :– University of Victoria– University of British Columbia– National Research Council, Canadian AstronomyData Centre– Herzberg Institute for Astrophysics Helps in providing Infrastructure for VMs.5/25/20134
  • 5. Need for Cloud Computing: Jobs are embarrassingly parallel, muchlike HEP. Each of these surveys requires a differentprocessing environment, which require: A specific version of a Linuxdistribution. A specific compiler version. Specific libraries Applications have little documentation. These environments are evolving rapidly5/25/20135
  • 6. DATA is precious,too precious..We need Infrastructure,which comes easily as aService5/25/20136
  • 7. A word about Cloud Computing:5/25/20137
  • 8. IaaS: What next? With IaaS, we can easily createmany instances of a VM image How do we Manage the VMsonce booted? How do we get jobs to theVMs?5/25/20138
  • 9. Our Solution: Cloud Scheduler + Condor Users create a VM with theirexperiment software installed. A basic VM is created by one group,and users add on their analysis orprocessing software to create theircustom VM. Users then create batch jobs as theywould on a regular cluster, but theyspecify which VM should run theirimages.CONDOR5/25/20139
  • 10. Steps for the successful architecturesetup:5/25/201310
  • 11. 5/25/201311
  • 12. 5/25/201312
  • 13. 5/25/201313
  • 14. CANFAR : MAssive Compact HaloObjects Detailed re-analysis of data fromthe MACHO experiment DarkMatter search. Jobs perform a wget to retrieve theinput data (40 M) and have a 4-6hour run time. Low I/O great forclouds. Astronomers happy with theenvironment.5/25/201314
  • 15. Data Handling in BaBar:Analysis JobsEvent dataReal DataSimulatedDataConfigurationBaBarConditionsDatabase Data is approximately 2PB. The file system is hosted on acluster of six nodes, consisting ofa Management/Metadataserver (MGS/MDS). five Object Storage servers(OSS). single gigabit interface/VLAN tocommunicate both internallyand externally.5/25/201315
  • 16. Xrootd : Need for Distributed Data Xrootd is a file serverproviding byte level accessand is used by many highenergy physics experiments. provides access to thedistributed data. a read-ahead value of 1 MB a read-ahead cache size of10 MB was set on eachXrootd client5/25/201316
  • 17. How a DFS works? Blocks replicated across severaldatanodes(usually 3) Single namenode stores metadata (file names,block locations, etc.) Optimized for large files, sequential reads Clients read from closest replica available.(note:locality of reference.) If the replication for a block drops below target, itis automatically re-replicated.Datanodes1234124213143324Namenode5/25/201317
  • 18. Results and Analysis:5/25/201318
  • 19. Fault tolerant model:5/25/201319
  • 20. Acknowledgements A special word of appreciation andthanks to Mr. Somenath Roy Chowdhury. My heartiest thanks to the entire teamwho worked hard to build the cloud.5/25/201320
  • 21. Questions Please?21

×