Opportunities for X-Ray science in future computing architectures

738 views

Published on

The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
738
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Trends: computers, storage, detectors, …It’s the ratios that matter: Cores/CPU, CPUs/computer, data/scientistExperiment and simulation
  • To show what I means, let’s look at the example of astronomy again.Tycho Brahe … 30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newton’s law of gravitation.
  • To show what this means, let’s look at the example of astronomy once again.Tycho Brahe … 30 years cataloging the position of 777 stars and the known planets with great accuracyHis assistant Kepler then took the data, and from it derived his laws of planetary motion, which say that bodies sweep out equal areas in equal time. A precursor to Newton’s law of gravitation.
  • Some allege that Kepler took unusual steps to acquire his data. Hopefully not so common.
  • Photographic plates  We need computers!Here are some early computers in Harvard Observatory, around 1890.Computing the consequences of equations became a profession1 multiplication per 2 seconds, maybe, x 8 people, 4 multiplications per secondHowever, unreliable and hard to get to work more than 8 hours per day
  • By the late 1990s, in 5 years, imaged 230 million celestial objects, measuring the spectra of more than 1 million of them
  • “Slices through the SDSS 3-dimensional map of the distribution of galaxies. Earth is at the center, and each point represents a galaxy, typically containing about 100 billion stars. Galaxies are colored according to the ages of their stars, with the redder, more strongly clustered points showing galaxies that are made of older stars. The outer circle is at a distance of two billion light years. The region between the wedges was not mapped by the SDSS because dust in our own Galaxy obscures the view of the distant universe in these directions. Both slices contain all galaxies within -1.25 and 1.25 degrees declination.”
  • http://xrds.acm.org/article.cfm?aid=1836552
  • Sequencing volumes doubling every 4-6 months.Note the log scale!Bioinformatics cost is purely BLAST;values are in Amazon EC2Lessons: 1) Need computer scientists; 2) Need more hardware; 3) Need more collaboration on analysis.
  • In contrast, see SDSS—and also Google.VolumeDiversity and complexitySpeed of analysis
  • Research data management in 2011
  • Photon science recognizes the importance of computing.However, if we perform some simple textual analysis, we see that ~1% of the report talks about computing and data. 670 out of 50,676 words—1.3%
  • Liz Lyon, U. Bath—Associate Director, UK Digital Curation CenterGeneric Data Acquisition (GDA) software developed at Daresbury initially, now at Diamond Light Source.
  • Chris Jacobsen
  • What about networking?Difficult to price, but many experts estimate a doubling time of 9 months for network capacity thanks to WDM and optical doping.10 Gbps per User ~ 100-1000x Shared Internet Throughput
  • Port Pricing is Falling Density is Rising – Dramatically Cost of 10GbE Approaching Cluster HPC Interconnects
  • Chicago is an international networking hub
  • Chicago railroads, 1950 (http://www.encyclopedia.chicagohistory.org/pages/1774.html)
  • Motivated by enormous parallelism,massive data, complexityEnabled by networks
  • What’s this got to do with that cloud thing?Recall that “cloud” is a term used to mean a few different things
  • Next question: Where does computing happen? Massive parallelism in computing and storage. Operations costs go up.Google data center in OregonNote also variation in cost of power: factor of 5
  • Interestingly, if we look at the situation in business, things are quite different.There is a similarly long list of time-consuming tasks. There is a large and growing SaaS industry that addresses many of them.If I start a business today, I can do it from a coffee shop—there is no need to acquire and run any IT at all. I can outsource …
  • Of course, people also make effective use of IaaS, but only for more specialized tasks
  • So let’s look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks …
  • The result of this work is something called Globus Online. This is something new. Not just more of the same Globus Toolkit stuff.Globus Toolkit: hasn’t changed. Been around 15 years. Still a toolkit for building custom Grids such as LHC, TeraGrid, ESG, BIRN, LIGO, etc.Globus Online: Focused on out sourcing the time-consuming activities associated with data transfer. Register, transfer, monitor, and customize endpoints.Globus Online is a full Web 2.0-based solution. That means a few different things. First, it is architected using REST principles: important elements are exposed as resources, on which operations can be performed using HTTP operations. These operations can be used directly, or via powerful AJAX Web GUIs.
  • The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But we’re talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
  • The deceptively simple task of moving data from place to another.You might ask: What could be simpler. I simply stick it in the mail, right? But we’re talking about data that is too large to email. Maybe I need to move 100,000 files totaling 10 Terabytes from a federal laboratory where they were generated to my home institution. That sort of thing which can be very difficult.Hai Ah Nam, a nuclear physicist from Oak Ridge, spoke at GlobusWorld March 2010 about her struggles with moving dataInitially transferring 1.6 TB (86 large files) from Oak Ridge to NERSCChanged from using SCP to GridFTP to reduce transfer from days to hoursReduced transferring 137 TB from months to daysBut, it was not easy...
  • Under the covers: built as a scale-out web applicationHosted on Amazon Web ServicesReplicate state data over multiple storage servers.Dynamically scale number of VMs.
  • Explain attempts; a cornerstone of our failure mitigation strategyThrough repeated attempts GO was able to overcome transient errors at OLCF and rangerThe expired host certs on bigred were not updated until after the run had completed
  • 3000 zebra fish mutants
  • Collect, move, store, index,analyze, share, update, iterate; millions of files;1000s of experiments
  • ×