SlideShare a Scribd company logo
1 of 14
Download to read offline
Cloud and Big Data Come Together in the Ocean
Observatories Initiative to Give Scientists Real-Time Access
to Environmental Measurements
Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer
researchers a treasure trove of new real-time information.

Listen to the podcast. Find it on iTunes/iPod. Sponsor: VMware


Dana Gardner: Hi. This is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're
                listening to BriefingsDirect.

                Today, we present a sponsored podcast discussion on a fascinating global ocean
                studies initiative that defines some of the superlative around big data, cloud, and
                middleware integration capabilities.

                We'll be exploring the Ocean Observatories Initiative (OOI) and its
accompanying Cyberinfrastructure Program. This undertaking aims at providing an
unprecedented ability to study the Earth's oceans and climate impact using myriad distributed
centers and oceans worth of data.

The scale and impact of the science's importance is closely followed by the magnitude of the
computer science needed to make that data accessible and actionable by scientists. In a sense, the
OOI and its infrastructure program are constructing a big data scale programmable and
integratable cloud fabric. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

We’ve gathered three leaders to explain the OOI and how the Cyberinfrastructure Program may
not only solve this set of data and compute problems, but perhaps establish how future massive
data and analysis problems are solved.

Here to share their story on OOI are our guests. Please join me in welcoming Matthew Arrott,
                       Project Manager at the OOI Cyberinfrastructure. Matthew's career spans
                       more than 20 years in design leadership and engineering management for
                       software and network systems.

He’s held leadership positions at Currenex, DreamWorks SKG, Autodesk, and the National
Center for Supercomputing Applications. His most recent work has been with the University of
California as e-Science Program Manager while focusing on delivering the OOI
Cyberinfrastructure capabilities.

Also joining us is Michael Meisinger. He is the Managing Systems Architect for the Ocean
Observatories Initiative Cyberinfrastructure. Since 2007, Michael has been employed by the
University of California, San Diego. He leads a team of systems architects on the OOI Project.
Prior to UC San Diego, Michael was a lead developer in an Internet startup, developing a
platform for automated customer interactions and data analysis.

Michael holds a masters degree in computer science from the Technical University of Munich
and will soon complete a PhD in formal services-oriented computing and distributed systems
architecture.

Lastly, we’re joined by Alexis Richardson, Senior Director for the VMware Cloud Application
Platform. He is a serial entrepreneur and a technologist. Previously, he was a founder of
RabbitMQ and the CEO of Rabbit Technologies Limited, which was acquired by VMware in
April of 2010.

Alexis plays a role in both the cloud and messaging communities, a leading role in addition to
working with AMQP. He is a co-founder of the CloudCamp conferences, and a co-chair of the
Open Cloud Computing Interface at the Open Grid Forum.

Welcome to you all.

Michael Meisinger, let me start with you. Could you sum up the OOI for our audience? Let us
know a little bit about how it came about.


Ocean Observatories Initiative

Michael Meisinger: Thanks, Dana. The Ocean Observatories Initiative is a large project. It's a
                US National Science Foundation project that is intended to build a platform for
                ocean sciences end users and communities interested in this form of data for an
                operational life span of 30 years.

                 It comprises a construction period of five years and will integrate a large number
                 of resources and assets. These range from typical oceanographic assets, like
                 instruments that are mounted on buoys deployed in the ocean, to networking
                 infrastructure on the cyberinfrastructure side. It also includes a large number of
sophisticated software systems.

I'm the managing architect for the cyberinfrastructure, so I'm primarily concerned with the
interfaces through the oceanographic infrastructure, including beta interfaces, networking
interfaces, and then primarily, the design of the system that is the network hardware and software
system that comprises the cyberinfrastructure.

As I said, OOI’s goals include serving the science and education communities with their needs
for receiving, analyzing, and manipulating ocean sciences and environmental data. This will have
a large impact on the science community and the overall public, as a whole, because ocean
sciences data is very important in understanding the changes and processes of the earth, the
environment, and the climate as a whole.
Ocean sciences, as a discipline, hasn't yet received as much infrastructure and central attention as
other communities. So the OOI initiative is a very important to bring this to the community. It
has a large volume. It has an almost $400 million construction budget and an annual operations
budget of $70 million for a planned lifetime of 25-30 years.

Gardner: Matthew Arrott, what is the big hurdle here in terms of a compute issue that you've
faced. Obviously, it's a tremendously important project with a tremendous amount of data, but
from a purely compute requirement perspective, what makes this so challenging?

Matthew Arrott: It has a number of key aspects that we had to address. It's best to start at the
                top of the functional requirements, which is to provide interactive mission
                planning and control of the overall instrumentation on the 65 independent
                platforms that are deployed throughout the ocean.

                    The issue there is how to provide a standard command-and-control
                    infrastructure over a core set of 800 instruments, about 50 different classes of
                    instrumentation, as well as be able to deploy over the 30-year lifecycle, new
                   instrumentation brought to us by different scientific communities for
experimentation.

The next is that the mission planning and control is meant to be interactive and respond to
emergent changes. So we needed an event-response infrastructure that allowed us to operate on
scales from microseconds to hours in being able to detect and respond to the changes. We needed
an ability to move computing throughout the network to deal with the different latency
requirements that were needed for the event-response analysis.

Finally, we have computational nodes all the way down in the ocean, as well as on the shore
stations, that are accepting or acquiring the data coming off the network. And we're distributing
that data in real time to any one who wants to listen to the signals to develop their own sense-
and-response mechanisms, whether they're in the cloud, in their local institutions, or on their
laptop.

Domain of control

The fundamental challenge was the ability to create a domain of control over instrumentation
that is deployed by operators and for processing and data distribution to be agile in its
deployment anywhere in the global network.

Gardner: Alexis Richardson, it sounds like a very interesting problem to solve. Why is this a
good time to try to solve this? Of course, big data, cloud, doing tremendous amounts of services
orientation across middleware and a variety of different formats and transports is all very
prominent in the enterprise now. Given that, what makes this, such an interesting pursuit for you
in thinking about this from a software distribution and data distribution perspective?
Alexis Richardson: It really comes down to the scale of the system and the ability of
                technologies to meet the scale need today. If we had been talking about this 12
                years ago in the year 2000, we would have been talking about companies like
                Google and Yahoo, which we would not have considered to be of moderate
                scale.

                   Since then, many companies have appeared, for example, Facebook, which
                   has many hundreds of millions of users connecting throughout the world,
                  sharing vast amounts of data all the time.

It's that scale that's changed the architecture and deployment patents that people have been using
for these applications. In addition to that, many of these companies have brought out essentially
a platform capability, whereby others, such as Zynga, in the case of Facebook, can create
applications that run inside these networks -- social networks in the case of Facebook.

We can see the OOI project is essentially bringing the science needs to collaborate between vast
numbers of sensors and signals and a comparatively smaller number of scientists, research
institutions, and scientific applications to do analytics in a similar way as to how Facebook
combines what people say, what pictures they post, what music they listen to with everybody’s
friends, and then allow an application to be attached to that.

So it’s a huge technology challenge that would have been simply infeasible 12 years ago in the
year 2000, when we thought things were big, but they were not. Now, when we talk about big
data being masses of terabytes and petabytes that need to be analyzed all the time, then we’re
starting to glimpse what's possible with the technology that’s been created in the last 10 years.

Arrott: I’d like to actually go one step further than that. The challenge goes beyond just the big
data challenge. It also now introduces, as Alexis talked about, the human putting in what they say
in their pictures. It introduced that the concept of the instrument as an equal partner with the
human in the participation in the network.

So you now have to think about what it means to have a device that’s acting like a human in the
network, and the notion that the instrument is, in fact, owned by someone and must be governed
by someone, which is not the case with the human, because the human governs themselves. So it
represents the notion of an autonomous agent in the network, as well as that agent having a
notion of control that has to stay on the network.

Gardner: Thank you, Matthew. I’d like to try to explain for our audience a bit more about what
is going on here. We understand that we’ve got a tremendous diversity of sensors gathering in
real time a tremendous scale of data. But we’re also talking about automating the gathering and
distribution of that data to a variety of applications.
Numerical framework

We’re talking about having applications within this fabric, so that the output is not necessarily
data, but is a computational numerical framework that’s distributed. So there's computation being
done at the data level, and then it has to be regulated. Certain data goes to certain people for
certain reasons under certain circumstances.

So there's a lot of data, a lot of logic, and a lot of scale. Can one of you help step me through a
little bit more to understand the architecture of what’s being conducted here, so that we can then
move into how it’s being done?

Meisinger: The challenge, as you mentioned, is very heterogeneous. We deal with various
classes of sensors, classes of data, classes of users, or even communities of users, and with
classes of technological problems and solution spaces.

So the architecture is based on a tiered model or in a layered model of most invariant things at
the bottom, things that shouldn’t change over the lifetime of 30 years to serve the highest level of
attention.

Then, we go into our more specialized layered architecture where we try to find optimal solutions
using today’s technologies for high-speed messaging, big data, and so on. Then, we go into
specialized solutions for specific groups of users and specific sensors that are there as last-mile
technologies to integrate them into the system.

So you basically see an onion layer model of the architecture, externalization outside. Then as
you go towards the core, you approach the invariants of the system. What are the invariants? We
recognized that a system of this scale and a system of this heterogeneity cannot be reinvented
every five years as part of the typical maintenance. So as a strongly scalable and extensible
system, it's distributed in its nature, and as part of the distribution, the most invariant parts are
the protocols and the interactions between the distributed entities on the system.

We found that it's essential to define a common language, a common format, for the various
applications and participants of the network, including sensor and sensor agents, but also higher-
level software services to communicate in a common format.

This architecture is based on defining a common interaction format. It’s based on defining a
common data format. You mentioned the complex numerical model. A lot of things in this
architecture are defined so that you have an easier model of reaching many heterogeneous
communities by ingesting and getting specific solutions into the system, representing them
consistently and then presenting them again in the specific format for the audience.

Our architecture is strongly communication-oriented, service-oriented, message-oriented, and
federated.
As Matthew mentioned, it’s an important means to have the individual resources, agents, provide
their own policies, not having a central bottleneck in the system or central governing entity in the
system that defines policies.


Strongly federated

So it’s a strongly federated system. It’s a system that’s strongly technology-independent. The
communication product can be implemented by various technologies, and we’re choosing a
couple of programming languages and technologies for our initial reference implementation, but
it’s strongly extensible for future communities to use.

Gardner: One of the aspects of this that was particularly interesting to me is that this is very
much a two-way street. The scientists who are gathering their analysis can very rapidly go back
to these sensors, go back to this compute fabric, this fusion of data, and ask it to do other things
in real time or to bring in data from outside sources to compare and contrast, to find the
commonalities and to find what it is they’re looking for in terms of trends.

Could one of you help me understand why this is a two-way street and how that's possible given
the scale and complexity?

Arrott: The way to think about it, first and foremost, is to think of it as its four core layers. There
is the underlying network resource management layer. We talk about agents. They supply that
capability to any process in the system, and we create devices that process.

The next layer up is the data layer, and the data layer consists of two core parts. One is the
distribution system that allows for data to be moved in real time from the source to the interested
parties. It’s fundamentally a publish-subscribe (pub-sub) model. We're currently using point-to-
point as well as topic-based subscriptions, but we're quickly moving towards content-based
routing which is more based on the the selector that is provided by the consumer to direct traffic
towards them.

The other part of the data layer is the traditional harvesting or retrieval of data from historical
repositories.

The next layer up is the analytic layer. It looks a lot like the device layer, but is focused on the
management of processes that are using the big data and responding to new arrival of data in the
network or change in data in the network. Finally, there is the fourth layer which is the mission
planning and control layer, which we’ll talk about later.

Gardner: I'd like to go to Alexis Richardson. When you saw the problem that needed to be
solved here, you had a lot of experience with advanced message queuing protocol (AMQP),
which I'd like you to explain to us, and you also understand the requirements of a messaging
system that can accomplish what Matthew just described.
So tell me about AMQP, why this problem seems to be the right fit for that particular technology,
RabbitMQ, and a messaging infrastructure in general.

Richardson: What Matthew and Michael have described can be broken down into three
fundamental pieces of technology.

Lot of chatter

Number one, you’ve got a lot of chatter coming from these devices -- machines, people, and
other kinds of processes -- and that needs to get to the right place. It's being chattered or twittered
away and possibly at high rates and high frequencies. It needs to get to just the set of receivers
following that stream, very similar to how we understand distribution to our computers. So you
need what’s called pub-sub, which is a fundamental technology.

In addition, that data needs to be stored somewhere. People need to go back and audit it, to pull it
out of the archive and replay it, or view it again. So you need some form of storage and
reliability built into your messaging network.

Finally, you need the ability to attach applications that will be written by autonomous groups,
scientists, and other people who don’t necessarily talk to one another, may choose these different
programming languages, and may be deploying our applications, as Matthew said, on their own
servers, on multiple different clouds that they are choosing through what you would like to be a
common platform. So you need this to be done in a standard way.

AMQP is unique in bringing together pub-sub with reliable messaging with standards, so that
this can happen. That is precisely why AMQP is important. It's like HTTP and email SMTP, but
it’s aimed at messaging the publish-subscribe reliable message delivery in a standard way. And
RabbitMQ is one of the first implementations and that’s how we ended up working with the OOI
team, because RabbitMQ provides these and does it well.

Gardner: Now we’ve talked a lot about computer science and some of the thorny issues that
have been created as a result of this project in going forward, but, I’d also like to go back to the
project itself, and give our listeners a sense of what this can accomplish. I’ve heard it described
as the Hubble Telescope of oceans.

Let’s go back to the oceanography and the climate science. What can we accomplish with this,
when this data is delivered in the fashion we’ve been discussing, where the programmability is
there, where certain scientists can interact with these sensors and data, ask it to do things, and
then get that information back in a format that’s not raw, but is in fact actionable intelligence.

Matthew, what could possibly happen in terms of the change in our understanding of the oceans
from this type of undertaking?

Arrott: The way to think about this is not so much from the fact that we know exactly what will
happen. It's the notion that we're providing capabilities that do not currently exist for
oceanographers. It can be summed up as continual presence in the oceans at multiple scales
through multiple perspectives, also known as the different classes of instrumentation that are
observed in the ocean.

Another class of instrumentation is deployed specifically for refocusing. The scope of the OOI is
such that it is considered to be observing the ocean at multiple scales -- coastal, regional, and
global. It is an expandable model such that other observatories, as well as additions to the OOI
network, can be considered and deployed in subsequent years.

This allows us now, as Alexis talked about, to attach many different classes of applications to the
network. One of the largest classes of applications that we’ll attach to the network are the
modeling, in particular the nowcast and forecast modeling.

Happening at scale


Through those observations about the ocean now, about what the ocean will be, and to be able
to ground-truth those models going forward, based on data arriving in the same time as the
forecasts, provides for a broad range of modeling that has been done for a fair amount of time,
but it now allows it to happen at scale.

Once you have that ability to actually model the oceans and predict where it’s going, you can use
that to refocus the instrumentation on emergent events. It's this ability to have long-term
presence in the ocean, and the ability to refocus the instrumentation on emergent events, that
really represents the revolutionary change in the formation of this infrastructure.

Meisinger: Let me add, I'm very fascinated by The Hubble Space Telescope as something that
produces fantastic imagery and fantastic insights into the universe. For me as a computer
scientist, it’s often very difficult to imagine what users of the system would do with the system.

I’d like to see the OOI as a platform that’s developed by the experts in their fields to deploy the
platforms, the buoys, the cables, the sensors into the ocean that then enables the users of the
system over 25 years to produce unprecedented knowledge and results out of that system.

The primary mission of our project is to provide this platform, the space telescope in the ocean.
And it’s not a single telescope. In our case, it's a set of 65 buoys, locations in the ocean, and even
a cable that runs a 1,000 miles at the seafloor of the Pacific Northwest that provides 10 gigabit
ethernet connectivity to the instrument, and high power.

It’s a model where scientists have to compete. They have to compete for a slot on that
infrastructure. They'll have to apply for grants and they'll have to reserve the spot, so that they
can accomplish the best scientific discoveries out of that system.
It’s kind of the analogy of the space telescope that will bring ocean scientists to the next level.
This is our large platform, our large infrastructure that have the best scientists develop and
research to best results. That’s the fascination that I see as part of this project.

Gardner: For the average listener to understand, is this comparable to tracking weather and the
climate on the surface? Many of us, of course, get our weather forecasts and they seem to be
getting better. We have satellites, radar, measurements, and historical data to compare, and we
have models of what weather should do. Is this in some ways taking the weather of the oceans?
Is it comparable?

Arrott: Quite comparable. There's a movement to instrument the earth, so that we can
understand from observation, as opposed to speculation, what the earth is actually doing, and
from a notion of climate and climate change, what we might be doing to the earth as participants
on it.

The weather community, because of the demand for commercial need for that weather data, has
been well in advance of the other environmental sciences in this regard. What you'll find is that
OOI is just one of several ongoing initiatives to do exactly what weather has done.

The work that I did at NCSA, was with the atmospheric sciences community was very clear at
the time. What could they do if they had the kind of resources that we now have here in the 21st
Century? We've worked with them and modeled much of our system based on the systems that
they built, both in the research area, and in the operational area in programs such as Nova.


Science more mature


Gardner: So, in a sense, we're following the path of what we’ve done with the weather, and
understanding the climate on land. We’re now moving into the oceans, but at a time when the
computer science is more mature, and in fact, perhaps even more productive.

Back to you Alexis Richardson. This is being sponsored by the US National Science Foundation,
so being cost efficient is very important of course. How is it that cloud computing is being
brought to bear, making this productive, and perhaps even ahead of where the whole weather and
predicting weather has been, because we can now avail ourselves of some of the newer tools and
models around data and cloud infrastructure?

Richardson: Happily, that’s an easy one. Imagine if a person or scientist wanted to process very
quickly a large amount of data that’s come from the oceans to build a picture of the climate, the
ocean, or anything to do with the coastal proprieties of the North American coast. They might
need to borrow 10,000 or 20,000 machines for an hour, and they might need to have a vast
amount of data readily accessible to those machines.

In the cloud, you can do that, and with big data technologies today, that is a realistic proposition.
It was not 5-10 years ago. It’s that simple.
Obviously, you need to have the technologies, like this messaging that we talked about, to get
that data to those machines so they can be processed. But, the cloud is really there to bring it
altogether and to make it seem to the application owner like something that’s just ready for them
to acquire it, and when they don’t need it anymore, they can put it back and someone else can use
it.

Gardner: Back to you Michael. How do you view the advent of cloud computing as a benefit to
this sort of initiative? We’ve got a piece of it from Alexis, but I’d like to hear your perspective on
why cloud models are enabling this perhaps at an unprecedented scale, but also at a most
efficient cost?

Meisinger: Absolutely. It does enable computing at unprecedented scale for exactly reasons that
Alexis mentioned. A lot of the earth's environment is changing. Assume that you’re interested in
tracking the effect of a hurricane somewhere in the ocean and you’re interested in computing a
very complex numerical model that provides certain predictions about currents and other
variables of the ocean. You want to do that when the hurricane occurs and you want to do it
quickly. Part of the strategy is to enable quick computation on demand.

The OOI architecture, in particular, its common execution infrastructure subsystem, is built in
order to enable this access to computation and big data very quickly. You want to be able to make
use of execution provider’s infrastructure as a service very quickly to run your own models with
the infrastructure that the OOI provides.

Then, there are other users that want to do things more regularly, and they might have their own
hardware. They might run their own clusters, but in order to be interoperable, and in order to
have excess overflow capabilities, it’s very important to have cloud infrastructure as a means of
making the system more homogenous.

So the cloud is a way of abstracting compute resources of the various participants of the system,
be they commercial or academic cloud computing providers or institutions that provide their own
clusters as cloud systems, and they all form a large compute network, a compute fabric, so that
they can run the computation in a predictable way, but also then in a very episodic way.


Cloud as enabler

I really see that the cloud paradigm is one of the enablers of doing this very efficiently, and it
enables us as a software infrastructure project to develop the systems, the architecture, to actually
manage this computation from a system’s point of view in a central way.

Gardner: Alexis, because of AMQP and the VMware Cloud Application Platform, it seems to
me that you’ve been able to shop around for cloud resources, using the marketplace, because
you’ve allowed for interoperability among and between platforms, applications, tools, and
frameworks.
Is it the case that leveraging AMQP has given you the opportunity to go to where the compute
resources are available at the lowest cost when that’s in your best interest?

Richardson: The dividend of interoperability for the end user and the end customer in this
platform environment is ultimately portability -- portability through being able to choose where
your application will run.

Michael described it very well. A hurricane is coming. Do you want to use the machines
provided by the cloud provider here for this price? Do you want to use your own servers? Maybe
your neighboring data center has servers available to you, provided those are visible and
provided there is this fundamental interoperability through cloud platforms of the type that we
are investing in. Then, you will be able to have that choice. And that lets you make these
decisions in a way that you could not do before.

Gardner: I’m afraid we’re almost out of time, but I want to try to compare this to what this will
allow in other areas. It’s been mentioned by Alexis and others that this has got some common
features to Twitter, Facebook, or Zynga. We think of the social environment because of the scale,
complexity, and the use of cloud models. But we’re doing far more advanced computational
activities here. This is simply not a display of 140 characters, based on a very rudimentary
search, for example. These are high-performance computing level, supercomputer-level types of
requests and analysis.

So are we combining the best of a social fabric approach and the architecture behind that to what
we’ve been traditionally exposed to in high-performance computing and supercomputing? If so,
what does that mean for how we could bring this to other types of uses in the future. I’ll throw
this out to any of you? How are we doing the best of the old and the new, and what does that
mean for the future?

Meisinger: This is the direction in which the future will evolve, and it’s the combination of
proven patterns of interaction that are emerging out of how humans interact applied to high-
performance computing. Providing a strong platform or a strong technological footprint that’s
not specific to any technology is a great benefit to the community out there.

Providing a reference architecture and a reference implementation that can solve these problems,
that social network for sensor networks and for device computation will be a pattern that can be
leveraged by other interested participants, either by participating in the system directly or
indirectly, where it’s just taking that pattern and the technologies that come with it and basically
bringing it to the next level in the future. Developing it as one large project in a coherent set
really yields a technology stack and architecture that will carry us far into the future.

Arrott: With all the incremental change that we're introducing is taking the concepts of
Facebook and of Twitter and the notions of Dropbox which is my ability to move a file to a
shared place so someone else can pick it up later, which was really not possible long ago. I had to
do an FTP server, put up an HTTP server to accomplish that.
Sharing processes


What we are now adding to the mix is not sharing just artifacts, but we’re actually sharing
processes with one another and then specifically sharing instrumentation. I can say to you, "Here,
have a look through my telescope." You can move it around and focus it.

Basically, we introduced the concept of artifacts or information resources, as well as the concept
of a taskable resource, and the thing that we’re adding to that which can be shared are taskable
resources.

Gardner: I’m just going to throw out a few blue-sky ideas that it seems this could be applicable
to things like genetics and the human genome, but on an individual basis. Or crime statistics, in
order to have better insight into human behavior at a massive scale. Or perhaps even healthcare,
where you’re diagnosing specific types of symptoms and then correlating them across entire
regions or genetic patterns that would be brought to bear on those symptoms.

Am I off-base? Is this science fiction? Or am I perhaps pointing to where this sort of capability
might go next?

Richardson: The answer to your question is yes, if you add one little phrase into that: in real
time. If, you’re talking about crime statistics, as events happen on the streets, information is
gathered and shared and processed. As people go on jobs, if information is gathered, shared, and
processed on how people are doing, then you will be able to have the kind of crime or healthcare
benefits that you described. I’m sure we could think of lots of use cases. Transport is another one.

Arrott: At the institution in which the OOI Cyberinfrastructure is housed, California Institute of
Telecommunication and Information Technology (Calit2), all of the concerns that you’ve
mentioned are, in fact, active development research programs, all of which have yielded
significant improvements in the computational environment for that scientific community.

Gardner: Michael, last word to you. Where do you see this potentially going in terms of the
capability? Obviously, it's a very important activity with the oceans. But the methods that you’re
defining, the implementations that you’re perfecting, where do you see them being applied in the
not-too-distant future?

Meisinger: You’re absolutely right. This pattern is very applicable and it’s not that frequent that
a research and construction project of that size has an ability to provide an end-to-end technology
solution to this challenge of big data combined with real-time analysis and real-time command
and control of the infrastructure.

What I see that’s evolving into is, first of all, you can take the solutions build in this project and
apply it to other communities that are in need for such a solution. But then it could go further.
Why not combine these communities into a larger system? Why not federate or connect all these
communities into a larger infrastructure that all is based on common ideas, common standards,
and that still enables open participation.

It’s a platform where you can plug in your own system or subsystem that you can then make
available to whoever is connected to that platform, whoever you trust. So it can evolve into a
large ecosystem, and that does not have to happen under the umbrella of one organization such as
OOI.

Larger ecosystem


It can happen to a larger ecosystem of connected computing based on your own policies, your
own technologies, your own standards, but where everyone shares a common piece of the same
idea and can take whatever they want and not consume what they’re not interested in.

Gardner: And as I said earlier at that very interesting intersection of where you can find the
most efficient compute resources available and avail yourself of them with that portability, it
sounds like a really powerful combination.

We’ve been talking about how the Ocean Observatories Initiative and its accompanying
Cyberinfrastructure Program have been not only feeding the means for the ocean to be better
understood and climate interaction to be better appreciated, but we’re also seeing how the
architecture behind that is leading to the potential for many other big data, cloud fabric, real-
time, compute-intensive applications.

I’d like to thank our guests. We’ve been joined by Matthew Arrott. He is the Project Manager at
the OOI and the initiative for the Cyberinfrastructure. Thank you so much, Matthew.

Arrott: Thank you.

Gardner: We’ve also been joined by Michael Meisinger. He is the Managing Systems Architect
for the OOI Cyberinfrastructure. Thank you, Michael.

Meisinger: Thanks, Dana.

Gardner: And Alexis Richardson, the Senior Director for VMware Cloud Application Platform.
Thank you, Alexis.

Richardson: Thank you very much.

Gardner: And this is Dana Gardner, Principal Analyst at Interarbor Solutions. Thanks to you,
our audience, for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod. Sponsor: VMware
Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer
researchers a treasure trove of new real-time information. Copyright Interarbor Solutions, LLC,
2005-2012. All rights reserved.


You may also be interested in:
  •    Case Study: Strategic Approach to Disaster Recovery and Data Lifecycle Management
       Pays Off for Australia's SAI Global
  •    Case Study: Strategic Approach to Disaster Recovery and Data Lifecycle Management
       Pays Off for Australia's SAI Global
  •    Virtualization Simplifies Disaster Recovery for Insurance Broker Myron Steves While
       Delivering Efficiency and Agility Gains Too
  •    SAP Runs VMware to Provision Virtual Machines to Support Complex Training Courses
  •    Case Study: How SEGA Europe Uses VMware to Standardize Cloud Environment for
       Globally Distributed Game Development
  •    Germany's Largest Travel Agency Starts a Virtual Journey to Get Branch Office IT Under
       Control

More Related Content

What's hot

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science ConnectedDeepak Singh
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) DataEdward Curry
 
Ubiquitous computing Paper
 Ubiquitous computing Paper Ubiquitous computing Paper
Ubiquitous computing PaperAssem mousa
 
Software Sustainability: The Challenges and Opportunities for Enterprises and...
Software Sustainability: The Challenges and Opportunities for Enterprises and...Software Sustainability: The Challenges and Opportunities for Enterprises and...
Software Sustainability: The Challenges and Opportunities for Enterprises and...Patricia Lago
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
cloud of things paper
cloud of things papercloud of things paper
cloud of things paperAssem mousa
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper Assem mousa
 
Building the European Cloud Computing Strategy
Building the European Cloud Computing StrategyBuilding the European Cloud Computing Strategy
Building the European Cloud Computing StrategyCarl-Christian Buhr
 
WoT 2016 - Seventh International Workshop on the Web of Things
WoT 2016 - Seventh International Workshop on the Web of ThingsWoT 2016 - Seventh International Workshop on the Web of Things
WoT 2016 - Seventh International Workshop on the Web of ThingsSimon Mayer
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Dr. Kim (Kyllesbech Larsen)
 
New Research Articles 2020 July Issue International Journal of Software Engin...
New Research Articles 2020 July Issue International Journal of Software Engin...New Research Articles 2020 July Issue International Journal of Software Engin...
New Research Articles 2020 July Issue International Journal of Software Engin...ijseajournal
 
Closing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen ActuationClosing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen ActuationDavid Crowley
 

What's hot (18)

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Ubiquitous computing Paper
 Ubiquitous computing Paper Ubiquitous computing Paper
Ubiquitous computing Paper
 
Iot2014program
Iot2014programIot2014program
Iot2014program
 
Software Sustainability: The Challenges and Opportunities for Enterprises and...
Software Sustainability: The Challenges and Opportunities for Enterprises and...Software Sustainability: The Challenges and Opportunities for Enterprises and...
Software Sustainability: The Challenges and Opportunities for Enterprises and...
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
cloud of things paper
cloud of things papercloud of things paper
cloud of things paper
 
Grid computing
Grid computingGrid computing
Grid computing
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper
 
Building the European Cloud Computing Strategy
Building the European Cloud Computing StrategyBuilding the European Cloud Computing Strategy
Building the European Cloud Computing Strategy
 
WoT 2016 - Seventh International Workshop on the Web of Things
WoT 2016 - Seventh International Workshop on the Web of ThingsWoT 2016 - Seventh International Workshop on the Web of Things
WoT 2016 - Seventh International Workshop on the Web of Things
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
New Research Articles 2020 July Issue International Journal of Software Engin...
New Research Articles 2020 July Issue International Journal of Software Engin...New Research Articles 2020 July Issue International Journal of Software Engin...
New Research Articles 2020 July Issue International Journal of Software Engin...
 
Knowledge of IoT
Knowledge of IoTKnowledge of IoT
Knowledge of IoT
 
Closing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen ActuationClosing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen Actuation
 
Security research trends in 2020
Security research trends in 2020Security research trends in 2020
Security research trends in 2020
 

Viewers also liked

Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldabaux singapore
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source CreativitySara Cannon
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)maditabalnco
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 

Viewers also liked (8)

Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...The impact of innovation on travel and tourism industries (World Travel Marke...
The impact of innovation on travel and tourism industries (World Travel Marke...
 
Open Source Creativity
Open Source CreativityOpen Source Creativity
Open Source Creativity
 
Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)Reuters: Pictures of the Year 2016 (Part 2)
Reuters: Pictures of the Year 2016 (Part 2)
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 

Similar to Cloud and Big Data Come Together in the Ocean Observatories Initiative to Give Scientists Real-Time Access to Environmental Measurements

Semantic Technology Solutions For Recovery Gov And Data Gov With Transparenc...
Semantic Technology Solutions For Recovery Gov And  Data Gov With Transparenc...Semantic Technology Solutions For Recovery Gov And  Data Gov With Transparenc...
Semantic Technology Solutions For Recovery Gov And Data Gov With Transparenc...Mills Davis
 
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...ijceronline
 
A survey on software defined networking
A survey on software defined networkingA survey on software defined networking
A survey on software defined networkingredpel dot com
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingThe HDF-EOS Tools and Information Center
 
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...ajmalik
 
Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...GreenQloud
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsSriram Subramanian
 
Soderstrom
SoderstromSoderstrom
SoderstromNASAPMC
 
Safe Drinking Water In Bangladesh Essay
Safe Drinking Water In Bangladesh EssaySafe Drinking Water In Bangladesh Essay
Safe Drinking Water In Bangladesh EssaySusan Cox
 
Iaetsd efficient file transferring in
Iaetsd efficient file transferring inIaetsd efficient file transferring in
Iaetsd efficient file transferring inIaetsd Iaetsd
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta diyotta
 
COM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxCOM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxwrite31
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsIRJET Journal
 
The Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use casesThe Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use casesDeloitte United States
 
Big Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraBig Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraJohnWilson47710
 
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...Andrei Ciortea
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureredpel dot com
 
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxBIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxjasoninnes20
 
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxBIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxtangyechloe
 

Similar to Cloud and Big Data Come Together in the Ocean Observatories Initiative to Give Scientists Real-Time Access to Environmental Measurements (20)

Semantic Technology Solutions For Recovery Gov And Data Gov With Transparenc...
Semantic Technology Solutions For Recovery Gov And  Data Gov With Transparenc...Semantic Technology Solutions For Recovery Gov And  Data Gov With Transparenc...
Semantic Technology Solutions For Recovery Gov And Data Gov With Transparenc...
 
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...
Tools and Techniques for Designing, Implementing, & Evaluating Ubiquitous Com...
 
A survey on software defined networking
A survey on software defined networkingA survey on software defined networking
A survey on software defined networking
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology Sharing
 
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...
 
Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...Open Source Clouds: Be The Change...
Open Source Clouds: Be The Change...
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
 
Soderstrom
SoderstromSoderstrom
Soderstrom
 
Safe Drinking Water In Bangladesh Essay
Safe Drinking Water In Bangladesh EssaySafe Drinking Water In Bangladesh Essay
Safe Drinking Water In Bangladesh Essay
 
Iaetsd efficient file transferring in
Iaetsd efficient file transferring inIaetsd efficient file transferring in
Iaetsd efficient file transferring in
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta
 
COM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxCOM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docx
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and Trends
 
Ijet v5 i6p12
Ijet v5 i6p12Ijet v5 i6p12
Ijet v5 i6p12
 
The Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use casesThe Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use cases
 
Big Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraBig Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 Era
 
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...
A Decade in Hindsight: The Missing Bridge Between Multi-Agent Systems and the...
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architecture
 
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxBIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
 
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docxBIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
BIG IOT AND SOCIAL NETWORKING DATA FOR SMART CITIES Alg.docx
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Cloud and Big Data Come Together in the Ocean Observatories Initiative to Give Scientists Real-Time Access to Environmental Measurements

  • 1. Cloud and Big Data Come Together in the Ocean Observatories Initiative to Give Scientists Real-Time Access to Environmental Measurements Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer researchers a treasure trove of new real-time information. Listen to the podcast. Find it on iTunes/iPod. Sponsor: VMware Dana Gardner: Hi. This is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, we present a sponsored podcast discussion on a fascinating global ocean studies initiative that defines some of the superlative around big data, cloud, and middleware integration capabilities. We'll be exploring the Ocean Observatories Initiative (OOI) and its accompanying Cyberinfrastructure Program. This undertaking aims at providing an unprecedented ability to study the Earth's oceans and climate impact using myriad distributed centers and oceans worth of data. The scale and impact of the science's importance is closely followed by the magnitude of the computer science needed to make that data accessible and actionable by scientists. In a sense, the OOI and its infrastructure program are constructing a big data scale programmable and integratable cloud fabric. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.] We’ve gathered three leaders to explain the OOI and how the Cyberinfrastructure Program may not only solve this set of data and compute problems, but perhaps establish how future massive data and analysis problems are solved. Here to share their story on OOI are our guests. Please join me in welcoming Matthew Arrott, Project Manager at the OOI Cyberinfrastructure. Matthew's career spans more than 20 years in design leadership and engineering management for software and network systems. He’s held leadership positions at Currenex, DreamWorks SKG, Autodesk, and the National Center for Supercomputing Applications. His most recent work has been with the University of California as e-Science Program Manager while focusing on delivering the OOI Cyberinfrastructure capabilities. Also joining us is Michael Meisinger. He is the Managing Systems Architect for the Ocean Observatories Initiative Cyberinfrastructure. Since 2007, Michael has been employed by the University of California, San Diego. He leads a team of systems architects on the OOI Project.
  • 2. Prior to UC San Diego, Michael was a lead developer in an Internet startup, developing a platform for automated customer interactions and data analysis. Michael holds a masters degree in computer science from the Technical University of Munich and will soon complete a PhD in formal services-oriented computing and distributed systems architecture. Lastly, we’re joined by Alexis Richardson, Senior Director for the VMware Cloud Application Platform. He is a serial entrepreneur and a technologist. Previously, he was a founder of RabbitMQ and the CEO of Rabbit Technologies Limited, which was acquired by VMware in April of 2010. Alexis plays a role in both the cloud and messaging communities, a leading role in addition to working with AMQP. He is a co-founder of the CloudCamp conferences, and a co-chair of the Open Cloud Computing Interface at the Open Grid Forum. Welcome to you all. Michael Meisinger, let me start with you. Could you sum up the OOI for our audience? Let us know a little bit about how it came about. Ocean Observatories Initiative Michael Meisinger: Thanks, Dana. The Ocean Observatories Initiative is a large project. It's a US National Science Foundation project that is intended to build a platform for ocean sciences end users and communities interested in this form of data for an operational life span of 30 years. It comprises a construction period of five years and will integrate a large number of resources and assets. These range from typical oceanographic assets, like instruments that are mounted on buoys deployed in the ocean, to networking infrastructure on the cyberinfrastructure side. It also includes a large number of sophisticated software systems. I'm the managing architect for the cyberinfrastructure, so I'm primarily concerned with the interfaces through the oceanographic infrastructure, including beta interfaces, networking interfaces, and then primarily, the design of the system that is the network hardware and software system that comprises the cyberinfrastructure. As I said, OOI’s goals include serving the science and education communities with their needs for receiving, analyzing, and manipulating ocean sciences and environmental data. This will have a large impact on the science community and the overall public, as a whole, because ocean sciences data is very important in understanding the changes and processes of the earth, the environment, and the climate as a whole.
  • 3. Ocean sciences, as a discipline, hasn't yet received as much infrastructure and central attention as other communities. So the OOI initiative is a very important to bring this to the community. It has a large volume. It has an almost $400 million construction budget and an annual operations budget of $70 million for a planned lifetime of 25-30 years. Gardner: Matthew Arrott, what is the big hurdle here in terms of a compute issue that you've faced. Obviously, it's a tremendously important project with a tremendous amount of data, but from a purely compute requirement perspective, what makes this so challenging? Matthew Arrott: It has a number of key aspects that we had to address. It's best to start at the top of the functional requirements, which is to provide interactive mission planning and control of the overall instrumentation on the 65 independent platforms that are deployed throughout the ocean. The issue there is how to provide a standard command-and-control infrastructure over a core set of 800 instruments, about 50 different classes of instrumentation, as well as be able to deploy over the 30-year lifecycle, new instrumentation brought to us by different scientific communities for experimentation. The next is that the mission planning and control is meant to be interactive and respond to emergent changes. So we needed an event-response infrastructure that allowed us to operate on scales from microseconds to hours in being able to detect and respond to the changes. We needed an ability to move computing throughout the network to deal with the different latency requirements that were needed for the event-response analysis. Finally, we have computational nodes all the way down in the ocean, as well as on the shore stations, that are accepting or acquiring the data coming off the network. And we're distributing that data in real time to any one who wants to listen to the signals to develop their own sense- and-response mechanisms, whether they're in the cloud, in their local institutions, or on their laptop. Domain of control The fundamental challenge was the ability to create a domain of control over instrumentation that is deployed by operators and for processing and data distribution to be agile in its deployment anywhere in the global network. Gardner: Alexis Richardson, it sounds like a very interesting problem to solve. Why is this a good time to try to solve this? Of course, big data, cloud, doing tremendous amounts of services orientation across middleware and a variety of different formats and transports is all very prominent in the enterprise now. Given that, what makes this, such an interesting pursuit for you in thinking about this from a software distribution and data distribution perspective?
  • 4. Alexis Richardson: It really comes down to the scale of the system and the ability of technologies to meet the scale need today. If we had been talking about this 12 years ago in the year 2000, we would have been talking about companies like Google and Yahoo, which we would not have considered to be of moderate scale. Since then, many companies have appeared, for example, Facebook, which has many hundreds of millions of users connecting throughout the world, sharing vast amounts of data all the time. It's that scale that's changed the architecture and deployment patents that people have been using for these applications. In addition to that, many of these companies have brought out essentially a platform capability, whereby others, such as Zynga, in the case of Facebook, can create applications that run inside these networks -- social networks in the case of Facebook. We can see the OOI project is essentially bringing the science needs to collaborate between vast numbers of sensors and signals and a comparatively smaller number of scientists, research institutions, and scientific applications to do analytics in a similar way as to how Facebook combines what people say, what pictures they post, what music they listen to with everybody’s friends, and then allow an application to be attached to that. So it’s a huge technology challenge that would have been simply infeasible 12 years ago in the year 2000, when we thought things were big, but they were not. Now, when we talk about big data being masses of terabytes and petabytes that need to be analyzed all the time, then we’re starting to glimpse what's possible with the technology that’s been created in the last 10 years. Arrott: I’d like to actually go one step further than that. The challenge goes beyond just the big data challenge. It also now introduces, as Alexis talked about, the human putting in what they say in their pictures. It introduced that the concept of the instrument as an equal partner with the human in the participation in the network. So you now have to think about what it means to have a device that’s acting like a human in the network, and the notion that the instrument is, in fact, owned by someone and must be governed by someone, which is not the case with the human, because the human governs themselves. So it represents the notion of an autonomous agent in the network, as well as that agent having a notion of control that has to stay on the network. Gardner: Thank you, Matthew. I’d like to try to explain for our audience a bit more about what is going on here. We understand that we’ve got a tremendous diversity of sensors gathering in real time a tremendous scale of data. But we’re also talking about automating the gathering and distribution of that data to a variety of applications.
  • 5. Numerical framework We’re talking about having applications within this fabric, so that the output is not necessarily data, but is a computational numerical framework that’s distributed. So there's computation being done at the data level, and then it has to be regulated. Certain data goes to certain people for certain reasons under certain circumstances. So there's a lot of data, a lot of logic, and a lot of scale. Can one of you help step me through a little bit more to understand the architecture of what’s being conducted here, so that we can then move into how it’s being done? Meisinger: The challenge, as you mentioned, is very heterogeneous. We deal with various classes of sensors, classes of data, classes of users, or even communities of users, and with classes of technological problems and solution spaces. So the architecture is based on a tiered model or in a layered model of most invariant things at the bottom, things that shouldn’t change over the lifetime of 30 years to serve the highest level of attention. Then, we go into our more specialized layered architecture where we try to find optimal solutions using today’s technologies for high-speed messaging, big data, and so on. Then, we go into specialized solutions for specific groups of users and specific sensors that are there as last-mile technologies to integrate them into the system. So you basically see an onion layer model of the architecture, externalization outside. Then as you go towards the core, you approach the invariants of the system. What are the invariants? We recognized that a system of this scale and a system of this heterogeneity cannot be reinvented every five years as part of the typical maintenance. So as a strongly scalable and extensible system, it's distributed in its nature, and as part of the distribution, the most invariant parts are the protocols and the interactions between the distributed entities on the system. We found that it's essential to define a common language, a common format, for the various applications and participants of the network, including sensor and sensor agents, but also higher- level software services to communicate in a common format. This architecture is based on defining a common interaction format. It’s based on defining a common data format. You mentioned the complex numerical model. A lot of things in this architecture are defined so that you have an easier model of reaching many heterogeneous communities by ingesting and getting specific solutions into the system, representing them consistently and then presenting them again in the specific format for the audience. Our architecture is strongly communication-oriented, service-oriented, message-oriented, and federated.
  • 6. As Matthew mentioned, it’s an important means to have the individual resources, agents, provide their own policies, not having a central bottleneck in the system or central governing entity in the system that defines policies. Strongly federated So it’s a strongly federated system. It’s a system that’s strongly technology-independent. The communication product can be implemented by various technologies, and we’re choosing a couple of programming languages and technologies for our initial reference implementation, but it’s strongly extensible for future communities to use. Gardner: One of the aspects of this that was particularly interesting to me is that this is very much a two-way street. The scientists who are gathering their analysis can very rapidly go back to these sensors, go back to this compute fabric, this fusion of data, and ask it to do other things in real time or to bring in data from outside sources to compare and contrast, to find the commonalities and to find what it is they’re looking for in terms of trends. Could one of you help me understand why this is a two-way street and how that's possible given the scale and complexity? Arrott: The way to think about it, first and foremost, is to think of it as its four core layers. There is the underlying network resource management layer. We talk about agents. They supply that capability to any process in the system, and we create devices that process. The next layer up is the data layer, and the data layer consists of two core parts. One is the distribution system that allows for data to be moved in real time from the source to the interested parties. It’s fundamentally a publish-subscribe (pub-sub) model. We're currently using point-to- point as well as topic-based subscriptions, but we're quickly moving towards content-based routing which is more based on the the selector that is provided by the consumer to direct traffic towards them. The other part of the data layer is the traditional harvesting or retrieval of data from historical repositories. The next layer up is the analytic layer. It looks a lot like the device layer, but is focused on the management of processes that are using the big data and responding to new arrival of data in the network or change in data in the network. Finally, there is the fourth layer which is the mission planning and control layer, which we’ll talk about later. Gardner: I'd like to go to Alexis Richardson. When you saw the problem that needed to be solved here, you had a lot of experience with advanced message queuing protocol (AMQP), which I'd like you to explain to us, and you also understand the requirements of a messaging system that can accomplish what Matthew just described.
  • 7. So tell me about AMQP, why this problem seems to be the right fit for that particular technology, RabbitMQ, and a messaging infrastructure in general. Richardson: What Matthew and Michael have described can be broken down into three fundamental pieces of technology. Lot of chatter Number one, you’ve got a lot of chatter coming from these devices -- machines, people, and other kinds of processes -- and that needs to get to the right place. It's being chattered or twittered away and possibly at high rates and high frequencies. It needs to get to just the set of receivers following that stream, very similar to how we understand distribution to our computers. So you need what’s called pub-sub, which is a fundamental technology. In addition, that data needs to be stored somewhere. People need to go back and audit it, to pull it out of the archive and replay it, or view it again. So you need some form of storage and reliability built into your messaging network. Finally, you need the ability to attach applications that will be written by autonomous groups, scientists, and other people who don’t necessarily talk to one another, may choose these different programming languages, and may be deploying our applications, as Matthew said, on their own servers, on multiple different clouds that they are choosing through what you would like to be a common platform. So you need this to be done in a standard way. AMQP is unique in bringing together pub-sub with reliable messaging with standards, so that this can happen. That is precisely why AMQP is important. It's like HTTP and email SMTP, but it’s aimed at messaging the publish-subscribe reliable message delivery in a standard way. And RabbitMQ is one of the first implementations and that’s how we ended up working with the OOI team, because RabbitMQ provides these and does it well. Gardner: Now we’ve talked a lot about computer science and some of the thorny issues that have been created as a result of this project in going forward, but, I’d also like to go back to the project itself, and give our listeners a sense of what this can accomplish. I’ve heard it described as the Hubble Telescope of oceans. Let’s go back to the oceanography and the climate science. What can we accomplish with this, when this data is delivered in the fashion we’ve been discussing, where the programmability is there, where certain scientists can interact with these sensors and data, ask it to do things, and then get that information back in a format that’s not raw, but is in fact actionable intelligence. Matthew, what could possibly happen in terms of the change in our understanding of the oceans from this type of undertaking? Arrott: The way to think about this is not so much from the fact that we know exactly what will happen. It's the notion that we're providing capabilities that do not currently exist for
  • 8. oceanographers. It can be summed up as continual presence in the oceans at multiple scales through multiple perspectives, also known as the different classes of instrumentation that are observed in the ocean. Another class of instrumentation is deployed specifically for refocusing. The scope of the OOI is such that it is considered to be observing the ocean at multiple scales -- coastal, regional, and global. It is an expandable model such that other observatories, as well as additions to the OOI network, can be considered and deployed in subsequent years. This allows us now, as Alexis talked about, to attach many different classes of applications to the network. One of the largest classes of applications that we’ll attach to the network are the modeling, in particular the nowcast and forecast modeling. Happening at scale Through those observations about the ocean now, about what the ocean will be, and to be able to ground-truth those models going forward, based on data arriving in the same time as the forecasts, provides for a broad range of modeling that has been done for a fair amount of time, but it now allows it to happen at scale. Once you have that ability to actually model the oceans and predict where it’s going, you can use that to refocus the instrumentation on emergent events. It's this ability to have long-term presence in the ocean, and the ability to refocus the instrumentation on emergent events, that really represents the revolutionary change in the formation of this infrastructure. Meisinger: Let me add, I'm very fascinated by The Hubble Space Telescope as something that produces fantastic imagery and fantastic insights into the universe. For me as a computer scientist, it’s often very difficult to imagine what users of the system would do with the system. I’d like to see the OOI as a platform that’s developed by the experts in their fields to deploy the platforms, the buoys, the cables, the sensors into the ocean that then enables the users of the system over 25 years to produce unprecedented knowledge and results out of that system. The primary mission of our project is to provide this platform, the space telescope in the ocean. And it’s not a single telescope. In our case, it's a set of 65 buoys, locations in the ocean, and even a cable that runs a 1,000 miles at the seafloor of the Pacific Northwest that provides 10 gigabit ethernet connectivity to the instrument, and high power. It’s a model where scientists have to compete. They have to compete for a slot on that infrastructure. They'll have to apply for grants and they'll have to reserve the spot, so that they can accomplish the best scientific discoveries out of that system.
  • 9. It’s kind of the analogy of the space telescope that will bring ocean scientists to the next level. This is our large platform, our large infrastructure that have the best scientists develop and research to best results. That’s the fascination that I see as part of this project. Gardner: For the average listener to understand, is this comparable to tracking weather and the climate on the surface? Many of us, of course, get our weather forecasts and they seem to be getting better. We have satellites, radar, measurements, and historical data to compare, and we have models of what weather should do. Is this in some ways taking the weather of the oceans? Is it comparable? Arrott: Quite comparable. There's a movement to instrument the earth, so that we can understand from observation, as opposed to speculation, what the earth is actually doing, and from a notion of climate and climate change, what we might be doing to the earth as participants on it. The weather community, because of the demand for commercial need for that weather data, has been well in advance of the other environmental sciences in this regard. What you'll find is that OOI is just one of several ongoing initiatives to do exactly what weather has done. The work that I did at NCSA, was with the atmospheric sciences community was very clear at the time. What could they do if they had the kind of resources that we now have here in the 21st Century? We've worked with them and modeled much of our system based on the systems that they built, both in the research area, and in the operational area in programs such as Nova. Science more mature Gardner: So, in a sense, we're following the path of what we’ve done with the weather, and understanding the climate on land. We’re now moving into the oceans, but at a time when the computer science is more mature, and in fact, perhaps even more productive. Back to you Alexis Richardson. This is being sponsored by the US National Science Foundation, so being cost efficient is very important of course. How is it that cloud computing is being brought to bear, making this productive, and perhaps even ahead of where the whole weather and predicting weather has been, because we can now avail ourselves of some of the newer tools and models around data and cloud infrastructure? Richardson: Happily, that’s an easy one. Imagine if a person or scientist wanted to process very quickly a large amount of data that’s come from the oceans to build a picture of the climate, the ocean, or anything to do with the coastal proprieties of the North American coast. They might need to borrow 10,000 or 20,000 machines for an hour, and they might need to have a vast amount of data readily accessible to those machines. In the cloud, you can do that, and with big data technologies today, that is a realistic proposition. It was not 5-10 years ago. It’s that simple.
  • 10. Obviously, you need to have the technologies, like this messaging that we talked about, to get that data to those machines so they can be processed. But, the cloud is really there to bring it altogether and to make it seem to the application owner like something that’s just ready for them to acquire it, and when they don’t need it anymore, they can put it back and someone else can use it. Gardner: Back to you Michael. How do you view the advent of cloud computing as a benefit to this sort of initiative? We’ve got a piece of it from Alexis, but I’d like to hear your perspective on why cloud models are enabling this perhaps at an unprecedented scale, but also at a most efficient cost? Meisinger: Absolutely. It does enable computing at unprecedented scale for exactly reasons that Alexis mentioned. A lot of the earth's environment is changing. Assume that you’re interested in tracking the effect of a hurricane somewhere in the ocean and you’re interested in computing a very complex numerical model that provides certain predictions about currents and other variables of the ocean. You want to do that when the hurricane occurs and you want to do it quickly. Part of the strategy is to enable quick computation on demand. The OOI architecture, in particular, its common execution infrastructure subsystem, is built in order to enable this access to computation and big data very quickly. You want to be able to make use of execution provider’s infrastructure as a service very quickly to run your own models with the infrastructure that the OOI provides. Then, there are other users that want to do things more regularly, and they might have their own hardware. They might run their own clusters, but in order to be interoperable, and in order to have excess overflow capabilities, it’s very important to have cloud infrastructure as a means of making the system more homogenous. So the cloud is a way of abstracting compute resources of the various participants of the system, be they commercial or academic cloud computing providers or institutions that provide their own clusters as cloud systems, and they all form a large compute network, a compute fabric, so that they can run the computation in a predictable way, but also then in a very episodic way. Cloud as enabler I really see that the cloud paradigm is one of the enablers of doing this very efficiently, and it enables us as a software infrastructure project to develop the systems, the architecture, to actually manage this computation from a system’s point of view in a central way. Gardner: Alexis, because of AMQP and the VMware Cloud Application Platform, it seems to me that you’ve been able to shop around for cloud resources, using the marketplace, because you’ve allowed for interoperability among and between platforms, applications, tools, and frameworks.
  • 11. Is it the case that leveraging AMQP has given you the opportunity to go to where the compute resources are available at the lowest cost when that’s in your best interest? Richardson: The dividend of interoperability for the end user and the end customer in this platform environment is ultimately portability -- portability through being able to choose where your application will run. Michael described it very well. A hurricane is coming. Do you want to use the machines provided by the cloud provider here for this price? Do you want to use your own servers? Maybe your neighboring data center has servers available to you, provided those are visible and provided there is this fundamental interoperability through cloud platforms of the type that we are investing in. Then, you will be able to have that choice. And that lets you make these decisions in a way that you could not do before. Gardner: I’m afraid we’re almost out of time, but I want to try to compare this to what this will allow in other areas. It’s been mentioned by Alexis and others that this has got some common features to Twitter, Facebook, or Zynga. We think of the social environment because of the scale, complexity, and the use of cloud models. But we’re doing far more advanced computational activities here. This is simply not a display of 140 characters, based on a very rudimentary search, for example. These are high-performance computing level, supercomputer-level types of requests and analysis. So are we combining the best of a social fabric approach and the architecture behind that to what we’ve been traditionally exposed to in high-performance computing and supercomputing? If so, what does that mean for how we could bring this to other types of uses in the future. I’ll throw this out to any of you? How are we doing the best of the old and the new, and what does that mean for the future? Meisinger: This is the direction in which the future will evolve, and it’s the combination of proven patterns of interaction that are emerging out of how humans interact applied to high- performance computing. Providing a strong platform or a strong technological footprint that’s not specific to any technology is a great benefit to the community out there. Providing a reference architecture and a reference implementation that can solve these problems, that social network for sensor networks and for device computation will be a pattern that can be leveraged by other interested participants, either by participating in the system directly or indirectly, where it’s just taking that pattern and the technologies that come with it and basically bringing it to the next level in the future. Developing it as one large project in a coherent set really yields a technology stack and architecture that will carry us far into the future. Arrott: With all the incremental change that we're introducing is taking the concepts of Facebook and of Twitter and the notions of Dropbox which is my ability to move a file to a shared place so someone else can pick it up later, which was really not possible long ago. I had to do an FTP server, put up an HTTP server to accomplish that.
  • 12. Sharing processes What we are now adding to the mix is not sharing just artifacts, but we’re actually sharing processes with one another and then specifically sharing instrumentation. I can say to you, "Here, have a look through my telescope." You can move it around and focus it. Basically, we introduced the concept of artifacts or information resources, as well as the concept of a taskable resource, and the thing that we’re adding to that which can be shared are taskable resources. Gardner: I’m just going to throw out a few blue-sky ideas that it seems this could be applicable to things like genetics and the human genome, but on an individual basis. Or crime statistics, in order to have better insight into human behavior at a massive scale. Or perhaps even healthcare, where you’re diagnosing specific types of symptoms and then correlating them across entire regions or genetic patterns that would be brought to bear on those symptoms. Am I off-base? Is this science fiction? Or am I perhaps pointing to where this sort of capability might go next? Richardson: The answer to your question is yes, if you add one little phrase into that: in real time. If, you’re talking about crime statistics, as events happen on the streets, information is gathered and shared and processed. As people go on jobs, if information is gathered, shared, and processed on how people are doing, then you will be able to have the kind of crime or healthcare benefits that you described. I’m sure we could think of lots of use cases. Transport is another one. Arrott: At the institution in which the OOI Cyberinfrastructure is housed, California Institute of Telecommunication and Information Technology (Calit2), all of the concerns that you’ve mentioned are, in fact, active development research programs, all of which have yielded significant improvements in the computational environment for that scientific community. Gardner: Michael, last word to you. Where do you see this potentially going in terms of the capability? Obviously, it's a very important activity with the oceans. But the methods that you’re defining, the implementations that you’re perfecting, where do you see them being applied in the not-too-distant future? Meisinger: You’re absolutely right. This pattern is very applicable and it’s not that frequent that a research and construction project of that size has an ability to provide an end-to-end technology solution to this challenge of big data combined with real-time analysis and real-time command and control of the infrastructure. What I see that’s evolving into is, first of all, you can take the solutions build in this project and apply it to other communities that are in need for such a solution. But then it could go further. Why not combine these communities into a larger system? Why not federate or connect all these
  • 13. communities into a larger infrastructure that all is based on common ideas, common standards, and that still enables open participation. It’s a platform where you can plug in your own system or subsystem that you can then make available to whoever is connected to that platform, whoever you trust. So it can evolve into a large ecosystem, and that does not have to happen under the umbrella of one organization such as OOI. Larger ecosystem It can happen to a larger ecosystem of connected computing based on your own policies, your own technologies, your own standards, but where everyone shares a common piece of the same idea and can take whatever they want and not consume what they’re not interested in. Gardner: And as I said earlier at that very interesting intersection of where you can find the most efficient compute resources available and avail yourself of them with that portability, it sounds like a really powerful combination. We’ve been talking about how the Ocean Observatories Initiative and its accompanying Cyberinfrastructure Program have been not only feeding the means for the ocean to be better understood and climate interaction to be better appreciated, but we’re also seeing how the architecture behind that is leading to the potential for many other big data, cloud fabric, real- time, compute-intensive applications. I’d like to thank our guests. We’ve been joined by Matthew Arrott. He is the Project Manager at the OOI and the initiative for the Cyberinfrastructure. Thank you so much, Matthew. Arrott: Thank you. Gardner: We’ve also been joined by Michael Meisinger. He is the Managing Systems Architect for the OOI Cyberinfrastructure. Thank you, Michael. Meisinger: Thanks, Dana. Gardner: And Alexis Richardson, the Senior Director for VMware Cloud Application Platform. Thank you, Alexis. Richardson: Thank you very much. Gardner: And this is Dana Gardner, Principal Analyst at Interarbor Solutions. Thanks to you, our audience, for listening, and come back next time. Listen to the podcast. Find it on iTunes/iPod. Sponsor: VMware
  • 14. Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer researchers a treasure trove of new real-time information. Copyright Interarbor Solutions, LLC, 2005-2012. All rights reserved. You may also be interested in: • Case Study: Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off for Australia's SAI Global • Case Study: Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off for Australia's SAI Global • Virtualization Simplifies Disaster Recovery for Insurance Broker Myron Steves While Delivering Efficiency and Agility Gains Too • SAP Runs VMware to Provision Virtual Machines to Support Complex Training Courses • Case Study: How SEGA Europe Uses VMware to Standardize Cloud Environment for Globally Distributed Game Development • Germany's Largest Travel Agency Starts a Virtual Journey to Get Branch Office IT Under Control