SlideShare a Scribd company logo
1 of 4
Download to read offline
®




CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT
Winter 2009 | Ping Li | ping@accel.com

Cloud computing has been generating considerable hype these                                  from this exponential data growth – as inexpensively as
days. Every participant in the datacenter and IT ecosystem has                               possible.
been rolling out “cloud” initiatives and strategies from hardware
vendors, ISVs, SaaS providers, and Web 2.0 companies - start-                                Previous computing platform transitions had technology
ups and incumbents are equally active.                                                       dislocations similar to cloud computing but along different
                                                                                             dimensions. The shift from mainframe to client-server was
Cloud computing promises to transform IT infrastructure and                                  fueled by disruptive innovation in computing horsepower that
deliver scalability, flexibility, and efficiency, as well as new                             enabled distributed microprocessing environments. The
services and applications that were previously unthinkable.                                  following shift to web applications/web services during the last
Despite all of this activity, cloud computing remains as                                     decade was enabled by the open networking of applications and
amorphous today as its name suggests. However, one critical                                  services through the internet buildout. While cloud computing
trend shines through the cloud – Big Data. Indeed, it’s the core                             will leverage these prior waves of technology – computing and
driver in cloud computing and will define the future of IT.                                  networking – it will also embrace deep innovations in storage/
                                                                                             data management to tackle big data.
BIG DATA – THE PERFECT STORM
                                                                                             Along these lines, many of the early uses of cloud computing
Cloud computing has been driven fundamentally by the need to                                 have been focused less on “computing” and more on “storage.”
process an exploding quantity of data. Data is no longer measured                            For example, a significant portion of the initial applications on
in gigabytes but in exabytes as we are “Approaching the                                      AWS were primarily leveraging just S3 with applications
ZettaByte Era.”1 Moreover, data types – structured, semi-                                    executing behind the firewall. Popular storage applications, like
structured, or unstructured – continue to proliferate at an                                  Jungle Disk and Smug Mug, were early AWS customers. This
alarming rate as more information is digitized, from family                                  explosion of data has driven enterprises (and consumers for
pictures to historical documents to genome mapping to financial                              that matter) to find cheap, on-demand storage in unlimited
transactions to utility metering. The list is truly unbounded. But                           quantities – which cloud storage promises to deliver. Until
today, data is not only being generated by users and applications.                           now, massive tape archives in the middle of nowhere (like Iron
It is increasingly being “machine-generated,” and such data is                               Mountain) have been the only means to achieve that cheap
exponentially leading the charge in the Big Data world. In a                                 storage. However, enterprises today need more; they need
recent article, The Economist called this phenomenon the “Data                               quick access data retrieval for multiple reasons, from
Deluge” (http://www.economist.com/opinion/displaystory.cfm?                                  compliance to business analytics. It is simply no longer
story_id=15579717).                                                                          sufficient to have “cold” data; rather, it needs to be online and
                                                                                             resilient (and cheap, of course); hence, the accelerating shift
One can argue that Web 2.0 companies have been pushing the                                   towards storing every piece of data in memory or on disks
upper bounds of large-scale data processing more than anyone.                                (Data Domain smartly rode this trend).
That being said, this data explosion is not sparing any vertical
industries – financial, health care, biotech, advertising, energy,                           The need to balance data availability/usability and cost
telecom, etc. All are grappling with this perfect storm. Below are                           effectiveness has prompted significant innovation in both on-
just a few stats:                                                                            premise and hosted cloud storage – cloud storage systems
                                                                                             (Caringo, EMC Atmos, and ParaScale, to name just a few),
    • Google was processing two years ago more than 400PB of
                                                                                             flash-based storage systems (Fusion IO, Nimble Storage,
      data/month in just one application
                                                                                             Pliant, etc.) – are just some current examples. Furthermore,
    • The New York Times is processing an 11-million-story
                                                                                             hierarchical storage management (HSM, which has always
      archive dating back to 1851
                                                                                             sounded great but has been implemented only rarely) will
    • eBay processes more than 50TB/day in its data warehouse                                become an important element in storage workflows.
    • CERN is processing 2GB/second for their most recent                                    Enterprises will require seamless capability to move data
      particle accelerator                                                                   across different tiers of storage (both on-premise and into the
    • Facebook crunches 15TB/day into a 2.5PB data warehouse                                 cloud) based on policy and data type to maximize retrieval
                                                                                             costs. As cloud computing matures, true cloud applications will
Without question, data represents the competitive advantage of                               be (re)written to leverage hierarchical and cloud-like storage
any enterprise, and every organization is now encumbered with                                tiers to retrieve data dynamically from different storage layers.
the task of storing, managing, analyzing, and extracting value

                                                                                                                                                                 Page 1
1
 Source: “Approaching the Zettabyte Era.” Cisco, 16 June 2008. <http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-
481374_ns827_Networking_Solutions_White_Paper.html>
A NEW CLOUD STACK                                                       point network and data level security, although high bandwidth
                                                                        encryption solutions and sophisticated key management will be
In order for cloud computing to become a mainstream approach,           needed to match the massively parallel computational cloud
a new “cloud” stack (like mainframe and OSI) will likely emerge.        environments. In this case, the primary security challenges will
Just like prior computing platform transitions (client/server, web      stem from “control.” User authentication will become
services, etc.), core platform capabilities, such as security, access   increasingly challenging as applications are federated outside
control, application management, virtualization, systems                the firewall because of SaaS adoption. In addition, managing
management, provisioning, availability, etc. will be a prerequisite     and reconciling user identities across individual user directories
before IT organizations are able to adopt the cloud completely.         for each SaaS/Cloud application will present further security
                                                                        issues. Much like web applications in the 90s created an SSO
Clearly, this stack will exist in a different representation than       layer, cloud computing is essentially abstracting a web services
prior platform layers to embrace a cloud environment. Simply            interface for infrastructure IT, and it will demand a similar
replicating the current computing stack but allowing it to reside       unified “authentication/entitlement layer.”
off-premise will not achieve the scale, capabilities, and
economies of cloud computing. In particular, this new cloud             In addition to federated user authentication, cloud computing
framework needs the ability to process data in increasingly             will also require “data” authentication and security. Imperva’s
greater orders of magnitude – and do it at a fraction of the cost –     database firewall is an example of an increasingly important
by leveraging commodity, multi-threaded servers for storage and         cloud security product. As applications reside in different
computing. In many ways, this cloud stack has been implemented          public and private clouds, it will be critical for the cloud
already, albeit in a primitive form, at large-scale internet            applications to be able to “talk” to each other. This will drive
datacenters.                                                            the need for ensuring data authentication and policy control for
                                                                        the volumes of data flowing between cloud applications.
The challenge of processing terabytes of data daily at Google,          Moreover, given the multi-tenancy paradigm of cloud
Facebook, and Amazon drove them to adopt a new data                     environments, policy granularity will be paramount to ensure
architecture, which is essentially Martian to traditional enterprise    security and compliance. Data integration across cloud
datacenter architects. No longer are ACID and relational                platforms will be more of an obstacle than application
databases back-ending transactional applications. Internet              integration, as applications have become more open/standard.
datacenters quickly encountered the scaling limitations of SQL          Standard “data” APIs will emerge as part of the new cloud
databases as the volume of data exploded. Instead, high-                stack to allow disparate environments to talk to each other and
performance, scalable/distributed non-SQL data stores are being         avoid vendor lock-in. Data migration challenges are perhaps
developed internally and implemented at scale. Big Table and            the greatest factor today for locking users to a particular cloud
Cassandra are among the many variants, and this “non-database           platform.
database” trend has proliferated to the point of having its own
conference: NoSQL. Database caching layers (i.e., Northscale’s          Over time, these APIs and layers will harden and will become
Memcached) are also being implemented to further drive                  tailored, depending on use case and workload for particular
application performance, and its now accepted as a “standard”           applications. The adoption of these new frameworks will
tier in datacenters.                                                    ultimately make cloud computing “safe” and broaden its
                                                                        penetration into enterprises of all sizes.
Managing non-transactional data has become even more
daunting. From log files to click stream data to web indexing,
internet data centers are collecting massive volumes of data that       WHAT’S BREWING IN A CLOUD?
need to be processed cheaply in order to drive monetization
value. Hadoop is an open source data management framework               Despite constant comparisons to grid and utility computing,
that has become widely deployed for massive parallel                    cloud computing has the potential to address a much broader
computation and distributed file systems in a cloud environment.        set of applications and use cases beyond the limited HPC
Hadoop has allowed the largest web properties (Yahoo!,                  environments served traditionally by grid computing. This
LinkedIn, Facebook, etc.) to store and analyze any data in near         breadth of cloud computing is engendered in a new set of
real-time at a fraction of the cost that traditional data               underlying technology forces. Virtualization technologies,
management and data warehouse approaches could even                     high-powered commodity servers, low-cost/high bandwidth
contemplate. Although the framework has roots in internet               connectivity, concurrent/multi-threaded programming models
datacenters, Hadoop is quickly penetrating broader enterprise use       and open source software stacks are all technology building
cases. The diverse set of participants at Hadoop World NYC              blocks that can deliver the high performance and scalability of
hosted by Cloudera clearly points to this trend.                        grid/utility computing, but importantly – and concurrently –
                                                                        with underlying commodity resources.
SECURING THE CLOUD                                                      These technology drivers enable applications and users to be
                                                                        abstracted cleanly from particular IT infrastructure resources
Given this data intensive nature, any widely adopted cloud              (computing, storage, networking, etc.) in new and powerful
computing platform will inevitably account for richer security          ways; i.e., location agnostic and multi-tenancy are two critical
requirements. The security challenges will be focused less on

                                                                                                                                  Page 2
elements among others. Unlike traditional HPC grid                    a powerful trend in the role of developers in driving cloud
environments, which were designed for a specific application in a     computing adoptions. Many early users of cloud computing are
single company, cloud computing enables disparate applications        examples of developers launching applications without
and entities to harness a shared pool of resources. In addition,      requiring the involvement of IT (in the case of a Web 2.0 start-
applications can be “broken up” in the cloud where computing          up, they don’t have an IT department). Increasingly,
resources may reside on the client while the data is accessed         empowering developers and line of business owners to
portably from multiple cloud locations (as an example).               innovate and deploy new applications without the shackles of
                                                                      IT will be a motivating driver for cloud adoption. No longer do
Many different definitions of cloud computing have surfaced.
                                                                      users need to have IT’s blessing and time to get their job done.
Rather than posit yet another, several characteristics are resident
                                                                      This developer-centric nature was a primary motivator of
in any cloud instance: (i) self-provisioned (either by user,
                                                                      VMware’s strategic acquisition of SpringSource. In addition to
developer, or IT); (ii) elasticity (on-demand allocation of any
                                                                      inheriting significant Java technology, VMware now has a
computing, storage and networking resources); (iii) multi-
                                                                      distinct opportunity to transition SpringSource’s dominant Java
“anything” (multi-user, multi-application, multi-session, etc.);
                                                                      developer mindshare to develop onto VMware’s private cloud
and (iv) portability (applications are abstracted from physical
                                                                      platform. Amazon Web Services has experienced tremendous
infrastructure and can be migrated easily). These capabilities
                                                                      success from its developer-centric platform APIs. Unlike
allow enterprise to shift IT resources from capex to opex – a
                                                                      traditional hosting providers that cater to IT/operations,
usage based model that is particularly appealing during recent
                                                                      Amazon went after developers first and has only recently
economic constraints.
                                                                      begun to add the functionality that will appeal to broader
These cloud prerequisites will yield a powerful a set of use cases    enterprise IT.
beyond grid computing that are unique to cloud platforms. Cloud
computing will reach its full potential in the future when a whole    Within enterprises, there are early signs of developers (Q&A
new set of applications (never possible before) is created that is    environments, batch processing, and developer prototyping)
purpose-built for the cloud. For example, one can envision            and line of business/departmental leveraging cloud computing.
powerful collaboration applications emerging that enable internal     It is not uncommon for new platform technologies to start at
enterprise and external users to seamlessly cooperate that would      the “fringes” of IT before mainstream adoption takes place.
have been previously impossible with users and data isolated on       Unlike typical three-tier “traditional” enterprise datacenters, the
disparate enterprise islands.      It’s likely these innovative       internet datacenters of Facebook, Google, etc. were not
applications will require new programming models and                  encumbered by legacy enterprise stacks, applications, and IT
potentially languages yet to be hardened.                             rules; which in turn enabled them to be built from the ground
                                                                      up with cloud stacks to handle elastically large-scale consumer
                                                                      transactions for multiple applications. Therefore, and
STILL IN THE EARLY DAYS                                               unsurprisingly, Amazon’s internet datacenters was easily
Despite the high energy surrounding cloud computing and early         adapted to become the first and leading “public computing”
cloud offering successes, such as Amazon Web Services, cloud          provider. It will certainly take significant time/effort for
computing for enterprise services is definitely still in its          enterprise IT infrastructure gatekeepers to evolve their current
formative stages. In contrast, however, consumers have already        architectures to embrace a new cloud platform. Luckily,
adopted cloud computing technologies. One could argue that web        enterprises can reap the technology innovation from internet
companies like Google, Yahoo!, Facebook, and Salesforce are           data centers (many which are open source) to accelerate this
examples of consumers leveraging cloud computing. These Web           transition.
2.0/SaaS offerings clearly exhibit the core cloud characteristics
outlined above, and in turn are delivering new, value-added           MORE THAN ONE FLAVOR
services previously considered unthinkable. Interestingly, this
time the consumers, via their use of Web 2.0 services, have been      There have been analogies drawn between cloud computing
teaching the typically early technology adopter enterprises the       and public utilities (electric, gas, etc.) where the value is all
effectiveness of cloud computing.                                     about economies of scale. According to this hypothesis, the
                                                                      world will only have a few cloud providers that reach
Today, the enterprise use of cloud computing represents opposite
                                                                      maximum efficient scale. It is quite unlikely that this will
ends of the spectrum: (i) Web 2.0 start-ups seeking to launch
                                                                      happen. Multiple cloud models will emerge depending on the
applications quickly and cheaply, and (ii) compute intensive
                                                                      user, the workload, and the application. For example, certain
enterprises that need batch processing for bursty, large-scale
                                                                      developers will prefer to interface with a cloud provider at a
applications. Although these users are driving the early adoption
                                                                      higher level of abstraction, such as Google App Engine, as
of cloud technology, it’s unlikely these limited use cases will
                                                                      opposed to a more bare metal API, such as Rackspace.
establish cloud computing as a pervasive platform. Cloud
                                                                      Alternatively, an application may choose to run on MSFT
computing instead will need to penetrate mainstream IT
                                                                      Azure to leverage SQL/MSFT services or Salesforce Force for
infrastructure slowly and offer a broader set enterprise
                                                                      CRM integration and distribution advantages. Today, one can
applications.                                                         break cloud platforms into roughly two camps: developer-
It is important to note here that these Web 2.0 start-ups represent   centric (Amazon, MSFT) and IT-centric (EMC, VMware).
                                                                                                                                 Page 3
Cloud platforms will remain distinct and diverse as long as they       Fellow, Yahoo! Research: “So a lot of the companies that are
continue to deliver unique value-add for their particular use cases    out there today – Yahoo!, Facebook, Google – they’re all
and users.                                                             exposing data APIs. Imagine what’s going to happen once
                                                                       large clouds are routinely available to build they’re own
To drive this cloud diversity point further, the concept of a “cloud
                                                                       application and you start aggregating your own data, and you
within a cloud” is also emerging where distinct services,
                                                                       have the opportunity to fuse that with all the data that’s out
such as data warehousing, can be built atop a more generic cloud       there. Someone’s going to figure out the next big thing, by
platform to provide a higher layer cloud service.                      taking 2 + 2 and coming up with 20.”
In addition, “private clouds” behind the firewalls present yet         Mike Schroepfer, VP Engineering, Facebook: “…one of the
another flavor of cloud computing as enterprises leverage the          things that is going to happen is that people are going to figure
benefits of cloud frameworks while maintaining security/control        out that we need a more blended workload between the cloud
as well as the compliance of their internal datacenters. Lastly,       and the client. We’ve been operating kind of in the cycle of
hybrid clouds that bridge private and public clouds on a               reincarnation and computer science, moved toward most of the
permanent and temporary basis (also known as “cloud bursting”)         computing happening in the cloud, and my browser effectively
will come to fruition for certain applications or as a migration       being it’s own terminal. You know, in the last 2 or 3 years, the
path for enterprises. Several start-ups (Cirtas, CloudSwitch and       speed and capability of browsers has been outpacing that of
Zetta among them) are building products that make the cloud            most chips. You’re seeing 2x to 4x improvements in core
“safe” for enterprises. Innovation will abound to solve the            performance on the engines and VMs in those browsers year on
specific issues in all of these various cloud environments.            year, which is way outpacing the speed of chip design…So I
                                                                       believe that there will be a couple of people who will figure out
LOOKING AHEAD                                                          ways to blend computation and storage on the client, more
                                                                       gracefully with that on the server, but still provide you with all
To further parse all this, I hosted a cloud computing panel with an    of the benefits of basically access to my data anywhere I need,
esteemed group of technology thought leaders at Accel’s 15th           and the kind of reliability of the cloud.”
Stanford Technology Symposium. Needless to say, these
panelists had plenty of deep insights, opinions, and predictions       Jayshree Ullal, President and CEO, Arista Networks: “Well,
about cloud computing.                                                 there’s a technology impact but I actually think it’s going to
                                                                       really make CIO’s rethink their jobs. Today, you can have a
The panel brought together technologists who view cloud                server administrator, an application administrator, a network
computing from distinctly different lenses: private cloud              administrator, and they’re all silos… but you need your general
innovators, public cloud providers, cloud enabling technology          practitioner. And that’s really missing right now in the cloud.
solutions and cloud infrastructure applications. In wrapping up        So if I had to make a prediction, less on the technology, more
the panel session, I asked each speaker to conjure up a single         on the operational side, I would say for the deployment of this,
prediction for cloud computing in the next few years. Here’s what      it’s got to be a generalized IT person, whether that’s the CIO or
the experts said:                                                      somebody he or she appoints…”

Jonathan Bryce, CTO/Founder, Mosso (Rackspace): “…I think              Rich Wolski, Professor of Computer Science, University of
cloud computing is going to be a mindshift; it’s going to take a       California, Santa Barbara and CTO/Founder, Eucalyptus
while. But I think an economy like this is actually a huge             Systems: “…there’s another revolution coming that’s going to
opportunity for entrepreneurs…I think this is a time when              intersect the cloud revolution and that has to do with data
resources are scarce – that’s when great businesses end up getting     simulation…pretty much everything you own is going to be
built. And I think part of what’s going to enable some of those        trying to send you data. And you’re going to need, personally,
businesses is cloud computing, and being able to get started with      a great deal of storage and compute capacity to be able to deal
a lower varied entry, lower price point, all of those kind of          with that. I think the cloud is going to make that revolution that
things…”                                                               much quicker to come to us.”

Mike Olson, CEO/Co-founder, Cloudera: “I think that a lot of           These predictions depict cloud computing as still being in its
what’s been said around here about data is really right on. I          formative phases, but that it will emerge as fundamental
predict that in the next 10 years, computer science as computer        breakthroughs in datacenter and IT infrastructure in the years to
science isn’t really going to be the place that smart young guys       come. Despite the current macro headwinds, deep innovation,
are going to find tremendously rewarding careers. I think that the     and market opportunities in cloud computing will persist. Once
application of these new compute systems to large data in the          this economic storm passes, I’m convinced the sun will shine
sciences will advance human kind substantially. I think that           through, and cloud computing is sure to have many silver
science will be done maybe not even in the lab on the wet bench        linings.
anymore, but with data, with computer systems looking at vast
amounts of data….”                                                     Ping Li is a partner at Accel Partners in Palo Alto
                                                                       and focuses primarily on Information Technology
Raghu Ramakrishnan, Chief Scientist for Audience and Research          infrastructure and digital media platforms.
                                                                                                                                 Page 4

More Related Content

What's hot

How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
Vladimir Bacvanski, PhD
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmond
lowedmond
 
Virtualworks - Ebook
Virtualworks - EbookVirtualworks - Ebook
Virtualworks - Ebook
trulsjeppe
 

What's hot (20)

Demystifying Cloud Computing
Demystifying Cloud Computing Demystifying Cloud Computing
Demystifying Cloud Computing
 
cloud computing
cloud computingcloud computing
cloud computing
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
 
Cloud computing tarea
Cloud computing tareaCloud computing tarea
Cloud computing tarea
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmond
 
IIR Congres ICT & Recht - Cloud Computing - Peter de Haas - Microsoft - 20-04...
IIR Congres ICT & Recht - Cloud Computing - Peter de Haas - Microsoft - 20-04...IIR Congres ICT & Recht - Cloud Computing - Peter de Haas - Microsoft - 20-04...
IIR Congres ICT & Recht - Cloud Computing - Peter de Haas - Microsoft - 20-04...
 
Virtualworks - Ebook
Virtualworks - EbookVirtualworks - Ebook
Virtualworks - Ebook
 
Bni cloud presentation
Bni cloud presentationBni cloud presentation
Bni cloud presentation
 
How to create a secure high performance storage and compute infrastructure
 How to create a secure high performance storage and compute infrastructure How to create a secure high performance storage and compute infrastructure
How to create a secure high performance storage and compute infrastructure
 
Partly Sunny with a Chance of Rain II: Forecasting the Legal Issues in Cloud ...
Partly Sunny with a Chance of Rain II: Forecasting the Legal Issues in Cloud ...Partly Sunny with a Chance of Rain II: Forecasting the Legal Issues in Cloud ...
Partly Sunny with a Chance of Rain II: Forecasting the Legal Issues in Cloud ...
 
computing
computingcomputing
computing
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
Cloud computing white paper
Cloud computing white paperCloud computing white paper
Cloud computing white paper
 
The Distributed Cloud
The Distributed CloudThe Distributed Cloud
The Distributed Cloud
 
Widespread Cloud Adoption: What's Taking So Long?
Widespread Cloud Adoption: What's Taking So Long?Widespread Cloud Adoption: What's Taking So Long?
Widespread Cloud Adoption: What's Taking So Long?
 
Cloud computing applicatio
Cloud  computing  applicatioCloud  computing  applicatio
Cloud computing applicatio
 
Briefing 47
Briefing 47Briefing 47
Briefing 47
 
PRESENTATION ON CLOUD COMPUTING
PRESENTATION ON CLOUD COMPUTINGPRESENTATION ON CLOUD COMPUTING
PRESENTATION ON CLOUD COMPUTING
 
Cloud computing..
Cloud computing..Cloud computing..
Cloud computing..
 

Similar to Cloud Computing Big Data Is Future Of It

Economics of Cloud Computing_Jim Cooke
Economics of Cloud Computing_Jim CookeEconomics of Cloud Computing_Jim Cooke
Economics of Cloud Computing_Jim Cooke
Jim Cooke
 
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
IBM India Smarter Computing
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
João Gabriel Lima
 
Cloud Computing With SAS
Cloud Computing With SASCloud Computing With SAS
Cloud Computing With SAS
white paper
 

Similar to Cloud Computing Big Data Is Future Of It (20)

Cloud scenario infrastructure in Data Center
Cloud scenario infrastructure in Data CenterCloud scenario infrastructure in Data Center
Cloud scenario infrastructure in Data Center
 
Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.
 
Cloud Infrastructure for Your Data Center
Cloud Infrastructure for Your Data CenterCloud Infrastructure for Your Data Center
Cloud Infrastructure for Your Data Center
 
Economics of Cloud Computing_Jim Cooke
Economics of Cloud Computing_Jim CookeEconomics of Cloud Computing_Jim Cooke
Economics of Cloud Computing_Jim Cooke
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
cloud computing, touch screen, dms and cores
cloud computing, touch screen, dms and corescloud computing, touch screen, dms and cores
cloud computing, touch screen, dms and cores
 
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
Scale-Out Network-Attached Storage Addresses Storage Problems for Private Clo...
 
How Software.docx
How Software.docxHow Software.docx
How Software.docx
 
First step to the cloud white paper
First step to the cloud white paperFirst step to the cloud white paper
First step to the cloud white paper
 
E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2
 
Cloud Computing Essays
Cloud Computing EssaysCloud Computing Essays
Cloud Computing Essays
 
Openstack
OpenstackOpenstack
Openstack
 
Cloud infrastructure; Public or Private? A cost perspective
Cloud infrastructure; Public or Private? A cost perspectiveCloud infrastructure; Public or Private? A cost perspective
Cloud infrastructure; Public or Private? A cost perspective
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
 
Cloud Computing With SAS
Cloud Computing With SASCloud Computing With SAS
Cloud Computing With SAS
 
Rio Info 2010 - Seminário de Tecnologia - Integracao de Servicos - Cesar Taur...
Rio Info 2010 - Seminário de Tecnologia - Integracao de Servicos - Cesar Taur...Rio Info 2010 - Seminário de Tecnologia - Integracao de Servicos - Cesar Taur...
Rio Info 2010 - Seminário de Tecnologia - Integracao de Servicos - Cesar Taur...
 
Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020Seven data storage & networking trends in 2020
Seven data storage & networking trends in 2020
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the Cloud
 
Cloud Computing Essay
Cloud Computing EssayCloud Computing Essay
Cloud Computing Essay
 

More from Aman Ghei

First Annual Excellence In Investing Conference
First Annual Excellence In Investing ConferenceFirst Annual Excellence In Investing Conference
First Annual Excellence In Investing Conference
Aman Ghei
 

More from Aman Ghei (6)

Top 8 Digital Health Trends
Top 8 Digital Health TrendsTop 8 Digital Health Trends
Top 8 Digital Health Trends
 
Finch Capital FinTech Prediction 2019
Finch Capital FinTech Prediction 2019Finch Capital FinTech Prediction 2019
Finch Capital FinTech Prediction 2019
 
Finch Capital Predictions 2018
Finch Capital Predictions 2018Finch Capital Predictions 2018
Finch Capital Predictions 2018
 
Finch Capital 2018 Predictions Summary
Finch Capital 2018 Predictions SummaryFinch Capital 2018 Predictions Summary
Finch Capital 2018 Predictions Summary
 
Payments 2 0
Payments 2 0Payments 2 0
Payments 2 0
 
First Annual Excellence In Investing Conference
First Annual Excellence In Investing ConferenceFirst Annual Excellence In Investing Conference
First Annual Excellence In Investing Conference
 

Cloud Computing Big Data Is Future Of It

  • 1. ® CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT Winter 2009 | Ping Li | ping@accel.com Cloud computing has been generating considerable hype these from this exponential data growth – as inexpensively as days. Every participant in the datacenter and IT ecosystem has possible. been rolling out “cloud” initiatives and strategies from hardware vendors, ISVs, SaaS providers, and Web 2.0 companies - start- Previous computing platform transitions had technology ups and incumbents are equally active. dislocations similar to cloud computing but along different dimensions. The shift from mainframe to client-server was Cloud computing promises to transform IT infrastructure and fueled by disruptive innovation in computing horsepower that deliver scalability, flexibility, and efficiency, as well as new enabled distributed microprocessing environments. The services and applications that were previously unthinkable. following shift to web applications/web services during the last Despite all of this activity, cloud computing remains as decade was enabled by the open networking of applications and amorphous today as its name suggests. However, one critical services through the internet buildout. While cloud computing trend shines through the cloud – Big Data. Indeed, it’s the core will leverage these prior waves of technology – computing and driver in cloud computing and will define the future of IT. networking – it will also embrace deep innovations in storage/ data management to tackle big data. BIG DATA – THE PERFECT STORM Along these lines, many of the early uses of cloud computing Cloud computing has been driven fundamentally by the need to have been focused less on “computing” and more on “storage.” process an exploding quantity of data. Data is no longer measured For example, a significant portion of the initial applications on in gigabytes but in exabytes as we are “Approaching the AWS were primarily leveraging just S3 with applications ZettaByte Era.”1 Moreover, data types – structured, semi- executing behind the firewall. Popular storage applications, like structured, or unstructured – continue to proliferate at an Jungle Disk and Smug Mug, were early AWS customers. This alarming rate as more information is digitized, from family explosion of data has driven enterprises (and consumers for pictures to historical documents to genome mapping to financial that matter) to find cheap, on-demand storage in unlimited transactions to utility metering. The list is truly unbounded. But quantities – which cloud storage promises to deliver. Until today, data is not only being generated by users and applications. now, massive tape archives in the middle of nowhere (like Iron It is increasingly being “machine-generated,” and such data is Mountain) have been the only means to achieve that cheap exponentially leading the charge in the Big Data world. In a storage. However, enterprises today need more; they need recent article, The Economist called this phenomenon the “Data quick access data retrieval for multiple reasons, from Deluge” (http://www.economist.com/opinion/displaystory.cfm? compliance to business analytics. It is simply no longer story_id=15579717). sufficient to have “cold” data; rather, it needs to be online and resilient (and cheap, of course); hence, the accelerating shift One can argue that Web 2.0 companies have been pushing the towards storing every piece of data in memory or on disks upper bounds of large-scale data processing more than anyone. (Data Domain smartly rode this trend). That being said, this data explosion is not sparing any vertical industries – financial, health care, biotech, advertising, energy, The need to balance data availability/usability and cost telecom, etc. All are grappling with this perfect storm. Below are effectiveness has prompted significant innovation in both on- just a few stats: premise and hosted cloud storage – cloud storage systems (Caringo, EMC Atmos, and ParaScale, to name just a few), • Google was processing two years ago more than 400PB of flash-based storage systems (Fusion IO, Nimble Storage, data/month in just one application Pliant, etc.) – are just some current examples. Furthermore, • The New York Times is processing an 11-million-story hierarchical storage management (HSM, which has always archive dating back to 1851 sounded great but has been implemented only rarely) will • eBay processes more than 50TB/day in its data warehouse become an important element in storage workflows. • CERN is processing 2GB/second for their most recent Enterprises will require seamless capability to move data particle accelerator across different tiers of storage (both on-premise and into the • Facebook crunches 15TB/day into a 2.5PB data warehouse cloud) based on policy and data type to maximize retrieval costs. As cloud computing matures, true cloud applications will Without question, data represents the competitive advantage of be (re)written to leverage hierarchical and cloud-like storage any enterprise, and every organization is now encumbered with tiers to retrieve data dynamically from different storage layers. the task of storing, managing, analyzing, and extracting value Page 1 1 Source: “Approaching the Zettabyte Era.” Cisco, 16 June 2008. <http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11- 481374_ns827_Networking_Solutions_White_Paper.html>
  • 2. A NEW CLOUD STACK point network and data level security, although high bandwidth encryption solutions and sophisticated key management will be In order for cloud computing to become a mainstream approach, needed to match the massively parallel computational cloud a new “cloud” stack (like mainframe and OSI) will likely emerge. environments. In this case, the primary security challenges will Just like prior computing platform transitions (client/server, web stem from “control.” User authentication will become services, etc.), core platform capabilities, such as security, access increasingly challenging as applications are federated outside control, application management, virtualization, systems the firewall because of SaaS adoption. In addition, managing management, provisioning, availability, etc. will be a prerequisite and reconciling user identities across individual user directories before IT organizations are able to adopt the cloud completely. for each SaaS/Cloud application will present further security issues. Much like web applications in the 90s created an SSO Clearly, this stack will exist in a different representation than layer, cloud computing is essentially abstracting a web services prior platform layers to embrace a cloud environment. Simply interface for infrastructure IT, and it will demand a similar replicating the current computing stack but allowing it to reside unified “authentication/entitlement layer.” off-premise will not achieve the scale, capabilities, and economies of cloud computing. In particular, this new cloud In addition to federated user authentication, cloud computing framework needs the ability to process data in increasingly will also require “data” authentication and security. Imperva’s greater orders of magnitude – and do it at a fraction of the cost – database firewall is an example of an increasingly important by leveraging commodity, multi-threaded servers for storage and cloud security product. As applications reside in different computing. In many ways, this cloud stack has been implemented public and private clouds, it will be critical for the cloud already, albeit in a primitive form, at large-scale internet applications to be able to “talk” to each other. This will drive datacenters. the need for ensuring data authentication and policy control for the volumes of data flowing between cloud applications. The challenge of processing terabytes of data daily at Google, Moreover, given the multi-tenancy paradigm of cloud Facebook, and Amazon drove them to adopt a new data environments, policy granularity will be paramount to ensure architecture, which is essentially Martian to traditional enterprise security and compliance. Data integration across cloud datacenter architects. No longer are ACID and relational platforms will be more of an obstacle than application databases back-ending transactional applications. Internet integration, as applications have become more open/standard. datacenters quickly encountered the scaling limitations of SQL Standard “data” APIs will emerge as part of the new cloud databases as the volume of data exploded. Instead, high- stack to allow disparate environments to talk to each other and performance, scalable/distributed non-SQL data stores are being avoid vendor lock-in. Data migration challenges are perhaps developed internally and implemented at scale. Big Table and the greatest factor today for locking users to a particular cloud Cassandra are among the many variants, and this “non-database platform. database” trend has proliferated to the point of having its own conference: NoSQL. Database caching layers (i.e., Northscale’s Over time, these APIs and layers will harden and will become Memcached) are also being implemented to further drive tailored, depending on use case and workload for particular application performance, and its now accepted as a “standard” applications. The adoption of these new frameworks will tier in datacenters. ultimately make cloud computing “safe” and broaden its penetration into enterprises of all sizes. Managing non-transactional data has become even more daunting. From log files to click stream data to web indexing, internet data centers are collecting massive volumes of data that WHAT’S BREWING IN A CLOUD? need to be processed cheaply in order to drive monetization value. Hadoop is an open source data management framework Despite constant comparisons to grid and utility computing, that has become widely deployed for massive parallel cloud computing has the potential to address a much broader computation and distributed file systems in a cloud environment. set of applications and use cases beyond the limited HPC Hadoop has allowed the largest web properties (Yahoo!, environments served traditionally by grid computing. This LinkedIn, Facebook, etc.) to store and analyze any data in near breadth of cloud computing is engendered in a new set of real-time at a fraction of the cost that traditional data underlying technology forces. Virtualization technologies, management and data warehouse approaches could even high-powered commodity servers, low-cost/high bandwidth contemplate. Although the framework has roots in internet connectivity, concurrent/multi-threaded programming models datacenters, Hadoop is quickly penetrating broader enterprise use and open source software stacks are all technology building cases. The diverse set of participants at Hadoop World NYC blocks that can deliver the high performance and scalability of hosted by Cloudera clearly points to this trend. grid/utility computing, but importantly – and concurrently – with underlying commodity resources. SECURING THE CLOUD These technology drivers enable applications and users to be abstracted cleanly from particular IT infrastructure resources Given this data intensive nature, any widely adopted cloud (computing, storage, networking, etc.) in new and powerful computing platform will inevitably account for richer security ways; i.e., location agnostic and multi-tenancy are two critical requirements. The security challenges will be focused less on Page 2
  • 3. elements among others. Unlike traditional HPC grid a powerful trend in the role of developers in driving cloud environments, which were designed for a specific application in a computing adoptions. Many early users of cloud computing are single company, cloud computing enables disparate applications examples of developers launching applications without and entities to harness a shared pool of resources. In addition, requiring the involvement of IT (in the case of a Web 2.0 start- applications can be “broken up” in the cloud where computing up, they don’t have an IT department). Increasingly, resources may reside on the client while the data is accessed empowering developers and line of business owners to portably from multiple cloud locations (as an example). innovate and deploy new applications without the shackles of IT will be a motivating driver for cloud adoption. No longer do Many different definitions of cloud computing have surfaced. users need to have IT’s blessing and time to get their job done. Rather than posit yet another, several characteristics are resident This developer-centric nature was a primary motivator of in any cloud instance: (i) self-provisioned (either by user, VMware’s strategic acquisition of SpringSource. In addition to developer, or IT); (ii) elasticity (on-demand allocation of any inheriting significant Java technology, VMware now has a computing, storage and networking resources); (iii) multi- distinct opportunity to transition SpringSource’s dominant Java “anything” (multi-user, multi-application, multi-session, etc.); developer mindshare to develop onto VMware’s private cloud and (iv) portability (applications are abstracted from physical platform. Amazon Web Services has experienced tremendous infrastructure and can be migrated easily). These capabilities success from its developer-centric platform APIs. Unlike allow enterprise to shift IT resources from capex to opex – a traditional hosting providers that cater to IT/operations, usage based model that is particularly appealing during recent Amazon went after developers first and has only recently economic constraints. begun to add the functionality that will appeal to broader These cloud prerequisites will yield a powerful a set of use cases enterprise IT. beyond grid computing that are unique to cloud platforms. Cloud computing will reach its full potential in the future when a whole Within enterprises, there are early signs of developers (Q&A new set of applications (never possible before) is created that is environments, batch processing, and developer prototyping) purpose-built for the cloud. For example, one can envision and line of business/departmental leveraging cloud computing. powerful collaboration applications emerging that enable internal It is not uncommon for new platform technologies to start at enterprise and external users to seamlessly cooperate that would the “fringes” of IT before mainstream adoption takes place. have been previously impossible with users and data isolated on Unlike typical three-tier “traditional” enterprise datacenters, the disparate enterprise islands. It’s likely these innovative internet datacenters of Facebook, Google, etc. were not applications will require new programming models and encumbered by legacy enterprise stacks, applications, and IT potentially languages yet to be hardened. rules; which in turn enabled them to be built from the ground up with cloud stacks to handle elastically large-scale consumer transactions for multiple applications. Therefore, and STILL IN THE EARLY DAYS unsurprisingly, Amazon’s internet datacenters was easily Despite the high energy surrounding cloud computing and early adapted to become the first and leading “public computing” cloud offering successes, such as Amazon Web Services, cloud provider. It will certainly take significant time/effort for computing for enterprise services is definitely still in its enterprise IT infrastructure gatekeepers to evolve their current formative stages. In contrast, however, consumers have already architectures to embrace a new cloud platform. Luckily, adopted cloud computing technologies. One could argue that web enterprises can reap the technology innovation from internet companies like Google, Yahoo!, Facebook, and Salesforce are data centers (many which are open source) to accelerate this examples of consumers leveraging cloud computing. These Web transition. 2.0/SaaS offerings clearly exhibit the core cloud characteristics outlined above, and in turn are delivering new, value-added MORE THAN ONE FLAVOR services previously considered unthinkable. Interestingly, this time the consumers, via their use of Web 2.0 services, have been There have been analogies drawn between cloud computing teaching the typically early technology adopter enterprises the and public utilities (electric, gas, etc.) where the value is all effectiveness of cloud computing. about economies of scale. According to this hypothesis, the world will only have a few cloud providers that reach Today, the enterprise use of cloud computing represents opposite maximum efficient scale. It is quite unlikely that this will ends of the spectrum: (i) Web 2.0 start-ups seeking to launch happen. Multiple cloud models will emerge depending on the applications quickly and cheaply, and (ii) compute intensive user, the workload, and the application. For example, certain enterprises that need batch processing for bursty, large-scale developers will prefer to interface with a cloud provider at a applications. Although these users are driving the early adoption higher level of abstraction, such as Google App Engine, as of cloud technology, it’s unlikely these limited use cases will opposed to a more bare metal API, such as Rackspace. establish cloud computing as a pervasive platform. Cloud Alternatively, an application may choose to run on MSFT computing instead will need to penetrate mainstream IT Azure to leverage SQL/MSFT services or Salesforce Force for infrastructure slowly and offer a broader set enterprise CRM integration and distribution advantages. Today, one can applications. break cloud platforms into roughly two camps: developer- It is important to note here that these Web 2.0 start-ups represent centric (Amazon, MSFT) and IT-centric (EMC, VMware). Page 3
  • 4. Cloud platforms will remain distinct and diverse as long as they Fellow, Yahoo! Research: “So a lot of the companies that are continue to deliver unique value-add for their particular use cases out there today – Yahoo!, Facebook, Google – they’re all and users. exposing data APIs. Imagine what’s going to happen once large clouds are routinely available to build they’re own To drive this cloud diversity point further, the concept of a “cloud application and you start aggregating your own data, and you within a cloud” is also emerging where distinct services, have the opportunity to fuse that with all the data that’s out such as data warehousing, can be built atop a more generic cloud there. Someone’s going to figure out the next big thing, by platform to provide a higher layer cloud service. taking 2 + 2 and coming up with 20.” In addition, “private clouds” behind the firewalls present yet Mike Schroepfer, VP Engineering, Facebook: “…one of the another flavor of cloud computing as enterprises leverage the things that is going to happen is that people are going to figure benefits of cloud frameworks while maintaining security/control out that we need a more blended workload between the cloud as well as the compliance of their internal datacenters. Lastly, and the client. We’ve been operating kind of in the cycle of hybrid clouds that bridge private and public clouds on a reincarnation and computer science, moved toward most of the permanent and temporary basis (also known as “cloud bursting”) computing happening in the cloud, and my browser effectively will come to fruition for certain applications or as a migration being it’s own terminal. You know, in the last 2 or 3 years, the path for enterprises. Several start-ups (Cirtas, CloudSwitch and speed and capability of browsers has been outpacing that of Zetta among them) are building products that make the cloud most chips. You’re seeing 2x to 4x improvements in core “safe” for enterprises. Innovation will abound to solve the performance on the engines and VMs in those browsers year on specific issues in all of these various cloud environments. year, which is way outpacing the speed of chip design…So I believe that there will be a couple of people who will figure out LOOKING AHEAD ways to blend computation and storage on the client, more gracefully with that on the server, but still provide you with all To further parse all this, I hosted a cloud computing panel with an of the benefits of basically access to my data anywhere I need, esteemed group of technology thought leaders at Accel’s 15th and the kind of reliability of the cloud.” Stanford Technology Symposium. Needless to say, these panelists had plenty of deep insights, opinions, and predictions Jayshree Ullal, President and CEO, Arista Networks: “Well, about cloud computing. there’s a technology impact but I actually think it’s going to really make CIO’s rethink their jobs. Today, you can have a The panel brought together technologists who view cloud server administrator, an application administrator, a network computing from distinctly different lenses: private cloud administrator, and they’re all silos… but you need your general innovators, public cloud providers, cloud enabling technology practitioner. And that’s really missing right now in the cloud. solutions and cloud infrastructure applications. In wrapping up So if I had to make a prediction, less on the technology, more the panel session, I asked each speaker to conjure up a single on the operational side, I would say for the deployment of this, prediction for cloud computing in the next few years. Here’s what it’s got to be a generalized IT person, whether that’s the CIO or the experts said: somebody he or she appoints…” Jonathan Bryce, CTO/Founder, Mosso (Rackspace): “…I think Rich Wolski, Professor of Computer Science, University of cloud computing is going to be a mindshift; it’s going to take a California, Santa Barbara and CTO/Founder, Eucalyptus while. But I think an economy like this is actually a huge Systems: “…there’s another revolution coming that’s going to opportunity for entrepreneurs…I think this is a time when intersect the cloud revolution and that has to do with data resources are scarce – that’s when great businesses end up getting simulation…pretty much everything you own is going to be built. And I think part of what’s going to enable some of those trying to send you data. And you’re going to need, personally, businesses is cloud computing, and being able to get started with a great deal of storage and compute capacity to be able to deal a lower varied entry, lower price point, all of those kind of with that. I think the cloud is going to make that revolution that things…” much quicker to come to us.” Mike Olson, CEO/Co-founder, Cloudera: “I think that a lot of These predictions depict cloud computing as still being in its what’s been said around here about data is really right on. I formative phases, but that it will emerge as fundamental predict that in the next 10 years, computer science as computer breakthroughs in datacenter and IT infrastructure in the years to science isn’t really going to be the place that smart young guys come. Despite the current macro headwinds, deep innovation, are going to find tremendously rewarding careers. I think that the and market opportunities in cloud computing will persist. Once application of these new compute systems to large data in the this economic storm passes, I’m convinced the sun will shine sciences will advance human kind substantially. I think that through, and cloud computing is sure to have many silver science will be done maybe not even in the lab on the wet bench linings. anymore, but with data, with computer systems looking at vast amounts of data….” Ping Li is a partner at Accel Partners in Palo Alto and focuses primarily on Information Technology Raghu Ramakrishnan, Chief Scientist for Audience and Research infrastructure and digital media platforms. Page 4