®CLOUD COMPUTING: BIG DATA IS THE FUTURE OF ITWinter 2009 | Ping Li | ping@accel.comCloud computing has been generating co...
A NEW CLOUD STACK                                                       point network and data level security, although hi...
elements among others. Unlike traditional HPC grid                    a powerful trend in the role of developers in drivin...
Cloud platforms will remain distinct and diverse as long as they       Fellow, Yahoo! Research: “So a lot of the companies...
Upcoming SlideShare
Loading in …5

Cloud Computing Big Data Is Future Of It


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cloud Computing Big Data Is Future Of It

  1. 1. ®CLOUD COMPUTING: BIG DATA IS THE FUTURE OF ITWinter 2009 | Ping Li | ping@accel.comCloud computing has been generating considerable hype these from this exponential data growth – as inexpensively asdays. Every participant in the datacenter and IT ecosystem has possible.been rolling out “cloud” initiatives and strategies from hardwarevendors, ISVs, SaaS providers, and Web 2.0 companies - start- Previous computing platform transitions had technologyups and incumbents are equally active. dislocations similar to cloud computing but along different dimensions. The shift from mainframe to client-server wasCloud computing promises to transform IT infrastructure and fueled by disruptive innovation in computing horsepower thatdeliver scalability, flexibility, and efficiency, as well as new enabled distributed microprocessing environments. Theservices and applications that were previously unthinkable. following shift to web applications/web services during the lastDespite all of this activity, cloud computing remains as decade was enabled by the open networking of applications andamorphous today as its name suggests. However, one critical services through the internet buildout. While cloud computingtrend shines through the cloud – Big Data. Indeed, it’s the core will leverage these prior waves of technology – computing anddriver in cloud computing and will define the future of IT. networking – it will also embrace deep innovations in storage/ data management to tackle big data.BIG DATA – THE PERFECT STORM Along these lines, many of the early uses of cloud computingCloud computing has been driven fundamentally by the need to have been focused less on “computing” and more on “storage.”process an exploding quantity of data. Data is no longer measured For example, a significant portion of the initial applications onin gigabytes but in exabytes as we are “Approaching the AWS were primarily leveraging just S3 with applicationsZettaByte Era.”1 Moreover, data types – structured, semi- executing behind the firewall. Popular storage applications, likestructured, or unstructured – continue to proliferate at an Jungle Disk and Smug Mug, were early AWS customers. Thisalarming rate as more information is digitized, from family explosion of data has driven enterprises (and consumers forpictures to historical documents to genome mapping to financial that matter) to find cheap, on-demand storage in unlimitedtransactions to utility metering. The list is truly unbounded. But quantities – which cloud storage promises to deliver. Untiltoday, data is not only being generated by users and applications. now, massive tape archives in the middle of nowhere (like IronIt is increasingly being “machine-generated,” and such data is Mountain) have been the only means to achieve that cheapexponentially leading the charge in the Big Data world. In a storage. However, enterprises today need more; they needrecent article, The Economist called this phenomenon the “Data quick access data retrieval for multiple reasons, fromDeluge” (http://www.economist.com/opinion/displaystory.cfm? compliance to business analytics. It is simply no longerstory_id=15579717). sufficient to have “cold” data; rather, it needs to be online and resilient (and cheap, of course); hence, the accelerating shiftOne can argue that Web 2.0 companies have been pushing the towards storing every piece of data in memory or on disksupper bounds of large-scale data processing more than anyone. (Data Domain smartly rode this trend).That being said, this data explosion is not sparing any verticalindustries – financial, health care, biotech, advertising, energy, The need to balance data availability/usability and costtelecom, etc. All are grappling with this perfect storm. Below are effectiveness has prompted significant innovation in both on-just a few stats: premise and hosted cloud storage – cloud storage systems (Caringo, EMC Atmos, and ParaScale, to name just a few), • Google was processing two years ago more than 400PB of flash-based storage systems (Fusion IO, Nimble Storage, data/month in just one application Pliant, etc.) – are just some current examples. Furthermore, • The New York Times is processing an 11-million-story hierarchical storage management (HSM, which has always archive dating back to 1851 sounded great but has been implemented only rarely) will • eBay processes more than 50TB/day in its data warehouse become an important element in storage workflows. • CERN is processing 2GB/second for their most recent Enterprises will require seamless capability to move data particle accelerator across different tiers of storage (both on-premise and into the • Facebook crunches 15TB/day into a 2.5PB data warehouse cloud) based on policy and data type to maximize retrieval costs. As cloud computing matures, true cloud applications willWithout question, data represents the competitive advantage of be (re)written to leverage hierarchical and cloud-like storageany enterprise, and every organization is now encumbered with tiers to retrieve data dynamically from different storage layers.the task of storing, managing, analyzing, and extracting value Page 11 Source: “Approaching the Zettabyte Era.” Cisco, 16 June 2008. <http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.html>
  2. 2. A NEW CLOUD STACK point network and data level security, although high bandwidth encryption solutions and sophisticated key management will beIn order for cloud computing to become a mainstream approach, needed to match the massively parallel computational clouda new “cloud” stack (like mainframe and OSI) will likely emerge. environments. In this case, the primary security challenges willJust like prior computing platform transitions (client/server, web stem from “control.” User authentication will becomeservices, etc.), core platform capabilities, such as security, access increasingly challenging as applications are federated outsidecontrol, application management, virtualization, systems the firewall because of SaaS adoption. In addition, managingmanagement, provisioning, availability, etc. will be a prerequisite and reconciling user identities across individual user directoriesbefore IT organizations are able to adopt the cloud completely. for each SaaS/Cloud application will present further security issues. Much like web applications in the 90s created an SSOClearly, this stack will exist in a different representation than layer, cloud computing is essentially abstracting a web servicesprior platform layers to embrace a cloud environment. Simply interface for infrastructure IT, and it will demand a similarreplicating the current computing stack but allowing it to reside unified “authentication/entitlement layer.”off-premise will not achieve the scale, capabilities, andeconomies of cloud computing. In particular, this new cloud In addition to federated user authentication, cloud computingframework needs the ability to process data in increasingly will also require “data” authentication and security. Imperva’sgreater orders of magnitude – and do it at a fraction of the cost – database firewall is an example of an increasingly importantby leveraging commodity, multi-threaded servers for storage and cloud security product. As applications reside in differentcomputing. In many ways, this cloud stack has been implemented public and private clouds, it will be critical for the cloudalready, albeit in a primitive form, at large-scale internet applications to be able to “talk” to each other. This will drivedatacenters. the need for ensuring data authentication and policy control for the volumes of data flowing between cloud applications.The challenge of processing terabytes of data daily at Google, Moreover, given the multi-tenancy paradigm of cloudFacebook, and Amazon drove them to adopt a new data environments, policy granularity will be paramount to ensurearchitecture, which is essentially Martian to traditional enterprise security and compliance. Data integration across clouddatacenter architects. No longer are ACID and relational platforms will be more of an obstacle than applicationdatabases back-ending transactional applications. Internet integration, as applications have become more open/standard.datacenters quickly encountered the scaling limitations of SQL Standard “data” APIs will emerge as part of the new clouddatabases as the volume of data exploded. Instead, high- stack to allow disparate environments to talk to each other andperformance, scalable/distributed non-SQL data stores are being avoid vendor lock-in. Data migration challenges are perhapsdeveloped internally and implemented at scale. Big Table and the greatest factor today for locking users to a particular cloudCassandra are among the many variants, and this “non-database platform.database” trend has proliferated to the point of having its ownconference: NoSQL. Database caching layers (i.e., Northscale’s Over time, these APIs and layers will harden and will becomeMemcached) are also being implemented to further drive tailored, depending on use case and workload for particularapplication performance, and its now accepted as a “standard” applications. The adoption of these new frameworks willtier in datacenters. ultimately make cloud computing “safe” and broaden its penetration into enterprises of all sizes.Managing non-transactional data has become even moredaunting. From log files to click stream data to web indexing,internet data centers are collecting massive volumes of data that WHAT’S BREWING IN A CLOUD?need to be processed cheaply in order to drive monetizationvalue. Hadoop is an open source data management framework Despite constant comparisons to grid and utility computing,that has become widely deployed for massive parallel cloud computing has the potential to address a much broadercomputation and distributed file systems in a cloud environment. set of applications and use cases beyond the limited HPCHadoop has allowed the largest web properties (Yahoo!, environments served traditionally by grid computing. ThisLinkedIn, Facebook, etc.) to store and analyze any data in near breadth of cloud computing is engendered in a new set ofreal-time at a fraction of the cost that traditional data underlying technology forces. Virtualization technologies,management and data warehouse approaches could even high-powered commodity servers, low-cost/high bandwidthcontemplate. Although the framework has roots in internet connectivity, concurrent/multi-threaded programming modelsdatacenters, Hadoop is quickly penetrating broader enterprise use and open source software stacks are all technology buildingcases. The diverse set of participants at Hadoop World NYC blocks that can deliver the high performance and scalability ofhosted by Cloudera clearly points to this trend. grid/utility computing, but importantly – and concurrently – with underlying commodity resources.SECURING THE CLOUD These technology drivers enable applications and users to be abstracted cleanly from particular IT infrastructure resourcesGiven this data intensive nature, any widely adopted cloud (computing, storage, networking, etc.) in new and powerfulcomputing platform will inevitably account for richer security ways; i.e., location agnostic and multi-tenancy are two criticalrequirements. The security challenges will be focused less on Page 2
  3. 3. elements among others. Unlike traditional HPC grid a powerful trend in the role of developers in driving cloudenvironments, which were designed for a specific application in a computing adoptions. Many early users of cloud computing aresingle company, cloud computing enables disparate applications examples of developers launching applications withoutand entities to harness a shared pool of resources. In addition, requiring the involvement of IT (in the case of a Web 2.0 start-applications can be “broken up” in the cloud where computing up, they don’t have an IT department). Increasingly,resources may reside on the client while the data is accessed empowering developers and line of business owners toportably from multiple cloud locations (as an example). innovate and deploy new applications without the shackles of IT will be a motivating driver for cloud adoption. No longer doMany different definitions of cloud computing have surfaced. users need to have IT’s blessing and time to get their job done.Rather than posit yet another, several characteristics are resident This developer-centric nature was a primary motivator ofin any cloud instance: (i) self-provisioned (either by user, VMware’s strategic acquisition of SpringSource. In addition todeveloper, or IT); (ii) elasticity (on-demand allocation of any inheriting significant Java technology, VMware now has acomputing, storage and networking resources); (iii) multi- distinct opportunity to transition SpringSource’s dominant Java“anything” (multi-user, multi-application, multi-session, etc.); developer mindshare to develop onto VMware’s private cloudand (iv) portability (applications are abstracted from physical platform. Amazon Web Services has experienced tremendousinfrastructure and can be migrated easily). These capabilities success from its developer-centric platform APIs. Unlikeallow enterprise to shift IT resources from capex to opex – a traditional hosting providers that cater to IT/operations,usage based model that is particularly appealing during recent Amazon went after developers first and has only recentlyeconomic constraints. begun to add the functionality that will appeal to broaderThese cloud prerequisites will yield a powerful a set of use cases enterprise IT.beyond grid computing that are unique to cloud platforms. Cloudcomputing will reach its full potential in the future when a whole Within enterprises, there are early signs of developers (Q&Anew set of applications (never possible before) is created that is environments, batch processing, and developer prototyping)purpose-built for the cloud. For example, one can envision and line of business/departmental leveraging cloud computing.powerful collaboration applications emerging that enable internal It is not uncommon for new platform technologies to start atenterprise and external users to seamlessly cooperate that would the “fringes” of IT before mainstream adoption takes place.have been previously impossible with users and data isolated on Unlike typical three-tier “traditional” enterprise datacenters, thedisparate enterprise islands. It’s likely these innovative internet datacenters of Facebook, Google, etc. were notapplications will require new programming models and encumbered by legacy enterprise stacks, applications, and ITpotentially languages yet to be hardened. rules; which in turn enabled them to be built from the ground up with cloud stacks to handle elastically large-scale consumer transactions for multiple applications. Therefore, andSTILL IN THE EARLY DAYS unsurprisingly, Amazon’s internet datacenters was easilyDespite the high energy surrounding cloud computing and early adapted to become the first and leading “public computing”cloud offering successes, such as Amazon Web Services, cloud provider. It will certainly take significant time/effort forcomputing for enterprise services is definitely still in its enterprise IT infrastructure gatekeepers to evolve their currentformative stages. In contrast, however, consumers have already architectures to embrace a new cloud platform. Luckily,adopted cloud computing technologies. One could argue that web enterprises can reap the technology innovation from internetcompanies like Google, Yahoo!, Facebook, and Salesforce are data centers (many which are open source) to accelerate thisexamples of consumers leveraging cloud computing. These Web transition.2.0/SaaS offerings clearly exhibit the core cloud characteristicsoutlined above, and in turn are delivering new, value-added MORE THAN ONE FLAVORservices previously considered unthinkable. Interestingly, thistime the consumers, via their use of Web 2.0 services, have been There have been analogies drawn between cloud computingteaching the typically early technology adopter enterprises the and public utilities (electric, gas, etc.) where the value is alleffectiveness of cloud computing. about economies of scale. According to this hypothesis, the world will only have a few cloud providers that reachToday, the enterprise use of cloud computing represents opposite maximum efficient scale. It is quite unlikely that this willends of the spectrum: (i) Web 2.0 start-ups seeking to launch happen. Multiple cloud models will emerge depending on theapplications quickly and cheaply, and (ii) compute intensive user, the workload, and the application. For example, certainenterprises that need batch processing for bursty, large-scale developers will prefer to interface with a cloud provider at aapplications. Although these users are driving the early adoption higher level of abstraction, such as Google App Engine, asof cloud technology, it’s unlikely these limited use cases will opposed to a more bare metal API, such as Rackspace.establish cloud computing as a pervasive platform. Cloud Alternatively, an application may choose to run on MSFTcomputing instead will need to penetrate mainstream IT Azure to leverage SQL/MSFT services or Salesforce Force forinfrastructure slowly and offer a broader set enterprise CRM integration and distribution advantages. Today, one canapplications. break cloud platforms into roughly two camps: developer-It is important to note here that these Web 2.0 start-ups represent centric (Amazon, MSFT) and IT-centric (EMC, VMware). Page 3
  4. 4. Cloud platforms will remain distinct and diverse as long as they Fellow, Yahoo! Research: “So a lot of the companies that arecontinue to deliver unique value-add for their particular use cases out there today – Yahoo!, Facebook, Google – they’re alland users. exposing data APIs. Imagine what’s going to happen once large clouds are routinely available to build they’re ownTo drive this cloud diversity point further, the concept of a “cloud application and you start aggregating your own data, and youwithin a cloud” is also emerging where distinct services, have the opportunity to fuse that with all the data that’s outsuch as data warehousing, can be built atop a more generic cloud there. Someone’s going to figure out the next big thing, byplatform to provide a higher layer cloud service. taking 2 + 2 and coming up with 20.”In addition, “private clouds” behind the firewalls present yet Mike Schroepfer, VP Engineering, Facebook: “…one of theanother flavor of cloud computing as enterprises leverage the things that is going to happen is that people are going to figurebenefits of cloud frameworks while maintaining security/control out that we need a more blended workload between the cloudas well as the compliance of their internal datacenters. Lastly, and the client. We’ve been operating kind of in the cycle ofhybrid clouds that bridge private and public clouds on a reincarnation and computer science, moved toward most of thepermanent and temporary basis (also known as “cloud bursting”) computing happening in the cloud, and my browser effectivelywill come to fruition for certain applications or as a migration being it’s own terminal. You know, in the last 2 or 3 years, thepath for enterprises. Several start-ups (Cirtas, CloudSwitch and speed and capability of browsers has been outpacing that ofZetta among them) are building products that make the cloud most chips. You’re seeing 2x to 4x improvements in core“safe” for enterprises. Innovation will abound to solve the performance on the engines and VMs in those browsers year onspecific issues in all of these various cloud environments. year, which is way outpacing the speed of chip design…So I believe that there will be a couple of people who will figure outLOOKING AHEAD ways to blend computation and storage on the client, more gracefully with that on the server, but still provide you with allTo further parse all this, I hosted a cloud computing panel with an of the benefits of basically access to my data anywhere I need,esteemed group of technology thought leaders at Accel’s 15th and the kind of reliability of the cloud.”Stanford Technology Symposium. Needless to say, thesepanelists had plenty of deep insights, opinions, and predictions Jayshree Ullal, President and CEO, Arista Networks: “Well,about cloud computing. there’s a technology impact but I actually think it’s going to really make CIO’s rethink their jobs. Today, you can have aThe panel brought together technologists who view cloud server administrator, an application administrator, a networkcomputing from distinctly different lenses: private cloud administrator, and they’re all silos… but you need your generalinnovators, public cloud providers, cloud enabling technology practitioner. And that’s really missing right now in the cloud.solutions and cloud infrastructure applications. In wrapping up So if I had to make a prediction, less on the technology, morethe panel session, I asked each speaker to conjure up a single on the operational side, I would say for the deployment of this,prediction for cloud computing in the next few years. Here’s what it’s got to be a generalized IT person, whether that’s the CIO orthe experts said: somebody he or she appoints…”Jonathan Bryce, CTO/Founder, Mosso (Rackspace): “…I think Rich Wolski, Professor of Computer Science, University ofcloud computing is going to be a mindshift; it’s going to take a California, Santa Barbara and CTO/Founder, Eucalyptuswhile. But I think an economy like this is actually a huge Systems: “…there’s another revolution coming that’s going toopportunity for entrepreneurs…I think this is a time when intersect the cloud revolution and that has to do with dataresources are scarce – that’s when great businesses end up getting simulation…pretty much everything you own is going to bebuilt. And I think part of what’s going to enable some of those trying to send you data. And you’re going to need, personally,businesses is cloud computing, and being able to get started with a great deal of storage and compute capacity to be able to deala lower varied entry, lower price point, all of those kind of with that. I think the cloud is going to make that revolution thatthings…” much quicker to come to us.”Mike Olson, CEO/Co-founder, Cloudera: “I think that a lot of These predictions depict cloud computing as still being in itswhat’s been said around here about data is really right on. I formative phases, but that it will emerge as fundamentalpredict that in the next 10 years, computer science as computer breakthroughs in datacenter and IT infrastructure in the years toscience isn’t really going to be the place that smart young guys come. Despite the current macro headwinds, deep innovation,are going to find tremendously rewarding careers. I think that the and market opportunities in cloud computing will persist. Onceapplication of these new compute systems to large data in the this economic storm passes, I’m convinced the sun will shinesciences will advance human kind substantially. I think that through, and cloud computing is sure to have many silverscience will be done maybe not even in the lab on the wet bench linings.anymore, but with data, with computer systems looking at vastamounts of data….” Ping Li is a partner at Accel Partners in Palo Alto and focuses primarily on Information TechnologyRaghu Ramakrishnan, Chief Scientist for Audience and Research infrastructure and digital media platforms. Page 4