• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data on AWS
 

Big Data on AWS

on

  • 679 views

 

Statistics

Views

Total Views
679
Views on SlideShare
679
Embed Views
0

Actions

Likes
2
Downloads
30
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data on AWS Big Data on AWS Presentation Transcript

    • BIG Data on AWSPaul Duffy
    • Characteristics ofBig Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
    • Characteristics of Big Data
    • The cost of data generation is falling rapidly Dramatic increase in volume, velocity and variety of data
    • BIG DATAA collection of tools, techniques and technologies thatallow you to work productively with data at any scale.
    • Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
    • Features driven by MapReduce
    • Variable data structures and sourcesComputer Generated Human Generated• Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400%• Sensor data (weather, growth per year water, smart grids) • Blogs/Reviews/Emails/P• Images/videos (traffic, ictures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
    • The Role of Data is Changing
    • Traditional analytics required a fixed data model,based on pre-known questions Big Data promotes data exploration and experimentation which leads to innovation
    • Collection & Computation CollaborationGeneration storage & analytics & sharing
    • Lower costs,faster throughput Collection & Computation Collaboration Generation storage & analytics & sharing Increased pressure on traditional IT and tools
    • Require tools designed for data collection and computation atany volume, velocity or format.
    • Software • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Hadoop
    • Infrastructure • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Cloud computing
    • Software Infrastructure
    • How the Cloud IsBig Data’s Best Friend
    • How do we define the cloud? By Benefits!
    • No Cap Ex Pay Per Elasticity Use CloudFast Time to Market Focus on core competency
    • Why is the CloudBig Data’s Best Friend?
    • We know we want collect, store, organize, analyze andshare it.But we have limited resources.
    • The Cloud OptimizesPrecious IT Resourcesi.e. Skilled People
    • “Over the next decade, the number of files or containers thatencapsulate the information in the digital universe will grow by75x.While the pool of IT staff available to manage them will growonly slightly. At 1.5x” - 2011 IDC Digital Universe Study
    • Deploying a Hadoop cluster is hard
    • Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting”
    • Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting” Cloud-Based Configuring Infrastructure Analyzing and Using Big Data Cloud Assets 70% 30%
    • ManagedReusability ServicesScale Innovation
    • ManagedReusability ServicesScale Innovation
    • ManagedReusability ServicesScale Innovation
    • ManagedReusability ServicesScale Innovation
    • ManagedReusability ServicesScale Innovation
    • The Cloud OptimizesCapacity Resources
    • Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
    • Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
    • Elastic Compute CapacityCapacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
    • Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
    • The Cloud Empowers Usersto Balance Cost and Time
    • 1 instance for 500 hours=500 instances for 1 hour I like this! I scale
    • The CloudReduces CostFor Experimentation
    • The CloudEnables Collection and Storageof Big Data
    • Storage Costs are Declining
    • Simple Storage Service 1 Trillion1000,000 750,000 500,000 250,000 0,000 750k+ peak transactions per second
    • Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia)US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
    • Amazon DynamoDBManaged NoSQL database serviceUnlimited sizeUnlimited scaleFlexible key/value storeConsistent, low latencies (single digit milliseconds, SSD)Robust, durable data storageIntegrated analytics with Elastic MapReduce
    • Amazon Elastic MapReduceOn-demand, managed analytics platformPowered by HadoopIntegrated with Spot instances to lower costsVibrant ecosystem of toolsElastic clustersFlexible programming model (Java, Python, Ruby etc)
    • Big Data on the CloudIn the Real World
    • Big Data Verticals SocialMedia/Advertisi Financial Oil & Gas Retail Life Sciences Security Network/Gamin ng Services g User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Processing Analysis Image In-game Recognition metrics
    • Visualizations
    • Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
    • RecommendationsThe Taste Test http://www.etsy.com/tastetest
    • RecommendationsGift Ideas for Facebook Friendsetsy.com/gifts
    • Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for (1.7 Million per day) video games
    • Characteristics ofBig Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
    • Thank you…