• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Amazon Web Services: EMR (Elastic Map Reduce) with ITOC Australia - What Is EMR/Hadoop?
 

Amazon Web Services: EMR (Elastic Map Reduce) with ITOC Australia - What Is EMR/Hadoop?

on

  • 919 views

David Nedved (ITOC Australia) returns to give you the run down on using Amazon's Elastic MapReduce to complete complex queries on large scale data sets.

David Nedved (ITOC Australia) returns to give you the run down on using Amazon's Elastic MapReduce to complete complex queries on large scale data sets.

Statistics

Views

Total Views
919
Views on SlideShare
884
Embed Views
35

Actions

Likes
1
Downloads
10
Comments
0

2 Embeds 35

http://www.linkedin.com 34
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Amazon Web Services: EMR (Elastic Map Reduce) with ITOC Australia - What Is EMR/Hadoop? Amazon Web Services: EMR (Elastic Map Reduce) with ITOC Australia - What Is EMR/Hadoop? Presentation Transcript

    • Elastic MapReduce On AWS Cloudhttp://linkedin.com/in/davidnedved
    • What is MapReduce?MapReduce is a programming model coming from functional programming(like LISP)MapReduce is "framework" for: ● Processing parallel problems ● Across HUGE datasets ● Using a LARGE number of computers!Made "Popular" by Google● Used to compute the index that maps "terms" to "pages". (AKA Googles Pagerank Algorithm)
    • Ok - so...???● Primary data has grown exponentially in the last 10 years on the internet...● Secondary data has gone "off the scale" ...● UH? ○ We seem to log everything and "ask questions later" For Instance: ● Recommendations (books, restaurants, etc..) ● Predict Trends (job skills in demand, amazons recent ). ● Show customised Ads on my site etc. ● Record every query a user makes on my site http://w3dt.net/● Big Data is no longer a problem for the big boys (Google, Microsoft etc)● Startups are "epically failing" to get on top of their big data....
    • The "Big Boys" You: SME/Startup
    • Hadoop ● Hadoop can help with BigData ○ Its proven in the field ○ Under active development ○ Will only get cheaper as hardware/AWS prices drop! ● Cheaper storage and retrieval (through a limited SQL interface) ● Easier to use with parallel programming. ● Scalability for storage/retrieval"Ok, so is hadoop a database?" NO, NO, NO! Hadoop is a processing platform. It combines data storage, retrieval and programming into a single highly scalable package.
    • Hadoop on AWS = EMR IT IS API DRIVEN :)
    • EMR simply kicks ass● Import/Export your BigData to AWS Platform quickly● Multipart Upload (s3)● Resize running job flows● Balance cost and Performance● Resize based on usage patterns● Access Control --> IAM, VPC, Everything else in standard EC2..
    • EMR in AWS console
    • For example...EMR can be used to efficiently export DynamoDB tables to S3, import S3data into DynamoDB, and perform sophisticated queries across tables storedin both DynamoDB and other storage services such as S3. ● By exporting rarely used data to S3 you will save $$$. ● Exported data in S3 is directly queryable (via EMR) ● Join exported tables with current DynamoDB Tables!Create hive table (notice the S3 endpoint)CREATE EXTERNAL TABLE sms_prices_s3 ( code string,country string, network int, networkname string, pricestring )PARTITIONED BY (code string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ,LOCATION s3://itoc-usergroup/sample ;
    • For example...Querying the external table (data in S3)SELECT code, country, networkname, priceFROM sms_prices_s3WHERE code = AU; ● Remember; you can run EMR (Hadoop) on just about ANY form of data! ● Use EMR to query your NoSQL DB with SQL like queries (: ● Store your BigData in S3, Dynamo, etc you get the 99.999999999% DurabilityDynamoDB catch-out...If you want to query DynamoDB using Hadoop you MUST use EMR...The library for hive isnt available for your own ec2 instances.
    • A few real life examples● Data Analytics Google Analytics/Quantcast● Crawling Google Search● Full-text Indexing Just about every HUGE system● Data Mining LinkedIn Maps (:
    • Thank You!http://aws.amazon.com/elasticache Amazon Elastic MapReduceEmail: david.nedved@itoc.com.au http://linkedin.com/in/davidnedved