Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Analysis - Journey Through the Cloud

447 views

Published on

Slides from the Data Analysis webinar in the 2015 Journey Through the Cloud webinar series.

Published in: Technology
  • Be the first to comment

Data Analysis - Journey Through the Cloud

  1. 1. 33 u. II5Ti1azon Journey Through the Cloud II web services Data Analysis 0:? Ian Massingham — Technical Evangelist .3 8 ianmas@amazon. com 9' @| anMmmm
  2. 2. Journey Through the Cloud Common use cases and adoption models for the AWS Cloud ; _ Learn from the journeys taken by other AWS customers 3 Discover best practices that you can use to bootstrap your projects
  3. 3. Data Analysis Collect and store Big Data in the AWS Cloud Meet the challenge of the increasing volume, variety, and velocity of data Reduce costs, scale to meet demand & increase the speed of innovation Make use of solutions for every stage of the big data lifecycle
  4. 4. Agenda Why Build Big Data Applications on AWS? Collecting Big Data in the AWS Cloud Real—time Streaming and Analysis Big Data Cloud Storage Solutions AWS Database Services Analytics with Hadoop with Amazon EMR Case Studies & Useful Resources
  5. 5. WHY BUILD BIG DATA APPLICATIONS ON AWS?
  6. 6. lt’s Never Been Easier And Less Expensive To Collect, Store, Analyze & Share Data
  7. 7. av’. ~_i ‘I U C
  8. 8. From all types of industries
  9. 9. ,. ‘r‘ _. .x K , , From a diverse range of sources u I ‘ f' 4 _ -in I . L; .. T -r’'. ‘«, ..—_. :. I. . ,.__ J A. I - —-9 I. J §_/ I I 0‘ I 7 J . . / f
  10. 10. AWS Services For Big Data Workloads Sources of Truth @ Amazon 83 Amazon EFS Amazon Redshift Real time Amazon Kinesis High Performance Databases E3 Amazon DynamoDB Amazon Aurora Analysis Platforms Amazon EMR
  11. 11. Broad Analytics Usage In The AWS Cloud (5 NOVA R i i 5 Q MERCK & airbnb Btlshol-Myvrs Squibb Discovery Development Delivery Sales A011: /4 W 7‘ | ll llll Risk Marketing Reporting Trade
  12. 12. Financial Times Uses AWS to Reduce Infrastructure Costs by 80% I-'lN»NCI»l. 'I‘I . IlxS - Needed a way to increase speed, performance and flexibility WHEN OUR of data analysis at a low cost Fl To DO - Using AWS enabled l-‘I’ to run queries 98% faster than QUERIES ON previously—helping FT make business decisions quickly REDSHIFT, THEY THOUGHT [T WAS BROKEN BECAUSE | T - Easier to track and analyze trends WAS WORKING so FAST" - Reduced infrastructure costs by 80% over traditional data John 0'Donovan center model CTO Financial Times Find out more here: aws. amazon. com/ solutions/ case—studies/ financial—times/
  13. 13. GENERATE }4 v RDBMS COLLECT I * STREAM I * STORE I > DATA WAREHOUSE I > ANALYTICS NOSQL *‘ Y ARCHIVE
  14. 14. COLLECTING BIG DATA IN THE AWS CLOUD
  15. 15. Amazon S3 Multipart upload AWS Import/ Export AWS Direct Connect AWS Storage Gateway ~. ‘»“. f‘. :.l5*. :iiYil ~. ‘i“Fij 1:1; Fit I/ =liilIs‘ I b. :‘C. f1I IIII£y. F.IE. H{ ll. 1351 III‘: I;1‘~: iII_. .31 t‘I£1L.7i 'TI‘: 3'3 AFI‘: i1F. I.I‘Y. ‘E'.
  16. 16. Amazon S3 Secure, durable, highly—scalable object storage Accessible via a simple web services interface Store & retrieve any amount of data Use alone or together with other AWS services Amazon S3 Masterclass webinar: https: //youtu. beNCOk—noNwOU
  17. 17. Amazon S3 Multipart Upload Large file Large object (Size < 5TB) j I I I I j j (Size < 5TB) Split file into parts Send parts to S3 S3 rejoins the parts
  18. 18. AWS Import/ Export Move large amounts of data into and out of the AWS cloud using portable storage devices Transfer your data directly onto and off of storage devices using Amazon’s high—speed internal network M For significant data sets, AWS Import/ Export is often ’ faster than Internet transfer and more cost effective than upgrading your connectivity ‘‘ Supports upload & download from S3 & upload to Amazon EBS snapshots & Amazon Glacier Vaults aws. amazon. com/ importexport/
  19. 19. Available Internet connection 11 ll 54-Wlbos: 1 D M has T3 I441 736NL1'. ’7S| 1DI)V nos 1 000 M cce. When to Use AWS Import/ Export Theoretical Min. Number ot Day: Io Trends: 111! at 80% Network Utilization 32 days 13 r1.1-,7. 3 O-{V3 ‘ io 2 cars Ln. '.'. k'*.1n ‘ day Wv-on to Consider AWS Impod/ Export’! i0O(3B or more SCKXEB nv mom 2T8 or move (:18 or rnc-'9 SOYB or more aws. amazon. com/ importexport/
  20. 20. AWS Direct Connect Makes it easy to establish a dedicated network connection from your premises to AWS Establish private connectivity between AWS & your A datacenter, office, or colocation environment Reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience The dedicated connection can be partitioned into multiple virtual interfaces using 802.1 q VLANs aws. amazon. com/ directconnect
  21. 21. AWS Direct Connect Locations 81 Partners 1GB and 10GB ports are available from AWS 50Mbps, 100Mbps, 200Mbps, 300Mbps, 400Mbps, and 500Mbps can be ordered from any APN partners supporting AWS Direct Connect aws. amazon. com/ directconnect/ partners/
  22. 22. AWS Storage Gateway An on—premises software appliance connecting with cloud—based storage Supports industry—standard storage protocols that work with your existing applications and workflows Provides low—latency performance by maintaining frequently accessed data on—premises while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier aws. amazon. com/ storagegateway/
  23. 23. AWS Storage Gateway Designed for user with other AWS Services Enables you to easily mirror data from your on premises environment for access within the AWS Cloud Easy to integrate into existing ETL workflows aws. amazon. com/ storagegateway/
  24. 24. REAL-TIME STREAMING AND ANALYSIS
  25. 25. ~? ~1“F. :lE'. .:. lli. l Amazon Kinesis ~T'7.FC IFE FII 1;! ill I: I ». It. I11 iiYII11:i5l. F.ii: ll. 1351 III‘: I;1‘~: :II_. .51 r‘. l.: 1l_.7i 'T. l‘C1~. "» . A1Fi‘: i2.F, .l'Y/ E‘:
  26. 26. Amazon Kinesis A fully managed, cloud—based service for real—time data processing over large, distributed data streams Continuously capture and store terabytes of data per hour from hundreds of thousands of sources Emit data to other AWS services such as Amazon 83, Amazon Redshift, Amazon Elastic Map Reduce (Amazon EMR) aws. amazon. com/ kinesis
  27. 27. :-rmv_a-nt--~. n». ~.n: wn~m_ vxnmu-iuo«. -x-m¢u’n. o-on. arm: --no r—«. «-an-mun-3 -I---v--an '- -— -. -. .--. m 1-. .« r. .- '-—: r1rn*v. lDIiID can ivwvaua-mo.
  28. 28. Using AWS, Dash Streams More Than 1 TB of Real- Time Data Per Day As a startup, using AWS - has allowed us to scale nicely and use resources without spending a lot of capital. Brian Langel CTO - Dash Needed scale IT resources to create an app that would offer real—time information to drivers Developed and deployed the Dash application on the AWS Cloud - Streams more than 1 TB of real—time data per day using Amazon Kinesis and processes billions of entries using Amazon DynamoDB Scaled up to support large traffic spikes—several thousand updates per second—in app usage Reduced operating costs by $200,000 per year Find out more here: aws. amazon. com/ solutions/ case—studies/ dash/
  29. 29. Amazon Kinesis Architecture Aggregate and archive to S3 Real—time dashboards and alarms lnllli : I[o'I Ordered stream of events supporting multiple readers Authentication I --6 ‘ I --~ z-e MI"I°”5 °I5°”ICe5 « Authorization Durable, consistent replicas across producing 100s of three Aws Availability Zones TB per hour Machine learning algorithms , .., , Amazon Web Services Region Inexpensive: $0.0165 per million PUT Payload Unitsvc‘ (in EU Ireland) Aggregate analysis in Hadoop or a data warehouse
  30. 30. Arr‘a1/on Kim 3518 Etnriourtc )5 PUT pr cirtg change. 1MB rccord stiptport. and the I’<'lICSlS Produce" Library Pot ‘rd On: Jun 2. 2015 Nrmzovl Kl‘-3‘. -1 1 A 9.; ,t mung->0 1ul'VClr la m. i‘»r mu pretty-. wu9 of i"u. "7‘ c. i'. t at I mAI'IFI" scu- Anmmu Kr. --. u cm cor-t nuou-. y c. |(; '-. rn. -31} ! orv- ea. -mam or fl. II. | p-r ha: hour una-cu cl 'Y‘o; .-nan 04 noun; -v-. Vt-e Anuzon K(l". ".-5 ‘. (‘y‘| hr a-nmrtco. -d PU! ovvcng (hinge and two new (amt. t I-5 £“r. ~:' vo Jun: 1.2015. Nnnznn Kn: -F. 1 mpJa: c'. the ‘PU! ’ Pecan! ‘ pvt: rug c-mm: -m mm ‘PUT Payoau Ur 1'. Ann: ":0 1:I. m9vr, rt-oa<! '.1rr. r -»r rrttn 25KB l(ICvbvv¢ .1 50% motion (II 1MtPlH or-cc For man r‘orm. tt-on. pH. llI -~. « An 1;. . x‘ .7 . l-- 1.9 Nun. -on KIT-X1 mrvvx ! 'v- rvcnru u. -u imi rror-i SCKR Io W‘! For mom rt’orm. i:-: n. p-um van. »t-- r --= K ~ L ~ tv. Nuaann Kr-: -. -. .1 -. :r "‘I. ‘.1'l. “L rm Km: --. -. Producer L: brar'y(KPu, an n.1sy~‘. :ru-.42 . -nd n-yty mr‘! uur.1bIu i army that hncvs and data '.3N'. uui Kmru. s stmmrr-a Fa mom nntovrr won, pk‘: -.1» mil pt: -tan mu )1: *1 1‘-x)t: n~' Lit: i—, For mom rt‘v. ‘rm.1!IUI -bout Am-. ron I00-In. .pn. nu ya’ on V. » : . P _IV and C» . i- 4.. v ’. '.. .-> Lm trims iuinunsrrinii snnnr -misnnu I/ iiis2ia' hissizs Aiair nm ririsn L981: min l‘iS/ SlnI’)SI !3l'1l(}S'
  31. 31. BIG DATA CLOUD STORAGE SOLUTIONS
  32. 32. Amazon S3 Amazon Glacier Amazon EBS :11 131 ill I? ‘ ~T'5f”. F.IE: ”.. £.1lYil 2‘? <filF. IE'. It: ‘C. f7A[ lilI£xFiE—IHij1l. lfiE'. .£.1r‘. l.: Irl_.7i'". F.lif1~_‘% N‘: I;1‘~: iII_. AXFI‘: IF, .l‘Y/ E‘:
  33. 33. Amazon S3 Secure, durable, highly—scalable object storage Accessible via a simple web services interface Store & retrieve any amount of data Use alone or together with other AWS services Amazon S3 Masterclass webinar: https: //youtu. beNCOk—noNwOU
  34. 34. Amazon S3 Allows you to decouple compute from storage for analytics workloads Amazon S3 Masterclass webinar: https: //youtu. be/ VCOk—noNwOU
  35. 35. Durable Designed for 99.999999999% durability of archives * Amazon Glacier 5 Cost Effective Write—once, read—never. Cost effective for long term storage. Pay for accessing data aws. amazon. com/ glacier
  36. 36. Amazon Elastic Block Store (EBS) EC2 1 Instance Persistent block level storage volumes 6.4 For use with Amazon EC2 instances Automatically replicated within Availability Zones Offer consistent and low—latency performance EBS Snapshot EBS Volume (stored on $3) aws. amazon. com/ ebs
  37. 37. Elastic Block Store Amazon EBS 1GB to 16TB Volumes up to 20,000 IOPS per volume with EBS PIOPS EC2 Instance Very Fast Block devices to attach to EC2 Instances Simple Storage Service Amazon S3 Highly Scalable Object Store Objects from 1 byte to 5TB 99.99999999% durability Fast API Accessible Object Storage Amazon Glacier Long term archive storage Extremely low cost per GB 99.99999999% durability 3-5 hour access latency Intended for write once, read never use—cases
  38. 38. AWS DATABASE SERVICES
  39. 39. Amazon RDS Amazon Redshift Amazon DynamoDB RDBMS DATA WAREHOUSE NOSQL
  40. 40. Amazon Relational Database Service (RDS) Easy to set up, operate, and scale a relational database Provides cost—efficient and resizable capacity Manages time—consuming database management tasks aws. amazon. com/ rds/
  41. 41. Amazon Redshift A fast, fully managed, petabyte—scale data warehouse Cost—effectively & efficiently analyze all your data Use existing Business Intelligence tools Fast query performance using columnar storage technology aws. amazon. com/ redshift/
  42. 42. Getting Started with Amazon Redshift 2 Month Free Trial 8 Step Getting Started Tutorial Best Practices Guides — loading data, table design & performance tuning Cluster Management Guide aws. amazon. com/ redshift/ getting—started/
  43. 43. Bl & ETL Tools for Amazon Redshift aws. amazon. com/ redshift/ partners/
  44. 44. Amazon DynamoDB A A fast and flexible NOSQL database service — Consistent, single—digit millisecond latency at any scale C A fully managed cloud database . v Supports both document and key—value store models Flexible data model and reliable performance aws. amazon. com/ dynamodb/
  45. 45. ANALYTICS WITH HADOOP & AMAZON EMR
  46. 46. Amazon EMR ANALYTICS
  47. 47. AMAZON ELASTIC MAPREDUCE A MANAGED HADOOP FRAMEWORK
  48. 48. HADOOP DISTRIBUTED FILESYSTEM (HDFS) + DISTRIBUTED PROCESSING ENGINE (MAPREDUCE)
  49. 49. Amazon Elastic MapReduce (EMR) A managed Hadoop framework Quickly & cost—effectively process vast amounts of data ' Dynamically scale across fleets of Amazon EC2 instances Run other popular distributed frameworks such as Spark aws. amazon. com/ emr/
  50. 50. Amazon Elastic MapReduce (EMR) Splits data in pieces using the HDFS filesystem Q Manages distributed access to data and task execution ' Gathers the results and deposits these in S3 for access
  51. 51. Very large clickstream logging data (e. g TBs)
  52. 52. Lots of actions by | John Smith Very large clickstream logging data (e. g TBs)
  53. 53. of actions by Smith Very large clickstream logging data T (e. g TBs) ' the I09 ' many small pieces
  54. 54. Process in an EMR , cluster Lots ctions by John ith 7 Very large clickstream logging data éb (e. g TBs) _ ' the I09 m small p" s
  55. 55. Process in an EMR cluster Lots of actions by John Smith Aggregate the results from all the nodes
  56. 56. Process in an EMR cluster Lots of actions by John Smith WhatJohn Aggregate the Smith did results from all the nodes
  57. 57. Very large clickstream What John logging data (e_g TBS) Insight in a fraction ofthe time Smith dld
  58. 58. Data management Analytics languages/ engines . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ,‘ —. ra- : ( 5 5 ‘- . :- map Reduce ‘:3 ,1 , , ‘V____, : : . 7; . :_ " ‘V’ a= q.2.a“-'¢ r, T T — —> (E I ‘ Dhikfiloxuilam " ' Amazon Amazon Amazon EMR RDS Kinesis W / ‘T _ _ 4- 7 _ — 7‘ y‘ Amazo“ Amazo” Amazo” AWS Data Pipeline Redshift S8 DynamoDB
  59. 59. DEMO: ANALYZING AMAZON S3 ACCESS LOGS WITH EMR AND HUE
  60. 60. PREDICTIVE ANALYTICS WITH AMAZON MACHINE LEARNING
  61. 61. l/ lore & More Customers Are Using Prediction Technologies change. org NETFLIX A Prismatic Email targeting Recommendations Social news “ OMetaMlnd @ Digital health Language processing Auto—scaling
  62. 62. Large opportunity to Low barrier to apply ML entry
  63. 63. Introducing Amazon Machine Learning Easily create machine learning models Visualize and optimize models Put models into production in seconds Battle—hardened technology aws. amazon. com/ ml/
  64. 64. Easy to Use, High Performance Train and optimize models on GBs of data Batch process predictions zl _ I ‘ ‘ Real—time prediction API in one—click / ‘V all No servers to provision or manage
  65. 65. fl Build model Validate & optimize Make predictions Batch predictions Asynchronous predictions with trained model / , , V'- / * . / I A I. Real-time predictions Synchronous, low latency, high throughput Mount API end-point with a single click
  66. 66. RESOURCES YOU CAN USE TO LEARN MORE
  67. 67. aws. amazon. com/ big—data/
  68. 68. aws. amazon. com/ importexport aws. amazon. com/ dlrectconnect iunnninmmvunnmnunnuau ilflllllllflllllilnillflfllfllllllifi Q ; f,'I'L', .i“I‘i‘[‘ . _'fi’_"Tj”. _"__. i. aws. amazon. com/ kinesis aws. amazon. com/ rds l up-nu-Inn-nudiélutnnl-Quinn: -nu--0-it ‘I I ' l aws. amazon. com/ redshift aws. amazon. com/ elasticmapreduce
  69. 69. 3'0:-nuns-u. »,1uIanopItInua-1 5,IJ'_‘. 'L’: '.“. ?aa; . . ... ... ... ... ..-1-. .-»m. —.. ..m. .a. .n. .r. ... mu: s -. ... .. .,». ... .., ... ... ... ..c. ... ... ._. ... .. .._. ... ... ... ... .. ... .. .,. ..c. ... ... ... ..-. ... .a- AWS White Paper — Big Data Analytics Options on AWS
  70. 70. I . I| 4 Ill Il'l NDKIA 'ulI| AAEE( i'r' Ice ' 3 [nlbam 5"‘ DOW IONKS '4‘ . .. . ... ... . . ._ . ... F . - . ._. v ‘ Q suann romsouanz , .- 9o: >M0w ' _’”}_"_'_ '" gflg . ,_. .. ... .. ... . gunmbum , ,.. ,.. ,E ‘il_':42V1O/ Vb| 'OM V. » i. §‘§Q‘; ,”. :BE _= .VAVl§Lfilfllfilflfilft. <Tt<‘ilFT7/Sit]It, IIIClFl;7/AT! &l316>4§iIL. lCIl%! §7/£ll‘l&lIy‘II<f57/
  71. 71. ‘iF'vU)lL" Il'»H‘4 av. ACDN ; i'C'A mop _ 3.lVAVl; ; .3.lITl>3.E'. <‘ill‘L. <‘1<f. lIT7/SllfilIl_, III<fill‘l$7/ATifil3‘l$>*8iil_, l<‘ZII$B7/IilEl4<’ZIaIE7/
  72. 72. Inonuug Common Cum Uotmwln on Amazon EMR Uuuq ttuxzndrv and Flat-uaazh Buldng a Munr CB5 M. Mood Mr: Amunn Madmo Lnvnoq Ocm-azn: Iov Sn: Sci-eras an-1 lntvtuvcd Sorting on Dru. -on mama Unto AWS Duxn Punch: 1 Pamnumvln-cl tumour»: Io8maVur04mLnuuyo1fTl u¢<pa Oalm-ms Buadrvaamnvnanfloor-m-xlM¢xnIw¢hNvnrovu Manna Inn-mg FLmvmqnvi¢~ PcrcnvrucnSAS(‘nL1Mn*. |gd Cknlnr on AWS WW7‘ Irflli C1006 [dfll'J' VI’! U51’! Lnm-crvrqana Rut-rung nnkrumn EMR Cunv n yo. rvPC —Pu12' C. nlm- DNS Mnaaqillcml-m. nL. nqN'uzufiEMR. Ind It-u: :rlSf1kvk1NnrA: xn-uInnMsnmaOI| !n % Hlutnumg Amnuou Knnnn Slvnuvl 3:1: Uunq N-suv-l (CI. to: Noon p Savnawuq Anny! -(3 mm I)¢JYvvnn1 NYS rid kwuon [NIH blogs. aws. amazon. com/ bigdata/
  73. 73. V System Overview , pl; 3 "[)(»f{! Gff. « ’ c‘7.e; $I. .v WEB APPLICATION HOSTING qszlausq (OI ull?2l0D<uliC§l sue DIIIMSIA qsrz zlolsas Sfivllcfi teal’ ennllcslml SIG zlmsa DU vwmu almnlc alma- Meeorllcee suq elm aoulsnl need 9?» Ian M69 5 ulavlx maple eimsas lullselu-aims bfiltmnlsucs suds lacs ii an milieu. .2 QGIMGAGQ will we peel DU22lD| S ""7 I E We 2 2 Il so ycsz ""§I62I 335 pk , ,., ,ee, , : < S, ,.m. l.m. .,a V3,, ,0 Vi ‘ism r amt Flrsnlor -l us, an ol Hhlllfl‘ . , QM and fatten H twists W m a, I_, ll-. : VHIJH. dl a; u meal we U am elem ‘:42; u _, .-mo We lhdi >-News $5‘/5&4 .2‘ vlmou some holul I01 llllms wan Qsmlnbmeut well LAGGQZ tile enelaul vwl Mill luau Ddcoms Iva elfiwua av vnluou n-calm I| lA! fll (wan sun W311 cnzmllxs l( to B VIAIEXOU ac: lnztsucse w02I DIHSLAKSIIDUE mill eslecl Merl esuee sun annilcalm asmste SL5 qamuxaa OLA , ‘eqeu’ , mow’ , V, a Sbhllfillol k , as W zzll o. » 5a la Swot IOL loscpslsul M5 ’4o? ,l SHSPI ; s/ lsu ISA nil It .5 :5 ii] A». at [K ,2 clra lacs: MC V CIOE v SW99‘ p gel 2 :2, ll “Wllcfi ml» we adv. nlllll / mm El-"A 4 whr-. ~ D aslwclua V. léll Sftlollifitlliffilh ‘IléllPNlS? move um, M: (/lL42.¢ W: (hat male. M Eisenc roan amazon ‘nub . 'xl. ‘l"n cos lvmsm 592) uexalumsul cl Vwnau 14- nousl nsisnm sum- mnlw: lumlllaia mllsnlm lmnnm I! am a nail B cmusluz sbhlllzssloue A155: l3 unemq lsanuusullx an s to bmnlfifl Luau S/ l§ll5Dl| l0' we laislmusl uslsoses wsl , ma 5 5.; etfiae . musv Mm [Sims 5. husxa I s zlel --[Lama MCI as essw 22 , .a my V: N C: ufi U1‘ vein. (.1: M mm; me. :I( a ml-. ,.a am, ‘an can mm W, lu D 1H| D2I: !|lIJdfltU"H VtH. .l2L‘! L/5:litlUE| .!tlL§il_‘: ull‘24 my emu; HLL JED em. Jul/ LAe*. lL uul H Liu / /PD ? l"i1/l(‘, f‘.9
  74. 74. AWS Training & Certification Self-Paced Labs Training Certification Try products, gain new skills, and Build technical expertise to Validate your proven skills and get hands—on practice working design and operate scalable, expertise with the AWS platform with AWS technologies efficient applications on AWS aws-amazon-Com/ traininw aws. amazon. com/ training aws. amazon. com/ certification self-paced-labs
  75. 75. amazon Ian Massingham — Technical Evangelist 9' @lanMmmm J @AWS_UKl for local AWS events & news J @AWScloud for Global AWS News & Announcements

×