Your SlideShare is downloading. ×
2013 AWS WWPS SummitCanberra, AustraliaBig Data with AWSGlenn GoreSr Manager, AWS
2013 AWS WWPS Summit,Canberra – May 23Overview• The Big Data Challenge• Big Data tools and what can we do with them ?• Pac...
2013 AWS WWPS Summit,Canberra – May 23An engineer’s definitionWhen your data sets become so large that you have to startin...
2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharing
2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharingLower co...
2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharingLower co...
Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center...
2013 AWS WWPS Summit,Canberra – May 23Amazon Web Services helps removeconstraints
2013 AWS WWPS Summit,Canberra – May 23Remove constraints = More experimentationMore experimentation = More innovationMore ...
2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
2013 AWS WWPS Summit,Canberra – May 23EMR is Hadoop in the Cloud
2013 AWS WWPS Summit,Canberra – May 23What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petab...
2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
2013 AWS WWPS Summit,Canberra – May 23How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, #...
2013 AWS WWPS Summit,Canberra – May 23What can you run on EMR…S3EMREMR Cluster
2013 AWS WWPS Summit,Canberra – May 23EMREMR ClusterResize NodesS3You can easily add andremove nodes
2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hour...
2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluste...
2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluste...
2013 AWS WWPS Summit,Canberra – May 23Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan te...
2013 AWS WWPS Summit,Canberra – May 23Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Tr...
2013 AWS WWPS Summit,Canberra – May 23EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youc...
2013 AWS WWPS Summit,Canberra – May 23EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-...
2013 AWS WWPS Summit,Canberra – May 23S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elasti...
2013 AWS WWPS Summit,Canberra – May 23S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster co...
2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
2013 AWS WWPS Summit,Canberra – May 23Reporting Data-warehouseRDBMSRedshiftOLTPERPReportingand BI1
2013 AWS WWPS Summit,Canberra – May 23Live Archive for (Structured) Big DataDynamoDBRedshiftOLTPWeb Apps Reportingand BI2
2013 AWS WWPS Summit,Canberra – May 23Cloud ETL for Big DataRedshiftReportingand BIElastic MapReduceS33
Streaming Hive Pig DynamoDB RedshiftUnstructuredData✓ ✓Structured Data ✓ ✓ ✓ ✓LanguageSupportAny* HQL Pig Latin Client SQL...
2013 AWS WWPS Summit,Canberra – May 23Collection & storageAnalytics & computationCollaboration & sharingRemoveConstraintsG...
South Australia Water DataManagement on AWSCarnegie Mellon UniversityDr. Murlikrishna ViswanathanSrinivasan VembuliRikio C...
Agenda1. Project Background2. Water Management in South Australia3. Water Data on Cloud (Case in SA)4. Future Roadmap
Project Background• “Australia is the driest inhabited continent onEarth, yet is among the world’s highestconsumers of wat...
National Water Initiative• A shared agreement by State Governments to increase theefficiency of Australia’s water use. Und...
Water Data in South Australia (SA)• In SA, the Department of Environment Water and NaturalResources (DEWNR) collects water...
Current Process at DEWNROtherDataField SensorsRawDataRawDataRawDataFoxpro DBHydstraSQL ServerGIS ApplicationWDTFData Sourc...
Water Data Transfer Format (WDTF)• DEWNR and BOM are using data generated from thecurrent process in Water Data Transfer F...
Current Limitations• The current architecture relies on multiple systemsrunning on legacy software ,i.e., Hydstra (Foxpro ...
Objectives• DEWNR wants to use data in WDTF format togenerate analytical data similar to BOM for publicconsumption (Open D...
Cloud-based Water DataManagement & Analytics
Data PipelineRaw Files(On premise)Raw Files(S3)Clean Data(S3, Redshift)AnalyzedData(S3)Data PipelineZipWDTFWDTFCSVJSONCSVJ...
Data Analysis
Future RoadmapWater Data fromEntire AustraliaOpen AccessTo Water DataRealtime WaterData Analysis
Summary• Benefit of cloud for water data management• Streamlined data management process• Open data hosting• Cost effectiv...
2013 AWS WWPS SummitCanberra, Australia
Upcoming SlideShare
Loading in...5
×

AWS Canberra WWPS Summit 2013 - Big Data with AWS

514

Published on

Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
514
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "AWS Canberra WWPS Summit 2013 - Big Data with AWS"

  1. 1. 2013 AWS WWPS SummitCanberra, AustraliaBig Data with AWSGlenn GoreSr Manager, AWS
  2. 2. 2013 AWS WWPS Summit,Canberra – May 23Overview• The Big Data Challenge• Big Data tools and what can we do with them ?• Packetloop – Big Data Security Analytics• Intel technology on big data.
  3. 3. 2013 AWS WWPS Summit,Canberra – May 23An engineer’s definitionWhen your data sets become so large that you have to startinnovating how to collect, store, organize, analyze andshare it
  4. 4. 2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharing
  5. 5. 2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughput
  6. 6. 2013 AWS WWPS Summit,Canberra – May 23GenerationCollection & storageAnalytics & computationCollaboration & sharingLower cost,higher throughputHighlyconstrained
  7. 7. Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  8. 8. 2013 AWS WWPS Summit,Canberra – May 23Amazon Web Services helps removeconstraints
  9. 9. 2013 AWS WWPS Summit,Canberra – May 23Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  10. 10. 2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
  11. 11. 2013 AWS WWPS Summit,Canberra – May 23EMR is Hadoop in the Cloud
  12. 12. 2013 AWS WWPS Summit,Canberra – May 23What is Amazon Redshift ?Amazon Redshift is a fast and powerful, fully managed,petabyte-scale data warehouse service in the AWScloudEasy to provision and scaleNo upfront costs, pay as you goHigh performance at a low priceOpen and flexible with support for popular BI tools
  13. 13. 2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
  14. 14. 2013 AWS WWPS Summit,Canberra – May 23How does EMR work ?EMREMR ClusterS3Put the datainto S3Choose: Hadoop distribution, # ofnodes, types of nodes, customconfigs, Hive/Pig/etc.Get the output fromS3Launch the cluster using theEMR console, CLI, SDK, orAPIsYou can also storeeverything in HDFS
  15. 15. 2013 AWS WWPS Summit,Canberra – May 23What can you run on EMR…S3EMREMR Cluster
  16. 16. 2013 AWS WWPS Summit,Canberra – May 23EMREMR ClusterResize NodesS3You can easily add andremove nodes
  17. 17. 2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $168
  18. 18. 2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42
  19. 19. 2013 AWS WWPS Summit,Canberra – May 23Resize Nodes with Spot InstancesCost without Spot Add 10 nodes on spot10 node cluster running for 14 hoursCost = 1.2 * 10 * 14 = $16820 node cluster running for 7 hoursCost = 1.2 * 10 * 7 = $84= 0.6 * 10 * 7 = $42= Total $12625% reduction in price50% reduction in time
  20. 20. 2013 AWS WWPS Summit,Canberra – May 23Ad-Hoc Clusters – What are they ?EMR ClusterS3When processing is complete, youcan terminate the cluster (and stoppaying)1
  21. 21. 2013 AWS WWPS Summit,Canberra – May 23Ad-Hoc Clusters – When to useEMR ClusterS3Not using HDFSNot using the cluster 24/7Transient jobs1
  22. 22. 2013 AWS WWPS Summit,Canberra – May 23EMREMR Cluster“Alive” Clusters – What are they ?S3If you run your jobs 24 x 7 , youcan also run a persistent clusterand use RI models to save costs2
  23. 23. 2013 AWS WWPS Summit,Canberra – May 23EMREMR Cluster“Alive” Clusters – When ?S3Frequently running jobsDependencies on map-reduce-mapoutputs2
  24. 24. 2013 AWS WWPS Summit,Canberra – May 23S3 instead of HDFSS3EMREMR Cluster• S3 provides 99.99999999999% ofdurability• Elastic• Version control against failure• Run multiple clusters with a singlesource of truth• Quick recovery from failure• Continuously resize clusters3
  25. 25. 2013 AWS WWPS Summit,Canberra – May 23S3 and HDFSS3EMREMR ClusterLoad data from S3 using S3DistCPBenefits of HDFSMaster copy of the data in S3Get all the benefits of S3HDFSS3distCP4
  26. 26. 2013 AWS WWPS Summit,Canberra – May 23Elastic MapReduce and RedshiftBig Data tools
  27. 27. 2013 AWS WWPS Summit,Canberra – May 23Reporting Data-warehouseRDBMSRedshiftOLTPERPReportingand BI1
  28. 28. 2013 AWS WWPS Summit,Canberra – May 23Live Archive for (Structured) Big DataDynamoDBRedshiftOLTPWeb Apps Reportingand BI2
  29. 29. 2013 AWS WWPS Summit,Canberra – May 23Cloud ETL for Big DataRedshiftReportingand BIElastic MapReduceS33
  30. 30. Streaming Hive Pig DynamoDB RedshiftUnstructuredData✓ ✓Structured Data ✓ ✓ ✓ ✓LanguageSupportAny* HQL Pig Latin Client SQLSQL ✓SQL-Like ✓Volume Unlimited Unlimited Unlimited RelativelyLow1.6 PBLatency Medium Medium Medium Ultra Low Low
  31. 31. 2013 AWS WWPS Summit,Canberra – May 23Collection & storageAnalytics & computationCollaboration & sharingRemoveConstraintsGeneration
  32. 32. South Australia Water DataManagement on AWSCarnegie Mellon UniversityDr. Murlikrishna ViswanathanSrinivasan VembuliRikio ChibaRomeo Luka
  33. 33. Agenda1. Project Background2. Water Management in South Australia3. Water Data on Cloud (Case in SA)4. Future Roadmap
  34. 34. Project Background• “Australia is the driest inhabited continent onEarth, yet is among the world’s highestconsumers of water.” - CSIRO: Water overview
  35. 35. National Water Initiative• A shared agreement by State Governments to increase theefficiency of Australia’s water use. Under this initiative, StateGovernments have made commitments to:-I. Prepare water plans with provisions for the environmentII. Deal with over-allocated or stressed water systemsIII. Introduce registers of water rights and standards for water accountingIV. Expand the trade of waterV. Improve pricing for water storage and deliveryVI. Meet and manage urban water demandshttp://www.nationalwatermarket.gov.au/rules-restrictions/national-rules.html
  36. 36. Water Data in South Australia (SA)• In SA, the Department of Environment Water and NaturalResources (DEWNR) collects water related data from varioussources• The data is stored in multiple systems• Hydstra (Legacy Foxpro DB)• SQL Server Data Warehouse• This Data is currently supplied to Bureau Of Meteorology(BOM) for its analytics applications and other agencies
  37. 37. Current Process at DEWNROtherDataField SensorsRawDataRawDataRawDataFoxpro DBHydstraSQL ServerGIS ApplicationWDTFData Source Storage / Application OutputAnalysisData Mart
  38. 38. Water Data Transfer Format (WDTF)• DEWNR and BOM are using data generated from thecurrent process in Water Data Transfer Format(WDTF)• Water Data Transfer Format is a National XMLstandard for exchanging water information
  39. 39. Current Limitations• The current architecture relies on multiple systemsrunning on legacy software ,i.e., Hydstra (Foxpro DB)• This leads to increased costs and inefficiency inservice delivery• Current architecture does not fully utilise WDTF asthe universal data format standard
  40. 40. Objectives• DEWNR wants to use data in WDTF format togenerate analytical data similar to BOM for publicconsumption (Open Data: Open TechnologyFoundation is a facilitator for SA Gov.)• To reduce system operation cost by migrating fromon premise system to on cloud
  41. 41. Cloud-based Water DataManagement & Analytics
  42. 42. Data PipelineRaw Files(On premise)Raw Files(S3)Clean Data(S3, Redshift)AnalyzedData(S3)Data PipelineZipWDTFWDTFCSVJSONCSVJSONCopy, Unzip Parse QueryOpen DataWeb Site(Dashboard)Observation Data
  43. 43. Data Analysis
  44. 44. Future RoadmapWater Data fromEntire AustraliaOpen AccessTo Water DataRealtime WaterData Analysis
  45. 45. Summary• Benefit of cloud for water data management• Streamlined data management process• Open data hosting• Cost effective• Project progress•  Data migration onto cloud•  Real-time data analysis• ☐ Open access to water data
  46. 46. 2013 AWS WWPS SummitCanberra, Australia

×