Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Working with Open Data on AWS

319 views

Published on

Learn how governments, research institutions, and private companies are using AWS to share massive amounts of data publicly. Discover best practices for sharing data in the cloud, how to find publicly available datasets through the Registry of Open Data on AWS, and how you can share your own data through AWS.

  • Be the first to comment

  • Be the first to like this

Working with Open Data on AWS

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Jed Sundwall, Global Open Data Lead Open Data on AWS https://opendata.aws
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Agenda Overview of Open Data on AWS How shared data on the cloud can accelerate research Finding data shared on AWS Sharing data on AWS
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark No Up Front Expense Pay for what you Use Improve Time to Market & Agility Scale Up and Down Self-Service Infrastructure AWS Cloud Equipment Resources and Administration Contracts Cost Traditional Infrastructure
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Why does AWS care about open data? Many AWS customers supply data to the public to accelerate research and product development. Many AWS customers use data shared on AWS to create new products and services.
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Companies want more value from their data Complications: Siloed approaches don’t work anymore It’s too expensive and limiting to store data on-premises Data is: Implication: A new approach is needed to extract insights and value Growing exponentially From new sources Increasingly diverse Used by many people Analyzed by many applications
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “…data must be organized, well-documented, consistently formatted, and error free. Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” — Data Driven by DJ Patil and Hilary Mason Undifferentiated heavy lifting
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Flipped data flow in the cloud Traditional approach: Move data to computing resources. Cloud approach: Move computing resources to data. Amazon S3 Amazon EC2 Amazon EMR Amazon Athena
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud data lakes are the future Customers want: To eliminate data silos To move to a single store, i.e. a data lake in the cloud To store data securely in standard formats To grow to any scale, with low costs To analyze their data in a variety of ways To have real-time analytics To predict future outcomes Data Lake
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Sharing data in the cloud lets data users spend more time on data analysis rather than data acquisition. https://opendata.aws
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Advantages of sharing data in the cloud Global community of users Faster pace of research Lower cost of research New services and tools
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Public Datasets https://registry.opendata.aws
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS Public Datasets https://registry.opendata.aws/tag/satellite-imagery
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Data at work
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “…data must be organized, well-documented, consistently formatted, and error free. Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” — Data Driven by DJ Patil and Hilary Mason Undifferentiated heavy lifting
  15. 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Graph by Drew Bollinger (@drewbo19) at Development Seed Landsat on AWS
  16. 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Monitoring at-risk bodies of water from space The Blue Dot Observatory uses Sentinel-2 satellite data on AWS to monitor water bodies around the world. “The cost to process one month of data for about 7,000 bodies of water currently in the system is 6 EUR. It is possible to set up world-scale systems with a shoestring budget.” opendata.aws/bluedot
  17. 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Opening high resolution elevation data USGS shares over 10 trillion lidar point cloud records from across the US on AWS. “The democratization of elevation data […] promises to revolutionize approaches to applications from flood forecasting and geologic assessments to precision agriculture and infrastructure development.” — Kevin Gallagher, USGS opendata.aws/usgs-lidar
  18. 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Detect earthquakes in real time Grillo shares seismic data produced by their network of IoT-based sensors deployed in Mexico, Chile and Costa Rica. This data can be processed in real time to detect earthquakes occurring off the coast of Chile before they are felt in Santiago. opendata.aws/iot-earthquake
  19. 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Facilitating over 31 million journeys made in London every day When Transport for London opened up access to its data, application developers and researchers used it to create more than 600 applications that provide services to 42 percent of Londoners, saving an up to estimated £130 million per year. opendata.aws/tfl
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Accelerating cybersecurity research “By providing better research data, we can help fight cyberattacks across Canada and throughout the world and help networks be more secure.” — Mike Davie, Canadian Communications Security Establishment opendata.aws/cse-cyber
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “Recently, the National Oceanic and Atmospheric Administration and Amazon Web Services (AWS) Cloud made available one of the largest datasets describing animal movement ever compiled: …” — Adriaan M. Dokter et al. Nature (2018)
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark “Recently, the National Oceanic and Atmospheric Administration and Amazon Web Services (AWS) Cloud made available one of the largest datasets describing animal movement ever compiled: the Next Generation Weather Radar (NEXRAD) archive.” — Adriaan M. Dokter et al. Nature (2018) opendata.aws/bird-radar
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Finding data on AWS Using the Registry of Open Data on AWS (RODA)
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS https://registry.opendata.aws/
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – Tags https://registry.opendata.aws/tag/sustainability
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – Usage examples https://registry.opendata.aws/usage-examples
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Registry of Open Data on AWS – How to contribute https://github.com/awslabs/open-data-registry
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Sharing data (on AWS) What we’ve learned
  30. 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product.
  31. 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  32. 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product. It is optimized for analysis.
  33. 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Highly processedRaw Userbase
  34. 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Raw Accessible Documented Trustworthy Userbase
  35. 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Traditional GeoTIFF bundle .tar
  36. 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud-optimized GeoTIFF
  37. 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Cloud-optimized GeoTIFF
  38. 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Patterns S3 Key Index External Index Internal Index
  39. 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Example: Allen Brain Observatory Key Naming https://registry.opendata.aws/allen-brain-observatory/
  40. 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Example: IRS 990 CSV as External Index
  41. 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark What makes a dataset successful? It is treated like a product. It is optimized for analysis. There is a community around it.
  42. 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pangeo community platform http://pangeo.io
  43. 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pangeo community platform http://pangeo.io
  44. 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Guide to sharing data on AWS Over 40 pages of insights on data sharing and case studies from customers including: - Transport for London - Canada’s Communications Security Establishment - The US Geological Survey - The Allen Institute for Brain Science https://opendata.aws/guide-pdf
  45. 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you! jed@amazon.com

×