In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Spark’s ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL. Presentation by Alex Kalinin, Tim Fox, Sujee Maniyam & Dave Nielsen at re:invent.
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017Amazon Web Services
In this session, we detail Sysco's journey from a company focused on hindsight-based reporting to one focused on insights and foresight. For this shift, Sysco moved from multiple data warehouses to an AWS ecosystem, including Amazon Redshift, Amazon EMR, AWS Data Pipeline, and more. As the team at Sysco worked with Tableau, they gained agile insight across their business. Learn how Sysco decided to use AWS, how they scaled, and how they became more strategic with the AWS ecosystem and Tableau.
Session sponsored by Tableau
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
This document provides a summary of a presentation on building data lakes and analytics on AWS. It discusses:
- The challenges of big data including volume, velocity, variety and veracity.
- How an AWS data lake can address these challenges by quickly ingesting and storing any type of data while providing insights, security and the ability to run the right analytics tools without data movement.
- Key components of a data lake on AWS including storage, data catalog, analytics, machine learning capabilities, and tools for real-time and traditional data movement.
The document discusses migrating big data workloads from on-premises environments to AWS. It describes deconstructing current workloads, identifying challenges with on-premises architectures, and how to migrate components to AWS services like Amazon EMR and Amazon S3. The document also shares the experience of Vanguard migrating their big data workload to AWS.
Today’s organisations require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Data Lake is a new and increasingly popular way to store all of your data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand.
In this webinar, you will discover how AWS gives you fast access to flexible and low-cost IT resources, so you can rapidly scale and build your data lake that can power any kind of analytics such as data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity and variety of data.
Learning Objectives:
• Discover how you can rapidly scale and build your data lake with AWS.
• Explore the key pillars behind a successful data lake implementation.
• Learn how to use the Amazon Simple Storage Service (S3) as the basis for your data lake.
• Learn about the new AWS services recently launched, Amazon Athena and Amazon Redshift Spectrum, that help customers directly query that data lake.
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
This document provides instructions for building a big data application on AWS that collects and analyzes web server logs. It discusses using Amazon Kinesis to collect logs with a Firehose delivery stream into an S3 bucket. It then covers using Kinesis Analytics to process the logs in real-time by writing SQL queries that compute metrics and detect anomalies. Finally, it discusses loading the processed logs into Amazon Redshift for interactive querying and visualizing insights with Amazon QuickSight.
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017Amazon Web Services
In this session, we detail Sysco's journey from a company focused on hindsight-based reporting to one focused on insights and foresight. For this shift, Sysco moved from multiple data warehouses to an AWS ecosystem, including Amazon Redshift, Amazon EMR, AWS Data Pipeline, and more. As the team at Sysco worked with Tableau, they gained agile insight across their business. Learn how Sysco decided to use AWS, how they scaled, and how they became more strategic with the AWS ecosystem and Tableau.
Session sponsored by Tableau
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
This document provides a summary of a presentation on building data lakes and analytics on AWS. It discusses:
- The challenges of big data including volume, velocity, variety and veracity.
- How an AWS data lake can address these challenges by quickly ingesting and storing any type of data while providing insights, security and the ability to run the right analytics tools without data movement.
- Key components of a data lake on AWS including storage, data catalog, analytics, machine learning capabilities, and tools for real-time and traditional data movement.
The document discusses migrating big data workloads from on-premises environments to AWS. It describes deconstructing current workloads, identifying challenges with on-premises architectures, and how to migrate components to AWS services like Amazon EMR and Amazon S3. The document also shares the experience of Vanguard migrating their big data workload to AWS.
Today’s organisations require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Data Lake is a new and increasingly popular way to store all of your data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand.
In this webinar, you will discover how AWS gives you fast access to flexible and low-cost IT resources, so you can rapidly scale and build your data lake that can power any kind of analytics such as data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity and variety of data.
Learning Objectives:
• Discover how you can rapidly scale and build your data lake with AWS.
• Explore the key pillars behind a successful data lake implementation.
• Learn how to use the Amazon Simple Storage Service (S3) as the basis for your data lake.
• Learn about the new AWS services recently launched, Amazon Athena and Amazon Redshift Spectrum, that help customers directly query that data lake.
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
This document provides instructions for building a big data application on AWS that collects and analyzes web server logs. It discusses using Amazon Kinesis to collect logs with a Firehose delivery stream into an S3 bucket. It then covers using Kinesis Analytics to process the logs in real-time by writing SQL queries that compute metrics and detect anomalies. Finally, it discusses loading the processed logs into Amazon Redshift for interactive querying and visualizing insights with Amazon QuickSight.
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...Amazon Web Services
Combining disparate datasets and making them accessible to data scientists and researchers is a prevalent challenge for many organizations, not just in healthcare research. American Heart Association (AHA) has built a data science platform using Amazon EMR, Amazon Elasticsearch Service, and other AWS services, that corrals multiple datasets and enables advanced research on phenotype and genotype datasets, aimed at curing heart diseases. In this session, we present how AHA built this platform and the key challenges they addressed with the solution. We also provide a demo of the platform, and leave you with suggestions and next steps so you can build similar solutions for your use cases.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You'll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.
Technology Trends in Data Processing - DAT311 - re:Invent 2017Amazon Web Services
In this talk, Anurag Gupta, VP for AWS Analytic and Transactional Database Services, will talk about some of the key trends we see in data processing and how they shape the services we offer at AWS. Specific trends will include the rise of machine generated logs as the dominant source of data, the move towards Serverless, api-centric computing, and the growing need for local access to data from users around the world.
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
Amazon S3 & Amazon Glacier provide the durable, scalable, secure and cost-effective storage you need for your data lake. But, as your data lake grows, the resources needed to analyze all the data can become expensive, or queries may take longer than desired. AWS provides query-in-place services like Amazon Athena and Amazon Redshift Spectrum to help you analyze this data easily and more cost-effectively than ever before. In this session, we will talk about how AWS query-in-place services and other tools work with Amazon S3 & Amazon Glacier and the optimizations you can use to analyze and process this data, cheaply and effectively.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
One of the biggest tradeoffs customers usually make when deploying BI solutions at scale is agility versus governance. Large-scale BI implementations with the right governance structure can take months to design and deploy. In this session, learn how you can avoid making this tradeoff using Amazon QuickSight. Learn how to easily deploy Amazon QuickSight to thousands of users using Active Directory and Federated SSO, while securely accessing your data sources in Amazon VPCs or on-premises. We also cover how to control access to your datasets, implement row-level security, create scheduled email reports, and audit access to your data.
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
"Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways.
In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users."
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
Customers that have Oracle Data warehouses find them complex and expensive to manage. Most are struggling with data load and performance issues. They are looking to migrate to something which is easy to manage, cost effective, and improves their query performance. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your Oracle data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. This workshop leverages AWS Database Migration Service and AWS Schema Conversion Tool to migrate an existing Oracle data warehouse to Amazon Redshift. When migrating your database from one engine to another, you have two major things to consider: the conversion of the schema and code objects, and the migration and conversion of the data itself. You can convert schema and code with AWS SCT and migrate data with AWS DMS. AWS DMS helps you migrate your data easily and securely with minimal downtime. Prerequisites: Have an AWS account with IAM admin permissions and sufficient limits for AWS resources above, with a comfortable working knowledge of AWS console, relational databases (Oracle) and Amazon Redshift.
DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Trave...Amazon Web Services
Building rich, high-performance streaming data systems requires fast, on-demand access to reference data sets, to implement complex business logic. In this talk, Expedia will discuss the architectural challenges the company faced, and how DAX + DynamoDB fits into the overall architecture and met their design requirements. Additionally, you will hear how DAX that enabled Expedia to add caching to their existing applications in hours, which previously was taking much longer. Session attendees will walk away with three key outputs: 1) Expedia’s overall architectural patterns for streaming data 2) how they uniquely leverage DynamoDB, DAX, Apache Spark, and Apache Kafka to solve these problems 3) the value that DAX provides and how it enabled them to improve our performance and throughput, reduce costs, and all without having to write any new code.
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
FINRA faced challenges with their on-premises data infrastructure, including difficulty tracking data, limited scalability, and high costs. They migrated to a managed data lake on AWS to address these issues. This provided centralized data management with a catalog, separation of storage and compute, encryption, and cost optimization. It enabled faster analytics through Presto querying, machine learning model development, and reduced TCO by 30% compared to their on-premises environment. Lessons learned included embracing disruption, automating infrastructure, and treating infrastructure as code. FINRA is exploring additional AWS services like Athena, Lambda, and Step Functions to continue improving their analytics capabilities.
by J. Bako, Solutions Architect, AWS
Graph databases are purpose-built to store and navigate relationships. They have advantages for many use cases: social networking, recommendation engines, fraud detection, and others where you need to create relationships between data and quickly query these relationships. Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. We’ll discuss when you should use a graph database and look at how to use Neptune.
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.
The document discusses building data lakes with AWS. It recommends using Amazon S3 as the storage layer for the data lake due to its scalability, durability and integration with other AWS analytics services. It also recommends using AWS Glue to catalog and ingest data into the data lake through automated crawlers. This allows for easy discovery, querying and analysis of data in the lake.
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017Amazon Web Services
As a leading cloud communications platform, Twilio has always been strongly data-driven. But as headcount and data volumes grew—and grew quickly—they faced many new challenges. One-off, static reports work when you’re a small startup, but how do you support a growth stage company to a successful IPO and beyond? Today, Twilio's data team relies on AWS and Looker to provide data access to 700 colleagues. Departments have the data they need to make decisions, and cloud-based scale means they get answers fast. Data delivers real-business value at Twilio, providing a 360-degree view of their customer, product, and business. In this session, you hear firsthand stories directly from the Twilio data team and learn real-world tips for fostering a truly data-driven culture at scale.
Session sponsored by Looker
The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover:
• How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
• Reference architectures for popular use cases, including: connected devices (IoT), log streaming, real-time intelligence, and analytics.
• The AWS big data portfolio of services, including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR) and Redshift.
• The latest relational database engine, Amazon Aurora - a MySQL-compatible, highly-available relational database engine which provides up to five times better performance than MySQL at a price one-tenth the cost of a commercial database.
• Amazon Machine Learning – the latest big data service from AWS provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.
by Rich Alberth, Solutions Architect, AWS
If you need to query relationships between data, you need a graph database. We’ll take a close look at Amazon Neptune, explore the differences between property graphs and RDF, then do graph data queries using Apache Tinkerpop. You’ll need a laptop with a Firefox or Chrome browser.
The document discusses Amazon Web Services (AWS) machine learning and artificial intelligence tools including Amazon Polly for text-to-speech, Amazon Lex for building conversational interfaces, and Amazon Rekognition for image and video analysis; it provides examples of how these tools work and can be used to build applications for tasks like flight booking, facial recognition, and building chatbots.
by Avijit Goswami, Sr. Solutions Architect, AWS
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017Amazon Web Services
In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Spark’s ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL.
Session sponsored by Intel
AWS makes it easy to migrate databases to the cloud and then operate them, faster and more cost-effectively. Our database capabilities also enable a number of methods to protect database volumes, and this session will help you understand best practices for backing up database instances in the cloud and then storing them in S3 for durable and available storage.
ABD316_American Heart Association Finding Cures to Heart Disease Through the ...Amazon Web Services
Combining disparate datasets and making them accessible to data scientists and researchers is a prevalent challenge for many organizations, not just in healthcare research. American Heart Association (AHA) has built a data science platform using Amazon EMR, Amazon Elasticsearch Service, and other AWS services, that corrals multiple datasets and enables advanced research on phenotype and genotype datasets, aimed at curing heart diseases. In this session, we present how AHA built this platform and the key challenges they addressed with the solution. We also provide a demo of the platform, and leave you with suggestions and next steps so you can build similar solutions for your use cases.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
Learn how to build a data lake for analytics in Amazon S3 and Amazon Glacier. In this session, we discuss best practices for data curation, normalization, and analysis on Amazon object storage services. We examine ways to reduce or eliminate costly extract, transform, and load (ETL) processes using query-in-place technology, such as Amazon Athena and Amazon Redshift Spectrum. We also review custom analytics integration using Apache Spark, Apache Hive, Presto, and other technologies in Amazon EMR. You'll also get a chance to hear from Airbnb & Viber about their solutions for Big Data analytics using S3 as a data lake.
Technology Trends in Data Processing - DAT311 - re:Invent 2017Amazon Web Services
In this talk, Anurag Gupta, VP for AWS Analytic and Transactional Database Services, will talk about some of the key trends we see in data processing and how they shape the services we offer at AWS. Specific trends will include the rise of machine generated logs as the dominant source of data, the move towards Serverless, api-centric computing, and the growing need for local access to data from users around the world.
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
Amazon S3 & Amazon Glacier provide the durable, scalable, secure and cost-effective storage you need for your data lake. But, as your data lake grows, the resources needed to analyze all the data can become expensive, or queries may take longer than desired. AWS provides query-in-place services like Amazon Athena and Amazon Redshift Spectrum to help you analyze this data easily and more cost-effectively than ever before. In this session, we will talk about how AWS query-in-place services and other tools work with Amazon S3 & Amazon Glacier and the optimizations you can use to analyze and process this data, cheaply and effectively.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
One of the biggest tradeoffs customers usually make when deploying BI solutions at scale is agility versus governance. Large-scale BI implementations with the right governance structure can take months to design and deploy. In this session, learn how you can avoid making this tradeoff using Amazon QuickSight. Learn how to easily deploy Amazon QuickSight to thousands of users using Active Directory and Federated SSO, while securely accessing your data sources in Amazon VPCs or on-premises. We also cover how to control access to your datasets, implement row-level security, create scheduled email reports, and audit access to your data.
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
"Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways.
In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users."
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
Customers that have Oracle Data warehouses find them complex and expensive to manage. Most are struggling with data load and performance issues. They are looking to migrate to something which is easy to manage, cost effective, and improves their query performance. Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Migrating your Oracle data warehouse to Amazon Redshift can substantially improve query and data load performance, increase scalability, and save costs. This workshop leverages AWS Database Migration Service and AWS Schema Conversion Tool to migrate an existing Oracle data warehouse to Amazon Redshift. When migrating your database from one engine to another, you have two major things to consider: the conversion of the schema and code objects, and the migration and conversion of the data itself. You can convert schema and code with AWS SCT and migrate data with AWS DMS. AWS DMS helps you migrate your data easily and securely with minimal downtime. Prerequisites: Have an AWS account with IAM admin permissions and sufficient limits for AWS resources above, with a comfortable working knowledge of AWS console, relational databases (Oracle) and Amazon Redshift.
DAT324_Expedia Flies with DynamoDB Lightning Fast Stream Processing for Trave...Amazon Web Services
Building rich, high-performance streaming data systems requires fast, on-demand access to reference data sets, to implement complex business logic. In this talk, Expedia will discuss the architectural challenges the company faced, and how DAX + DynamoDB fits into the overall architecture and met their design requirements. Additionally, you will hear how DAX that enabled Expedia to add caching to their existing applications in hours, which previously was taking much longer. Session attendees will walk away with three key outputs: 1) Expedia’s overall architectural patterns for streaming data 2) how they uniquely leverage DynamoDB, DAX, Apache Spark, and Apache Kafka to solve these problems 3) the value that DAX provides and how it enabled them to improve our performance and throughput, reduce costs, and all without having to write any new code.
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
FINRA faced challenges with their on-premises data infrastructure, including difficulty tracking data, limited scalability, and high costs. They migrated to a managed data lake on AWS to address these issues. This provided centralized data management with a catalog, separation of storage and compute, encryption, and cost optimization. It enabled faster analytics through Presto querying, machine learning model development, and reduced TCO by 30% compared to their on-premises environment. Lessons learned included embracing disruption, automating infrastructure, and treating infrastructure as code. FINRA is exploring additional AWS services like Athena, Lambda, and Step Functions to continue improving their analytics capabilities.
by J. Bako, Solutions Architect, AWS
Graph databases are purpose-built to store and navigate relationships. They have advantages for many use cases: social networking, recommendation engines, fraud detection, and others where you need to create relationships between data and quickly query these relationships. Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. We’ll discuss when you should use a graph database and look at how to use Neptune.
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.
The document discusses building data lakes with AWS. It recommends using Amazon S3 as the storage layer for the data lake due to its scalability, durability and integration with other AWS analytics services. It also recommends using AWS Glue to catalog and ingest data into the data lake through automated crawlers. This allows for easy discovery, querying and analysis of data in the lake.
How Twilio Scaled Its Data Driven Culture - ABD309 - re:Invent 2017Amazon Web Services
As a leading cloud communications platform, Twilio has always been strongly data-driven. But as headcount and data volumes grew—and grew quickly—they faced many new challenges. One-off, static reports work when you’re a small startup, but how do you support a growth stage company to a successful IPO and beyond? Today, Twilio's data team relies on AWS and Looker to provide data access to 700 colleagues. Departments have the data they need to make decisions, and cloud-based scale means they get answers fast. Data delivers real-business value at Twilio, providing a 360-degree view of their customer, product, and business. In this session, you hear firsthand stories directly from the Twilio data team and learn real-world tips for fostering a truly data-driven culture at scale.
Session sponsored by Looker
The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover:
• How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
• Reference architectures for popular use cases, including: connected devices (IoT), log streaming, real-time intelligence, and analytics.
• The AWS big data portfolio of services, including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR) and Redshift.
• The latest relational database engine, Amazon Aurora - a MySQL-compatible, highly-available relational database engine which provides up to five times better performance than MySQL at a price one-tenth the cost of a commercial database.
• Amazon Machine Learning – the latest big data service from AWS provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.
by Rich Alberth, Solutions Architect, AWS
If you need to query relationships between data, you need a graph database. We’ll take a close look at Amazon Neptune, explore the differences between property graphs and RDF, then do graph data queries using Apache Tinkerpop. You’ll need a laptop with a Firefox or Chrome browser.
The document discusses Amazon Web Services (AWS) machine learning and artificial intelligence tools including Amazon Polly for text-to-speech, Amazon Lex for building conversational interfaces, and Amazon Rekognition for image and video analysis; it provides examples of how these tools work and can be used to build applications for tasks like flight booking, facial recognition, and building chatbots.
by Avijit Goswami, Sr. Solutions Architect, AWS
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017Amazon Web Services
In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Spark’s ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL.
Session sponsored by Intel
AWS makes it easy to migrate databases to the cloud and then operate them, faster and more cost-effectively. Our database capabilities also enable a number of methods to protect database volumes, and this session will help you understand best practices for backing up database instances in the cloud and then storing them in S3 for durable and available storage.
by Drew Meyer, Sr. Manager, Product Marketing AWS
We will cover the core AWS storage services, which include Amazon Simple Storage Service (Amazon S3), Amazon Glacier, Amazon Elastic File System (Amazon EFS), and Amazon Elastic Block Store (Amazon EBS). We also discuss data transfer services such as AWS Snowball, Snowball Edge, and AWS Snowmobile, and hybrid storage solutions such as AWS Storage Gateway.
This document discusses building a data lake on AWS. It notes that organizations that successfully generate value from data will outperform competitors. It outlines challenges of data visibility, multiple access mechanisms, and analyzers needing access. AWS is presented as the perfect solution with its storage, analysis and security capabilities at scale. Case studies of Celgene and IEP are presented that used AWS for their data lakes. Traditional analytics are separated from data warehousing, but data lakes extend this by including diverse data and analytical engines at larger scale with lower costs. The AWS portfolio for data lakes, analytics and IoT is presented as the most complete toolset. Building value from the data lake is discussed through machine learning, analytics, data movement and visualization.
Building Text Analytics Applications on AWS using Amazon Comprehend - AWS Onl...Amazon Web Services
Learning Objectives:
- Get an introduction to Natural Language Processing (NLP)
- Learn benefits of new approaches to analytics and technologies that help empower better decisions, e.g., NLP, data prep
- Build a text analytics solution with Amazon Comprehend and Amazon Relational Database Service in a step by step demo
GPSBUS206_Best Practices for Building a Partner Database Practice on AWSAmazon Web Services
In this session, we walk through an overview of AWS database services. We discuss why customers choose to adopt AWS database services and how APN Partners can help customers by building a database practice using AWS services such as Amazon Aurora, Amazon Redshift, and Amazon DynamoDB. We share best practices for APN Partners to start building a successful database practice on AWS. We also talk about how APN Partners can use various resources offered by APN to accelerate their practice-building process.
Accelerate machine-learning workloads using Amazon EC2 P3 instances - CMP204 ...Amazon Web Services
Organizations are tackling exponentially complex questions across advanced scientific, energy, high-tech, and medical fields. Machine learning (ML) makes it possible to quickly explore a multitude of scenarios and generate the best answers, ranging from image, video, and speech recognition to autonomous vehicle systems and weather prediction. In this interactive chalk talk, we discuss the latest advancements in compute to support your ML goals. We also discuss how, for data scientists, researchers, and developers who want to speed development of their ML applications, Amazon Elastic Compute Cloud (Amazon EC2) P3 instances are the most powerful, cost-effective, and versatile GPU-compute instances available.
Generative Adversarial Networks (GANs) using Apache MXNetApache MXNet
The document provides an overview of generative adversarial networks (GANs) using Apache MXNet. It introduces GANs and deep learning concepts. It then demonstrates how to implement GANs using MXNet with examples like DCGAN. Finally, it discusses other GAN models and provides resources for using MXNet on AWS.
NEW LAUNCH! Amazon Neptune Overview and Customer Use Cases - DAT319 - re:Inve...Amazon Web Services
In this session, we will provide an overview of Amazon Neptune, AWS’s newest database service. Amazon Neptune is a fast, reliable graph database that makes it easy to build applications over highly connected data. We will then explore how Siemens is building a knowledge using Amazon Neptune.
Serverless Text Analytics with Amazon ComprehendDonnie Prakoso
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.
This deck provides how to build your own text analytics using Amazon Comprehend and integration with other AWS services. On top of that, this deck also provides an introduction to Amazon Lex.
GPSBUS216-GPS Applying AI-ML to Find Security Needles in the HaystackAmazon Web Services
Security is about visibility and control. It starts with getting visibility (collecting as much data as possible about your environment), then deciding what is worth alarming versus what is a distraction. A classic case of finding needles in the haystack. AWS Partners can leverage highly scalable, machine learning (ML) services to process large amounts of log, event, flow, and other data to build AWS–specific security solutions that scale. Pass the undifferentiated heavy lifting to AWS so you can focus on your core value proposition! This session helps AWS Partners understand what services are available and applicable for building security solutions, and provides use cases to help accelerate adoption.
Create an IoT Gateway and Establish a Data Pipeline to AWS IoT with Intel - I...Amazon Web Services
This document provides an overview and agenda for a workshop on creating an IoT gateway and establishing a data pipeline from edge devices to AWS IoT using Intel technology. The workshop will include an overview of Intel IoT technology including NUC gateways, development tools, and libraries. It will also cover an overview of AWS IoT services and a hands-on lab connecting Intel devices to AWS IoT using MQTT protocol and visual programming with Node-RED.
GAM311-How Linden Lab Built a Virtual World on the AWS Cloud.pdfAmazon Web Services
The document discusses how Linden Lab moved their virtual world platform from an on-premise architecture to AWS to enable faster development, continuous delivery, and scalability. The key steps involved migrating the monolithic application to microservices running in containers on ECS, building automated pipelines for container builds and deployments using tools like Packer, CloudFormation, and CodeDeploy, and designing the infrastructure for high availability using multiple AWS services. This allowed Linden Lab to rapidly prototype and deploy new features for their Sansar virtual world project and better handle the uncertainty of how user traffic might grow over time.
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
The document discusses Amazon AI services including Amazon SageMaker, an end-to-end machine learning platform, Amazon Comprehend for natural language processing, and Amazon Translate for machine translation. It highlights key benefits and use cases for these services, as well as launch customers.
Open Source at AWS: Code, Contributions, Collaboration, and CommunicationAmazon Web Services
At OSCON 2018, Adrian Cockcroft detailed the many ways AWS participates in open source: contributing to open source projects, reporting bugs, contributing fixes and enhancements to a wide spectrum of projects ranging from the Linux kernel to PostgreSQL and Kubernetes, and managing the hundreds of projects of its own.
The document discusses architectures for the 21st century. It emphasizes that 21st century architectures should be controllable, resilient, and adaptable. It also discusses focusing architectures on data. Other topics covered include accessibility in the present and future, with a focus on voice and interactions beyond just voice. Security practices for well-architected systems are presented, including security in continuous integration/continuous delivery pipelines. Automation to enhance security is also discussed.
The document discusses big data and machine learning solutions on AWS. It covers why organizations use big data, challenges they face, and how AWS solutions like S3 data lakes, Glue, Athena, Redshift, Kinesis, Elasticsearch, SageMaker, and QuickSight can help overcome these challenges. It also discusses how big data drives machine learning and how AWS machine learning services work. Core tenets discussed include building decoupled systems, using the right tool for the job, and leveraging serverless services.
Amir sadoughi developing large-scale machine learning algorithms on amazon ...MLconf
The document discusses Amazon SageMaker, a machine learning platform that allows users to build, train, and deploy machine learning models. It describes key aspects of developing machine learning algorithms on SageMaker such as interface design, system design, testing, and communications. Specific topics covered include storage optimization, compute resources, network design, unit testing, benchmarking, and hyperparameter tuning. The document provides an example of developing an exponential moving average algorithm on SageMaker.
Similar to BigDL Deep Learning in Apache Spark - AWS re:invent 2017 (20)
10 Ways to Scale with Redis - LA Redis Meetup 2019Dave Nielsen
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
Redis Streams plus Spark Structured StreamingDave Nielsen
Continuous applications have 3 things in common: They collect data from sources (ex: IoT devices), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number of devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time.
Redis Streams enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources.
Apache Spark’s Structured Streaming API enables real-time decision making for Continuous Applications.
In this session, Dave will perform a live demonstration of how to integrate open source Redis with Apache Spark’s Structured Streaming API using Spark-Redis library. I will also walk through the code and run a live continuous application.
Microservices - Is it time to breakup? Dave Nielsen
Microservices: Is it time to break up? discusses when and how to transition from a monolithic application architecture to a microservices architecture. While microservices allow for improved scalability, performance, and agility, they also introduce more complexity. The document recommends using Redis data services to address scalability needs within a monolithic application initially. As needs grow, the application can then be decomposed into microservices while still leveraging Redis to manage shared data. Redis Enterprise provides high availability, security, and performance advantages for microservices architectures in the cloud.
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
Slides for talk delivered at PostgresOpen 2018 in San Francisco https://postgresql.us/events/pgopen2018/schedule/session/538-add-redis-to-postgres-to-make-your-microservice-go-boom/
Redis as a Main Database, Scaling and HADave Nielsen
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
The document discusses a Cloud Storage API that provides a common library allowing developers to access either Amazon S3 or Nirvanix IMFS storage services with the same code, avoiding vendor lock-in and allowing switching between providers by changing configuration parameters only, while focusing on website features rather than storage integration challenges and differences between provider APIs and pricing.
Mashery is an API management platform formed in 2006 to help companies enable their APIs for the cloud. It provides services like developer key provisioning, reporting, community management, monitoring, access control, and capacity management to offload these responsibilities from its clients. Mashery operates a highly redundant infrastructure across multiple public clouds to ensure reliability and uses techniques like scripted alert reactions, data replication, and DNS failover for high availability.
- Google App Engine is a platform for easily developing and hosting scalable web applications, with no need for complex server management. It automatically scales the applications and handles all the operational details.
- App Engine applications run on Google's infrastructure and benefit from automatic scaling across multiple servers. It also provides security isolation and quotas to prevent applications from disrupting others.
- The platform uses a stateless, request-based architecture and scales applications automatically as traffic increases by distributing requests across multiple servers. It also uses quotas to ensure fairness among applications.
This document discusses cloud computing and the need for a unified cloud storage API. It describes cloud computing as utilizing on-demand computing resources over the internet rather than local servers, and identifies various cloud computing layers from hardware to applications. It also notes that cloud storage is useful for persistent apps but that current solutions lead to vendor lock-in or lack redundancy. The proposed solution is a cloud storage API that provides abstraction from specific vendors, supports multiple languages and clients, and allows for custom business logic.
Integrating Wikis And Other Social ContentDave Nielsen
The document discusses integrating social content like wikis into websites. It defines Web 2.0 as user-generated content and lists use cases like forums, reviews, and photo sharing. It introduces WYSIWYG wikis from Wetpaint that allow collaborative editing. It demonstrates how to add Wetpaint Injected social components to a site using server controls and WordPress plugins. Finally, it proposes ideas for a developer contest using social wikis and invites contact for more information.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
32. ABOUT ME
A l e x K a l i n i n
V P, A I / M a c h i n e L e a r n i n g | S i z m e k
a l e x . k a l i n i n @ s i z m e k . c o m
L i n k e d i n : l i n k e d i n . c o m / i n / a l e x k a l i n i n /