Data preparation is always a challenge. Why care about infrastructure?
Come learn how to deploy your Spark jobs in minutes using our managed services, EMR & Glue and focus on your business needs.
This document discusses different AWS services for data analytics and querying large datasets. It provides an overview of AWS Glue for ETL, Amazon Athena for interactive SQL queries on S3 data, and Amazon Redshift Spectrum for extending Amazon Redshift queries to data in S3. It then discusses a customer case study of NUVIAD who moved from a traditional data warehouse to using different AWS analytics services on a single S3 data lake.
by Androski Spicer, Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
by Bill Baldwin, Global Enterprise Support Lead, AWS
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
by Avijit Goswami, Sr Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
by Mamoon Chowdry, Solutions Architect
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
This document discusses preparing data for a data lake on AWS. It describes ingesting data from various sources into Amazon S3 as the data lake. It then discusses tools for processing, analyzing, and consuming the data from S3, including Amazon Athena, EMR, Redshift, Elasticsearch, QuickSight, and Glue. It provides an example of ingesting IoT sensor data from Kinesis into S3 and Athena, creating daily aggregations with Glue, and performing real-time analytics with Kinesis Analytics. The overall architecture leverages various AWS services together with S3 at its core to build a scalable, flexible, and cost-effective data lake.
The document discusses Amazon's use of AWS analytics technologies. It describes Amazon's enterprise data warehouse, which stores over 5 petabytes of integrated data from multiple sources. It faces challenges from rapid data growth and limited IT budgets. Amazon is addressing this by building a data lake called "Andes" that stores data in S3 and serves as a common source. Teams can use services like Redshift, EMR, and Athena to analyze the data through subscriptions that synchronize datasets. This approach aims to provide scalability and choices for analytics at Amazon.
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In this session, we live demo exciting new capabilities the team have been heads down building. SendGrid, a leader in trusted email delivery, discusses how they used Athena to reinvent a popular feature of their platform.
This document discusses different AWS services for data analytics and querying large datasets. It provides an overview of AWS Glue for ETL, Amazon Athena for interactive SQL queries on S3 data, and Amazon Redshift Spectrum for extending Amazon Redshift queries to data in S3. It then discusses a customer case study of NUVIAD who moved from a traditional data warehouse to using different AWS analytics services on a single S3 data lake.
by Androski Spicer, Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
by Bill Baldwin, Global Enterprise Support Lead, AWS
While a Data Lake can support completely unstructured data, getting performant analytics at scale requires some data preparation. We'll look at how to use Amazon Kinesis, AWS Glue, and Amazon EMR to make raw data ready to high-performance analytics.
by Avijit Goswami, Sr Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
by Mamoon Chowdry, Solutions Architect
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
This document discusses preparing data for a data lake on AWS. It describes ingesting data from various sources into Amazon S3 as the data lake. It then discusses tools for processing, analyzing, and consuming the data from S3, including Amazon Athena, EMR, Redshift, Elasticsearch, QuickSight, and Glue. It provides an example of ingesting IoT sensor data from Kinesis into S3 and Athena, creating daily aggregations with Glue, and performing real-time analytics with Kinesis Analytics. The overall architecture leverages various AWS services together with S3 at its core to build a scalable, flexible, and cost-effective data lake.
The document discusses Amazon's use of AWS analytics technologies. It describes Amazon's enterprise data warehouse, which stores over 5 petabytes of integrated data from multiple sources. It faces challenges from rapid data growth and limited IT budgets. Amazon is addressing this by building a data lake called "Andes" that stores data in S3 and serves as a common source. Teams can use services like Redshift, EMR, and Athena to analyze the data through subscriptions that synchronize datasets. This approach aims to provide scalability and choices for analytics at Amazon.
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In this session, we live demo exciting new capabilities the team have been heads down building. SendGrid, a leader in trusted email delivery, discusses how they used Athena to reinvent a popular feature of their platform.
A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization.In this session, we will introduce the Data Lake concept and its implementation on AWS.We will explain the different roles our services play and how they fit into the Data Lake picture.
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018Amazon Web Services
Learn about the latest and hottest features of Amazon Redshift. We’ll deep dive into the architecture and inner workings of Amazon Redshift and discuss how the recent availability, performance, and manageability improvements we’ve made can significantly enhance your user experience. We’ll also share glimpse of what we are working on and our plans for the future. Dow Jones will join us to share how they leverage a data lake powered by Redshift, Redshift spectrum and Athena to get fast time to insights.
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
This document discusses building data lakes and analytics on AWS. It covers challenges with big data like volume, velocity, and variety. An AWS data lake can quickly ingest and store any type of data. The data lake includes analytics, machine learning, real-time data movement, and traditional data movement. Metadata management is important for data lakes. AWS Glue crawlers can discover data in various formats and populate the data catalog. Different tools like Amazon Athena, Amazon EMR, and Amazon Redshift can be used for analytics depending on the user and use case. Machine learning benefits from big data, and a data lake supports agility in machine learning.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
Data lakes are emerging as the most common architecture built in data-driven organizations today. A data lake enables you to store unstructured, semi-structured, or fully-structured raw data as well as processed data for different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning. Well-designed data lakes ensure that organizations get the most business value from their data assets. In this session, you learn about the common challenges and patterns for designing an effective data lake on the AWS Cloud, with wisdom distilled from various customer implementations. We walk through patterns to solve data lake challenges, like real-time ingestion, choosing a partitioning strategy, file compaction techniques, database replication to your data lake, handling mutable data, machine learning integration, security patterns, and more.
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
Realizing the value of social media analytics can bolster your business goals. This type of analysis has grown in recent years due to the large amount of available information and the speed at which it can be collected and analyzed. In this workshop, we build a serverless data processing and machine learning (ML) pipeline that provides a multi-lingual social media dashboard of tweets within Amazon QuickSight. We leverage API-driven ML services, AWS Glue, Amazon Athena and Amazon QuickSight. These building blocks are put together with very little code by leveraging serverless offerings within AWS.
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
Speakers:
Tom McMeekin, Associate Solutions Architect, Amazon Web Services
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
Amazon EMR provides a flexible range of service customization options, enabling customers to use it as a building block for their data platforms. In this session, AWS customers Salesforce.com and Vanguard discuss in detail how they use Amazon EMR to build a self-service, secure, and auditable data engineering platform. Customers who want to optimize their design and configurations should attend this session to learn best practices from customer experts. Topics include achieving cost-efficient scale, using notebooks, processing streaming data, rapid prototyping of applications and data pipelines, architecting for both transient and persistent clusters, setting up advanced security and authorization controls, and enabling easy self service for users.
by Jon Handler, Principal Solutions Architect and Sanjay Dhar, Solutions Architect, AWS
Nearly everything in IT - servers, applications, websites, connected devices, and other things - generate discrete, time-stamped records of events called logs. Processing and analyzing these logs to gain actionable insights is log analytics. We'll look at how to use centralized log analytics across multiple sources with Amazon Elasticsearch Service.
A whitepaper is about Qubole on AWS provides end-to-end data lake services such as AWS infrastructure management, data management, continuous data engineering, analytics, & ML with zero administration
https://www.qubole.com/resources/white-papers/qubole-on-aws
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
Data lakes are transforming the way enterprises store, analyze, and learn insights from their data. While data lakes are a relatively new concept, many enterprises have already generated significant business value from the insights gleaned. In this session, AWS experts and technology leaders from Sysco, a Fortune 50 company and leader in food distribution and marketing, explain why Sysco decided to evolve its data management capabilities to include data lakes and how they customized them to support diverse querying capabilities and data science use cases. They also discuss how to architect different aspects of a data lake—ingestion from disparate sources, data consumption, and usability layers—and how to track data ingestion and consumption, monitor associated costs, enforce wanted levels of user access, manage data file formats, synchronize production and non-production environments, and maintain data integrity. Services to be discussed include Amazon S3 and S3 Select, Amazon Athena, Amazon EMR, Amazon EC2, and Amazon Redshift Spectrum.
by Amy Che, Sr Solutions Delivery Manager AWS and Marie Yap, Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
Most companies are overrun with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we discuss the most common use cases with Amazon Redshift, and we take an in-depth look at how modern data warehousing blends and analyzes all your data to give you deeper insights to run your business. Equinox Fitness Clubs joins us to share their journey from static reports, redundant data, and inefficient data intergration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
Do you want to increase your knowledge of AWS big data web services and launch your first big data application on the cloud? In this session, we walk you through simplifying big data processing as a data bus comprising ingest, store, process, and visualize. You will build a big data application using AWS managed services, including Amazon Athena, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. Along the way, we review architecture design patterns for big data applications and give you access to a take-home lab so you can rebuild and customize the application yourself. To get the most from this session, bring your own laptop and have some familiarity with AWS services.
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
Citrix moved large amounts of customer usage data to Amazon Redshift for analytics using Matillion ETL. Initially, Citrix built custom workflows to transform and load the data, but this required more maintenance. Using Matillion, Citrix can now load millions of rows into Redshift in minutes, allowing faster and more granular analysis of user data to optimize their applications. The speed and simplicity of Matillion has increased the efficiency of Citrix's analytics initiatives.
Modern data is massive, quickly evolving, unstructured, and increasingly hard to catalog and understand from multiple consumers and applications. This session will guide you though the best practices for designing a robust data architecture, highlightning the benefits and typical challenges of data lakes and data warehouses. We will build a scalable solution based on managed services such as Amazon Athena, AWS Glue, and AWS Lake Formation.
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
This document provides a summary of a presentation on building data lakes and analytics on AWS. It discusses:
- The challenges of big data including volume, velocity, variety and veracity.
- How an AWS data lake can address these challenges by quickly ingesting and storing any type of data while providing insights, security and the ability to run the right analytics tools without data movement.
- Key components of a data lake on AWS including storage, data catalog, analytics, machine learning capabilities, and tools for real-time and traditional data movement.
The Open Data Lake Platform Brief - Data Sheets | WhitepaperVasu S
An open data lake platform provides a robust and future-proof data management paradigm to support a wide range of data processing needs, including data exploration, ad-hoc analytics, streaming analytics, and machine learning.
by Sid Chauhan, Solutions architect, AWS
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
Companies have valuable data that they might not be analyzing due to the complexity, scalability, and performance issues of loading the data into their data warehouse. With the right tools, you can extend your analytics to query data in your data lake—with no loading required. Amazon Redshift Spectrum extends the analytic power of Amazon Redshift beyond data stored in your data warehouse to run SQL queries directly against vast amounts of unstructured data in your Amazon S3 data lake. This gives you the freedom to store your data where you want, in the format you want, and have it available for analytics when you need it. Join a discussion with an Amazon Redshift lead engineer to ask questions and learn more about how you can extend your analytics beyond your data warehouse.
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
This document discusses Amazon Neptune, a fully managed graph database service. It provides an overview of graph databases and their advantages over traditional databases for modeling connected data. It then describes Amazon Neptune's key features, like automatic scaling, high availability across Availability Zones, integration with open standards like Gremlin and SPARQL, and ease of use on AWS. Examples are given showing how to model and query graph data using Gremlin and SPARQL. Finally, it discusses Amazon Neptune's architecture and roadmap for general availability later in 2018.
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
"Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop/Spark to AWS in order to save costs, increase availability, and improve performance. In this session, AWS customers Airbnb and Guardian Life discuss how they migrated their workload to Amazon EMR. This session focuses on key motivations to move to the cloud. It details key architectural changes and the benefits of migrating Hadoop/Spark workloads to the cloud.
"
A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization.In this session, we will introduce the Data Lake concept and its implementation on AWS.We will explain the different roles our services play and how they fit into the Data Lake picture.
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018Amazon Web Services
Learn about the latest and hottest features of Amazon Redshift. We’ll deep dive into the architecture and inner workings of Amazon Redshift and discuss how the recent availability, performance, and manageability improvements we’ve made can significantly enhance your user experience. We’ll also share glimpse of what we are working on and our plans for the future. Dow Jones will join us to share how they leverage a data lake powered by Redshift, Redshift spectrum and Athena to get fast time to insights.
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
This document discusses building data lakes and analytics on AWS. It covers challenges with big data like volume, velocity, and variety. An AWS data lake can quickly ingest and store any type of data. The data lake includes analytics, machine learning, real-time data movement, and traditional data movement. Metadata management is important for data lakes. AWS Glue crawlers can discover data in various formats and populate the data catalog. Different tools like Amazon Athena, Amazon EMR, and Amazon Redshift can be used for analytics depending on the user and use case. Machine learning benefits from big data, and a data lake supports agility in machine learning.
In this session, we show you how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon ML services work together to build a successful data lake for various roles, including data scientists and business users.
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
Data lakes are emerging as the most common architecture built in data-driven organizations today. A data lake enables you to store unstructured, semi-structured, or fully-structured raw data as well as processed data for different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning. Well-designed data lakes ensure that organizations get the most business value from their data assets. In this session, you learn about the common challenges and patterns for designing an effective data lake on the AWS Cloud, with wisdom distilled from various customer implementations. We walk through patterns to solve data lake challenges, like real-time ingestion, choosing a partitioning strategy, file compaction techniques, database replication to your data lake, handling mutable data, machine learning integration, security patterns, and more.
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
Realizing the value of social media analytics can bolster your business goals. This type of analysis has grown in recent years due to the large amount of available information and the speed at which it can be collected and analyzed. In this workshop, we build a serverless data processing and machine learning (ML) pipeline that provides a multi-lingual social media dashboard of tweets within Amazon QuickSight. We leverage API-driven ML services, AWS Glue, Amazon Athena and Amazon QuickSight. These building blocks are put together with very little code by leveraging serverless offerings within AWS.
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
Speakers:
Tom McMeekin, Associate Solutions Architect, Amazon Web Services
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
Amazon EMR provides a flexible range of service customization options, enabling customers to use it as a building block for their data platforms. In this session, AWS customers Salesforce.com and Vanguard discuss in detail how they use Amazon EMR to build a self-service, secure, and auditable data engineering platform. Customers who want to optimize their design and configurations should attend this session to learn best practices from customer experts. Topics include achieving cost-efficient scale, using notebooks, processing streaming data, rapid prototyping of applications and data pipelines, architecting for both transient and persistent clusters, setting up advanced security and authorization controls, and enabling easy self service for users.
by Jon Handler, Principal Solutions Architect and Sanjay Dhar, Solutions Architect, AWS
Nearly everything in IT - servers, applications, websites, connected devices, and other things - generate discrete, time-stamped records of events called logs. Processing and analyzing these logs to gain actionable insights is log analytics. We'll look at how to use centralized log analytics across multiple sources with Amazon Elasticsearch Service.
A whitepaper is about Qubole on AWS provides end-to-end data lake services such as AWS infrastructure management, data management, continuous data engineering, analytics, & ML with zero administration
https://www.qubole.com/resources/white-papers/qubole-on-aws
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
Data lakes are transforming the way enterprises store, analyze, and learn insights from their data. While data lakes are a relatively new concept, many enterprises have already generated significant business value from the insights gleaned. In this session, AWS experts and technology leaders from Sysco, a Fortune 50 company and leader in food distribution and marketing, explain why Sysco decided to evolve its data management capabilities to include data lakes and how they customized them to support diverse querying capabilities and data science use cases. They also discuss how to architect different aspects of a data lake—ingestion from disparate sources, data consumption, and usability layers—and how to track data ingestion and consumption, monitor associated costs, enforce wanted levels of user access, manage data file formats, synchronize production and non-production environments, and maintain data integrity. Services to be discussed include Amazon S3 and S3 Select, Amazon Athena, Amazon EMR, Amazon EC2, and Amazon Redshift Spectrum.
by Amy Che, Sr Solutions Delivery Manager AWS and Marie Yap, Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
Most companies are overrun with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we discuss the most common use cases with Amazon Redshift, and we take an in-depth look at how modern data warehousing blends and analyzes all your data to give you deeper insights to run your business. Equinox Fitness Clubs joins us to share their journey from static reports, redundant data, and inefficient data intergration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Amazon Web Services
Do you want to increase your knowledge of AWS big data web services and launch your first big data application on the cloud? In this session, we walk you through simplifying big data processing as a data bus comprising ingest, store, process, and visualize. You will build a big data application using AWS managed services, including Amazon Athena, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. Along the way, we review architecture design patterns for big data applications and give you access to a take-home lab so you can rebuild and customize the application yourself. To get the most from this session, bring your own laptop and have some familiarity with AWS services.
Citrix Moves Data to Amazon Redshift Fast with Matillion ETLAmazon Web Services
Citrix moved large amounts of customer usage data to Amazon Redshift for analytics using Matillion ETL. Initially, Citrix built custom workflows to transform and load the data, but this required more maintenance. Using Matillion, Citrix can now load millions of rows into Redshift in minutes, allowing faster and more granular analysis of user data to optimize their applications. The speed and simplicity of Matillion has increased the efficiency of Citrix's analytics initiatives.
Modern data is massive, quickly evolving, unstructured, and increasingly hard to catalog and understand from multiple consumers and applications. This session will guide you though the best practices for designing a robust data architecture, highlightning the benefits and typical challenges of data lakes and data warehouses. We will build a scalable solution based on managed services such as Amazon Athena, AWS Glue, and AWS Lake Formation.
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
This document provides a summary of a presentation on building data lakes and analytics on AWS. It discusses:
- The challenges of big data including volume, velocity, variety and veracity.
- How an AWS data lake can address these challenges by quickly ingesting and storing any type of data while providing insights, security and the ability to run the right analytics tools without data movement.
- Key components of a data lake on AWS including storage, data catalog, analytics, machine learning capabilities, and tools for real-time and traditional data movement.
The Open Data Lake Platform Brief - Data Sheets | WhitepaperVasu S
An open data lake platform provides a robust and future-proof data management paradigm to support a wide range of data processing needs, including data exploration, ad-hoc analytics, streaming analytics, and machine learning.
by Sid Chauhan, Solutions architect, AWS
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
Companies have valuable data that they might not be analyzing due to the complexity, scalability, and performance issues of loading the data into their data warehouse. With the right tools, you can extend your analytics to query data in your data lake—with no loading required. Amazon Redshift Spectrum extends the analytic power of Amazon Redshift beyond data stored in your data warehouse to run SQL queries directly against vast amounts of unstructured data in your Amazon S3 data lake. This gives you the freedom to store your data where you want, in the format you want, and have it available for analytics when you need it. Join a discussion with an Amazon Redshift lead engineer to ask questions and learn more about how you can extend your analytics beyond your data warehouse.
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
This document discusses Amazon Neptune, a fully managed graph database service. It provides an overview of graph databases and their advantages over traditional databases for modeling connected data. It then describes Amazon Neptune's key features, like automatic scaling, high availability across Availability Zones, integration with open standards like Gremlin and SPARQL, and ease of use on AWS. Examples are given showing how to model and query graph data using Gremlin and SPARQL. Finally, it discusses Amazon Neptune's architecture and roadmap for general availability later in 2018.
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
"Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop/Spark to AWS in order to save costs, increase availability, and improve performance. In this session, AWS customers Airbnb and Guardian Life discuss how they migrated their workload to Amazon EMR. This session focuses on key motivations to move to the cloud. It details key architectural changes and the benefits of migrating Hadoop/Spark workloads to the cloud.
"
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Amazon Web Services
Accelerate Data Analytics at Scale with Amazon EMR
In this session you will learn the best practices and various use cases for performing data analytics at scale with Amazon EMR. We will introduce you to Amazon EMR design patterns and share how to use big data analytics to provide business insights.
Jonathan Fritz, Principal Product Manager, Amazon Web Services
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
Osemeke Isibor, Solutions Architect, AWS
In this session, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Redshift Spectrum, a feature of Redshift that enables you to analyze data across Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos.
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
Learning Objectives:
- Learn how to accelerate common data transformations from a variety of data
- Learn how to efficiently orchestrate transformation jobs
- Learn best practices and methodologies in data preparation for analytics
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
Raju Gulabani, Vice President of Databases, Analytics, Machine Learning, and Blockchain at AWS, presented on AWS databases and analytics services. He discussed AWS's strategy of having a broad and deep portfolio of purpose-built analytics services including Redshift, Athena, EMR, QuickSight, and SageMaker. He also provided examples of customers like Epic Games and Anthropic using these services to build analytics solutions at large scale.
The document discusses strategies for migrating databases to the cloud. It begins by outlining objectives and factors that contribute to successful database migration projects. It then covers various migration options like rehosting, replatforming, or rearchitecting databases. The document also discusses tools like the Schema Conversion Tool (SCT) and Database Migration Service (DMS) that can assist with assessment, conversion between database engines, and migrating data between on-premises and cloud databases. The overall process involves planning, assessment, executing the migration in phases, and testing.
Data Warehouses & Data Lakes: Data Analytics Week at the SF LoftAmazon Web Services
Data Warehouses and Data Lakes: Data Analytics Week at the San Francisco Loft
Organizations use reports, dashboards, and analytics tools to extract insights from their data, monitor performance, and support decision making. To support these tools, data must be collected and prepared for use. We'll look at two approaches: a structured centralized data repository as a Data Warehouse the less-structured repository of a Data Lake. We'll compare these approaches, examine the services that support each, and explore how they work together.
Level: Intermediate
Speakers:
Aser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
Asim Kumar Sasmal - Big Data Consultant, AWS
A decade ago, relational databases were used for nearly every use case. Today, new technologies are enabling a revolution in databases, creating new options for document, key: value, in-memory, search, and graph capabilities that do not use relational tables. We’ll discuss this revolution in database options and who is using them.
Level: 200
Speaker: Samir Karande - Sr. Manager, Solutions Architecture, AWS
This document discusses big data processing at scale using AWS services. It begins with an overview of the increasing volume of data being generated. Common architectures for collecting, storing, processing, analyzing and consuming big data on AWS are then presented. Specific AWS services for each step of the big data workflow like Kinesis, S3, EMR, Redshift and Glue are described. Common architectural patterns for building event-driven batch analytics and combining real-time and batch analytics on AWS are shown. The document concludes with success stories of customers like Netflix, FINRA and Nasdaq using AWS for big data and analytics workloads.
Best practices for Running Spark jobs on Amazon EMR with Spot Instances | AWS...Amazon Web Services
In this session we are going to focus on cost-optimizing and efficiently running Spark applications on EMR by using Spot Instances. There are several best practices you should follow, in order to increase the fault-tolerance of your Spark applications and make use of Spot Instances, without compromising availability or impacting performance/duration of your jobs.
Quickly and easily build, train, and deploy machine learning models at any scaleAWS Germany
The machine learning process often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow.
This workshop starts with a brief review of the machine learning process, followed by an introduction and deep dive into the individual components of Amazon SageMaker. As part of the workshop we will train artificial neural networks, get insight into some of the built-in machine learning algorithms of SageMaker that you can use for a variety of problem types, and after successfully training a model, look at options on how to deploy and scale a model as a service.
This workshop is aimed at developers that are new to machine learning, as well as data scientists that continue to be challenged by the operational challenges of the machine learning process. Bring your own laptop with Python and Jupyter Notebook, and (ideally) your own activated AWS account to follow through the examples.
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureAmazon Web Services
Your customers probably want a better experience with your brand. Your different business teams want and need better insights in their decision making. Almost certainly, your finance and operations teams require this to happen at a fraction of the cost of traditional on-premises options. Modern data architectures on AWS help many of our best customers realise all of those goals. Your business data contains critical information about customer behaviours, operational decisions, and many factors that have financial impact on your organisation. Increasingly, this data sits beyond your transactional systems, and is too big, too fast, and too complex for existing systems to handle. AWS Data and Analytics services are designed from our customers' requirements to ingest, store, analyse, and consume information at record-breaking scale. In this session you will learn how these services work together to deliver business automation, enhance customer engagement and intelligence.
This document provides an overview of data warehousing and data lake concepts. It discusses key stages in data collection, storage, analysis and consumption. Different data storage options like Amazon S3, DynamoDB and Redshift are presented along with considerations for which tool to use based on data characteristics. The document also covers stream storage options and best practices for building cost-conscious and decoupled data architectures.
Database Week at the San Francicso Loft
Non-Relational Revolution
A decade ago, relational databases were used for nearly every use case. Today, new technologies are enabling a revolution in databases, creating new options for document, key: value, in-memory, search, and graph capabilities that do not use relational tables. We’ll discuss this revolution in database options and who is using them.
Level: 200
Speakers:
Smitty Weygant - Solutions Architect, AWS
Karan Desai - Solutions Architect, AWS
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
In this session, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Redshift Spectrum, a feature of Redshift that enables you to analyze data across Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos. A customer is joining us to share how they were able to extend their data warehouse to their data lake to encompass multiple data sources and data formats. This modern architecture helps them tie together data sources to get actionable insights across their business units.
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
FINRA faced challenges with their on-premises data infrastructure, including difficulty tracking data, limited scalability, and high costs. They migrated to a managed data lake on AWS to address these issues. This provided centralized data management with a catalog, separation of storage and compute, encryption, and cost optimization. It enabled faster analytics through Presto querying, machine learning model development, and reduced TCO by 30% compared to their on-premises environment. Lessons learned included embracing disruption, automating infrastructure, and treating infrastructure as code. FINRA is exploring additional AWS services like Athena, Lambda, and Step Functions to continue improving their analytics capabilities.
The first step towards knowing your customer is to collect and extract insights and actionable information from your data. Learn how AWS enables you to cost efficiently store any amount of data and build an agile approach to data mining and visualization - helping you to make efficient business decisions and targeted offerings.
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
In this session, we provide an update on Amazon Redshift, and look at a case study from Equinox Fitness Clubs. We show you how Amazon Redshift queries data across your data warehouse and data lake, without the need or delay of loading data, to deliver insights you cannot obtain by querying independent data silos. Discover how Equinox Fitness Clubs transitioned from on-premises data warehouses and data marts to a cloud-based, integrated data platform, built on AWS and Amazon Redshift. Learn about their journey from static reports, redundant data, and inefficient data integration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Similar to Data preparation and transformation - Spin your straw into gold - Tel Aviv Summit 2018 (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
23. Founded in
By Ran Sarig, Efi Cohen
& Katrin Ribant
2012 +3000
Brands
+300
Agencies
+50
Publishers
+25
Industry
Verticals
16
Offices
worldwide
$50MFunding from
Lightspeed
Innovation
Endeavors
300+
Employees &
growing quickly
32. Transformations at scale
• Extract and Transform data
• Calculated columns
• Vlookups/Fuzzy match
• Complex logic and iterations
• Sandboxed environment
33. Marketing data is NOT immutable
• External vendors have windows of reconciliations
(up to 6 months)
• Our users want to update/delete specific rows/set
• Our users love to backdate
• Most (if not all) big data solutions are append only and updating
the data is considered a heavy process
38. Read and increment
table upload id1 Read input
file2 Read “to be updated”
partitions from S33 Merge the two
dataframes4
Reclaim stale
data offline,
periodically
7
Update hive
ALTER TABLE table_name [PARTITION
date=’20180314’] SET LOCATION
"/20180314_27";
6
Write out partitions
to new locations
e.g. /20180314_27
5
Atomic Update Flow
39. • Load / Query / Storage are completely decoupled
• Linear scale out
• L microservice is the driver program
– Single spark context per microservice instance
Important Notes