The document discusses database choices and migrations at Deliveroo. It describes moving from PostgreSQL 9.6 to 10 using AWS Database Migration Service (DMS) with minimal downtime. It also discusses using DynamoDB for high throughput services like order status due to its speed, reliability and scalability. Key learnings included properly creating schemas, using replication slots, and DynamoDB best practices like avoiding scans.
The document discusses different AWS database services and when to use each one. It provides overviews of Amazon Aurora, Amazon RDS, DynamoDB, DocumentDB, ElastiCache, Neptune, Timestream, and QLDB. It describes their common use cases, features, and benefits including performance, scalability, availability and security. It also briefly mentions AWS Database Migration Service and provides an example retail application using several AWS databases.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
Speakers:
Steve Abraham - Principal Database Specialist Solutions Architect, AWS
Peter Dachnowicz - Sr. Technical Account Manager, AWS
Big data architectures and the data lakeJames Serra
The document provides an overview of big data architectures and the data lake concept. It discusses why organizations are adopting data lakes to handle increasing data volumes and varieties. The key aspects covered include:
- Defining top-down and bottom-up approaches to data management
- Explaining what a data lake is and how Hadoop can function as the data lake
- Describing how a modern data warehouse combines features of a traditional data warehouse and data lake
- Discussing how federated querying allows data to be accessed across multiple sources
- Highlighting benefits of implementing big data solutions in the cloud
- Comparing shared-nothing, massively parallel processing (MPP) architectures to symmetric multi-processing (
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
This document is a training presentation on Databricks fundamentals and the data lakehouse concept by Dalibor Wijas from November 2022. It introduces Wijas and his experience. It then discusses what Databricks is, why it is needed, what a data lakehouse is, how Databricks enables the data lakehouse concept using Apache Spark and Delta Lake. It also covers how Databricks supports data engineering, data warehousing, and offers tools for data ingestion, transformation, pipelines and more.
The document discusses different AWS database services and when to use each one. It provides overviews of Amazon Aurora, Amazon RDS, DynamoDB, DocumentDB, ElastiCache, Neptune, Timestream, and QLDB. It describes their common use cases, features, and benefits including performance, scalability, availability and security. It also briefly mentions AWS Database Migration Service and provides an example retail application using several AWS databases.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
Speakers:
Steve Abraham - Principal Database Specialist Solutions Architect, AWS
Peter Dachnowicz - Sr. Technical Account Manager, AWS
Big data architectures and the data lakeJames Serra
The document provides an overview of big data architectures and the data lake concept. It discusses why organizations are adopting data lakes to handle increasing data volumes and varieties. The key aspects covered include:
- Defining top-down and bottom-up approaches to data management
- Explaining what a data lake is and how Hadoop can function as the data lake
- Describing how a modern data warehouse combines features of a traditional data warehouse and data lake
- Discussing how federated querying allows data to be accessed across multiple sources
- Highlighting benefits of implementing big data solutions in the cloud
- Comparing shared-nothing, massively parallel processing (MPP) architectures to symmetric multi-processing (
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
This document is a training presentation on Databricks fundamentals and the data lakehouse concept by Dalibor Wijas from November 2022. It introduces Wijas and his experience. It then discusses what Databricks is, why it is needed, what a data lakehouse is, how Databricks enables the data lakehouse concept using Apache Spark and Delta Lake. It also covers how Databricks supports data engineering, data warehousing, and offers tools for data ingestion, transformation, pipelines and more.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Learning Objectives:
- Learn the common use-cases for using Athena, AWS' interactive query service on S3
- Learn best practices for creating tables and partitions and performance optimizations
- Learn how Athena handles security, authorization, and authentication
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
This document discusses Marquez, an open source metadata management system. It provides an overview of Marquez and how it can be used to track metadata in data pipelines. Specifically:
- Marquez collects and stores metadata about data sources, datasets, jobs, and runs to provide data lineage and observability.
- It has a modular framework to support data governance, data lineage, and data discovery. Metadata can be collected via REST APIs or language SDKs.
- Marquez integrates with Apache Airflow to collect task-level metadata, dependencies between DAGs, and link tasks to code versions. This enables understanding of operational dependencies and troubleshooting.
- The Marquez community aims to build an open
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
The document provides an overview of the Databricks platform, which offers a unified environment for data engineering, analytics, and AI. It describes how Databricks addresses the complexity of managing data across siloed systems by providing a single "data lakehouse" platform where all data and analytics workloads can be run. Key features highlighted include Delta Lake for ACID transactions on data lakes, auto loader for streaming data ingestion, notebooks for interactive coding, and governance tools to securely share and catalog data and models.
Delta Lake is an open source storage layer that sits on top of data lakes and brings ACID transactions and reliability to Apache Spark. It addresses challenges with data lakes like lack of schema enforcement and transactions. Delta Lake provides features like ACID transactions, scalable metadata handling, schema enforcement and evolution, time travel/data versioning, and unified batch and streaming processing. Delta Lake stores data in Apache Parquet format and uses a transaction log to track changes and ensure consistency even for large datasets. It allows for updates, deletes, and merges while enforcing schemas during writes.
An overview of the Amazon ElastiCache managed service, with examples of how it can be used to increase performance, lower costs and augment other database services and databases to make things faster, easier and less expensive.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Should I move my database to the cloud?James Serra
So you have been running on-prem SQL Server for a while now. Maybe you have taken the step to move it from bare metal to a VM, and have seen some nice benefits. Ready to see a TON more benefits? If you said “YES!”, then this is the session for you as I will go over the many benefits gained by moving your on-prem SQL Server to an Azure VM (IaaS). Then I will really blow your mind by showing you even more benefits by moving to Azure SQL Database (PaaS/DBaaS). And for those of you with a large data warehouse, I also got you covered with Azure SQL Data Warehouse. Along the way I will talk about the many hybrid approaches so you can take a gradual approve to moving to the cloud. If you are interested in cost savings, additional features, ease of use, quick scaling, improved reliability and ending the days of upgrading hardware, this is the session for you!
Making the move to a document database can be intimidating. Yes, its flexible data model gives you a lot of choices, but it also raises questions: Which way is the right way? Is a document database even the right tool? Join this live session on the basics of data modeling with JSON to learn:
- How a document database compares to a traditional RDBMS
- What JSON data modeling means for your application code
- Which tools might be helpful along the way
The examples in this session use the free, open-source Couchbase Server document database, but the principles you’ll learn can also be applied to Cosmos DB, MongoDB, RavenDB, and others.
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Learn about the new AWS Database Migration Service, which helps you migrate databases with minimal downtime from on-premises and Amazon EC2 environments to Amazon RDS, Amazon Redshift, Amazon Aurora and EC2 databases. We discuss homogeneous (e.g. Oracle-to-Oracle, PostgreSQL-to-PostgreSQL, etc.) and heterogeneous (e.g. Oracle to Aurora, SQL Server to MariaDB) database migrations. We also talk about the new AWS Schema Conversion Tool that saves you development time when migrating your Oracle and SQL Server database schemas, including PL/SQL and T-SQL procedural code, to their MySQL, MariaDB and Aurora equivalents.
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the Data Warehouse or to facilitate competitive Data Science and building algorithms in the organization, the Data Lake — a place for unmodeled and vast data — will be provisioned widely in 2019.
Though it doesn’t have to be complicated, the Data Lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the Data Swamp, but not the Data Lake! The tool ecosystem is building up around the Data Lake and soon many will have a robust Lake and Data Warehouse. We will discuss policy to keep them straight, send “horses to courses,” and keep up users’ confidence in the Data Platforms.
As for platform, although Hadoop received the early majority of Data Lakes, organizations are now weighing in that the Data Lake will be built in Cloud object storage. We’ll discuss these options as well.
Get this data point for your Data Lake journey.
This document provides an overview of AWS Lake Formation and related services for building a secure data lake. It discusses how Lake Formation provides a centralized management layer for data ingestion, cleaning, security and access. It also describes how Lake Formation integrates with services like AWS Glue, Amazon S3 and ML transforms to simplify and automate many data lake tasks. Finally, it provides an example workflow for using Lake Formation to deduplicate data from various sources and grant secure access for analysis.
Databricks on AWS provides a unified analytics platform using Apache Spark. It allows companies to unify their data science, engineering, and business teams on one platform. Databricks accelerates innovation across the big data and machine learning lifecycle. It uniquely combines data and AI technologies on Apache Spark. Enterprises face challenges beyond just Apache Spark, including having data scientists and engineers in separate silos with complex data pipelines and infrastructure. Azure Databricks provides a fast, easy, and collaborative Apache Spark-based analytics platform on Azure that is optimized for the cloud. It offers the benefits of Databricks and Microsoft with one-click setup, a collaborative workspace, and native integration with Azure services. Over 500 customers participated in the
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This document provides an overview of building a modern cloud analytics solution using Microsoft Azure. It discusses the role of analytics, a history of cloud computing, and a data warehouse modernization project. Key challenges covered include lack of notifications, logging, self-service BI, and integrating streaming data. The document proposes solutions to these challenges using Azure services like Data Factory, Kafka, Databricks, and SQL Data Warehouse. It also discusses alternative implementations using tools like Matillion ETL and Snowflake.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
Choosing the right data warehouse is a big challenge for organisations. In this doc, we have made an end to end comparison of leading data warehouses. Snowflake vs Redshift vs BigQuery vs Hive vs Athena
Sprinkledata.com
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...javier ramirez
Data comes in different shapes and sizes, at different speeds, and with different processing needs. AWS offers managed databases covering SQL, NoSQL, in-memory, graphs, time series, and ledger databases. Learn which are the main use cases for each of those, which ones offer autoscaling or serverless capabilities, how you can start from scratch or via database migration services, and why Amazon Aurora is the fastest-growing service in the history of AWS.
This document discusses how companies are increasingly data-centric and how data has become a strategic asset. It introduces several AWS database and data storage services like Amazon Aurora, DynamoDB, DocumentDB, ElastiCache, Neptune, Timestream, and QLDB. These services provide different data models and use cases like relational, key-value, document, in-memory, graph, time-series, and ledger data. The document highlights features of each service like performance, scalability, availability, security, and ease of use. It also discusses how the AWS Database Migration Service can help migrate databases to AWS.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. In this session, we demonstrate how you can point Amazon QuickSight to AWS data stores, flat files, or other third-party data sources and begin visualizing your data in minutes. We also introduce SPICE - a new Super-fast, Parallel, In-memory, Calculation Engine in Amazon QuickSight, which performs advanced calculations and render visualizations rapidly without requiring any additional infrastructure, SQL programming, or dimensional modeling, so you can seamlessly scale to hundreds of thousands of users and petabytes of data. Lastly, you will see how Amazon QuickSight provides you with smart visualizations and graphs that are optimized for your different data types, to ensure the most suitable and appropriate visualization to conduct your analysis, and how to share these visualization stories using the built-in collaboration tools.
Presented by: Matthew McClean, AWS Partner Solutions Architect, Amazon Web Services
Learning Objectives:
- Learn the common use-cases for using Athena, AWS' interactive query service on S3
- Learn best practices for creating tables and partitions and performance optimizations
- Learn how Athena handles security, authorization, and authentication
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
This document discusses Marquez, an open source metadata management system. It provides an overview of Marquez and how it can be used to track metadata in data pipelines. Specifically:
- Marquez collects and stores metadata about data sources, datasets, jobs, and runs to provide data lineage and observability.
- It has a modular framework to support data governance, data lineage, and data discovery. Metadata can be collected via REST APIs or language SDKs.
- Marquez integrates with Apache Airflow to collect task-level metadata, dependencies between DAGs, and link tasks to code versions. This enables understanding of operational dependencies and troubleshooting.
- The Marquez community aims to build an open
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
The document provides an overview of the Databricks platform, which offers a unified environment for data engineering, analytics, and AI. It describes how Databricks addresses the complexity of managing data across siloed systems by providing a single "data lakehouse" platform where all data and analytics workloads can be run. Key features highlighted include Delta Lake for ACID transactions on data lakes, auto loader for streaming data ingestion, notebooks for interactive coding, and governance tools to securely share and catalog data and models.
Delta Lake is an open source storage layer that sits on top of data lakes and brings ACID transactions and reliability to Apache Spark. It addresses challenges with data lakes like lack of schema enforcement and transactions. Delta Lake provides features like ACID transactions, scalable metadata handling, schema enforcement and evolution, time travel/data versioning, and unified batch and streaming processing. Delta Lake stores data in Apache Parquet format and uses a transaction log to track changes and ensure consistency even for large datasets. It allows for updates, deletes, and merges while enforcing schemas during writes.
An overview of the Amazon ElastiCache managed service, with examples of how it can be used to increase performance, lower costs and augment other database services and databases to make things faster, easier and less expensive.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Should I move my database to the cloud?James Serra
So you have been running on-prem SQL Server for a while now. Maybe you have taken the step to move it from bare metal to a VM, and have seen some nice benefits. Ready to see a TON more benefits? If you said “YES!”, then this is the session for you as I will go over the many benefits gained by moving your on-prem SQL Server to an Azure VM (IaaS). Then I will really blow your mind by showing you even more benefits by moving to Azure SQL Database (PaaS/DBaaS). And for those of you with a large data warehouse, I also got you covered with Azure SQL Data Warehouse. Along the way I will talk about the many hybrid approaches so you can take a gradual approve to moving to the cloud. If you are interested in cost savings, additional features, ease of use, quick scaling, improved reliability and ending the days of upgrading hardware, this is the session for you!
Making the move to a document database can be intimidating. Yes, its flexible data model gives you a lot of choices, but it also raises questions: Which way is the right way? Is a document database even the right tool? Join this live session on the basics of data modeling with JSON to learn:
- How a document database compares to a traditional RDBMS
- What JSON data modeling means for your application code
- Which tools might be helpful along the way
The examples in this session use the free, open-source Couchbase Server document database, but the principles you’ll learn can also be applied to Cosmos DB, MongoDB, RavenDB, and others.
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
Modern data analysis is moving beyond the Data Warehouse to the Data Lake where analysts are able to take advantage of emerging technologies to manage complex analytics on large data volumes and diverse data types. Yet, for some business problems, a Data Warehouse may still be the right solution.
If you’re on the fence, join this webinar as we compare and contrast Data Lakes and Data Warehouses, identifying situations where one approach may be better than the other and highlighting how the two can work together.
Get tips, takeaways and best practices about:
- The benefits and problems of a Data Warehouse
- How a Data Lake can solve the problems of a Data Warehouse
- Data Lake Architecture
- How Data Warehouses and Data Lakes can work together
Learn about the new AWS Database Migration Service, which helps you migrate databases with minimal downtime from on-premises and Amazon EC2 environments to Amazon RDS, Amazon Redshift, Amazon Aurora and EC2 databases. We discuss homogeneous (e.g. Oracle-to-Oracle, PostgreSQL-to-PostgreSQL, etc.) and heterogeneous (e.g. Oracle to Aurora, SQL Server to MariaDB) database migrations. We also talk about the new AWS Schema Conversion Tool that saves you development time when migrating your Oracle and SQL Server database schemas, including PL/SQL and T-SQL procedural code, to their MySQL, MariaDB and Aurora equivalents.
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the Data Warehouse or to facilitate competitive Data Science and building algorithms in the organization, the Data Lake — a place for unmodeled and vast data — will be provisioned widely in 2019.
Though it doesn’t have to be complicated, the Data Lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the Data Swamp, but not the Data Lake! The tool ecosystem is building up around the Data Lake and soon many will have a robust Lake and Data Warehouse. We will discuss policy to keep them straight, send “horses to courses,” and keep up users’ confidence in the Data Platforms.
As for platform, although Hadoop received the early majority of Data Lakes, organizations are now weighing in that the Data Lake will be built in Cloud object storage. We’ll discuss these options as well.
Get this data point for your Data Lake journey.
This document provides an overview of AWS Lake Formation and related services for building a secure data lake. It discusses how Lake Formation provides a centralized management layer for data ingestion, cleaning, security and access. It also describes how Lake Formation integrates with services like AWS Glue, Amazon S3 and ML transforms to simplify and automate many data lake tasks. Finally, it provides an example workflow for using Lake Formation to deduplicate data from various sources and grant secure access for analysis.
Databricks on AWS provides a unified analytics platform using Apache Spark. It allows companies to unify their data science, engineering, and business teams on one platform. Databricks accelerates innovation across the big data and machine learning lifecycle. It uniquely combines data and AI technologies on Apache Spark. Enterprises face challenges beyond just Apache Spark, including having data scientists and engineers in separate silos with complex data pipelines and infrastructure. Azure Databricks provides a fast, easy, and collaborative Apache Spark-based analytics platform on Azure that is optimized for the cloud. It offers the benefits of Databricks and Microsoft with one-click setup, a collaborative workspace, and native integration with Azure services. Over 500 customers participated in the
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This document provides an overview of building a modern cloud analytics solution using Microsoft Azure. It discusses the role of analytics, a history of cloud computing, and a data warehouse modernization project. Key challenges covered include lack of notifications, logging, self-service BI, and integrating streaming data. The document proposes solutions to these challenges using Azure services like Data Factory, Kafka, Databricks, and SQL Data Warehouse. It also discusses alternative implementations using tools like Matillion ETL and Snowflake.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
Choosing the right data warehouse is a big challenge for organisations. In this doc, we have made an end to end comparison of leading data warehouses. Snowflake vs Redshift vs BigQuery vs Hive vs Athena
Sprinkledata.com
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...javier ramirez
Data comes in different shapes and sizes, at different speeds, and with different processing needs. AWS offers managed databases covering SQL, NoSQL, in-memory, graphs, time series, and ledger databases. Learn which are the main use cases for each of those, which ones offer autoscaling or serverless capabilities, how you can start from scratch or via database migration services, and why Amazon Aurora is the fastest-growing service in the history of AWS.
This document discusses how companies are increasingly data-centric and how data has become a strategic asset. It introduces several AWS database and data storage services like Amazon Aurora, DynamoDB, DocumentDB, ElastiCache, Neptune, Timestream, and QLDB. These services provide different data models and use cases like relational, key-value, document, in-memory, graph, time-series, and ledger data. The document highlights features of each service like performance, scalability, availability, security, and ease of use. It also discusses how the AWS Database Migration Service can help migrate databases to AWS.
AWS Quick Start is a free event designed to educate you about the AWS platform with architectural best practices, expert tips, and demonstrations. Join us as we cover a broad range of getting started topics to help you fast-track your success and build on AWS.
AWS Quick Start is designed to educate you about the AWS platform with architectural best practices, expert tips, and demonstrations.
Join us as we cover a broad range of getting started topics to help you fast-track your success and build on AWS.
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...Amazon Web Services
In this session, we cover a purpose-built strategy for databases, where you choose the right tool for the job. We explain why your application should drive the requirements of a database, not the other way around. We introduce AWS databases that are purpose-built for your application scenarios. We explain why different database services solve different aspects of an application. We demo application scenarios that lend themselves well to specific data services. If you’re building modern applications requiring high performance, scale, and functional databases, and trying to determine which relational and nonrelational data services to use, this session is for you.
Deriving Value with Next Gen Analytics and ML ArchitecturesAmazon Web Services
This presentation was delivered on March 19, 2019at Gartner's Data and Analytics Summit in Orlando, FL. Rahul Pathak, GM at AWS discusses Deriving Value with Next Gen Analytics and ML Architectures on AWS.
AWS Purpose-Built Database Strategy: The Right Tool for The Right JobAmazon Web Services
Learn why AWS is building a comprehensive database and analytics platform with purpose-built databases designed to solve specific customer problems.
We dive deeper into the operational database services that AWS offers, such as Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, and the new Amazon Neptune graph database. Finally, through a demonstration of Amazon RDS, you get to see how easy it is to use a managed database service.
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...Amazon Web Services
In this session, we give an overview of how to build low-latency and high-throughput applications with Amazon ElastiCache. We also provide an introduction to Redis and Memcached, two of the world’s leading in-memory databases, and we cover the specific use cases customers are solving with them. From use cases like caching, session storage, and leaderboards, a wide range of applications can have dramatic performance improvements with an added an in-memory database.
Building data lakes for analytics on AWS - ADB201 - Santa Clara AWS Summit.pdfAmazon Web Services
AWS provides the most comprehensive, secure, scalable, and cost-effective portfolio of services for building data lakes for analytics. In this session, learn how to discover, load, store, catalog, prepare, and secure your data in a data lake. Then, learn to analyze with the largest choice of analytics approaches, including big data, data warehouse, operational, and real-timing streaming analytics, and even ML and AI. Ensure that your needs are met for existing and future analytics use cases. Hear how leading companies found success with their data lake initiatives.
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseAWS Summits
Learn how to evaluate a new workload for the best managed database option based on specific application needs related to data shape, data size at limit, computational requirements, programmability, throughput and latency needs, and more. This session explains the ideal use cases for relational and non-relational database services, including Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache for Redis, Amazon Neptune, and Amazon Redshift.
Laura Caicedo, Solutions Architect, Amazon Web Services
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
Speaker: Renee Lo, Head of Big Data, Analytics, and AI, ASEAN, AWS
Customer Speaker: Natalia Kozyura, Head of Innovation Center, FWD Group
We discuss architectural principles that simplify big data analytics. We'll apply these principles to various stages of big data processing: collect, store, process, analyse, and visualise. We'll discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Building with Purpose-Built Databases: Match Your Workload to the Right DatabaseAmazon Web Services
Learn how to evaluate a new workload for the best managed database option based on specific application needs related to data shape, data size at limit, computational requirements, programmability, throughput and latency needs, and more. This session explains the ideal use cases for relational and non-relational database services, including Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache for Redis, Amazon Neptune, and Amazon Redshift.
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
As you look to modernizing your applications, you will need to consider your database options to meet the new application requirements. AWS offers a series of purpose-built databases that include relational, key value, document, graph and cache use cases to help you deliver new and enhanced functionalities. In this webinar session, we share the different modern application architectures, and how to combine different database services to meet your requirements. Understand how to modernize your relational databases through easy upgrades with Amazon Relational Database Service and learn how to migrate from one database to another with AWS Database Migration Service and AWS Schema Conversion Tool.
Speaker:
Blair Layton, Business Development Manager, Amazon Web Services
Tak ako cloud znížil náklady na ukladanie a procesovanie dát a objavila sa nová generácia aplikácií, vznikli nové požiadavky na databázy. Tieto aplikácie potrebujú databázy na ukladanie tera- či petabajtov dát, nových typov údajov, odozvy v milisekundách, schopnosť spracovať milióny požiadaviek za sekundu od miliónov užívateľov kdekoľvek na svete. Na podporu takýchto požiadaviek potrebujete relačné aj nerelačné databázy, ktoré sú navrhnuté tak, aby vyhovovali špecifickým potrebám vašich aplikácií.
Ak sa chcete dozvedieť viac, aké databázové systémy môžete použiť na AWS pre vaše aplikácie, pripojte sa k nášmu ďalšiemu AWS česko-slovenskému webináru. Budeme demonštrovať rôzne databázové riešenia na AWS, popíšeme prípady použitia, najlepšie postupy a ukážeme niekoľko ukážok.
Premiéra: 09/07/2019
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
AWS provides the most comprehensive, secure, scalable, and cost-effective portfolio of services for building data lakes for analytics. In this session, learn how to discover, load, store, catalog, prepare, and secure your data in a data lake. Then, learn to analyze with the largest choice of analytics approaches, including big data, data warehouse, operational, real-time streaming analytics, and even ML and AI. Ensure that your needs are met for existing and future analytics use cases, and discover how leading companies found success with their data lake initiatives.
The document discusses different database models and their common use cases. It describes relational, key-value, document, in-memory, graph, and time-series database models. For each model it provides examples of common data structures, features, and example use cases such as ERP/CRM systems, shopping carts, content management, fraud detection, and IoT applications. The goal is to help readers understand which database model to select based on their specific data and requirements.
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Speaker:
Roy Ben-Alta, AWS
For more Alluxio events: https://www.alluxio.io/events/
The Future of Database Migration is Cloud, AWS Federal Pop-Up LoftAmazon Web Services
According to Gartner, “the future of the DBMS is the cloud.” But what does it mean to migrate an on-premises database to the cloud? What database options are available? Do I lift-and-shift, re-platform, or re-factor to a modern database engine? How do I size my database workload in AWS? Join AWS experts for an in-depth discussion based on real-world use cases to explore these topics and more as you plan your journey to the cloud.
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...AWS Summits
In this session we will discuss the ideal use cases for relational and nonrelational data services, including Amazon ElastiCache for Redis, Amazon DynamoDB, Amazon Aurora, Amazon Neptune, Amazon ElasticSearch Service, Amazon TimeStream, Amazon QLDB, and Amazon DocumentDB. This session will focus on how to evaluate a new workload for the best managed database option.
Building with AWS Databases: Match Your Workload to the Right Database | AWS ...Amazon Web Services
The document discusses choosing the appropriate database for different types of workloads. It outlines several AWS databases and their best use cases based on data shape, size, and compute needs. Examples include using DynamoDB for key-value workloads, DocumentDB for flexible schemas, and Neptune for graph queries. The document emphasizes starting with application requirements and choosing a database tailored to the specific workload.
Similar to How to choose the right database for your workload (20)
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
1) The document discusses building a minimum viable product (MVP) using Amazon Web Services (AWS).
2) It provides an example of an MVP for an omni-channel messenger platform that was built from 2017 to connect ecommerce stores to customers via web chat, Facebook Messenger, WhatsApp, and other channels.
3) The founder discusses how they started with an MVP in 2017 with 200 ecommerce stores in Hong Kong and Taiwan, and have since expanded to over 5000 clients across Southeast Asia using AWS for scaling.
This document discusses pitch decks and fundraising materials. It explains that venture capitalists will typically spend only 3 minutes and 44 seconds reviewing a pitch deck. Therefore, the deck needs to tell a compelling story to grab their attention. It also provides tips on tailoring different types of decks for different purposes, such as creating a concise 1-2 page teaser, a presentation deck for pitching in-person, and a more detailed read-only or fundraising deck. The document stresses the importance of including key information like the problem, solution, product, traction, market size, plans, team, and ask.
This document discusses building serverless web applications using AWS services like API Gateway, Lambda, DynamoDB, S3 and Amplify. It provides an overview of each service and how they can work together to create a scalable, secure and cost-effective serverless application stack without having to manage servers or infrastructure. Key services covered include API Gateway for hosting APIs, Lambda for backend logic, DynamoDB for database needs, S3 for static content, and Amplify for frontend hosting and continuous deployment.
This document provides tips for fundraising from startup founders Roland Yau and Sze Lok Chan. It discusses generating competition to create urgency for investors, fundraising in parallel rather than sequentially, having a clear fundraising narrative focused on what you do and why it's compelling, and prioritizing relationships with people over firms. It also notes how the pandemic has changed fundraising, with examples of deals done virtually during this time. The tips emphasize being fully prepared before fundraising and cultivating connections with investors in advance.
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
This document discusses Amazon's machine learning services for building conversational interfaces and extracting insights from unstructured text and audio. It describes Amazon Lex for creating chatbots, Amazon Comprehend for natural language processing tasks like entity extraction and sentiment analysis, and how they can be used together for applications like intelligent call centers and content analysis. Pre-trained APIs simplify adding machine learning to apps without requiring ML expertise.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.