Since Amazon Redshift launched last year, it has been adopted by a wide variety of companies for data warehousing. In this session, learn how customers NASDAQ, HauteLook, and Roundarch Isobar are taking advantage of Amazon Redshift for three unique use cases: enterprise, big data, and SaaS. Learn about their implementations and how they made data analysis faster, cheaper, and easier with Amazon Redshift.
Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
In this session, you get an overview of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service. We'll cover how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also discuss new features, architecture best practices, and share how customers are using Amazon Redshift for their Big Data workloads.
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully managed petabyte-scale data warehouse service that costs less than $1,000 a TB a year, under a tenth the price of most traditional data warehousing solutions. Learn how Yahoo! uses both to build a billion event a day infrastructure that is fast, easy, and cost-effective. Dive into how Yahoo performs advanced user retention and cohort analysis to make near–real time product and marketing decisions.
Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.
In this session, you get an overview of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service. We'll cover how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also discuss new features, architecture best practices, and share how customers are using Amazon Redshift for their Big Data workloads.
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast, fully managed petabyte-scale data warehouse service that costs less than $1,000 a TB a year, under a tenth the price of most traditional data warehousing solutions. Learn how Yahoo! uses both to build a billion event a day infrastructure that is fast, easy, and cost-effective. Dive into how Yahoo performs advanced user retention and cohort analysis to make near–real time product and marketing decisions.
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
Amazon DynamoDB is a fully-managed, zero-admin, high-speed NoSQL database service. Amazon DynamoDB was built to support applications at any scale. With the click of a button, you can scale your database capacity from a few hundred I/Os per second to hundreds of thousands of I/Os per second or more. You can dynamically scale your database to keep up with your application's requirements while minimizing costs during low-traffic periods. The service has no limit on storage. You also learn about Amazon DynamoDB's design principles and history.
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
In this chalk talk, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Amazon Redshift Spectrum, a feature of Amazon Redshift that enables you to analyze data across Amazon Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitment or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
In this Masterclass presentation we will:
• Explore the architecture and fundamental characteristics of Amazon Redshift
• Show you how to launch Redshift clusters and to load data into them
• Explain out how to use the AWS Console to monitor and manage Redshift clusters
• Help you to discover best practices and other resources to help you get the most from Redshift
Watch the recording here: http://youtu.be/-FmCWcxRvXY
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
Redshift is a petabyte-scale data warehouse that is a lot faster, a lot less expensive and a whole lot simpler to use. How can you get your data into Amazon Redshift? In this webinar, hear from representatives of Attunity (Amazon Redshift Partner), and AWS as they present many of the options available for data integration. Whether your data is in an on premise platform or a cloud based database like DynamoDB, we will show you how you can easily load your data in to Re
dshift.
Reasons to attend: - Learn about best practices to efficiently integrate data into Redshift. - Attend Q&A session with Redshift experts
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
"No matter the industry, leading organizations need to closely integrate, deploy, secure, and scale diverse technologies to support workloads while containing costs. Nasdaq, Inc.—a leading provider of trading, clearing, and exchange technology—is no exception.
After migrating more than 1,100 tables from a legacy data warehouse into Amazon Redshift, Nasdaq, Inc. is now implementing a fully-integrated, big data architecture that also includes Amazon S3, Amazon EMR, and Presto to securely analyze large historical data sets in a highly regulated environment. Drawing from this experience, Nasdaq, Inc. shares lessons learned and best practices for deploying a highly secure, unified, big data architecture on AWS.
Attendees learn:
Architectural recommendations to extend an Amazon Redshift data warehouse with Amazon EMR and Presto.
Tips to migrate historical data from an on-premises solution and Amazon Redshift to Amazon S3, making it consumable.
Best practices for securing critical data and applications leveraging encryption, SELinux, and VPC."
"Amgen discovers, develops, manufactures, and delivers innovative human therapeutics, helping millions of people in the fight against serious illnesses. In 2014, Amgen implemented a solution to offload ETL data across a diverse data set (U.S. pharmaceutical prescriptions and claims) using Amazon EMR. The solution has transformed the way Amgen delivers insights and reports to its sales force. To support Amgen’s entry into a much larger market, the ETL process had to scale to eight times its existing data volume. We used Amazon EC2, Amazon S3, Amazon EMR, and Amazon Redshift to generate weekly sales reporting metrics.
This session discusses highlights in Amgen's journey to leverage big data technologies and lay the foundation for future growth: benefits of ETL offloading in Amazon EMR as an entry point for big data technologies; benefits and challenges of using Amazon EMR vs. expanding on-premises ETL and reporting technologies; and how to architect an ETL offload solution using Amazon S3, Amazon EMR, and Impala."
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & DataductAmazon Web Services
"As data volumes grow, managing and scaling data pipelines for ETL and batch processing can be daunting. With more than 13.5 million learners worldwide, hundreds of courses, and thousands of instructors, Coursera manages over a hundred data pipelines for ETL, batch processing, and new product development.
In this session, we dive deep into AWS Data Pipeline and Dataduct, an open source framework built at Coursera to manage pipelines and create reusable patterns to expedite developer productivity. We share the lessons learned during our journey: from basic ETL processes, such as loading data from Amazon RDS to Amazon Redshift, to more sophisticated pipelines to power recommendation engines and search services.
Attendees learn:
Do's and don’ts of Data Pipeline
Using Dataduct to streamline your data pipelines
How to use Data Pipeline to power other data products, such as recommendation systems
What’s next for Dataduct"
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Near Real-Time Data Analysis With FlyData FlyData Inc.
This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
In this workshop, you migrate a sample sporting event and ticketing database from Oracle or Microsoft SQL Server to Amazon Aurora or Postgre SQL using the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS). The workshop includes the migration of tables, indexes, procedures, functions, constraints, views, and more. We run SCT on a Amazon EC2 Windows instance--bring a laptop with Remote Desktop (or some other method of connecting to the Windows instance). Ideally, you should be familiar with relational databases, especially Oracle or SQL Server and PostgreSQL or Aurora, to get the most from this session. Additionally, attendees should be familiar with SCT and DMS. Familiarity with SQL Developer and pgAdmin III will be helpful but is not required.
Prerequisites:
- Participants should have an AWS account established and available for use during the workshop.
- Please bring your own laptop.
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Implementation of linear regression and logistic regression on SparkDalei Li
This presentation was developed for a course project at Technical University of Madrid. The course is massively parallel machine learning supervised by Alberto Mozo and Bruno Ordozgoiti.
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
Amazon DynamoDB is a fully-managed, zero-admin, high-speed NoSQL database service. Amazon DynamoDB was built to support applications at any scale. With the click of a button, you can scale your database capacity from a few hundred I/Os per second to hundreds of thousands of I/Os per second or more. You can dynamically scale your database to keep up with your application's requirements while minimizing costs during low-traffic periods. The service has no limit on storage. You also learn about Amazon DynamoDB's design principles and history.
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
In this chalk talk, we take a deep dive on Amazon Redshift architecture and the latest performance enhancements that give you faster insights into your data. We also cover Amazon Redshift Spectrum, a feature of Amazon Redshift that enables you to analyze data across Amazon Redshift and your Amazon S3 data lake to deliver unique insights not possible by analyzing independent data silos.
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
In the last six month, we have set up Amazon Redshift to power our interactive data analysis at Pinterest. It has tremendously improved the speed of analyzing our data.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitment or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
In this Masterclass presentation we will:
• Explore the architecture and fundamental characteristics of Amazon Redshift
• Show you how to launch Redshift clusters and to load data into them
• Explain out how to use the AWS Console to monitor and manage Redshift clusters
• Help you to discover best practices and other resources to help you get the most from Redshift
Watch the recording here: http://youtu.be/-FmCWcxRvXY
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process.
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for as low as $1000/TB/year. This webinar will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Learning Objectives:
• Get an introduction to Amazon Redshift's massively parallel processing, columnar, scale-out architecture
• Learn how to configure your data warehouse cluster, optimize schema, and load data efficiently
• Get an overview of all the latest features including interleaved sorting and user-defined functions
Redshift is a petabyte-scale data warehouse that is a lot faster, a lot less expensive and a whole lot simpler to use. How can you get your data into Amazon Redshift? In this webinar, hear from representatives of Attunity (Amazon Redshift Partner), and AWS as they present many of the options available for data integration. Whether your data is in an on premise platform or a cloud based database like DynamoDB, we will show you how you can easily load your data in to Re
dshift.
Reasons to attend: - Learn about best practices to efficiently integrate data into Redshift. - Attend Q&A session with Redshift experts
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
"No matter the industry, leading organizations need to closely integrate, deploy, secure, and scale diverse technologies to support workloads while containing costs. Nasdaq, Inc.—a leading provider of trading, clearing, and exchange technology—is no exception.
After migrating more than 1,100 tables from a legacy data warehouse into Amazon Redshift, Nasdaq, Inc. is now implementing a fully-integrated, big data architecture that also includes Amazon S3, Amazon EMR, and Presto to securely analyze large historical data sets in a highly regulated environment. Drawing from this experience, Nasdaq, Inc. shares lessons learned and best practices for deploying a highly secure, unified, big data architecture on AWS.
Attendees learn:
Architectural recommendations to extend an Amazon Redshift data warehouse with Amazon EMR and Presto.
Tips to migrate historical data from an on-premises solution and Amazon Redshift to Amazon S3, making it consumable.
Best practices for securing critical data and applications leveraging encryption, SELinux, and VPC."
"Amgen discovers, develops, manufactures, and delivers innovative human therapeutics, helping millions of people in the fight against serious illnesses. In 2014, Amgen implemented a solution to offload ETL data across a diverse data set (U.S. pharmaceutical prescriptions and claims) using Amazon EMR. The solution has transformed the way Amgen delivers insights and reports to its sales force. To support Amgen’s entry into a much larger market, the ETL process had to scale to eight times its existing data volume. We used Amazon EC2, Amazon S3, Amazon EMR, and Amazon Redshift to generate weekly sales reporting metrics.
This session discusses highlights in Amgen's journey to leverage big data technologies and lay the foundation for future growth: benefits of ETL offloading in Amazon EMR as an entry point for big data technologies; benefits and challenges of using Amazon EMR vs. expanding on-premises ETL and reporting technologies; and how to architect an ETL offload solution using Amazon S3, Amazon EMR, and Impala."
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & DataductAmazon Web Services
"As data volumes grow, managing and scaling data pipelines for ETL and batch processing can be daunting. With more than 13.5 million learners worldwide, hundreds of courses, and thousands of instructors, Coursera manages over a hundred data pipelines for ETL, batch processing, and new product development.
In this session, we dive deep into AWS Data Pipeline and Dataduct, an open source framework built at Coursera to manage pipelines and create reusable patterns to expedite developer productivity. We share the lessons learned during our journey: from basic ETL processes, such as loading data from Amazon RDS to Amazon Redshift, to more sophisticated pipelines to power recommendation engines and search services.
Attendees learn:
Do's and don’ts of Data Pipeline
Using Dataduct to streamline your data pipelines
How to use Data Pipeline to power other data products, such as recommendation systems
What’s next for Dataduct"
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
This webinar will provide an overview of Redshift with an emphasis on the many changes we recently introduced. In particular, we will address the newly released DW2 instance types and what you can do with them.
This content is designed for database developers and architects interested in Amazon Redshift.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Near Real-Time Data Analysis With FlyData FlyData Inc.
This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
In this workshop, you migrate a sample sporting event and ticketing database from Oracle or Microsoft SQL Server to Amazon Aurora or Postgre SQL using the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS). The workshop includes the migration of tables, indexes, procedures, functions, constraints, views, and more. We run SCT on a Amazon EC2 Windows instance--bring a laptop with Remote Desktop (or some other method of connecting to the Windows instance). Ideally, you should be familiar with relational databases, especially Oracle or SQL Server and PostgreSQL or Aurora, to get the most from this session. Additionally, attendees should be familiar with SCT and DMS. Familiarity with SQL Developer and pgAdmin III will be helpful but is not required.
Prerequisites:
- Participants should have an AWS account established and available for use during the workshop.
- Please bring your own laptop.
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Implementation of linear regression and logistic regression on SparkDalei Li
This presentation was developed for a course project at Technical University of Madrid. The course is massively parallel machine learning supervised by Alberto Mozo and Bruno Ordozgoiti.
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
In this session, you learn about the latest and hottest features of Amazon Redshift. Join Vidhya Srinivasan, General Manager of Amazon Redshift, to take a deep dive into the architecture and inner workings of Amazon Redshift. You discover how the recent availability, performance, and manageability improvements we’ve made can significantly enhance your end user experience. You also get a glimpse of what we are working on and our plans for the future.
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
Inefficient data workloads are all too common across enterprises - causing costly delays, breakages, hard-to-maintain complexity, and ultimately lost productivity. For a typical enterprise with multiple data warehouses, thousands of reports, and hundreds of thousands of ETL jobs being executed every day, this loss of productivity is a real problem. Add to all of this the complex handwritten SQL queries, and there can be nearly a million queries executed every month that desperately need to be optimized, especially to take advantage of the benefits of Apache Hadoop. How can enterprises dig through their workloads and inefficiencies to easily see which are the best fit for Hadoop and what’s the fastest path to get there?
Cloudera Navigator Optimizer is the solution - analyzing existing SQL workloads to provide instant insights into your workloads and turns that into an intelligent optimization strategy so you can unlock peak performance and efficiency with Hadoop. As the newest addition to Cloudera’s enterprise Hadoop platform, and now available in limited beta, Navigator Optimizer has helped customers profile over 1.5 million queries and ultimately save millions by optimizing for Hadoop.
AWS Lambda is a new compute service that runs your code in response to events and automatically manages the compute resources for you. AWS Lambda enables powerful application architectures that simplify and accelerate development of connected applications. Together with Amazon Cognito, AWS SNS Push Notifications and AWS DynamoDB, AWS Lambda is a powerful tool in your arsenal for developing IoT/mobile apps, and beyond. This session will show you how to get started quickly by covering key architectural design concepts and demonstrating the use of the AWS SDKs to simplify creating powerful applications for the always-on world that connects beyond the desktop.
Speaker: Adam Larter, Solutions Architect, Amazon Web Services
This presentation was delivered 14 times (in various forms) by AWS Evangelist Jeff Barr as part of his 2013 AWS Road Trip.
After introducing AWS, it covers the basics of S3, EC2, RDS, DynamoDB, Elastic Block Storage, Auto Scaling, Elastic Load Balancing, Redshift, the AWS Trusted Advisor, and more.
Are you looking to automate backup and archiving of your business-critical data workloads? Attend this session to understand key use cases, best practices, and considerations for protecting your data with AWS and CommVault. This session will feature lessons learned from CommVault customers that have: migrated onsite backup data into Amazon S3 to reduce hardware footprint and improve recoverability; implemented data-tiering and archived data in Amazon Glacier for long term retention and compliance; performed snapshot-based protection and recovery for applications running in Amazon EC2; and, provisioned and managed VMs in Amazon EC2.
Speaker: Michael Porfirio, Director Systems Engineering, CommVault
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012Amazon Web Services
Join Ray Bradford from Kleiner Perkins in a frank discussion with Yelp Engineering Manager Jim Blomo, and Flipboard Chief Architect Greg Scallan , as they explore how they are optimizing their costs with AWS, and how they think about owning vs. renting hardware as they grow. Ray will also share observations and trends on how successful VC funded companies think about IT costs and the right things to be spending money on.
40, 1173 & 516. What do these numbers mean? Since inception AWS has introduced more than 40 major new services, released over 1173 new services and features, with 516 new features and services announced in 2014 alone. How you use the AWS platform last year may be very different to how you utilise it today to maximize innovation, outcomes and remaining competitive. In this advanced technical session an AWS Solution Architect will address technical requirements for successfully deploying and managing applications on the AWS platform, how solutions were potentially architected previously, both off-cloud and on-cloud, and some of the best practice recommendations on AWS today.
Speaker: Dean Samuels, Solutions Architect, Amazon Web Services
The AWS cloud infrastructure has been architected to be one of the most flexible and secure cloud computing environments available today. In this session, we’ll provide a practical understanding of the assurance programs that AWS provides; such as HIPAA, FedRAMP(SM), PCI DSS Level 1, MPAA, and many others. We’ll also address the types of business solutions that these certifications enable you to deploy on the AWS Cloud, as well as the tools and services AWS makes available to customers to secure and manage their resources.
In this session, learn how to move your existing database applications to the cloud. We cover the best practices for planning your migrations, moving your data over, sizing your AWS deployment appropriately, and minimizing downtime. You also hear from some of our customers who have successfully migrated their applications about the techniques they used and the reasons they moved onto the cloud.
Webinar: Delivering Static and Dynamic Content Using CloudFrontAmazon Web Services
In this presentation from our webinar titled “Delivering Static and Dynamic Content using Amazon CloudFront”, we provide an overview on how you can use Amazon CloudFront to help architect your site to deliver both static and dynamic content (portions of your site that change for each end-user). Andy Rosenbaum, Director of Desktop Development at Earth Networks, also joined and presented on why Earth Networks chose Amazon CloudFront to deliver their dynamic weather content.
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
Amazon Web Services proporciona una amplia gama de servicios que le ayudarán a crear e implementar aplicaciones de análisis de big data de forma rápida y sencilla. AWS ofrece un acceso rápido a recursos de TI económicos y flexibles, algo que permitirá escalar prácticamente cualquier aplicación de big data con rapidez, incluidos almacenamiento de datos, análisis de clics, detección de elementos fraudulentos, motores de recomendación, proceso ETL impulsado por eventos, informática sin servidor y procesamiento del Internet de las cosas. Con AWS no necesita hacer grandes inversiones iniciales de tiempo o dinero para crear y mantener la infraestructura. En su lugar, puede aprovisionar exactamente el tipo y el tamaño adecuado de los recursos que necesita para impulsar sus aplicaciones de análisis de big data. Puede obtener acceso a tantos recursos como necesite, prácticamente al instante, y pagar únicamente por los utilice.
With AWS you can choose the right database technology and software for the job. Given the myriad of choices, from relational databases to non-relational stores, this session provides details and examples of some of the choices available to you. This session also provides details about real-world deployments from customers using Amazon RDS, Amazon ElastiCache, Amazon DynamoDB, and Amazon Redshift.
Using real time big data analytics for competitive advantageAmazon Web Services
Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.
Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.
To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.
BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.
DoneDeal AWS Data Analytics Platform build using AWS products: EMR, Data Pipeline, S3, Kinesis, Redshift and Tableau. Custom built ETL was written using PySpark.
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. Join the General Manager of Amazon EMR, Peter Sirota, to learn how to scale your analytics, use Hadoop with Amazon EMR, write queries with Hive, develop real world data flows with Pig, and understand the operational needs of a production data platform.
5 Things that Make Hadoop a Game Changer
Webinar by Elliott Cordo, Caserta Concepts
There is much hype and mystery surrounding Hadoop's role in analytic architecture. In this webinar, Elliott presented, in detail, the services and concepts that makes Hadoop a truly unique solution - a game changer for the enterprise. He talked about the real benefits of a distributed file system, the multi workload processing capabilities enabled by YARN, and the 3 other important things you need to know about Hadoop.
To access the recorded webinar, visit the event site: https://www.brighttalk.com/webcast/9061/131029
For more information the services and solutions that Caserta Concepts offers, please visit http://casertaconcepts.com/
In this talk, Ian will table about Amazon Redshift, a managed petabyte scale data warehouse, give an overview of integration with Amazon Elastic MapReduce, a managed Hadoop environment, and cover some exciting new developments in the analytics space.
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
3. Amazon Redshift architecture
• Leader Node
–
–
–
JDBC/ODBC
SQL endpoint
Stores metadata
Coordinates query execution
• Compute Nodes
–
–
–
–
10 GigE
(HPC)
Local, columnar storage
Execute queries in parallel
Load, backup, restore via Amazon S3
Parallel load from Amazon DynamoDB
• Single node version available
Ingestion
Backup
Restore
4. Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node
Effective Hourly
Price per TB
Effective Annual
Price per TB
On-Demand
$ 0.850
$ 0.425
$ 3,723
1 Year Reservation
$ 0.500
$ 0.250
$ 2,190
3 Year Reservation
$ 0.228
$ 0.114
$
999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
6. Where innovation meets action
OUR TECHNOLOGY
WE OWN AND OPERATE
IS USED TO POWER MORE THAN
70 M ARKETPLACES
26 MARKETS
including
IN 50 COUNTRIES
3 CLEARINGHOUSES
1 MILLION
MESSAGES/SECOND
AT A MEDIAN SPEED OF
SUB-55 MICROSECONDS
POWER
1 IN 10
OF THE WORLD’S SECURITIES TRANSACTIONS
AND 5 CENTRAL
SECURITIES
OUR GLOBAL PLATFORM
CAN HANDLE MORE THAN
WE
D E P OS ITOR IE S
MORE THAN 5500
STRUCTURED PRODUCTS
ARE TIED TO OUR GLOBAL INDEXES
WITH THE NOTIONAL VALUE OF
AT LEAST $1 TRILLION
WE LIST ~3300
GLOBAL COMPANIES WORTH
$6 TRILLION
IN MARKET CAP REPRESENTING
DIVERSE INDUSTRIES AND
MANY OF THE WORLD’S
MOST WELL-KNOWN AND
INNOVATIVE BRANDS
6
7. What I do
New data and analytics platforms to store and
serve data to internal and external customers.
8. The Challenge
• Archiving Market Data
– classic “Big Data” problem
• Power Surveillance and Business
Intelligence/Analytics
• Minimize cost
– Not only infrastructure, but development/IT labor costs too
• Empower the business for self-service
9. SIP Total Monthly Message Volumes
OPRA, UQDF and CQS
Market
Data
Is Big
Data
Total Monthly Message Volume
Date
Aug-12
Sep-12
Oct-12
Nov-12
Dec-12
Jan-13
Feb-13
Mar-13
Apr-13
May-13
Jun-13
Jul-13
Aug-13
Charts courtesy of the
Financial Information
Forum
NASDAQ Exchange Daily Peak Messages
600,000,000
500,000,000
400,000,000
300,000,000
200,000,000
100,000,000
0
OPRA Annual Increase: 69%
CQS Annual Increase: 10%
UQDF Annual Decrease: 6%
Jan-13
Feb-13 Mar-13
Apr-13 May-13 Jun-13
Jul-13
Aug-13 Sep-13
Financial Information Forum, Redistribution without permission from FIF prohibited, email: fifinfo@fif.com
UQDF
2,317,804,321
1,948,330,199
1,016,336,632
2,148,867,295
2,017,355,401
2,099,233,536
1,969,123,978
2,010,832,630
2,447,109,450
2,400,946,680
2,601,863,331
2,142,134,920
2,188,338,764
CQS
8,241,554,280
7,452,279,225
7,452,279,225
9,552,313,807
8,052,399,165
7,474,101,082
7,531,093,813
7,896,498,260
9,805,224,566
9,430,865,048
11,062,086,463
8,266,215,553
9,079,813,726
Total Monthly
Message Volume
Date
OPRA
Aug-12
80,600,107,361
Sep-12
77,303,404,427
Oct-12
98,407,788,187
Nov-12
104,739,265,089
Dec-12
81,363,853,339
Jan-13
82,227,243,377
Feb-13
87,207,025,489
Mar-13
93,573,969,245
Apr-13
123,865,614,055
May-13
134,587,099,561
Jun-13
162,771,803,250
Jul-13
120,920,111,089
Aug-13
136,237,441,349
Combined
Average Daily
Volume
459,102,548
494,768,917
403,267,422
557,199,100
503,487,728
455,873,077
500,011,463
495,366,545
556,924,273
537,809,624
683,197,490
473,106,840
512,188,750
Average Daily
Volume
3,504,352,494
4,068,600,233
4,686,085,152
4,987,584,052
4,068,192,667
3,915,583,018
4,589,843,447
4,678,698,462
5,630,255,184
6,117,595,435
8,138,590,163
5,496,368,686
6,192,610,970
23
10. Our legacy solution
• On-premises MPP DB
– Relatively expensive, finite storage
– Required periodic additional expenses to add more storage
– Ongoing IT (administrative) human costs
• Legacy BI tool
– Requires developer involvement for new data sources, reports,
dashboards, etc.
11. New Solution: Amazon Redshift
• Cost Effective
– Redshift is 43% of the cost of legacy
• Assuming equal storage capacities
– Doesn’t include IT ongoing costs!
• Performance
– Easily outperforms our legacy BI/DB solution
– Insert 550K rows/second on a 2 node 8XL cluster
• Elastic
– Add additional capacity on demand, easy to grow our cluster
12. New Solution: Pentaho BI/ETL
• Amazon Redshift partner
– http://aws.amazon.com/redshift/par
tners/pentaho/
• Self Service
– Tools empower BI users to
integrate new data sources, create
their own analytics, dashboards,
and reports without requiring
development involvement
• Cost effective
13. Net Result
• New solution is cheaper, faster, and offers
capabilities that our business didn’t have before
– Empowers our business users to explore data like they never
could before
– Reduces IT and development as bottlenecks
– Margin improvement (expense reduction and supports business
decisions to grow revenue)
15. Who am I? Kevin Diamond
• CTO of HauteLook, a Nordstrom Company
• Oversee all technology, infrastructure, data,
engineering, etc.
• Major focus on great customer experience and
the analytics to provide it
16. What is HauteLook?
• Private sale, members-only limited-time sale events
• Premium fashion and lifestyle brands at exclusive prices of
50-75% off
• Over 20 new sale events begin each morning at 8am PST
• Over 14 million members
• Acquired by Nordstrom in 2011
17. Why a Data Warehouse?
• Centralized storage of multiple data sources
• Singular reporting consistency for all departments
• Data model that supports analytics not transactions
• Operational reports vs. analytical reports
– Real-time vs. previous day
18. Why Amazon Redshift?
• Looked at some competitors:
– Ranged from $ to $$$
– All required Software, Implementation and BIG Hardware
• Skipped the RFP
• Jumped into the Public Beta of Amazon Redshift
and never looked back
19. How We Implemented Amazon Redshift
• ETL from MySQL and Microsoft SQL Server into AWS across a
Direct Connect line storing on S3
• Also used S3 to dump flat files (iTunes Connect Data, Web Analytics
dumps, log files, etc)
• Used AWS Data Pipeline for executing Sqoop and Hadoop running
on EC2 to load data into Amazon Redshift
• Redshift Data Model based on Star Schema which looks something
like …
21. Usage with Business Intelligence
• Already selected a BI Tool
• Had difficulty deploying in the cloud
• But worked great on-premises
• Easily tied into Amazon Redshift using ODBC Drivers
• BUT, metadata for reports had to live in MSSQL
• Ported many SSIS/SSRS reports over
– But only the analytical reports!
23. Amazon Redshift Instances
• We use a little under 2TB
• Thought to use 2 - BIG 8XL instance to get great performance (in
passive failover mode)
• Cost us $$$
• Then we tested using 6 - XL instances in a cluster
• Performed better and allowed for more concurrency of queries in all
but a handful of cases that really needed the 8XL power
• Cost us $
• Duh! That’s why we do distributed everything else!!
24. Some First Hand Experience
• ETL was hardest part
• Amazon Redshift performs awesome
• Someone needs to make a great client SQL tool
• MicroStrategy works great on it (just wished it loved
running in EC2)
• Saving a ton, thanks to:
–
No hardware costs
–
No maintenance/overhead (rack + power)
–
Annual costs are equivalent to just the annual maintenance
of some of the cheaper DW on-premises options
25. Conclusion/Last Advice
•
Only use 8XL instances if you need >2TB of space
–
Otherwise distribute on a bunch of XL nodes
•
Buy reserved instances (we still need to do this!) since you likely will have this always on
•
Although we haven’t yet, the idea of a flexible scale-up/down DW is crazy awesome – maybe during
Holiday we will
•
Probably could have used Elastic MapReduce instead of Hadoop – wasn’t sure how it would play with Sqoop
•
Almost all BI tools play with Amazon Redshift now, so choose what is right for your business, and make sure it
works in EC2 before just putting it there
•
Communication between AWS and your DC is easy and fast, but I recommend a Direct Connect
•
Passed our rigorous information security standards, but used in a VPC
27. roundarch isobar
OUR SERVICES ACROSS BOUGHT, OWNED AND EARNED MEDIA
Strategies
Campaigns
Experiences
Platforms
Products
We digitally transform
business processes and
disrupt industries
We create, measure and
optimize digitally-focused
campaigns
We produce joyful
experiences that inspire
consumer interaction
We design and build
flexible and scalable
technology solutions
We invent digital
products that generate
new revenue streams
Audience insight
Research: competitive,
segmentation, persona
development, heuristics
Platforms: content
management, search,
portals, mobile, frontend technology,
internet-enabled
devices/wearables, social
apps, web services,
security, big data,
hosting
Digital products
Business planning:
competitive & industry
analysis, business cases,
maturity models,
roadmaps
Strategies: brand,
interactive, multichannel, social, content
27
Communications planning
Creative: advertising, visual
design, content creation,
studio production
Optimization: analytics,
monitoring, SEO, MVT,
media ROI analysis
Requirements and
specifications: content
analysis and specs,
functional requirements,
functional specifications
User experience design:
information architecture,
taxonomy and meta data,
interaction design, mobile
Digital product
extensions
Brand as a service
28. We have served the U.S. Air Force since 2001, building their enterprise portal and many
mission-critical applications
U.S. Air Force
Key metrics for our USAF work include:
• 900,000+ registered users
• Portal availability over 99.9% of time
• 700,000+ PK-E users
• 28 production enterprise services
• Response time worldwide: 3 seconds for 80% of all pages
• Over 300 applications available
• Over 1.2 million logins/week
• Public-facing and secure private instances (NIPR & SIPR)
• 124,000 unique daily users
28
• 4-5+ million pages daily (40-70 Mbit/sec)
• Portal support for over 5,000 “Communities of Interest”
29. Transforming in-stadium operations through a touch-screen command center
New York Jets
Our executive touch-screen environment provides real-time stadium
and game data, allowing the Jets owner, Woody Johnson, to monitor
the fan experience during game time and make operational
decisions that help maximize sales. The command center provides
summary-level and drill-down views of stadium operations such as
tickets, parking and concessions. It also creates predictive
algorithms that help identify pinch points and open revenue
opportunities.
29
“We brought the big picture close enough to
identify new, better ways to do business.”
30. Through a joint venture with Copia Capital, we created a new product offering for William
Blair
William Blair | Investment Research Management System
• Facilitates collaboration between
portfolio managers and analysts
Technology:
• Provides a holistic view of a
company/stock
• Uses Jquery,
JavascriptMVC, Less
– What is everything our
organization knows about
AAPL
• Digitizes PDF/Excel tools and
reports to enable rich, dynamic
interactions
• Simplifies content creation; e.g.,
comments, recommendation
reports, document upload
• Rich charting and visualization of
analytics
30
• JavaScript, HTML5, CSS3
• JSON Web Services
• Java, Spring, JPA, Mongo
DB
• User comment: “We love
how fast it is!”
31. What is the focus of your
CMO today?
Optimize marketing spend
across all channels (Bought, Earned
and Owned)
31
33. marketing effectiveness stages
DLP
Scorecard
Sonar
AMNET
Compass
Optimize
Scorecard
Real-Time and Non-Real-Time
Learn
Analyze
• Centralized cross channel
Big Data Platform
• Standardized cross channel
reporting tools
• Discovery tools to identify
channel optimization
opportunities
• Modeling solutions
• Channel experience
enhancements
• Improved media buying,
planning & reporting functions
• Real time integration into DSP
• A/B testing based micro
segment adjustments
34. So what have we accomplished?
Built Marketing Analytics Platform - Radar
with 200+ in-time analytics, reporting andfrequency, granularity
forenable feeds (1TB/week) with various optimization
to scalable multi-tenant in 3 platform on Amazon
as multiple clients with customized metrics
with first launch SaaS months
and classification
34
36. scorecard logical architecture
Media Team
Display
Paid
Search
Organic
Search
Digital
Video
Site
Metrics
Sales
Google
DFA
Google
Bing
Marin
Google
Bing
Custom
Google
Omniture
Client
Stakeholders
TBD
Scorecard App
Detailed Analytic
Reports
TV
Radio
Print
OOH
Earned
Social
DDS
DDS
DDS
Facebook
Twitter
Competit
ive
Custom
Paid
Social
Facebook
Twitter
Media Team
36
Planners
Client Team
37. data sources
DATA VOLUME
Voluminous Data
Digital
CRM
Research
- Surveys
- Demographics
- Campaigns
- Search
- Mobile
- Attribution
- Site
- Social
- Display
VARIETY and GRANULARITY
37
- Cookie Level
- UGC
- Geospatial
- Weather
- Sales
- Competitive
39. ETL
Extract
Files loaded on Amazon S3/Amazon Glacier
Transform
Utilize Pig on Amazon EMR to cleanse,
standardize and validate the data
Radio
Glacier
Display
Ads
S3
Redshift
Search
Load
Use COPY to load Pig output
Social
Feeds
Hadoop EMR
39
40. data warehouse
Performance
Handles humongous aggregation quickly
Tableau,
BI Tools
Analysts
Cheap, fast, easily scalable
ODBC and JDBC access
For BI / adhoc analysis
Redshift
40
41. aggregation
Mapping
Radio
Join performance data with metadata
Display
Ads
Multi-step aggregation
SQL
Product,
Campaign
In Amazon Redshift using SQL
Search
Views, Clicks,
CTR, CPC etc
Load aggregates
Social
in MySQL for sub second web response
Aggregates
Redshift
41
MySQL RDS
42. data workflow
Jenkins for client+channel ETL
Job control dashboard
Jenkins
Ruby for provisioning, job flow
Data intake/extract
Amazon DynamoDB for state management
Ruby
DynamoDB
On demand, job-initiated
Amazon EMR clusters
S3
42
Hadoop EMR
Redshift
MySQL RDS
43. SaaS dashboard
Designed for redundancy
Hardware and location
Client1.com
Client2.com
ElastiCache
Multi-Tenant
Managed services
DNS
Automated stack provisioning
For clients
MySQL RDS
43
EC2
Beanstalk
Load
Balancing
44. AWS advantages
Innovate
US
Quickly with reduced risk
AMAZON
Time
To market
Java
Ruby
Python
Lower
Operational overhead
Highly
Scalable
44
Developers
DevOps
AWS Ops
45. learnings
Metadata is more important than the data
Design for scalability upfront
Always explore better ways to aggregate
Cost management is very important
Build Agile: Perform early end-to-end validation on smaller dataset
Separate data visualization, data cleansing, storage & data aggregation
Be smart about implementing data aggregation routines across multiple granularities
45
46. Please give us your feedback on this
presentation
DAT205
As a thank you, we will select prize
winners daily for completed surveys!