For all organizations looking to glean insights from their data, it is essential to deploy the right environment to successfully support analytics workloads. Learn about the different block storage options from AWS and discuss with our experts on how to select the best option for your big data analytics workloads. We will demonstrate how to setup, select, and modify volume types to right size your environment needs.
The AWS cloud computing platform has disrupted big data. Managing big data applications used to be for only well-funded research organizations and large corporations, but not any longer. Hear from Ben Butler, Big Data Solutions Marketing Manager for AWS, to learn how our customers are using big data services in the AWS cloud to innovate faster than ever before. Not only is AWS technology available to everyone, but it is self-service, on-demand, and featuring innovative technology and flexible pricing models at low cost with no commitments. Learn from customer success stories, as Ben shares real-world case studies describing the specific big data challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the resources needed to get you started quickly.
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3.
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover:
• How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
• Reference architectures for popular use cases, including: connected devices (IoT), log streaming, real-time intelligence, and analytics.
• The AWS big data portfolio of services, including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR) and Redshift.
• The latest relational database engine, Amazon Aurora - a MySQL-compatible, highly-available relational database engine which provides up to five times better performance than MySQL at a price one-tenth the cost of a commercial database.
• Amazon Machine Learning – the latest big data service from AWS provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.
BigDL Deep Learning in Apache Spark - AWS re:invent 2017Dave Nielsen
In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Spark’s ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL. Presentation by Alex Kalinin, Tim Fox, Sujee Maniyam & Dave Nielsen at re:invent.
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
For all organizations looking to glean insights from their data, it is essential to deploy the right environment to successfully support analytics workloads. Learn about the different block storage options from AWS and discuss with our experts on how to select the best option for your big data analytics workloads. We will demonstrate how to setup, select, and modify volume types to right size your environment needs.
The AWS cloud computing platform has disrupted big data. Managing big data applications used to be for only well-funded research organizations and large corporations, but not any longer. Hear from Ben Butler, Big Data Solutions Marketing Manager for AWS, to learn how our customers are using big data services in the AWS cloud to innovate faster than ever before. Not only is AWS technology available to everyone, but it is self-service, on-demand, and featuring innovative technology and flexible pricing models at low cost with no commitments. Learn from customer success stories, as Ben shares real-world case studies describing the specific big data challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the resources needed to get you started quickly.
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3.
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover:
• How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
• Reference architectures for popular use cases, including: connected devices (IoT), log streaming, real-time intelligence, and analytics.
• The AWS big data portfolio of services, including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR) and Redshift.
• The latest relational database engine, Amazon Aurora - a MySQL-compatible, highly-available relational database engine which provides up to five times better performance than MySQL at a price one-tenth the cost of a commercial database.
• Amazon Machine Learning – the latest big data service from AWS provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.
BigDL Deep Learning in Apache Spark - AWS re:invent 2017Dave Nielsen
In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Spark’s ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL. Presentation by Alex Kalinin, Tim Fox, Sujee Maniyam & Dave Nielsen at re:invent.
(BDT310) Big Data Architectural Patterns and Best Practices on AWS | AWS re:I...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Organizations around the world are facing a "data tsunami" as next-generation sensors produce enormous volumes of Earth observation data. Come learn how NASA is leveraging AWS to efficiently work with data and computing resources at massive scales. NASA is transforming its Earth Sciences EOSDIS (Earth Observing System Data Information System) program by moving data processing and archiving to the cloud. NASA anticipates that their Data Archives will grow from 16PB today to over 400PB by 2023 and 1 Exabyte by 2030, and they are moving to the cloud in order to scale their operations for this new paradigm. Learn More: https://aws.amazon.com/government-education/
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
Processing the volume and variety of data that today’s organizations produce can be both challenging and costly – especially with a legacy data warehouse. Combining the scale and performance of the cloud with AWS and APN Partner solutions for migration, integration, analysis, and visualization can help overcome these obstacles. With a modern data warehouse architecture, organizations can store, process, and analyze massive volumes of data of virtually any type. Register for this upcoming webinar, where Pearson - an education and media conglomerate - will share in detail how they built a scalable and flexible business intelligence platform on the cloud, with Tableau and AWS.
Learn how you can seamlessly load and transform data in Amazon Redshift with Matillion ETL and analyze it with Tableau. Hear how 47Lining and NorthBay can provide insights to guide you through migration with ease. Tableau will discuss best practices to analyze your data on AWS and share new insights throughout your organization.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Amazon big success using big data analyticsKovid Academy
Today, Big Data is everywhere, but the key problem is – it is too big to tackle and, too complex to evaluate and draw insights from. Also, Big Data Analytics relatively being a state-of-the-art concept, there is a lack of copious knowledge and expertise in the field of Big Data, which is often leading most organizations to misuse their data.
If you are interested to know more about AWS Chicago Summit, please use the following to register: http://amzn.to/1RooPPL
Many AWS customers store vast amounts of data in Amazon S3, a low cost, scalable, and durable object store; Amazon DynamoDB, a NoSQL database; or Amazon Kinesis, a real time data stream processing service. With large datasets in various AWS services, how do you derive value from this information in a cost-effective way? Using Amazon Elastic MapReduce (Amazon EMR) with applications in the Apache Hadoop ecosystem, you can directly interact with data in each of these storage services for scalable analytics workloads or ad hoc queries. You can quickly and easily launch an Amazon EMR cluster from the AWS Management Console, and scale your cluster to match the compute and memory resources needed for your workflow, independent from the storage capacity used in your AWS storage services. The webinar will accelerate your use of Amazon EMR by showing you how to create and monitor Amazon EMR clusters, and provide several use cases and architectures for using Amazon EMR with different AWS data stores.
Learning Objectives: • Recognize when to use Amazon EMR • Understand the steps required to set up and monitor an Amazon EMR cluster • Architect applications that effectively use Amazon EMR • Understand how to use HUE for ad hoc query of data in Amazon S3
Who Should Attend: • Developers, LOB owners, Continuous Integration & Continuous Delivery (CICD) practitioners
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
Considering new ways and options for reducing operational costs and scaling flexibility of your Apache Hadoop/Spark? Try migrating to Amazon EMR!
On-premises Apache Hadoop/Spark clusters are among the top sources of financial pressure for businesses. IT organizations want to reduce spend while still meeting demand, to keep their legacy data applications up and running. Come and learn from experts at Provectus & AWS how you can use Amazon EMR to start driving cost efficiencies in your organization!
Agenda
- Hadoop market and cost optimizations using Amazon EMR
- Cost related and other challenges of on-prem Hadoop clusters
- Cost optimizations by using Amazon EMR and migration best practices
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Nirav Shah, Senior Solutions Architect, AWS
- Perry Peterson, Business Development Manager, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/cost-optimization-for-apache-hadoop-spark-workloads-with-amazon-emr-june-2020/
Collecting, maintaining, and analyzing data is key to keeping pace within any industry today. In addition to being a critical competitive asset, maintaining corporate data requires careful foundational planning to ensure that the data is secure at all stages. Your big data may include not only proprietary non-public information, but also controlled data that must adhere to regulations such as HIPAA or ITAR. Securing this data while maintaining access for authorized data analytics and reporting workloads can pose significant challenges. In this talk, you’ll learn about strategies leveraging tools such as AWS Identity and Access Management (IAM), AWS Key Management Service (KMS) , Amazon S3, and Amazon EMR to secure your big data workloads in the cloud.
Level: 200
Speaker: Hannah Marlowe - Consultant, Federal, WWPS Professional Services
講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
Managing big data and running supercomputing jobs used to be for only well-funded research organizations and large corporations, but not any longer. AWS has democratized supercomputing and big data for the masses! AWS can provide you with the 64th fastest supercomputer in the world, on-demand and pay as you go. Hear from Ben Butler, Head of AWS Big Data Marketing, to learn how our customers are using big data and high performance computing to change the world. Not only is AWS technology available to everyone, but it is self-service and cheaper than ever before, featuring innovative technology and flexible pricing models – our AWS cloud computing platform has disrupted big data and HPC. Learn from customer successes, as Ben shares real-world case studies describing the specific big data and high performance computing challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the tools needed to get you started quickly.
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
Learning Objectives:
- Learn how building computer vision apps with AWS DeepLens can help you learn machine learning
- Learn how you can train models in the cloud with Amazon SageMaker and deploy them to AWS DeepLens
- Learn how you can integrate with other AWS services such as AWS Lambda to extend the functionality of your AWS DeepLens projects
Today’s organisations require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Data Lake is a new and increasingly popular way to store all of your data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand.
In this webinar, you will discover how AWS gives you fast access to flexible and low-cost IT resources, so you can rapidly scale and build your data lake that can power any kind of analytics such as data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity and variety of data.
Learning Objectives:
• Discover how you can rapidly scale and build your data lake with AWS.
• Explore the key pillars behind a successful data lake implementation.
• Learn how to use the Amazon Simple Storage Service (S3) as the basis for your data lake.
• Learn about the new AWS services recently launched, Amazon Athena and Amazon Redshift Spectrum, that help customers directly query that data lake.
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Organizations around the world are facing a "data tsunami" as next-generation sensors produce enormous volumes of Earth observation data. Come learn how NASA is leveraging AWS to efficiently work with data and computing resources at massive scales. NASA is transforming its Earth Sciences EOSDIS (Earth Observing System Data Information System) program by moving data processing and archiving to the cloud. NASA anticipates that their Data Archives will grow from 16PB today to over 400PB by 2023 and 1 Exabyte by 2030, and they are moving to the cloud in order to scale their operations for this new paradigm. Learn More: https://aws.amazon.com/government-education/
講師: Xiaoyong Han, Solution Architect, AWS
Data collection and storage is a primary challenge for any big data architecture. In this webinar, gain a thorough understanding of AWS solutions for data collection and storage, and learn architectural best practices for applying those solutions to your projects. This session will also include a discussion of popular use cases and reference architectures. In this webinar, you will learn:
• Overview of the different types of data that customers are handling to drive high-scale workloads on AWS, and how to choose the best approach for your workload • Optimization techniques that improve performance and reduce the cost of data ingestion • Leveraging Amazon S3, Amazon DynamoDB, and Amazon Kinesis for storage and data collection
Processing the volume and variety of data that today’s organizations produce can be both challenging and costly – especially with a legacy data warehouse. Combining the scale and performance of the cloud with AWS and APN Partner solutions for migration, integration, analysis, and visualization can help overcome these obstacles. With a modern data warehouse architecture, organizations can store, process, and analyze massive volumes of data of virtually any type. Register for this upcoming webinar, where Pearson - an education and media conglomerate - will share in detail how they built a scalable and flexible business intelligence platform on the cloud, with Tableau and AWS.
Learn how you can seamlessly load and transform data in Amazon Redshift with Matillion ETL and analyze it with Tableau. Hear how 47Lining and NorthBay can provide insights to guide you through migration with ease. Tableau will discuss best practices to analyze your data on AWS and share new insights throughout your organization.
This overview presentation discusses big data challenges and provides an overview of the AWS Big Data Platform by covering:
- How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
- Reference architectures for popular use cases, including, connected devices (IoT), log streaming, real-time intelligence, and analytics.
- The AWS big data portfolio of services, including, Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), and Redshift.
- The latest relational database engine, Amazon Aurora— a MySQL-compatible, highly-available relational database engine, which provides up to five times better performance than MySQL at one-tenth the cost of a commercial database.
Created by: Rahul Pathak,
Sr. Manager of Software Development
Amazon big success using big data analyticsKovid Academy
Today, Big Data is everywhere, but the key problem is – it is too big to tackle and, too complex to evaluate and draw insights from. Also, Big Data Analytics relatively being a state-of-the-art concept, there is a lack of copious knowledge and expertise in the field of Big Data, which is often leading most organizations to misuse their data.
If you are interested to know more about AWS Chicago Summit, please use the following to register: http://amzn.to/1RooPPL
Many AWS customers store vast amounts of data in Amazon S3, a low cost, scalable, and durable object store; Amazon DynamoDB, a NoSQL database; or Amazon Kinesis, a real time data stream processing service. With large datasets in various AWS services, how do you derive value from this information in a cost-effective way? Using Amazon Elastic MapReduce (Amazon EMR) with applications in the Apache Hadoop ecosystem, you can directly interact with data in each of these storage services for scalable analytics workloads or ad hoc queries. You can quickly and easily launch an Amazon EMR cluster from the AWS Management Console, and scale your cluster to match the compute and memory resources needed for your workflow, independent from the storage capacity used in your AWS storage services. The webinar will accelerate your use of Amazon EMR by showing you how to create and monitor Amazon EMR clusters, and provide several use cases and architectures for using Amazon EMR with different AWS data stores.
Learning Objectives: • Recognize when to use Amazon EMR • Understand the steps required to set up and monitor an Amazon EMR cluster • Architect applications that effectively use Amazon EMR • Understand how to use HUE for ad hoc query of data in Amazon S3
Who Should Attend: • Developers, LOB owners, Continuous Integration & Continuous Delivery (CICD) practitioners
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
Considering new ways and options for reducing operational costs and scaling flexibility of your Apache Hadoop/Spark? Try migrating to Amazon EMR!
On-premises Apache Hadoop/Spark clusters are among the top sources of financial pressure for businesses. IT organizations want to reduce spend while still meeting demand, to keep their legacy data applications up and running. Come and learn from experts at Provectus & AWS how you can use Amazon EMR to start driving cost efficiencies in your organization!
Agenda
- Hadoop market and cost optimizations using Amazon EMR
- Cost related and other challenges of on-prem Hadoop clusters
- Cost optimizations by using Amazon EMR and migration best practices
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Nirav Shah, Senior Solutions Architect, AWS
- Perry Peterson, Business Development Manager, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/cost-optimization-for-apache-hadoop-spark-workloads-with-amazon-emr-june-2020/
Collecting, maintaining, and analyzing data is key to keeping pace within any industry today. In addition to being a critical competitive asset, maintaining corporate data requires careful foundational planning to ensure that the data is secure at all stages. Your big data may include not only proprietary non-public information, but also controlled data that must adhere to regulations such as HIPAA or ITAR. Securing this data while maintaining access for authorized data analytics and reporting workloads can pose significant challenges. In this talk, you’ll learn about strategies leveraging tools such as AWS Identity and Access Management (IAM), AWS Key Management Service (KMS) , Amazon S3, and Amazon EMR to secure your big data workloads in the cloud.
Level: 200
Speaker: Hannah Marlowe - Consultant, Federal, WWPS Professional Services
講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
Managing big data and running supercomputing jobs used to be for only well-funded research organizations and large corporations, but not any longer. AWS has democratized supercomputing and big data for the masses! AWS can provide you with the 64th fastest supercomputer in the world, on-demand and pay as you go. Hear from Ben Butler, Head of AWS Big Data Marketing, to learn how our customers are using big data and high performance computing to change the world. Not only is AWS technology available to everyone, but it is self-service and cheaper than ever before, featuring innovative technology and flexible pricing models – our AWS cloud computing platform has disrupted big data and HPC. Learn from customer successes, as Ben shares real-world case studies describing the specific big data and high performance computing challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the tools needed to get you started quickly.
This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
Learning Objectives:
- Learn how building computer vision apps with AWS DeepLens can help you learn machine learning
- Learn how you can train models in the cloud with Amazon SageMaker and deploy them to AWS DeepLens
- Learn how you can integrate with other AWS services such as AWS Lambda to extend the functionality of your AWS DeepLens projects
Today’s organisations require a data storage and analytics solution that offers more agility and flexibility than traditional data management systems. Data Lake is a new and increasingly popular way to store all of your data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand.
In this webinar, you will discover how AWS gives you fast access to flexible and low-cost IT resources, so you can rapidly scale and build your data lake that can power any kind of analytics such as data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity and variety of data.
Learning Objectives:
• Discover how you can rapidly scale and build your data lake with AWS.
• Explore the key pillars behind a successful data lake implementation.
• Learn how to use the Amazon Simple Storage Service (S3) as the basis for your data lake.
• Learn about the new AWS services recently launched, Amazon Athena and Amazon Redshift Spectrum, that help customers directly query that data lake.
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...Amazon Web Services
AWS provides a platform that is ideally suited for deploying highly available and reliable systems that can scale with a minimal amount of human interaction. This talk describes a set of architectural patterns that support highly available services that are also scalable, low cost, low latency and allow for taking your application global with the click of a button. We walk through the various architectural decisions taken to achieve high scale and address global audience.
Amazon Web Services provides a number of database management alternatives for all type of customers. You can run managed relational databases, managed NoSQL databases, a petabyte-scale data warehouse, or you can even operate your own online database in the cloud on Amazon EC2. Discover our database offerings and find what service to use according to your existing needs or how to deliver your next big project. Find out about data migration services, tools and best practices for security, availability and scalability, and hear some of the great database success stories from AWS customers.
Speaker: Ari Newman, Account Manager & Rob Carr, Solutions Architect, Amazon Web Services
Featured Customer - Atlassian
Amazon Web Services (AWS) can make hosting scalable, highly-available websites and web applications easier and less expensive for the Enterprise Education customers. Join us for an informative webinar on tools AWS provides to elastically scale your architecture to avoid underutilized resources while reducing complexity with templates, partners, and tools to do much of the heavy lifting of creating and running a website for you.
Mentions about the details and the advantages that cloud computing has to offer in E commerce which is highly use by high tech customers at present modern technology age.
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...Amazon Web Services
Getting started with Amazon Web Services (AWS) is fast and simple. This presentation outlines best practice guidance from the Amazon Web Services team.
By Ryan Shuttleworth, AWS Technical Evangelist
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. Join the General Manager of Amazon EMR, Peter Sirota, to learn how to scale your analytics, use Hadoop with Amazon EMR, write queries with Hive, develop real world data flows with Pig, and understand the operational needs of a production data platform.
Antoine Genereux takes us on a detailed overview of the Database solutions available on the AWS Cloud, addressing the needs and requirements of customers at all levels. He also discusses Business Intelligence and Analytics solutions.
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
Leveraging big data and high performance computing (HPC) solutions enables your organization to make smarter and faster decisions that influence strategy, increase productivity, and ultimately grow your business. We kick off the Big Data and HPC track with the latest advancements in data analytics, databases, storage, and HPC at AWS. Hear customer success stories and discover how to put data to work in your own organization.
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
In this webinar AWS Technical Evangelist, Ian Massingham, discusses the role that AWS services can play in helping you to derive value from your data, from stream processing with Amazon Kinesis, techniques for managing ingest of large data sets, through to processing data with Amazon Elastic MapReduce (EMR) and its ecosystem of tools and running large scale data warehouses on AWS with Redshift.
View the recording: http://youtu.be/7bkqopn19WY
With AWS, you can choose the right storage service for the right use case. Given the myriad of choices, from object storage to block storage, this session will profile details and examples of some of the choices available to you, with details on real world deployments from customers who are using Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), Amazon Glacier, and AWS Storage Gateway.
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.
This webinar will show you examples of how to use Amazon EMR to with the MapR Distribution for Hadoop. You will learn how you can free yourself from the heavy lifting required to run Hadoop on-premises, and gain the advantages of using the cloud to increase flexibility and accelerate projects while lowering costs.
What we'll learn:
• See a live demonstration of how you can quickly and easily launch your first Hadoop cluster in a few steps.
• Examples of real world applications and customer successes in production
• Best practices for maximizing the benefits of using MapR with AWS.
Amazon Elastic MapReduce (Amazon EMR) is a web service that allows you to easily and securely provision and manage your Hadoop clusters. In this talk, we will introduce you to Amazon EMR design patterns, such as using various data stores like Amazon S3, how to take advantage of both transient and active clusters, and how to work with other Amazon EMR architectural patterns. We will dive deep on how to dynamically scale your cluster and address the ways you can fine-tune your cluster. We will discuss bootstrapping Hadoop applications from our partner ecosystem that you can use natively with Amazon EMR. Lastly, we will share best practices on how to keep your Amazon EMR cluster cost-effective.
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...Amazon Web Services
The key to fighting cancer through better therapeutics is a deep understanding of the basic biology of this disease at a cellular and molecular level. Comprehensive analysis of cancer mutations in specific tumors or cancer cell lines by using Life Technologies sequencing and real-time PCR systems generates gigabytes to terabytes of data every day. Our customers bring together this data in studies that seek to discover the genetic fingerprint of cancer. The data typically translates to millions of records in databases that require complex algorithmic processing, cross-application analysis, and interactive visualizations with real-time response (2-3 seconds) to enable users to consume large volumes of complex scientific information.
We have chosen the AWS platform to bring this new era of data analysis power to our customers by using technologies such as Amazon S3, ElastiCache, and DynamoDB for storage and fast access and Amazon EMR for parallelizing complex computations. Our talk tells the story with rich details about challenges and roadblocks in building data-intense, highly interactive applications in the cloud. We also highlight enhanced customer workflows and highly optimized applications with orders of magnitude improvement in performance and scalability.
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
The world is producing an ever increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
Neel Mitra - Solutions Architect, AWS
Roger Dahlstrom - Solutions Architect, AWS
by Mamoon Chowdry, Solutions Architect
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Similar to Aaum Analytics event - Big data in the cloud (20)
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. When you do a directory listing on a folder and the lights
start to flicker
What is Big Data ?
3. When your data sets become
so large that you have to start innovating how to Collect,
Store, Organize, Analyze and Share it
Its tough because of
Velocity, Volume and Variety
What is Big Data ?
6. GB TB
PB
ZB
EB
Unconstrained Data Growth
Big Data is now moving fast …
• IT/ Application server logs
IT Infrastructure logs, Metering,
Audit logs, Change logs
• Web sites / Mobile Apps/ Ads
Clickstream, User Engagement
• Sensor data
Weather, Smart Grids, Wearables
• Social Media, User Content
450MM+ Tweets/day
10. Data Volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Generated data
Available for analysis
11. Elastic & Highly Scalable
+
No capital expense
+
Pay-per-use
+
On-demand
Cloud Computing
$0
= Remove constraints
14. 3.5 Billion records, 71 Million unique cookies, 1.7 Million targeted ads per day
Analyzed customer clicks and impressions with Elastic MapReduce
“…no upfront investment in hardware, no hardware procurement delay, and no
additional operations staff was hired.”
“Because of the richness of the algorithm and the flexibility of the platform to support it
at scale, our first client campaign experienced a 500% increase in their return on
ad spend from a similar campaign a year before.”
Targeted Ad
User recently
purchased a sports
movie and is
searching for video
games
Case Study: Razorfish
20. EMR is Hadoop in the Cloud
Hadoop is an open-source framework for
parallel processing huge amounts of data
on a cluster of machines
What is Amazon Elastic MapReduce (EMR)?
21. How does it work?
1. Put the data
into S3 (or HDFS)
2. Launch your cluster.
Choose:
• Hadoop distribution
• How many nodes
• Node type (hi-CPU, hi-
memory, etc.)
• Hadoop apps (Hive, Pig,
HBase)
3. Get the results.
S3 EMR Cluster
22. How does it work?
S3 EMR Cluster
You can easily
resize the cluster.
23. How does it work?
S3 EMR Cluster
Use Spot instances
to save time and
money.
24. How does it work?
S3 EMR Cluster
Launch parallel
clusters against the
same data source
25. How does it work?
S3 EMR Cluster
When the work is
complete, you can
terminate the
cluster (and stop
paying)
26. Cost to run a 100 node Elastic MapReduce Cluster
INR 450/hour
($7.5/h)
27. Cost to run a 100 node Elastic MapReduce Cluster
Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/
Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/
28. Each day AWS adds the equivalent
server capacity to a global, $7B
enterprise
31. EMR makes it easy to use Hive and Pig
Pig:
• High-level programming
language (Pig Latin)
• Supports UDFs
• Ideal for data flow/ETL
Hive:
• Data Warehouse for Hadoop
• SQL-like query language (HiveQL)
• Initially developed at Facebook
32. HBase:
• Column-oriented database
• Runs on top of HDFS
• Ideal for sparse data
• Random, read/write access
• Ideal for very large tables (billions of
rows, millions of columns)
Mahout:
• Machine learning library
• Supports recommendation
mining, clustering,
classification, and frequent
itemset mining
EMR makes it easy to use HBase and Mahout
33. EMR makes it easy to use Spark and Shark
Shark:
• Hive on Spark
• Up to 100x Faster
• Compatible with Hive
• Used at Yahoo, Airbnb, etc
• Download via BA
Spark:
• In-memory MapReduce
• Up to 100x faster than Hadoop
• Access HDFS, HBase, S3
• Developed at UCBerkely
• Download via BA
34. Ganglia
• Scalable distributed monitoring
• View performance of the cluster and
individual nodes
• Open source
R:
• Language and software
environment for statistical
computing and graphics
• Open source
EMR makes it easy to use other tools and
applications
35. EMR Also Supports MapR’s Hadoop Distributions
In addition to the Amazon Hadoop distribution, EMR supports
two MapR Hadoop distributions
M3
M5
M7
Key features of MapR
NFS
No NameNode
JobTracker high availability
Cluster mirroring (disaster recovery)
Compression
39. Amazon Redshift
• Easy to provision, scale, operate
• No up-front costs, pay-as-you-go
• Fast: Columnar storage, compression,
Specialized nodes
• 1-100 node clusters 2TB - 1.6PB
• $999 per TB per year
• Transparent backups, restore, failover
• Security in transit, at rest, for backups, VPC
Data Warehousing the AWS way
45. 3 million (100 * 30 thousand) queries are executed per hour on 100
HANA instances.
46. Each SAP HANA instance is deployed on an Amazon Web Services (AWS) Linux
server.
Each AWS Server has 16 cores, 60 GB of RAM and 600 million rows
(Total: 1776 cores, 6.6TB RAM and 60 billion rows).”