Learn about AWS business intelligence (BI) analytics, visualization, artificial intelligence, and machine learning services that can transform data into insights.
Attend this webinar to learn about AWS business intelligence (BI) analytics, visualization, artificial intelligence, and machine learning services that can transform data into insights.
Data analytics is the stage where an organization can identify ways to increase revenue or reduce cost. Analytics and visualization delivers decisions makers the insights to transform an organization by identifying unmet needs within the customers or by optimizing operational processes. Data-driven decisions leads to transforming how managers allocate resources and evaluate results within an organization. Reliance on data reduces the role of hearsay and instincts when making choices. A manager’s intuition is now backed with data at the front-end of the planning process, through the course of implementation, and when evaluating the impact of his or her decisions.
Key considerations in this phase include the requirements for analytics being clearly defined; the output being aligned to the use cases; and the consumers of data within the organization finding the insight generated as actionable data. Let’s review some of the solutions available for analytics within the AWS portfolio during this stage.
Picking the right analytical engine for your needs(200)
AWS offers analytical engines for several use cases such as big data processing, data warehousing, ad-hoc analysis, real-time streaming, and operational/log analytics.
In this session, you will learn about what engines you can use for your use case to analyze all of your data stored in your Amazon S3 data lake in open formats.
You will also learn how to use these engines together for generating new insights, such as complementing your data warehouse workloads with ad-hoc and real-time analytics engines to incorporate new data into your reports.
We begin with data warehousing. Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds. With Amazon Redshift, you can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions.
FastAmazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and by parallelizing queries across multiple nodes. Data load speed scales linearly with cluster size, with integrations to Amazon S3, Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and any SSH-enabled host.
InexpensiveYou only pay for what you use. You can have unlimited number of users doing unlimited analytics on all your data for just $1000 per terabyte per year, 1/10th the cost of traditional data warehouse solutions. Most customers see 3-4x reduction of data size after compression, reducing their costs to $250-$333 per uncompressed terabyte per year.
ExtensibleRedshift Spectrum enables you to run queries against exabytes of data in Amazon S3 as easily as you run queries against petabytes of data stored on local disks in Amazon Redshift, using the same SQL syntax and BI tools you use today. You can store highly structured, frequently accessed data on Redshift local disks, keep vast amounts of unstructured data in an Amazon S3 “Data Lake”, and query seamlessly across both.
SimpleAmazon Redshift allows you to easily automate most of the common administrative tasks to manage, monitor, and scale your data warehouse. By handling all these time-consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business.
ScalableYou can easily resize your cluster up and down as your performance and capacity needs change with just a few clicks in the console or a simple API call.
SecureSecurity is built-in. You can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate your clusters using Amazon VPC and even manage your keys using AWS Key Management Service (KMS) and hardware security modules (HSMs).
Traditionally, analytics was run through a relational data warehouse. It collects data from multiple source systems and produce operational reports. This method of analytics had a few characteristics:
It was optimized for relational data sources
It scaled up to PBs
It required the questions to be answered prior to the DW design because schema had to be created to know what type of data is loaded into the data warehouse
It enabled operational reporting on top of the data in the DW
The belief that data is an asset is causing pressure on traditional architectures. It can’t be business as usual anymore because these new customer requirements might break the traditional approach.
Customers need to:
Capture and store new non-relational data at EB scale. Customers want to store new non-relational data that is being sourced by different places not currently in the data warehouse. This includes machine generated data (ie. IoT devices), logfiles, clickstream data, social media, etc. These new sources of data are being generated at a high volume that can scale to Exabyte-size. The traditional data warehouse was not optimized for storing all of this non-relational data because it was designed for relational data at PB scale.
Secure and combine data from new and existing sources. Customers want to have a single view of all of their data and they want an easy way to catalog, search all of this data to do analytics on top of it. Furthermore, they want their data to be secured to prevent unauthorized access. The traditional data architectures were not built to account for this. Data exists in silos or if they are centralized into an enterprise data warehouse, it is extremely costly to build ETL to move the data which will not scale at EB data volumes.
Do new types of analysis on their data (Machine Learning, Big Data processing & real-time analytics). Customers are increasingly needing to do new types of analytics. They want to move from answering questions that happen in the past to using statistical models and forecast techniques to understand and answer what could happen. To do this, customers need to move to incorporate machine learning, big data processing, and real-time analytics. However, their traditional architecture could only accommodate reporting and ad hoc analysis on relational data.
--==[WHAT TO SAY]==--
This table provides a point of comparison, from the old world to the new…
AWS cloud is the best place to build a data lake…
A data lake on AWS gives you access to the most complete platform for big data. AWS provides you with secure infrastructure and offers a broad set of scalable, cost-effective services to collect, store, categorize, and analyze your data to get meaningful insights. AWS makes it easy to build and tailor your data lake to your specific data analytic requirements.
Amazon Redshift Spectrum enables you to run Amazon Redshift SQL queries against exabytes of data in Amazon S3. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “Data Lake” -- without having to load or transform any data. Redshift Spectrum applies sophisticated query optimization, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.
Redshift Spectrum directly queries data in Amazon S3 using the open data formats you already use, including Avro, CSV, Grok, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, you can run sophisticated queries using the same Business Intelligence (BI) tools you use today. You can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and your full data sets stored cost-effectively in Amazon S3.
Start Querying Instantly
Same SQL. Same BI tools. No loading required.
With Amazon Redshift Spectrum, you can start querying your data in Amazon S3 immediately, with no loading or transformation required. You just need to register your Amazon Athena, AWS Glue Data Catalog, or Apache Hive Metastore as an external schema. You can use the same SQL you use for querying Amazon Redshift tables and any BI tool that supports Redshift today.
Fast Performance
Leverage the powerful Amazon Redshift query optimizer.
Amazon Redshift delivers super-fast performance whether it is for ad-hoc analysis on large unstructured data sets in Amazon S3 or frequent analysis on structured data sets in Redshift tables. You can maintain hot data in your Amazon Redshift clusters to get the performance of local disks, and use Amazon Redshift Spectrum to extend your queries to cold data stored in Amazon S3 for unlimited scalability and low cost. The Amazon Redshift query optimizer will automatically determine how to minimize data scanned in Amazon S3 and the number of Redshift Spectrum nodes to use in the query.
Limitless Scalability
Separate compute and storage.
With Amazon Redshift Spectrum, you don’t have to worry about scaling your cluster. It lets you separate storage and compute, allowing you to scale each independently. You can even run multiple Amazon Redshift clusters against the same Amazon S3 Data Lake, enabling limitless concurrency. Redshift Spectrum automatically scales out to thousands of instances if needed, so queries run quickly, whether processing a terabyte, a petabyte or an exabyte.
Pay Per Query
Only pay for data processed.
With Amazon Redshift Spectrum, you only pay for the queries you run. You are charged $5 per terabyte of data processed to execute your query. Redshift Spectrum can query compressed data. You can both save 30% to 90% on your per-query costs and improve performance by compressing, partitioning, and converting your data to a columnar format. There are no charges for Redshift Spectrum when you’re not running queries. You pay standard Amazon S3 rates for data storage and Amazon Redshift instance rates for the clusters used.
If we take a look behind the scenes,
we have the Redshift cluster in green and the purple boxes are the autoscaling, multi tenant, spectrum fleet of compute nodes.
tricks for Spectrum in Redshift optimizer, high levels of parallism to operate on S3 data.
slice upto 10 Spectrum compute : GBs or even an Exabyte.
Separate storage + compute. loading data in your local cluster is now optional. data in S3 + Redshift + Spectrum. Many clusters, perf hit but flexible.
For big data processing, Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB. Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.
We have found EMR to be one of the most cost effective Hadoop and Spark distribution because of all the flexible ways we can bill the customer with per second billing introduced this year. We have spot pricing that can dramatically lower your bill or reserved instances that can lower your bill 50-80%. Finally, with the ability to automatically resize your clusters down based on scaling rules, EMR is the most cost effective place to run your Hadoop and Spark workloads.
Easy to Use
You can launch an Amazon EMR cluster in minutes. You don’t need to worry about node provisioning, cluster setup, Hadoop configuration, or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis.
Low Cost
Amazon EMR pricing is simple and predictable: You pay a per-second rate for every second used, with a one-minute minimum charge. You can launch a 10-node Hadoop cluster for as little as $0.15 per hour. Because Amazon EMR has native support for Amazon EC2 Spot and Reserved Instances, you can also save 50-80% on the cost of the underlying instances.
Elastic
With Amazon EMR, you can provision one, hundreds, or thousands of compute instances to process data at any scale. You can easily increase or decrease the number of instances manually or with Auto Scaling, and you only pay for what you use.
Reliable
You can spend less time tuning and monitoring your cluster. Amazon EMR has tuned Hadoop for the cloud; it also monitors your cluster —retrying failed tasks and automatically replacing poorly performing instances.
Secure
Amazon EMR automatically configures Amazon EC2 firewall settings that control network access to instances, and you can launch clusters in an Amazon Virtual Private Cloud (VPC), a logically isolated network you define. For objects stored in Amazon S3, you can use Amazon S3 server-side encryption or Amazon S3 client-side encryption with EMRFS, with AWS Key Management Service or customer-managed keys.
Flexible
You have complete control over your cluster. You have root access to every instance, you can easily install additional applications, and you can customize every cluster with bootstrap actions. You can also launch Amazon EMR clusters with custom Amazon Linux AMIs.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.
Start Querying Instantly
Serverless. No ETL. Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses. Just point to your data in Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows you to tap into all your data in S3 without the need to set up complex processes to extract, transform, and load the data (ETL).
Pay Per Query
Only pay for data scanned.With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries. You can save from 30% to 90% on your per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Athena queries data directly in Amazon S3. There are no additional storage charges beyond S3.
Open. Powerful. Standard
Built on Presto. Runs standard SQL. ANSI SQL interface, JDBC/ODBC drivers, Can handle multiple formats(CSV, JSON, AVRO, PARQUET, ORC, Geospatial), compression types (GZ, LZO, BZ2) and complex Joins and data types (Arrays, maps, structs)
EasyServerless. Zero Infrastructure. Zero Administration
We are making rapid improvements to solve the three hardest challenges for customers to adopt AI/ML: Cost, Ease of Use, and Data. Our launches this year underscore that.
We see the Machine Learning stack having three key layers.
ML Frameworks:
The bottom layer is for expert machine learning practitioners—researchers and developers.
These are people who are comfortable building models, tuning models, training models, figuring out how to deploy into production, and manage them themselves.
And the vast majority of machine learning in the cloud today at this layer is being down through Amazon SageMaker which provides a managed experience for frameworks, or the AWS Deep Learning AMI that we built that effectively embeds all the major frameworks.
Infrastructure:
AWS offers a broad array of compute options for training and inference with powerful GPU-based instances, compute and memory optimized instances, and even FPGAs.
Our P3 instances provide up to 14 times better performance than previous-generation Amazon EC2 GPU compute instances.
C5 instances offer higher memory to vCPU ratio and deliver 25% improvement in price/performance compared to C4 instances, and are ideal for demanding inference applications.
We also have Amazon EC2 F1, a compute instance with field programmable gate arrays (FPGAs) that you can program to create custom hardware accelerations for your machine learning applications. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code. You can reuse your designs as many times, and across as many F1 instances as you like.
The new Amazon EC2 P3dn instance has four-times the networking bandwidth and twice the GPU memory of the largest P3 instance, P3dn is ideal for large scale distributed training. No one else has anything close.
P3dn.24xlarge instances offer 96vCPUs of Intel Skylake processors to reduce preprocessing time of data required for machine learning training.
The enhanced networking of the P3n instance allows GPUs to be used more efficiently in multi-node configurations so training jobs complete faster.
Finally, the extra GPU memory allows developers to easily handle more advanced machine learning models such as holding and processing multiple batches of 4k images for image classification and object detection systems
ML Services:
But, if you want to enable most enterprises and companies to be able to scale machine learning, we’ve solved that problem for organizations by making ML accessible for everyday developers and scientists. Amazon SageMaker removes the heavy lifting, complexity, and guesswork from each step of the machine learning process.
SageMaker makes model building and training easier by providing pre-built development notebooks, popular machine learning algorithms optimized for petabyte-scale datasets, and automatic model tuning, enabling developers to build, train, and deploy models in a single click.
SageMaker is already helping thousands of developers easily get started with building, training, and deploying models.
AI Services:
At the top layer are AI services which are ready-made for all developers—no ML skills.
For example, customers say here is an object, tell me what's in it, or here's a face, tell me if it's part of this facial group using Amazon Rekognition
Or let me translate text to speech using Amazon Polly
Or let’s build conversational apps with Amazon Lex.
Convert speech to text with Amazon Transcribe
Translate text between languages using Amazon Translate
Understand relationships and find insights from unstructured text using Amazon Comprehend
We have a portfolio of solution based AI services that can be accessed via a simple API call across vision, speech, language service and conversational chatbots.
AWS has invested deeply in these services as they address some of the most common problems and or opportunities customers are facing where AI can advance the state of the art.
AWS has the capability to invest at a level of scale that would be uneconomical for most customer, and our scale enables us to offer these services at low cost.
Customers can build these capabilities into their new and existing applications to reduce costs, increase speed, improve customer satisfaction and insight, and build ‘modern’ intelligent applications
Our AI services are intentionally easy to use. They can be accessed via a simple API call.
When used in conjunction, create compelling solutions that target common business problems and use cases.
ADDITIONAL COLOR:
Amazon Rekognition:
Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Rekognition API, and the service can identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content.
Amazon Rekognition also provides highly accurate facial analysis and facial recognition on images and video that you provide. You can detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.Rekognition is a simple and easy to use API that can quickly analyze any image or video file stored in Amazon S3. Amazon Rekognition is always learning from new data, and we are continually adding new labels and facial recognition features to the service.
More info: https://aws.amazon.com/rekognition/
Amazon Polly:
Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.
Polly is a text to speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.
With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.
More info: https://aws.amazon.com/polly/
Amazon Transcribe:
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications.
Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.
Amazon Transcribe can be used for lots of common applications, including the transcription of customer service calls and generating subtitles on audio and video content.
The service can transcribe audio files stored in common formats, like WAV and MP3, with time stamps for every word so that you can easily locate the audio in the original source by searching for the text. Amazon Transcribe is continually learning and improving to keep pace with the evolution of language.
More info: https://aws.amazon.com/transcribe/
Amazon Translate:
Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation.
Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms.
Amazon Translate allows you to localize content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently.
More info: https://aws.amazon.com/translate/
Amazon Comprehend:
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.
The service identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
Using these APIs, you can analyze text and apply the results in a wide range of applications including voice of customer analysis, intelligent document search, and content personalization for web applications.
More info: https://aws.amazon.com/comprehend
Amazon Lex:
Amazon Lex is a service for building conversational interfaces into any application using voice and text.
Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions.
With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots
More info: https://aws.amazon.com/lex
Amazon SageMaker is the most widely used machine learning service
And it is because SageMaker removes the complexity that holds back developer success
It allows companies of all sizes to easily build sophisticated machine learning models—from prediction engines to intelligent applications and processes
Industry leaders use SageMaker to transform their business—let’s take a look at some successes…