Successfully reported this slideshow.
Your SlideShare is downloading. ×

What is Amazon Athena

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
what is OSI model
what is OSI model
Loading in …3
×

Check these out next

1 of 18 Ad

More Related Content

More from jeetendra mandal (20)

Recently uploaded (20)

Advertisement

What is Amazon Athena

  1. 1. What is Amazon Athena ? Athena is an ANSI-standard query tool, or interactive query service, that works with “big data” stored in Amazon Simple Storage Service (S3). Typical use cases supported by Amazon Athena are data science, machine learning, visualizations, ETL, and reporting. Since AWS Athena is serverless, this means no infrastructure to manage, and you can tap into scalable storage on S3. This also means you only pay for the queries you run, which benefits someone like a data analyst who wants to minimize Amazon Athena costs. Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives. Analyze data or build applications from an Amazon Simple Storage Service (S3) data lake and 25+ data sources, including on-premises data sources or other cloud systems using SQL or
  2. 2. AWS Athena is a serverless interactive analytics service offered by Amazon that can be readily used to gain insights on data residing in S3. Under to hood, Athena used a distributed SQL engine called Presto, which is used to run the SQL queries. Presto is based on the popular open-source technology Hive, to store structured, semi-structured and unstructured data.
  3. 3. Amazon Athena is a serverless data query tool which means it is scalable and cost-effective at the same time. Usually, customers are charged on a pay per query basis which translates to the number of queries that are executed on a given time period. The normal charge for scanning 1TB of data from S3 is 5 USD.
  4. 4. Working with Athena It can quickly analyze the data with the help of Amazon S3 using standard SQL. It even does not need to load the data in Athena. All we require to do is to point to the data in Amazon S3, define the particular schema and start querying using the standard SQL. With the help of Amazon Athena, we can process any of data, whether it is structured, semi-structured or unstructured data, i.e., it can handle the data in CSV ,arrays and objects Amazon Athena provides a simple UI.Getting started with Athena is much more comfortable, all need to do is create a database, select the table name and specify the location of the data on Amazon S3.
  5. 5. Working of AWS Athena Amazon Athena works in direct association with the S3 data. It is used as a distributed SQL engine for running the queries and it also uses Apache Hive for creating and altering tables and partitions. Some of the important standpoints needed for working with Athena include: 1.You must have an AWS Account 2.You should enable your account to export the cost and usage data into the S3 bucket. 3.You can prepare buckets for Athena to connect. 4.AWS also creates manifest files with the use of metadata each time it writes to the bucket. In fact, it creates a folder within the technology AWS billing data bucket known as Athena that contains only the data. 5.For simplifying the setup, a region called the US-West-2 region can also be used. 6.The last and final step is downloading the credentials for the new user because the credentials help indirectly mapping to the database credentials.
  6. 6. Athena Benefits Amazon Athena makes it easier to run the interactive queries against the extensive data by directly uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting the issues in the site. •Based on SQL: You can use Athena to run SQL queries against the desired table that is configured in the Glue data catalogue or data sources that you can connect to using the Athena Query Federation SDK. For users who already know SQL, there is no learning curve to get started. •Open architecture (no vendor lock-in): Athena enables open access to data rather than lock-in to a specific tool or technology. This manifests itself in various ways; •Ubiquitous Access: Because your data is stored in an S3 bucket and the schema is defined in the Glue Data Catalog, you can switch between query engines that can read from these sources without redefining the schema or creating a separate copy of the data.
  7. 7. Athena Benefits Amazon Athena makes it easier to run the interactive queries against the extensive data by directly uploading them in Amazon S3 and don’t worry about managing the infrastructure and handling the data. Athena is best suited when we need to run the queries against some weblogs for troubleshooting the issues in the site. •Separated storage and computing resources: Athena has a complete separation of computing and memory resources. Data is stored in your Amazon S3 account, while Amazon Web Services provide Athena computation as a shared resource among all Athena users. •Open file formats: Unlike many high-performance databases, Athena does not use a proprietary file format but supports standard open source formats such as Apache Parquet, ORC, CSV, and JSON. •Low cost: Athena’s pricing model is based on terabytes of scanned data. You can control and keep costs down by checking only the data you need to answer a specific query (this can be done using data splitting – see below). •Access to all your data: Most organizations process only 30 to 35 percent of their data into a traditional data warehouse due to the high operational and infrastructure costs of constantly resizing database clusters.
  8. 8. Speed and Performance As Amazon Athena is serverless, which makes it quicker and easier to execute the queries on Amazon S3 without taking care of the server and the cluster to set up or manage. Another thing is the initialization time, in Athena, we can straight away query the data on Amazon S3, but in Redshift, we have to wait for the cluster to get active and once the cluster is activated, only then we are allowed to query the data.
  9. 9. Speed and Performance •The optimization is limited to queries: You can optimize your questions, not your data. However, your data is already stored in Amazon S3; performing transformations to use Athena Athena may affect other users using the exact information for other purposes. •Multi-tenancy means pooled resources: All Athena users receive a similar SLA for queries at any time. In other words, the entire global user base is “competing” for the same resources – and although AWS provides more as needed, this could mean that query performance fluctuates depending on other people’s usage. •No indexing: Indexes are integrated into traditional databases but do not exist in Athena. This makes joining large tables a demanding operation that increases the load on Athena and negatively impacts performance. For example, running a query by key requires scanning all the data and searching for the desired key in the result list. This is solved using Upsolver lookup tables. •Partitioning: Efficient queries in Athena require partitioning of the data. Maintaining the number of partitions in the park that meet your performance needs is essential. Every 500 divisions scanned will add 1 second to your query.
  10. 10. Which data types does Amazon Athena support? Athena can process numerous structured and unstructured data types, including standard data formats like CSV (comma-separated value), JSON (JavaScript Object Notation), ORC (Optimized Row Columnar), Apache Parquet and Apache Avro. Athena also supports compressed data in Snappy, Zlib, LZO (Lempel-Ziv-Oberhumer) and Gzip (GNU Zip) formats. Other examples of supported data types include: •Boolean •TinyIT •SMALLINT •Column •VARCHAR •CHAR •BigInt •WorkGroupConfigurationUpdates •UnprocessedNamedQueryId
  11. 11. Feature of Athena •Serverless It is serverless so that the end-user does not have to worry about configuration, infrastructure, scaling, or failure. Athena takes care of it all easily. •Pay Per Query Athena charges you just for the query you run which is the amount of data that gets managed per query. You can actually save a lot if you compress the data and format it accordingly. •Secure Using the IAM policies and the AWS identity, Amazon Athena offers complete control over the data set. With the data being stored in S3 buckets the IAM policies can help in managing control to users. •Available Amazon Athena is highly available and the users can execute queries round the clock. •Machine Learning The developers can use Amazon Sage Maker for creating and deploying the machine learning models in Amazon Athena.
  12. 12. What are the limitations of Amazon Athena? •Optimization is limited to queries. For example, data already stored in S3 cannot be optimized. •No indexing options. Indexing options commonly appear in traditional databases. Without indexing, the operation load on Athena increases, potentially affecting performance. •Efficient queries require partitioning. In order to enable efficient queries, data must first be partitioned. Partitions must then be managed for what best fits performance needs. •Stored procedures, parameterized queries and Presto federated connectors are not supported. Amazon Athena Federated Query is needed to connect data sources. •When querying a table with thousands of partitions, Athena can time out. •Source files that start with an underscore or a dot are treated as hidden. •The row and column size cannot exceed 32 megabytes. •Athena does not support querying data in S3 Glacier and S3 Glacier Deep Archive storage classes.
  13. 13. Summary Athena is a service offered by Amazon that is an interactive query service. Athena makes it easy for the user to directly analyze data in Amazon S3 (Simple Storage Service) using standard SQL. For example, in the Amazon Management Console, it can be set to point to where data is stored in Amazon S3 with a few clicks of a button. SQL can then be used to run ad-hoc queries, bringing the result to the user in seconds. •It does not store data. Instead, storage is managed entirely on Amazon S3. The Athena query service is fully managed, so resources are automatically allocated by AWS as needed to execute a query. •Because your data is stored in an S3 bucket and the schema is defined in the Glue Data Catalog, you can switch between query engines that can read from these sources without redefining the schema or creating a separate copy of the data. •As one of the best serverless architectures, Amazon Athena makes data queries easy to use, set up and fast to run. In fact, the pay-per-use model of Athena makes the entire thing affordable to run analytics. Moreover, since Athena works with Amazon S3 and comes with great scalability, reliability, and durability, this is one of the best suites to run analytics workloads.
  14. 14. THANK YOU Like the Video and Subscribe the Channel

×