SlideShare a Scribd company logo
1 of 11
Download to read offline
R and Athena …
there is
another way!?
What is AWS Athena?
Amazon Athena is an interactive query service that makes it easy
to analyse data in Amazon S3 using standard SQL
select * from iris
My S3 Bucket
R_data
iris
iris.csv
mtcars
mtcars.csv
nyc_taxi
yellow_taxi_trip
year=2019
month=01
yellow_taxi_trip.csv
month=02
yellow_taxi_trip.csv
...
...
select * from yellow_taxi_trip
select * from yellow_taxi_trip
where year = ‘2019’
select * from yellow_taxi_trip
where year = ‘2019’
and month = ‘01’
How to connect to AWS Athena?
# connecting using ODBC
library(DBI)
con <- dbConnect(odbc::odbc(), ...)
# connecting using jdbc
library(DBI)
library(RJDBC)
drv <- jdbc(...)
con <- dbConnect(drv, ...)
# connecting using AWS SDK
????
A new Connection Method …
# connecting using AWS SDK
library(reticulate)
boto <- import(“boto3)
athena <- boto$Session()$client(“athena”)
# query athena
res <- athena$start_query_execution(QueryString = “select * from iris”,
QueryExecutionContext =
list(‘Database’ = ‘default’),
ResultConfiguration=
list(‘OutputLocation’= ‘s3://path/to/my/bucket/’))
# check status of query
status <- athena$get_query_execution(QueryExecutionId = res$QueryExecutionId)
# if query was successful then download output to local computer
if (status$QueryExecution$Status$State == “SUCCEEDED”) {
s3 <- boto$Session()$resource(“s3”)
key <- paste0(res$QueryExecutionId, “.csv”)
s3$Bucket(‘s3://path/to/my/bucket/’)$download_file(key, ‘c://my/computer/file.csv’)
}
That is a lot of code to just query a database! Plus there is no poll function to let us
know if the query has been successful or not. Is there a better way?
Introducing RAthena
library(DBI)
con <- dbConnect(RAthena::athena(),
s3_staging_dir = ‘s3://path/to/my/bucket/’)
dbGetQuery(con, “select * from iris”)
What a R SDK!?
# connecting using AWS SDK
athena <- paws::athena()
# query Athena
res <- athena$start_query_execution(QueryString = “select * from iris”,
QueryExecutionContext =
list(‘Database’ = ‘default’),
ResultConfiguration=
list(‘OutputLocation’= ‘s3://path/to/my/bucket/’))
# check status of query
status <- athena$get_query_execution(QueryExecutionId = res$QueryExecutionId)
# if query was successful then download output to local computer
if (status$QueryExecution$Status$State == “SUCCEEDED”) {
s3 <-paws::s3()
key <- paste0(res$QueryExecutionId, “.csv”)
obj <- s3$(Bucket = ‘s3://path/to/my/bucket/’, Key = key)
writeBin(obj$Body, “c://my/computer/file.csv”)
}
Introducing noctua
library(DBI)
con <- dbConnect(noctua::athena(),
s3_staging_dir = ‘s3://path/to/my/bucket/’)
dbGetQuery(con, “select * from iris”)
RStudio integration
Other features
library(DBI)
library(dplyr)
con <- dbConnect(noctua::athena(),
s3_staging_dir = ‘s3://path/to/my/bucket/’)
dbWriteTable(con, “mtcars”, mtcars)
# or dplyr method
copy_to(con, mtcars)
Useful links ….
• AWS Athena: https://aws.amazon.com/athena/
• Boto3: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
• reticulate: https://rstudio.github.io/reticulate/
• paws: https://paws-r.github.io/
• RAthena: https://dyfanjones.github.io/RAthena/
• noctua: https://dyfanjones.github.io/noctua/
Stuff about Me:
• https://github.com/DyfanJones
• https://dyfanjones.me/
• https://www.linkedin.com/in/dyfan-jones-a8261799/
Any
Questions?
Thanks for listening

More Related Content

What's hot

(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...
(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...
(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...Amazon Web Services
 
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMRAmazon Web Services
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands Onhkbhadraa
 
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...Amazon Web Services
 
Azure Durable Functions (2018-06-13)
Azure Durable Functions (2018-06-13)Azure Durable Functions (2018-06-13)
Azure Durable Functions (2018-06-13)Paco de la Cruz
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
PUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPhilipp Fehre
 
AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAmazon Web Services
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_awsHirokuni Uchida
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!Nicolas (Nick) Barcet
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014Amazon Web Services
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014Amazon Web Services
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraCédrick Lunven
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Get Value from Your Data
Get Value from Your DataGet Value from Your Data
Get Value from Your DataDanilo Poccia
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisAmazon Web Services
 
Getting your data in and out of elasticsearch: let me count the ways
Getting your data in and out of elasticsearch: let me count the waysGetting your data in and out of elasticsearch: let me count the ways
Getting your data in and out of elasticsearch: let me count the ways🥑 Jay Miller
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScaleDataWorks Summit
 

What's hot (20)

(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...
(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...
(SEC202) Closing the Gap: Moving Critical, Regulated Workloads to AWS | AWS r...
 
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
 
OpenStack Ceilometer
OpenStack CeilometerOpenStack Ceilometer
OpenStack Ceilometer
 
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...
(SEC309) Amazon VPC Configuration: When Least Privilege Meets the Penetration...
 
Azure Durable Functions (2018-06-13)
Azure Durable Functions (2018-06-13)Azure Durable Functions (2018-06-13)
Azure Durable Functions (2018-06-13)
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
PUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY Riak
 
AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 
From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!From Ceilometer to Telemetry: not so alarming!
From Ceilometer to Telemetry: not so alarming!
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache Cassandra
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Get Value from Your Data
Get Value from Your DataGet Value from Your Data
Get Value from Your Data
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
 
Getting your data in and out of elasticsearch: let me count the ways
Getting your data in and out of elasticsearch: let me count the waysGetting your data in and out of elasticsearch: let me count the ways
Getting your data in and out of elasticsearch: let me count the ways
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte Scale
 

Similar to R and Athena … there is another way!?

(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules
(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules
(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & RulesAmazon Web Services
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisYi Hsuan (Jeddie) Chuang
 
Amazon Web Services for PHP Developers
Amazon Web Services for PHP DevelopersAmazon Web Services for PHP Developers
Amazon Web Services for PHP DevelopersJeremy Lindblom
 
Serverless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesServerless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesDaniel Zivkovic
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)Amazon Web Services
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsAmazon Web Services
 
Living on the edge at Netflix - Adrian Cole
Living on the edge at Netflix - Adrian ColeLiving on the edge at Netflix - Adrian Cole
Living on the edge at Netflix - Adrian Colejaxconf
 
MapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London MeetupMapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London MeetupLandoop Ltd
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryWilliam Candillon
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
Integrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code SuiteIntegrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code SuiteAtlassian
 
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017Grant McAlister
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - PolybaseŁukasz Grala
 
AWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as CodeAWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as CodeAmazon Web Services
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for TrainingBryan Yang
 

Similar to R and Athena … there is another way!? (20)

(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules
(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules
(MBL312) NEW! AWS IoT: Programming a Physical World w/ Shadows & Rules
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
 
Amazon Web Services for PHP Developers
Amazon Web Services for PHP DevelopersAmazon Web Services for PHP Developers
Amazon Web Services for PHP Developers
 
Serverless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesServerless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best Practices
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
 
Living on the edge at Netflix - Adrian Cole
Living on the edge at Netflix - Adrian ColeLiving on the edge at Netflix - Adrian Cole
Living on the edge at Netflix - Adrian Cole
 
MapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London MeetupMapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London Meetup
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Integrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code SuiteIntegrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code Suite
 
Reduxing like a pro
Reduxing like a proReduxing like a pro
Reduxing like a pro
 
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
Amazon RDS for PostgreSQL: What's New and Lessons Learned - NY 2017
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
WhizCard-CLF-C01-06-09-2022.pdf
WhizCard-CLF-C01-06-09-2022.pdfWhizCard-CLF-C01-06-09-2022.pdf
WhizCard-CLF-C01-06-09-2022.pdf
 
AWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as CodeAWS May Webinar Series - Deep Dive: Infrastructure as Code
AWS May Webinar Series - Deep Dive: Infrastructure as Code
 
Full Stack Scala
Full Stack ScalaFull Stack Scala
Full Stack Scala
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 

Recently uploaded

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 

Recently uploaded (17)

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 

R and Athena … there is another way!?

  • 1. R and Athena … there is another way!?
  • 2. What is AWS Athena? Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL select * from iris My S3 Bucket R_data iris iris.csv mtcars mtcars.csv nyc_taxi yellow_taxi_trip year=2019 month=01 yellow_taxi_trip.csv month=02 yellow_taxi_trip.csv ... ... select * from yellow_taxi_trip select * from yellow_taxi_trip where year = ‘2019’ select * from yellow_taxi_trip where year = ‘2019’ and month = ‘01’
  • 3. How to connect to AWS Athena? # connecting using ODBC library(DBI) con <- dbConnect(odbc::odbc(), ...) # connecting using jdbc library(DBI) library(RJDBC) drv <- jdbc(...) con <- dbConnect(drv, ...) # connecting using AWS SDK ????
  • 4. A new Connection Method … # connecting using AWS SDK library(reticulate) boto <- import(“boto3) athena <- boto$Session()$client(“athena”) # query athena res <- athena$start_query_execution(QueryString = “select * from iris”, QueryExecutionContext = list(‘Database’ = ‘default’), ResultConfiguration= list(‘OutputLocation’= ‘s3://path/to/my/bucket/’)) # check status of query status <- athena$get_query_execution(QueryExecutionId = res$QueryExecutionId) # if query was successful then download output to local computer if (status$QueryExecution$Status$State == “SUCCEEDED”) { s3 <- boto$Session()$resource(“s3”) key <- paste0(res$QueryExecutionId, “.csv”) s3$Bucket(‘s3://path/to/my/bucket/’)$download_file(key, ‘c://my/computer/file.csv’) } That is a lot of code to just query a database! Plus there is no poll function to let us know if the query has been successful or not. Is there a better way?
  • 5. Introducing RAthena library(DBI) con <- dbConnect(RAthena::athena(), s3_staging_dir = ‘s3://path/to/my/bucket/’) dbGetQuery(con, “select * from iris”)
  • 6. What a R SDK!? # connecting using AWS SDK athena <- paws::athena() # query Athena res <- athena$start_query_execution(QueryString = “select * from iris”, QueryExecutionContext = list(‘Database’ = ‘default’), ResultConfiguration= list(‘OutputLocation’= ‘s3://path/to/my/bucket/’)) # check status of query status <- athena$get_query_execution(QueryExecutionId = res$QueryExecutionId) # if query was successful then download output to local computer if (status$QueryExecution$Status$State == “SUCCEEDED”) { s3 <-paws::s3() key <- paste0(res$QueryExecutionId, “.csv”) obj <- s3$(Bucket = ‘s3://path/to/my/bucket/’, Key = key) writeBin(obj$Body, “c://my/computer/file.csv”) }
  • 7. Introducing noctua library(DBI) con <- dbConnect(noctua::athena(), s3_staging_dir = ‘s3://path/to/my/bucket/’) dbGetQuery(con, “select * from iris”)
  • 9. Other features library(DBI) library(dplyr) con <- dbConnect(noctua::athena(), s3_staging_dir = ‘s3://path/to/my/bucket/’) dbWriteTable(con, “mtcars”, mtcars) # or dplyr method copy_to(con, mtcars)
  • 10. Useful links …. • AWS Athena: https://aws.amazon.com/athena/ • Boto3: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html • reticulate: https://rstudio.github.io/reticulate/ • paws: https://paws-r.github.io/ • RAthena: https://dyfanjones.github.io/RAthena/ • noctua: https://dyfanjones.github.io/noctua/ Stuff about Me: • https://github.com/DyfanJones • https://dyfanjones.me/ • https://www.linkedin.com/in/dyfan-jones-a8261799/