SlideShare a Scribd company logo
Amazon Data
Analysis
MEMBERS : Vinay Gupta (3521)
Yash Patil (3530)
Yash Thakur (3544)
INDEX
1. INTRODUCTION
2. Background of R
3. AWS
4. Use cases for R on AWS
 Big Data Processing
 Databases
 File Storage
5. Getting started with AWS in R
6. Connecting to Databases
7. Extracting Text and Tables
8. Uploading Data to Database
INTRODUCTION
 Language and environment for statistical computing and graphics.
 Similar to the S language and environment.
 Generally comes with the Command-line interface.
 Provides a wide variety of statistical and graphical techniques, and is highly
extensible.
 R’s strengths is the ease, with which well-designed publication-quality plots can be
produced.
 Is available as Free Software in source code form which compiles & runs on a wide
variety of UNIX platforms and similar systems.
Background of R
 R programming is used as a leading tool for machine learning, statistics, and data analysis.
 It’s a platform-independent language.
 It’s an open-source free language.
 R programming language is not only a statistic package but also allows us to integrate with other
languages.
 Another important part of the R ecosystem is the development environment RStudio.
 One of the most popular sets of packages in the R ecosystem is the Tidy verse.
 These are designed to allow users to ingest data.
 The R programming language has a vast community of users and it’s growing day by day.
 R is currently one of the most requested programming languages.
 AWS (Amazon Web Services) is a comprehensive, evolving cloud computing platform.

 AWS services can offer an organization tools such as compute power, database storage and content delivery services.
 AWS was launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail
operations.
 AWS offers many different tools and solutions for enterprises and software developers that can be used in data centers
in up to 190 countries.
How AWS works??
 AWS are separated into different services which makes it easy to handle.
 Each can be configured in different ways based on the user's needs which helps the Users to see configuration options
and individual server maps for an AWS service.
 More than 100 services comprise the Amazon Web Services portfolio, including those for compute, databases,
infrastructure management, application development and security.
IaaS
SaaS
PaaS
Use Cases For R On AWS
Big Data Processing
 For big data problems, R can be limited by locally available memory; high-memory instance
types help here.
 R deals with data in-memory by default, so using an instance with more memory can make a
problem tractable without having to make changes to code.
 Many problems are also parallelizable, and with R’s support for parallel processing, modifying
code to use R’s parallel processing packages allows users to take advantage of instance types
with a large number of cores.
 Between AWS’ R-type (memory optimized) and C-type (compute optimized) instances,
developers can choose an instance type that closely matches their compute and memory
workload needs.
 Often, data scientists deal with these big problems only part of the time, and running permanent
Amazon EC2 instances or containers would not be cost effective.
DATABASES
 Databases are a valuable resource for data science teams; they provide a single source
of truth for datasets and offer performant reads and writes.
 We can take advantage of popular databases like PostgreSQL through Amazon
Relational Database Service (Amazon RDS), while letting AWS take care of underlying
instance and database maintenance.
 In many cases, R can interact with these services with only small modifications; the Tidy
verse packages within R allow you to write your code irrespective of where it’s going to
run, and allow you to retarget the code to perform operations on data sourced from
the database.
FILE STORAGE
 Lastly, Amazon Simple Storage Service (Amazon S3) allows developers to
store raw input files, results, reports, artifacts, and anything else that we
wouldn’t want to store directly in a database.
 Items stored in S3 are accessible online, making sharing resources with
collaborators easy, but it also offers fine-grained resource permissions so
that access is limited to only those who should have it.
AWS Cost & Usage Data!
AWS Cost and Usage Reports can do the following:
 Deliver report files to your Amazon S3 bucket
 Update the report up to three times a day
 Create, retrieve, and delete your reports using the AWS CUR API Reference
 The AWS Cost & Usage Report contains the most comprehensive set of AWS cost and usage data
available, including additional metadata about AWS services, pricing, credit, fees, taxes, discounts,
cost categories, Reserved Instances, and Savings Plans.
 The AWS Cost & Usage Report (CUR) itemizes usage at the account or Organization level by product
code, usage type and operation. These costs can be further organized by Cost Allocation tags and
Cost Categories.
 The AWS Cost & Usage Report is available at an hourly, daily, or monthly level of granularity, as well
as at the management or member account level.
 The right access, users can access CUR at management and member account level, which saves
management account holders from having to generate CUR reports for member accounts
Getting StartedWith In
 To use AWS in R, you can use the Paws AWS software development kit, an R
package developed by my colleague Adam Banker and me.
 Paws is an unofficial SDK, but it covers most of the same functionality as the
official SDKs for other languages.
 You can also use the official Python SDK, boto3, through the bettor and
reticulate packages, but you also will need to ensure Python is installed on
your machine before using them
Connecting to Databases
 You can use databases in R by setting up a connection to the
database.
 Then you can refer to tables in the database as if they were datasets in
R.
 The dbplyr package in the Tidy verse and the dbplyr database backend
are what provide this functionality.
ExtractingText andTables
 Here, we need to identify where the tables are, then reconstruct their rows and
columns based on the position and spacing of the words or numbers on the page.
 To do this we use Amazon Extract, an AWS-managed AI service, to get data from
images and PDFs.
 With the Paws SDK for R, we can get a PDF document’s text using the operation
start_document_text_detection and get a document’s tables and forms using the
operation start_document_analysis.
 These are asynchronous operations, which means that they will initialize text
detection and document analysis jobs, returning an identifier for the specific jobs that
we can poll to check the completion status.
 Once the job is finished, we can then retrieve the result with a second operation,
get_document_text_detection and get_document_analysis respectively, by passing in
the job IDs.
Uploading Data to Database
 A suitably configured PostgreSQL server running on RDS supports authentication via
IAM, avoiding the need to store passwords.
 If we are using an IAM user or role with the appropriate permissions, we can then
connect to our PostgreSQL database from R using an IAM authentication token.
 The Paws package supports this feature as well; functionality that was developed using
the support of the AWS Open Source program.
 We connect to our database using the token generated by build_auth_token from the
Paws package.
THANKYOU..!!

More Related Content

Similar to Adv. R (AWS) EDITED.pptx

The Pro Measures Of Amazon Web Services
The Pro Measures Of Amazon Web ServicesThe Pro Measures Of Amazon Web Services
The Pro Measures Of Amazon Web Services
Intelligentia IT Systems Pvt. Ltd.
 
AWS Data Engineering Guide: Everything you need to know - By DataToBiz
AWS Data Engineering Guide: Everything you need to know - By DataToBizAWS Data Engineering Guide: Everything you need to know - By DataToBiz
AWS Data Engineering Guide: Everything you need to know - By DataToBiz
Kavika Roy
 
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdfAmazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Hoang CHi THang
 
AWS RDS Vs Aurora: Everything You Need to Know
AWS RDS Vs Aurora: Everything You Need to KnowAWS RDS Vs Aurora: Everything You Need to Know
AWS RDS Vs Aurora: Everything You Need to Know
Lucy Zeniffer
 
AWSome Day MODULE 3 - Databases
AWSome Day MODULE 3 - DatabasesAWSome Day MODULE 3 - Databases
AWSome Day MODULE 3 - Databases
Amazon Web Services
 
Databases - State of the Union
Databases - State of the UnionDatabases - State of the Union
Databases - State of the Union
Amazon Web Services
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Ankit Rathi
 
AWS course in Pune.pdf
AWS course in Pune.pdfAWS course in Pune.pdf
AWS course in Pune.pdf
Hrushikesh Joshi
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
Amazon Web Services
 
cloud computing.pptx
cloud computing.pptxcloud computing.pptx
cloud computing.pptx
GayathriP95
 
Data Engineering
Data EngineeringData Engineering
Data Engineering
kiansahafi
 
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
Amazon Web Services
 
Aws- Amazon Web Services
Aws- Amazon Web ServicesAws- Amazon Web Services
Aws- Amazon Web Services
Shreya Srivastava
 
Cloud service providers
Cloud service providersCloud service providers
Cloud service providers
AgnihotriGhosh1
 
Amazon rds product details
Amazon rds product detailsAmazon rds product details
Amazon rds product details
Apsara G
 
AWS MLS-C01 Exam Study Notes
AWS MLS-C01 Exam Study NotesAWS MLS-C01 Exam Study Notes
AWS MLS-C01 Exam Study Notes
Tiffany Jachja
 
Aw spppt
Aw sppptAw spppt
Aw spppt
sterlingit
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
Safe Software
 

Similar to Adv. R (AWS) EDITED.pptx (20)

The Pro Measures Of Amazon Web Services
The Pro Measures Of Amazon Web ServicesThe Pro Measures Of Amazon Web Services
The Pro Measures Of Amazon Web Services
 
AWS Data Engineering Guide: Everything you need to know - By DataToBiz
AWS Data Engineering Guide: Everything you need to know - By DataToBizAWS Data Engineering Guide: Everything you need to know - By DataToBiz
AWS Data Engineering Guide: Everything you need to know - By DataToBiz
 
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdfAmazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdf
 
AWS RDS Vs Aurora: Everything You Need to Know
AWS RDS Vs Aurora: Everything You Need to KnowAWS RDS Vs Aurora: Everything You Need to Know
AWS RDS Vs Aurora: Everything You Need to Know
 
AWSome Day MODULE 3 - Databases
AWSome Day MODULE 3 - DatabasesAWSome Day MODULE 3 - Databases
AWSome Day MODULE 3 - Databases
 
Databases - State of the Union
Databases - State of the UnionDatabases - State of the Union
Databases - State of the Union
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
 
AWS course in Pune.pdf
AWS course in Pune.pdfAWS course in Pune.pdf
AWS course in Pune.pdf
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
cloud computing.pptx
cloud computing.pptxcloud computing.pptx
cloud computing.pptx
 
Data Engineering
Data EngineeringData Engineering
Data Engineering
 
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
Migrating Massive Databases and Data Warehouses to the Cloud - ENT327 - re:In...
 
Aws- Amazon Web Services
Aws- Amazon Web ServicesAws- Amazon Web Services
Aws- Amazon Web Services
 
Cloud service providers
Cloud service providersCloud service providers
Cloud service providers
 
Amazon rds product details
Amazon rds product detailsAmazon rds product details
Amazon rds product details
 
Aws coi7
Aws coi7Aws coi7
Aws coi7
 
AWS MLS-C01 Exam Study Notes
AWS MLS-C01 Exam Study NotesAWS MLS-C01 Exam Study Notes
AWS MLS-C01 Exam Study Notes
 
Amazon web services
Amazon web servicesAmazon web services
Amazon web services
 
Aw spppt
Aw sppptAw spppt
Aw spppt
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Adv. R (AWS) EDITED.pptx

  • 1. Amazon Data Analysis MEMBERS : Vinay Gupta (3521) Yash Patil (3530) Yash Thakur (3544)
  • 2. INDEX 1. INTRODUCTION 2. Background of R 3. AWS 4. Use cases for R on AWS  Big Data Processing  Databases  File Storage 5. Getting started with AWS in R 6. Connecting to Databases 7. Extracting Text and Tables 8. Uploading Data to Database
  • 3. INTRODUCTION  Language and environment for statistical computing and graphics.  Similar to the S language and environment.  Generally comes with the Command-line interface.  Provides a wide variety of statistical and graphical techniques, and is highly extensible.  R’s strengths is the ease, with which well-designed publication-quality plots can be produced.  Is available as Free Software in source code form which compiles & runs on a wide variety of UNIX platforms and similar systems.
  • 4. Background of R  R programming is used as a leading tool for machine learning, statistics, and data analysis.  It’s a platform-independent language.  It’s an open-source free language.  R programming language is not only a statistic package but also allows us to integrate with other languages.  Another important part of the R ecosystem is the development environment RStudio.  One of the most popular sets of packages in the R ecosystem is the Tidy verse.  These are designed to allow users to ingest data.  The R programming language has a vast community of users and it’s growing day by day.  R is currently one of the most requested programming languages.
  • 5.
  • 6.  AWS (Amazon Web Services) is a comprehensive, evolving cloud computing platform.   AWS services can offer an organization tools such as compute power, database storage and content delivery services.  AWS was launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations.  AWS offers many different tools and solutions for enterprises and software developers that can be used in data centers in up to 190 countries. How AWS works??  AWS are separated into different services which makes it easy to handle.  Each can be configured in different ways based on the user's needs which helps the Users to see configuration options and individual server maps for an AWS service.  More than 100 services comprise the Amazon Web Services portfolio, including those for compute, databases, infrastructure management, application development and security. IaaS SaaS PaaS
  • 7. Use Cases For R On AWS
  • 8. Big Data Processing  For big data problems, R can be limited by locally available memory; high-memory instance types help here.  R deals with data in-memory by default, so using an instance with more memory can make a problem tractable without having to make changes to code.  Many problems are also parallelizable, and with R’s support for parallel processing, modifying code to use R’s parallel processing packages allows users to take advantage of instance types with a large number of cores.  Between AWS’ R-type (memory optimized) and C-type (compute optimized) instances, developers can choose an instance type that closely matches their compute and memory workload needs.  Often, data scientists deal with these big problems only part of the time, and running permanent Amazon EC2 instances or containers would not be cost effective.
  • 9. DATABASES  Databases are a valuable resource for data science teams; they provide a single source of truth for datasets and offer performant reads and writes.  We can take advantage of popular databases like PostgreSQL through Amazon Relational Database Service (Amazon RDS), while letting AWS take care of underlying instance and database maintenance.  In many cases, R can interact with these services with only small modifications; the Tidy verse packages within R allow you to write your code irrespective of where it’s going to run, and allow you to retarget the code to perform operations on data sourced from the database.
  • 10. FILE STORAGE  Lastly, Amazon Simple Storage Service (Amazon S3) allows developers to store raw input files, results, reports, artifacts, and anything else that we wouldn’t want to store directly in a database.  Items stored in S3 are accessible online, making sharing resources with collaborators easy, but it also offers fine-grained resource permissions so that access is limited to only those who should have it.
  • 11. AWS Cost & Usage Data!
  • 12. AWS Cost and Usage Reports can do the following:  Deliver report files to your Amazon S3 bucket  Update the report up to three times a day  Create, retrieve, and delete your reports using the AWS CUR API Reference
  • 13.  The AWS Cost & Usage Report contains the most comprehensive set of AWS cost and usage data available, including additional metadata about AWS services, pricing, credit, fees, taxes, discounts, cost categories, Reserved Instances, and Savings Plans.  The AWS Cost & Usage Report (CUR) itemizes usage at the account or Organization level by product code, usage type and operation. These costs can be further organized by Cost Allocation tags and Cost Categories.  The AWS Cost & Usage Report is available at an hourly, daily, or monthly level of granularity, as well as at the management or member account level.  The right access, users can access CUR at management and member account level, which saves management account holders from having to generate CUR reports for member accounts
  • 15.  To use AWS in R, you can use the Paws AWS software development kit, an R package developed by my colleague Adam Banker and me.  Paws is an unofficial SDK, but it covers most of the same functionality as the official SDKs for other languages.  You can also use the official Python SDK, boto3, through the bettor and reticulate packages, but you also will need to ensure Python is installed on your machine before using them
  • 16. Connecting to Databases  You can use databases in R by setting up a connection to the database.  Then you can refer to tables in the database as if they were datasets in R.  The dbplyr package in the Tidy verse and the dbplyr database backend are what provide this functionality.
  • 17. ExtractingText andTables  Here, we need to identify where the tables are, then reconstruct their rows and columns based on the position and spacing of the words or numbers on the page.  To do this we use Amazon Extract, an AWS-managed AI service, to get data from images and PDFs.  With the Paws SDK for R, we can get a PDF document’s text using the operation start_document_text_detection and get a document’s tables and forms using the operation start_document_analysis.  These are asynchronous operations, which means that they will initialize text detection and document analysis jobs, returning an identifier for the specific jobs that we can poll to check the completion status.  Once the job is finished, we can then retrieve the result with a second operation, get_document_text_detection and get_document_analysis respectively, by passing in the job IDs.
  • 18. Uploading Data to Database  A suitably configured PostgreSQL server running on RDS supports authentication via IAM, avoiding the need to store passwords.  If we are using an IAM user or role with the appropriate permissions, we can then connect to our PostgreSQL database from R using an IAM authentication token.  The Paws package supports this feature as well; functionality that was developed using the support of the AWS Open Source program.  We connect to our database using the token generated by build_auth_token from the Paws package.