SlideShare a Scribd company logo
All images in this presentation are subject to copyright and belong to respective
Hands-on hack session
Data Science &
Cloud
Computing
All images in this presentation are subject to copyright and belong to respective
DISCLAIMER 2:
The opinions expressed in
this presentation are my own
views and not those of
JITHENDRA
BALAKRISHNAN
Technical Leader,
Cloud Product
Solutions
Head of Technology,
47Line Technologies
@jitcompil
e
/jithendrabalakrishn
an
DISCLAIMER 1:
All copyrights and trademarks of images
belong to their respective IP owners and
are used under Fair Use for educational
All images in this presentation are subject to copyright and belong to respective
AGENDA
Cloud
Computing
Storage
Data
Science
Compute
Learning
Hands on
Hack
All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
Harvard Business Review
“Data Scientist: The Sexiest Job
of the 21st Century”
All images in this presentation are subject to copyright and belong to respective
Data Science
Process
All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
Paul Maritz, Pivotal
“Cloud is about how you do
computing, not where you do
computing”
All images in this presentation are subject to copyright and belong to respective
Storage Compute Learning
CLOUD COMPUTING
SERVICES
All images in this presentation are subject to copyright and belong to respective
AMAZONWEB
SERVICES
All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
W. Edwards Deming, Scholar & Teacher
“In God we trust. All others must
bring data”
Structured
Unstructured
Graph
Time Series
All images in this presentation are subject to copyright and belong to respective
DATA IS
THE
NEW OIL
Value
Variety
Velocity
Volume
All images in this presentation are subject to copyright and belong to respective
AMAZON S3
Object storage to store
and retrieve any
amount of data from
anywhere.
AMAZON REDSHIFT
Fully managed
petabyte scale data
warehouse.
AMAZON NEPTUNE
Fully managed graph
database engine.
AMAZON RDS
Fully managed
relational database
service.
AMAZON DYNAMODB
Fast & Flexible NoSQL
database service.
AMAZON ELASTICACHE
Managed Redis &
MemCached as a
Service.
AMAZON AURORA
Fully managed MySQL
& PostgreSQL
compliant cloud
database.
AMAZON GLACIER
Secure, durable & low
cost data archival &
long term backup
service.
AMAZON SIMPLEDB
Highly available,
secure & inexpensive
NoSQL data store.
All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
Peter Norvig, Google Research
“More data beats clever
algorithms, but better data beats
more data.”
All images in this presentation are subject to copyright and belong to respective
SCALABLE PROCESSING ELASTICIT
Y
SCALABILI
TY
COST
All images in this presentation are subject to copyright and belong to respective
Secure resizable elastic compute
capacity in the cloud.
EC2
Managed Hadoop framework
for easy, fast and cost-effective
cluster for processing large
amounts of data
Interactive
SQL query
service to
analyze data
in S3.
ATHENA
Fully managed
ETL service to
prepare and
load data for
analytics.
EMRGLUE
COMPUTE
SERVICES
All images in this presentation are subject to copyright and belong to respective
COST OPTIONS
SPOT INSTANCES
Spare AWS capacity available
at up to 90% discount.
Recommended for stateless,
low cost and flexible timed
applications.
RESERVED INSTANCES
Provides up to 75% discount
on committed usage over 1 or
3 year period. Recommended
for Steady state and planned
capacity needs.
SPOT BLOCK
Spare AWS capacity available
at up to 40% discount on
committed usage of 6 hours.
Recommended for low cost,
low risk and known duration
workloads.
02
03
01
All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective
Andrew Ng, Chairman, Coursera
“Artificial Intelligence is the new
Electricity”
All images in this presentation are subject to copyright and belong to respective
All images in this presentation are subject to copyright and belong to respective
o Machine Learning for
everyone
o API-driven ML services
o GPU Instances
o Powerful Compute
o FPGA Hardware
Acceleration
MACHINE
LEARNING AS A
SERVICE
All images in this presentation are subject to copyright and belong to respective
All images in this presentation are subject to copyright and belong to respective
All images in this presentation are subject to copyright and belong to respective
SUMMARY
1
DATA SCIENCE
Inter-disciplinary field that involves
the entire technology organization
2
CLOUD
COMPUTING
Helps data science practitioners by
simplifying usage of resources &
tools
3
DATA STORAGE
Data is collected at volume and
clear storage plan helps in
reducing costs
4
DATA PROCESSING
Cheap compute resources helps in
cleaning & extracting value from
data
5
MACHINE
LEARNING
Automated algorithms available as
service with managed infrastructure
6
MODEL USAGE
API services to apply machine
learning models in real world
applications

More Related Content

What's hot

Cloud Computing Benefits
Cloud Computing BenefitsCloud Computing Benefits
Cloud Computing Benefits
onefederalsolution
 
Interoperability and Portability for Cloud Computing: A Guide
Interoperability and Portability for Cloud Computing: A GuideInteroperability and Portability for Cloud Computing: A Guide
Interoperability and Portability for Cloud Computing: A Guide
Cloud Standards Customer Council
 
Tableau
TableauTableau
Tableau
Nilesh Patel
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
Edureka!
 
Seven step model of migration into the cloud
Seven step model of migration into the cloudSeven step model of migration into the cloud
Seven step model of migration into the cloud
Raj Raj
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Vikram Nandini
 
Cloud computing and service models
Cloud computing and service modelsCloud computing and service models
Cloud computing and service models
Prateek Soni
 
Informatica Interview Questions | Informatica Tutorial | Informatica Training...
Informatica Interview Questions | Informatica Tutorial | Informatica Training...Informatica Interview Questions | Informatica Tutorial | Informatica Training...
Informatica Interview Questions | Informatica Tutorial | Informatica Training...
Edureka!
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
Almog Ramrajkar
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
SadhanaParameswaran
 
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
DHANUSAIREDDY
 
Evaluating web conference_tools
Evaluating web conference_toolsEvaluating web conference_tools
Evaluating web conference_tools
Aniket Maithani
 
Green computing
Green computingGreen computing
Green computing
Snehasis Panigrahi
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Introduction to Green IT
Introduction to Green ITIntroduction to Green IT
Introduction to Green IT
Chris Hammond-Thrasher
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
Girish Khanzode
 
Clique and sting
Clique and stingClique and sting
Clique and sting
Subramanyam Natarajan
 

What's hot (20)

Cloud Computing Benefits
Cloud Computing BenefitsCloud Computing Benefits
Cloud Computing Benefits
 
Interoperability and Portability for Cloud Computing: A Guide
Interoperability and Portability for Cloud Computing: A GuideInteroperability and Portability for Cloud Computing: A Guide
Interoperability and Portability for Cloud Computing: A Guide
 
Tableau
TableauTableau
Tableau
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
Internet of Things(IoT) Applications | IoT Tutorial for Beginners | IoT Train...
 
Seven step model of migration into the cloud
Seven step model of migration into the cloudSeven step model of migration into the cloud
Seven step model of migration into the cloud
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Cloud computing and service models
Cloud computing and service modelsCloud computing and service models
Cloud computing and service models
 
Informatica Interview Questions | Informatica Tutorial | Informatica Training...
Informatica Interview Questions | Informatica Tutorial | Informatica Training...Informatica Interview Questions | Informatica Tutorial | Informatica Training...
Informatica Interview Questions | Informatica Tutorial | Informatica Training...
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
ROLE OF INFORMATION TECHNOLOGY IN ENERGY CONSERVATION
 
Evaluating web conference_tools
Evaluating web conference_toolsEvaluating web conference_tools
Evaluating web conference_tools
 
Green computing
Green computingGreen computing
Green computing
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Introduction to Green IT
Introduction to Green ITIntroduction to Green IT
Introduction to Green IT
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 

Similar to Data science and cloud computing

Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
Harald Erb
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
sarith divakar
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
Data Con LA
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
Amazon Web Services
 
2017 12 lab informatics summit
2017 12 lab informatics summit2017 12 lab informatics summit
2017 12 lab informatics summit
Chris Dwan
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
Timothy Spann
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
Alluxio, Inc.
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Timothy Spann
 
Microsoft Data Warehousing
Microsoft Data Warehousing Microsoft Data Warehousing
Microsoft Data Warehousing
Glenture
 
All Change! How the new economics of Cloud will make you think differently ab...
All Change! How the new economics of Cloud will make you think differently ab...All Change! How the new economics of Cloud will make you think differently ab...
All Change! How the new economics of Cloud will make you think differently ab...
Steve Poole
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
SoftServe
 
DeepScale: Real-Time Perception for Automated Driving
DeepScale: Real-Time Perception for Automated DrivingDeepScale: Real-Time Perception for Automated Driving
DeepScale: Real-Time Perception for Automated Driving
Forrest Iandola
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
Anyscale
 
Hands-On with Oracle SOA Cloud Service
Hands-On with Oracle SOA Cloud ServiceHands-On with Oracle SOA Cloud Service
Hands-On with Oracle SOA Cloud Service
Revelation Technologies
 
Image and text Encryption using RSA algorithm in java
Image and text Encryption using RSA algorithm in java  Image and text Encryption using RSA algorithm in java
Image and text Encryption using RSA algorithm in java
PiyushPatil73
 

Similar to Data science and cloud computing (20)

Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
2017 12 lab informatics summit
2017 12 lab informatics summit2017 12 lab informatics summit
2017 12 lab informatics summit
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
Microsoft Data Warehousing
Microsoft Data Warehousing Microsoft Data Warehousing
Microsoft Data Warehousing
 
All Change! How the new economics of Cloud will make you think differently ab...
All Change! How the new economics of Cloud will make you think differently ab...All Change! How the new economics of Cloud will make you think differently ab...
All Change! How the new economics of Cloud will make you think differently ab...
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
DeepScale: Real-Time Perception for Automated Driving
DeepScale: Real-Time Perception for Automated DrivingDeepScale: Real-Time Perception for Automated Driving
DeepScale: Real-Time Perception for Automated Driving
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
 
Hands-On with Oracle SOA Cloud Service
Hands-On with Oracle SOA Cloud ServiceHands-On with Oracle SOA Cloud Service
Hands-On with Oracle SOA Cloud Service
 
Image and text Encryption using RSA algorithm in java
Image and text Encryption using RSA algorithm in java  Image and text Encryption using RSA algorithm in java
Image and text Encryption using RSA algorithm in java
 

Recently uploaded

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 

Recently uploaded (20)

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 

Data science and cloud computing

  • 1. All images in this presentation are subject to copyright and belong to respective Hands-on hack session Data Science & Cloud Computing
  • 2. All images in this presentation are subject to copyright and belong to respective DISCLAIMER 2: The opinions expressed in this presentation are my own views and not those of JITHENDRA BALAKRISHNAN Technical Leader, Cloud Product Solutions Head of Technology, 47Line Technologies @jitcompil e /jithendrabalakrishn an DISCLAIMER 1: All copyrights and trademarks of images belong to their respective IP owners and are used under Fair Use for educational
  • 3. All images in this presentation are subject to copyright and belong to respective AGENDA Cloud Computing Storage Data Science Compute Learning Hands on Hack
  • 4. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Harvard Business Review “Data Scientist: The Sexiest Job of the 21st Century”
  • 5. All images in this presentation are subject to copyright and belong to respective Data Science Process
  • 6. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Paul Maritz, Pivotal “Cloud is about how you do computing, not where you do computing”
  • 7. All images in this presentation are subject to copyright and belong to respective Storage Compute Learning CLOUD COMPUTING SERVICES
  • 8. All images in this presentation are subject to copyright and belong to respective AMAZONWEB SERVICES
  • 9. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective W. Edwards Deming, Scholar & Teacher “In God we trust. All others must bring data”
  • 11. All images in this presentation are subject to copyright and belong to respective DATA IS THE NEW OIL Value Variety Velocity Volume
  • 12. All images in this presentation are subject to copyright and belong to respective AMAZON S3 Object storage to store and retrieve any amount of data from anywhere. AMAZON REDSHIFT Fully managed petabyte scale data warehouse. AMAZON NEPTUNE Fully managed graph database engine. AMAZON RDS Fully managed relational database service. AMAZON DYNAMODB Fast & Flexible NoSQL database service. AMAZON ELASTICACHE Managed Redis & MemCached as a Service. AMAZON AURORA Fully managed MySQL & PostgreSQL compliant cloud database. AMAZON GLACIER Secure, durable & low cost data archival & long term backup service. AMAZON SIMPLEDB Highly available, secure & inexpensive NoSQL data store.
  • 13. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Peter Norvig, Google Research “More data beats clever algorithms, but better data beats more data.”
  • 14. All images in this presentation are subject to copyright and belong to respective SCALABLE PROCESSING ELASTICIT Y SCALABILI TY COST
  • 15. All images in this presentation are subject to copyright and belong to respective Secure resizable elastic compute capacity in the cloud. EC2 Managed Hadoop framework for easy, fast and cost-effective cluster for processing large amounts of data Interactive SQL query service to analyze data in S3. ATHENA Fully managed ETL service to prepare and load data for analytics. EMRGLUE COMPUTE SERVICES
  • 16. All images in this presentation are subject to copyright and belong to respective COST OPTIONS SPOT INSTANCES Spare AWS capacity available at up to 90% discount. Recommended for stateless, low cost and flexible timed applications. RESERVED INSTANCES Provides up to 75% discount on committed usage over 1 or 3 year period. Recommended for Steady state and planned capacity needs. SPOT BLOCK Spare AWS capacity available at up to 40% discount on committed usage of 6 hours. Recommended for low cost, low risk and known duration workloads. 02 03 01
  • 17. All images in this presentation are subject to copyright and belong to respectiveAll images in this presentation are subject to copyright and belong to respective Andrew Ng, Chairman, Coursera “Artificial Intelligence is the new Electricity”
  • 18. All images in this presentation are subject to copyright and belong to respective
  • 19. All images in this presentation are subject to copyright and belong to respective o Machine Learning for everyone o API-driven ML services o GPU Instances o Powerful Compute o FPGA Hardware Acceleration MACHINE LEARNING AS A SERVICE
  • 20. All images in this presentation are subject to copyright and belong to respective
  • 21. All images in this presentation are subject to copyright and belong to respective
  • 22. All images in this presentation are subject to copyright and belong to respective SUMMARY 1 DATA SCIENCE Inter-disciplinary field that involves the entire technology organization 2 CLOUD COMPUTING Helps data science practitioners by simplifying usage of resources & tools 3 DATA STORAGE Data is collected at volume and clear storage plan helps in reducing costs 4 DATA PROCESSING Cheap compute resources helps in cleaning & extracting value from data 5 MACHINE LEARNING Automated algorithms available as service with managed infrastructure 6 MODEL USAGE API services to apply machine learning models in real world applications

Editor's Notes

  1. Understand audience distribution Set the context Basics of Data Science How Cloud Computing helps doing Data Science
  2. Introduction Explain cmpute.io & the data science work done there Cisco acquisition of cmpute.io Cisco Disclaimer Image Fair Use Disclaimer
  3. Agenda for the workshop What data science process looks like How cloud computing has changed the way things are done today Storage concepts from data science perspective Compute specific services for data science ML & Deep Learning Explain one problem of cmpute.io & walk through how it was resolved
  4. Joke: LinkedIn inclusion of “Data Science” as core skill increased after this article Data Science history Costly and difficult skill Very niche and not available everywhere Increase in data storage increased need to find value in them Data Science is a must have skill in today’s information age
  5. Explain the Process Continuous Learning model – Similarities to cmpute.io bid model
  6. Cloud brings the best processes into organization Design for failure Unlimited Scale
  7. Data Science specific topics in Cloud Storage – Store information Compute – Clean and Process information Learning – Ready to use services for AI & Deep Learning
  8. Showcasing AWS to demonstrate Cloud Computing Early innovator in Cloud space Has multiple choices of Services for each of the previous areas Fit for Beginners to Expert level Presenter is familiar with this cloud 
  9. Data is the starting point for all analysis Collect as much as you can Collect in native forms and then transpose them for analysis
  10. Companies now need varied storage choices Structured – Traditional Relational Storage - SQL Unstructured – Modern Storage – NOSQL Graph – Significant focus on Relationships – Social information Time Series – Streaming data – Metrics
  11. Data is classified based on origin and scale Variety Twitter feed is saved to MongoDB Website form information saved to RDBMS Velocity Downstream mainframes which drop files once a day Twitter sending unending stream of requests for support to company social media handle Volume IOT devices sending many metrics every seconds Leave Management System receiving a few requests per day Value Finding value in all information is the goal of Data Science
  12. Storage Choices on Amazon Relational Amazon RDS Amazon Aurora Amazon Redshift NoSQL Amazon SimpleDb Amazon DynamoDB UnStructured Amazon S3 Amazon Glacier Graph Amazon Neptune Metrics Amazon CloudWatch Speciality Amazon ElastiCache
  13. Collecting data is important Processing the collected data to make meaningful training sets is primary Computers work on the principle of GIGO Garbage In Garbage Out Gold In Gold Out
  14. Cloud Computing solves 3 important needs of data science Elasticity Scale up and down based on your needs Scalability Aim for any size cluster and cloud makes it available Cost Cost conscious computing choices available based on needs
  15. Basic services for processing and querying large data sets EC2 Write processing and scale based on your own framework EMR Process and scale on top of Hadoop, Pig, Hive models Glue Managed ETL without any code Athena Query data directly without any servers
  16. Available cost choices Reserved For predictable work loads Spot Block For checkpoint based time limited work loads Spot For interrupt tolerant processing
  17. Machine Learning became a widely discussed topic due to the free AI course from Coursera.
  18. Machine Learning, a niche skill has became a common skill due to commoditization and open source
  19. Ready to use Machine Learning Services and comparison with other clouds. Tensorflow is backed by Google Glucon is deep learning project backed by Amazon & Microsoft MxNet is open source and supported by Amazon Amazon ML has limited choices Built for beginners Proprietary engine Supports only 3 algorithms – Binary Classification, Multi Classification and Regression Amazon SageMaker has both ready made algorithms and support for custom algorithms Built for data scientists Uses TensorFlow and MxNet Azure and GCP have much advanced support for ML, AI and Deep Learning Cognitive services are outside of the scope of this presentation We are focused only on Data Speech, Image and Other recognition services are considered cognitive All clouds offer ready to use services which have advanced automation and are available over API
  20. Cmpute.io Initial days What we did How we did Issues we faced Why we turned to data science Need for predictions Need for classification How we went about solving our problems Explain flowchart Demo Problem Predict spot prices using historical data Disclaimer: Cmpute.io used multiple sources of data and not just historical information
  21. A simple Real time Spot prediction System Infrastructure Amazon RDS Aurora – Store information AWS Fargate – Scheduled Container execution AWS S3 – Training data storage AWS ML – Machine Learning model and evaluation AWS Api Gateway – REST Service AWS Lambda – Actual functions for API React – Front end Background Services Spot fetcher – Fetch prices every 5 minutes Training data – Convert daily data into training data every day – 1 file per day Machine Learning Create Data source from S3 Training Data Create Model using Regression Create Evaluation using Model Create Real time API for evaluation API Get Current prices – Fetch information from aws api and save to database Get Prediction – Call AWS ML Real time prediction API Front end Simple grid that shows the matrix of region, availability zone, instance type, platform ,current price and predicted price
  22. Data science is a inter-disciplinary process that involves the entire organization Cloud computing is here to stay and offers significant advances to the data science process Storage management solutions allows any type of data and is built for volume, variety, velocity Cleaning and extraction brings out value in data Democratization of AI has made it easy for data processing API based models help in real world usage