SlideShare a Scribd company logo
1 of 33
Download to read offline
Data Analytics using the Cloud
- Challenges and Opportunities for India
Introduction
AJAY OHRI
Author 1,2,3 Thinker 1,2
Founder, DECISIONSTATS
ohri2007@gmail.com http://linkedin.com/in/ajayohri
What comes next?
Data Analytics- Older Paradigms
Thoughts on Stats and Computer Science
Overview - Data Storage, Cloud Computing
Data Analytics
old (er) paradigms -
SAS and SPSS languages, ETL and DWs
newer paradigms -
R and Python, Scala and Hadoop
More machine learning, less classical stats
Is statistics lagging behind
computer science
Classical statistics- too few data
Big Data era- cost of throwing data is more
than cost of storing it
Machine learning - seems to be the flavor
Data Storage
older paradigms - RDBMS and Spreadsheets
structure and interactivity
new paradigms- NoSQL, Hadoop ,
cloud enabled spreadsheets
(?)
Cloud Computing- defined by NIST
http://www.nist.gov/itl/csd/cloud-102511.cfm
cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction
or
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Service Models for Cloud Computing
SaaS- Software as a service
IaaS - Infrastructure as a service
PaaS-Platform as a service
Service Models for Cloud Computing
IaaS - Infrastructure as a service
http://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf
http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
Service Models for Cloud Computing
PaaS - Platform as a service
http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp
http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
Service Models for Cloud Computing
SaaS - Software as a service
http://www.forrester.com/Software--as--a--Service-%28SaaS%29
http://www.gartner.com/newsroom/id/1963815
http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud-
computing-growth/
http://my.gartner.com/portal/server.pt?
open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile
http://www.gartner.com/it-glossary/software-as-a-service-saas/
Deployment Models for Cloud
Computing
Private-
Community-
Public-
Hybrid-
Data Analytics (traditional) -Porter’s
Model
Threat of Mobility- Low (Lockin)
Industry Rivalry- Medium (Many)
Supplier Power- High(S/w, H/W)
Buyer Power- Medium
Substitutes- Low (Not many
alternatives to SAS, SPSS)
Data Analytics (cloud based) -Porter’
s Model
Threat of Mobility- High (Easy switch
as data and analytics is cloud based)
Industry Rivalry- High( Global providers)
Supplier Power- Low (open source
,free , GPL)
Buyer Power -High (lots of options
outsource, insource,crowd source)
Substitutes- High (lots of options
Python, R , Julia etc)
Data Analytics in India - Porter’s
Diamond Model
Chance- Favorable supply of engineers
, Mature outsource and service industry
, Rapid growth domestically
Factor Conditions- Good Service Industry
Firm Strategy- relative lack of ecosystem
hampers analytics entrepreneurs
Demand Conditions- High
Government- Little or No interference
India in traditional Data Analytics
Strengths Weakness
reliable pool of experienced engineering
talent
inability or unwillingness to invest in huge
upfront capex for hardware and software for
analytics
Opportunities Threats
ability to navigate upstream based on cost based arbitrage than skill
based value addition thus vulnerable to
competition
India in Cloud Based Data Analytics
Strengths Weakness
experienced service industry with huge pool
of trained engineering and analytical talent
lack of deep domain depth
relative lack of ecosystem for cutting edge
analytics entrepreneurship
slow to embrace open source
Opportunities Threats
no more capital expenditure needed in
software and hardware
virtualization offers secure delivery from
any location
risk management needs to be more mature
lack of data privacy regulations
Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Biggest Challenge to using Cloud ==NSA?
Biggest Challenge to using Cloud
Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Unfortunately the USA Govt taps the information for both security as well as economic advantages
Unfortunately American Companies seek and get economic advantages for such cooperation
Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic
players
In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off.
Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain?
India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even
current generation computing
Biggest Opportunities to using
Cloud
Build our critical digital grid using local companies - POSSIBLE
Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE
Teach more distributed computing earlier ;)
Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system
Compare ADHAAR Card with information in emails, social networks, on the personal computer ??
Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
Moving onto Cloud Based Data
Analytics
Open Source analytics like Python and R
Support Distributed Computing
Memory is no problem now ( especially for R)
on the cloud
Existing Data Analytics in India
Lots of Analytics Outsourcing
Both SAS and SPSS are present
Open Source Analytics on the rise but still
palpable lack of awareness
Data - ETL- Data WareHouse- SQL Query-
Stats Software MINDSET
Existing Data Analytics in India
Cloud Computing Explicitly uses Linux for
Efficiency
Your Windows CERTIFICATIONS can hinder
your IT Department’s mindset on the cloud
Data Science requires cross functional learning
Developments in Stats Software
A New Hope - Julia, Pandas
http://julialang.org/
http://pandas.pydata.org/
The Empire Strikes Back - SAS
http://www.sas.com/en_us/software/cloud.html
https://www.sas.com/en_us/software/sas-hadoop.html
Return of the Jedi
http://www.r-bloggers.com/
a few Developments in Analytics
Revolution R on the cloud (AWS)
www.revolutionanalytics.com/RRE-AWS
SAS on the cloud
http://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/
http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924
Apache Spark and R
http://amplab-extras.github.io/SparkR-pkg/
a few Developments on the Cloud
Amazon http://aws.amazon.com/
Google https://cloud.google.com/products/
IBM http://www.ibm.com/cloud-computing/in/en/
Oracle https://cloud.oracle.com/java
a few Developments in R
RHadoop Project
https://github.com/RevolutionAnalytics/RHadoop/wiki
OpenCPU Project
https://www.opencpu.org/
rOpenSci Project
http://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
The future of Open Cloud
R + Python on OpenStack ?
There is a fair degree that Apache Hadoop related projects like Shark / Spark
would be there and We need a Hadoop Based Data Warehouse Solutions(?)
We need to hedge for US Policy Interference
Education and developer ecosystems have to keep pace
Thank You

More Related Content

What's hot

Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 

What's hot (20)

Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 

Similar to Data analytics using the cloud challenges and opportunities for india

Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxshujee381
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014Kenneth Igiri
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 

Similar to Data analytics using the cloud challenges and opportunities for india (20)

Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Cloudant
CloudantCloudant
Cloudant
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 

More from Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Data analytics using the cloud challenges and opportunities for india

  • 1. Data Analytics using the Cloud - Challenges and Opportunities for India
  • 2. Introduction AJAY OHRI Author 1,2,3 Thinker 1,2 Founder, DECISIONSTATS ohri2007@gmail.com http://linkedin.com/in/ajayohri
  • 3. What comes next? Data Analytics- Older Paradigms Thoughts on Stats and Computer Science Overview - Data Storage, Cloud Computing
  • 4. Data Analytics old (er) paradigms - SAS and SPSS languages, ETL and DWs newer paradigms - R and Python, Scala and Hadoop More machine learning, less classical stats
  • 5. Is statistics lagging behind computer science Classical statistics- too few data Big Data era- cost of throwing data is more than cost of storing it Machine learning - seems to be the flavor
  • 6. Data Storage older paradigms - RDBMS and Spreadsheets structure and interactivity new paradigms- NoSQL, Hadoop , cloud enabled spreadsheets (?)
  • 7. Cloud Computing- defined by NIST http://www.nist.gov/itl/csd/cloud-102511.cfm cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction or http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  • 8.
  • 9.
  • 10.
  • 11. Service Models for Cloud Computing SaaS- Software as a service IaaS - Infrastructure as a service PaaS-Platform as a service
  • 12. Service Models for Cloud Computing IaaS - Infrastructure as a service http://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
  • 13. Service Models for Cloud Computing PaaS - Platform as a service http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
  • 14. Service Models for Cloud Computing SaaS - Software as a service http://www.forrester.com/Software--as--a--Service-%28SaaS%29 http://www.gartner.com/newsroom/id/1963815 http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud- computing-growth/ http://my.gartner.com/portal/server.pt? open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile http://www.gartner.com/it-glossary/software-as-a-service-saas/
  • 15. Deployment Models for Cloud Computing Private- Community- Public- Hybrid-
  • 16. Data Analytics (traditional) -Porter’s Model Threat of Mobility- Low (Lockin) Industry Rivalry- Medium (Many) Supplier Power- High(S/w, H/W) Buyer Power- Medium Substitutes- Low (Not many alternatives to SAS, SPSS)
  • 17. Data Analytics (cloud based) -Porter’ s Model Threat of Mobility- High (Easy switch as data and analytics is cloud based) Industry Rivalry- High( Global providers) Supplier Power- Low (open source ,free , GPL) Buyer Power -High (lots of options outsource, insource,crowd source) Substitutes- High (lots of options Python, R , Julia etc)
  • 18. Data Analytics in India - Porter’s Diamond Model Chance- Favorable supply of engineers , Mature outsource and service industry , Rapid growth domestically Factor Conditions- Good Service Industry Firm Strategy- relative lack of ecosystem hampers analytics entrepreneurs Demand Conditions- High Government- Little or No interference
  • 19. India in traditional Data Analytics Strengths Weakness reliable pool of experienced engineering talent inability or unwillingness to invest in huge upfront capex for hardware and software for analytics Opportunities Threats ability to navigate upstream based on cost based arbitrage than skill based value addition thus vulnerable to competition
  • 20. India in Cloud Based Data Analytics Strengths Weakness experienced service industry with huge pool of trained engineering and analytical talent lack of deep domain depth relative lack of ecosystem for cutting edge analytics entrepreneurship slow to embrace open source Opportunities Threats no more capital expenditure needed in software and hardware virtualization offers secure delivery from any location risk management needs to be more mature lack of data privacy regulations
  • 21. Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors Most of the cloud infrastructure is based out of United States of America
  • 22. Biggest Challenge to using Cloud ==NSA?
  • 23. Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors Most of the cloud infrastructure is based out of United States of America Unfortunately the USA Govt taps the information for both security as well as economic advantages Unfortunately American Companies seek and get economic advantages for such cooperation Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic players In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off. Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain? India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even current generation computing
  • 24. Biggest Opportunities to using Cloud Build our critical digital grid using local companies - POSSIBLE Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE Teach more distributed computing earlier ;) Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system Compare ADHAAR Card with information in emails, social networks, on the personal computer ?? Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
  • 25. Moving onto Cloud Based Data Analytics Open Source analytics like Python and R Support Distributed Computing Memory is no problem now ( especially for R) on the cloud
  • 26. Existing Data Analytics in India Lots of Analytics Outsourcing Both SAS and SPSS are present Open Source Analytics on the rise but still palpable lack of awareness Data - ETL- Data WareHouse- SQL Query- Stats Software MINDSET
  • 27. Existing Data Analytics in India Cloud Computing Explicitly uses Linux for Efficiency Your Windows CERTIFICATIONS can hinder your IT Department’s mindset on the cloud Data Science requires cross functional learning
  • 28. Developments in Stats Software A New Hope - Julia, Pandas http://julialang.org/ http://pandas.pydata.org/ The Empire Strikes Back - SAS http://www.sas.com/en_us/software/cloud.html https://www.sas.com/en_us/software/sas-hadoop.html Return of the Jedi http://www.r-bloggers.com/
  • 29. a few Developments in Analytics Revolution R on the cloud (AWS) www.revolutionanalytics.com/RRE-AWS SAS on the cloud http://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/ http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924 Apache Spark and R http://amplab-extras.github.io/SparkR-pkg/
  • 30. a few Developments on the Cloud Amazon http://aws.amazon.com/ Google https://cloud.google.com/products/ IBM http://www.ibm.com/cloud-computing/in/en/ Oracle https://cloud.oracle.com/java
  • 31. a few Developments in R RHadoop Project https://github.com/RevolutionAnalytics/RHadoop/wiki OpenCPU Project https://www.opencpu.org/ rOpenSci Project http://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
  • 32. The future of Open Cloud R + Python on OpenStack ? There is a fair degree that Apache Hadoop related projects like Shark / Spark would be there and We need a Hadoop Based Data Warehouse Solutions(?) We need to hedge for US Policy Interference Education and developer ecosystems have to keep pace