SlideShare a Scribd company logo
Big Data, Big Challenge.
Puneet Kacker, Kanpur
08-OCT-2015
What is Big Data?
 Big Data is data that is too large, complex and dynamic for any
conventional data tools to capture, store, manage and
analyze.
 The right use of Big Data allows analysis to spot trends and
gives niche insights that help create value and innovation
much faster than conventional methods.
 However, there is more to the big data deluge than mere
volumes; in particular, increasing data heterogeneity and
complexity makes it difficult to extract knowledge from such
data.
 If the use of big data for drug discovery should indeed open
new frontiers, and not only be hype, new visions and concepts
are required to reduce data complexity and increase data
consistency from different sources.
What is Big Data?
What is the Challenge?
Three “V’s”, i.e., the Volume, Variety and
Velocity of data coming in is what creates the
challenge.
http://hlwiki.slais.ubc.ca/images/1/1a/Big_data_2013.jpg
1 PB = 1000 TB
big challenges in data storage,
processing and analysis.
Coordinated efforts from both
experimental biologists and
bioinformaticists are required
to overcome these challenges.
Big Biological Data
Open Source
Chemical Compounds
Drug Targets
10,774 Targets
Drug Discovery Through Virtual Screening
One Target, One Compound
Disease
Enzyme, Drug Target
Potential Drug
Candidate
One Target, One Compound
Disease
Enzyme, Drug Target
Potential Drug
Candidate
1 Target, 1 Compound, 1 Disease = 1 Molecular Docking Run
One Compound to Many Targets
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N
Potential Drug
Candidate
10,000 Targets, 1 Compound, 10,000 Diseases = Total 10,000 Molecular
Docking Runs
One Compound to Many Targets and Their Conformations
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N
Potential Drug
Candidate
10,000X2 Target Conformations, 1 Compound, 10,000 Diseases = Total 20,000 Molecular Docking Runs
Conf-1Conf-2
Many Compounds to Many Targets and Their Conformations
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N 60,826,590
Potential Compounds
10,000X2 Target Conformations, 60,826,590
Compounds, 10,000 Diseases = Total 1,216,531,800,000 Molecular Docking Runs
Conf-1Conf-2
Calculation
Suppose one docking run takes 1 min. time on single processor
 1,216,531,800,000 /60 = 20275530000 Hours
 1,216,531,800,000 /(60X24) = 844813750 Days
 1,216,531,800,000 /(60X24X30) = 28160458 Months
 1,216,531,800,000 /(60X24X30X12) = 2346704 Years
 1,216,531,800,000 /(60X24X30X12X60) = 39111 Births
10 Crores Processors will be needed to complete all the docking runs in less than a day time
An excel sheet can accommodate 1048576 rows by 16384 columns
What if the same calculations are carried out by two different methods!
Big Data requires Big resources and smart data handling methods
Supporting Tools/Languages
R is a free software environment for
statistical computing and graphics.
https://www.r-project.org/
Hadoop is an open-source framework that
allows to store and process big data in a
distributed environment across clusters of
computers using simple programming models.
https://hadoop.apache.org/
Let’s Learn Programming Interactively
http://tryr.codeschool.com/levels/1/challenges/1
Further Reading
And After That
Thank You!
www.puneetsclassroom.in

More Related Content

Similar to Dgpg college kanpur_2015

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
Robert Grossman
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
Ashish Salve
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
Sean Ekins
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
Al Dossetter
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Robert Grossman
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Sean Ekins
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Ed Griffen
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
Sean Yu
 
Deep learning for large scale biodiversity monitoring
Deep learning for large scale biodiversity monitoringDeep learning for large scale biodiversity monitoring
Deep learning for large scale biodiversity monitoring
Greenapps&web
 
Where Technology Meets Medicine: SickKids High Performance Computing Data Centre
Where Technology Meets Medicine: SickKids High Performance Computing Data CentreWhere Technology Meets Medicine: SickKids High Performance Computing Data Centre
Where Technology Meets Medicine: SickKids High Performance Computing Data Centre
Scalar Decisions
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
Xavier Rafael Palou
 
Big data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcareBig data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcare
redpel dot com
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
Enrico Daga
 
Addressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral VectorsAddressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral Vectors
MilliporeSigma
 
Addressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral VectorsAddressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral Vectors
Merck Life Sciences
 
Molecular docking and its importance in drug design
Molecular docking and its importance in drug designMolecular docking and its importance in drug design
Molecular docking and its importance in drug design
devilpicassa01
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
Philip Bourne
 
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
Business Turku
 

Similar to Dgpg college kanpur_2015 (20)

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
Deep learning for large scale biodiversity monitoring
Deep learning for large scale biodiversity monitoringDeep learning for large scale biodiversity monitoring
Deep learning for large scale biodiversity monitoring
 
Where Technology Meets Medicine: SickKids High Performance Computing Data Centre
Where Technology Meets Medicine: SickKids High Performance Computing Data CentreWhere Technology Meets Medicine: SickKids High Performance Computing Data Centre
Where Technology Meets Medicine: SickKids High Performance Computing Data Centre
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Big data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcareBig data, big knowledge big data for personalized healthcare
Big data, big knowledge big data for personalized healthcare
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
biomedicines-03-00203
biomedicines-03-00203biomedicines-03-00203
biomedicines-03-00203
 
Addressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral VectorsAddressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral Vectors
 
Addressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral VectorsAddressing the Challenge of Scalability in Viral Vectors
Addressing the Challenge of Scalability in Viral Vectors
 
Molecular docking and its importance in drug design
Molecular docking and its importance in drug designMolecular docking and its importance in drug design
Molecular docking and its importance in drug design
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
 
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Dgpg college kanpur_2015

  • 1. Big Data, Big Challenge. Puneet Kacker, Kanpur 08-OCT-2015
  • 2. What is Big Data?
  • 3.  Big Data is data that is too large, complex and dynamic for any conventional data tools to capture, store, manage and analyze.  The right use of Big Data allows analysis to spot trends and gives niche insights that help create value and innovation much faster than conventional methods.  However, there is more to the big data deluge than mere volumes; in particular, increasing data heterogeneity and complexity makes it difficult to extract knowledge from such data.  If the use of big data for drug discovery should indeed open new frontiers, and not only be hype, new visions and concepts are required to reduce data complexity and increase data consistency from different sources. What is Big Data?
  • 4. What is the Challenge? Three “V’s”, i.e., the Volume, Variety and Velocity of data coming in is what creates the challenge. http://hlwiki.slais.ubc.ca/images/1/1a/Big_data_2013.jpg 1 PB = 1000 TB big challenges in data storage, processing and analysis. Coordinated efforts from both experimental biologists and bioinformaticists are required to overcome these challenges.
  • 9. Drug Discovery Through Virtual Screening
  • 10. One Target, One Compound Disease Enzyme, Drug Target Potential Drug Candidate
  • 11. One Target, One Compound Disease Enzyme, Drug Target Potential Drug Candidate 1 Target, 1 Compound, 1 Disease = 1 Molecular Docking Run
  • 12. One Compound to Many Targets 10,000 Protein Targets Disease-1 Disease-2 Disease-N Potential Drug Candidate 10,000 Targets, 1 Compound, 10,000 Diseases = Total 10,000 Molecular Docking Runs
  • 13. One Compound to Many Targets and Their Conformations 10,000 Protein Targets Disease-1 Disease-2 Disease-N Potential Drug Candidate 10,000X2 Target Conformations, 1 Compound, 10,000 Diseases = Total 20,000 Molecular Docking Runs Conf-1Conf-2
  • 14. Many Compounds to Many Targets and Their Conformations 10,000 Protein Targets Disease-1 Disease-2 Disease-N 60,826,590 Potential Compounds 10,000X2 Target Conformations, 60,826,590 Compounds, 10,000 Diseases = Total 1,216,531,800,000 Molecular Docking Runs Conf-1Conf-2
  • 15. Calculation Suppose one docking run takes 1 min. time on single processor  1,216,531,800,000 /60 = 20275530000 Hours  1,216,531,800,000 /(60X24) = 844813750 Days  1,216,531,800,000 /(60X24X30) = 28160458 Months  1,216,531,800,000 /(60X24X30X12) = 2346704 Years  1,216,531,800,000 /(60X24X30X12X60) = 39111 Births 10 Crores Processors will be needed to complete all the docking runs in less than a day time An excel sheet can accommodate 1048576 rows by 16384 columns
  • 16. What if the same calculations are carried out by two different methods!
  • 17. Big Data requires Big resources and smart data handling methods
  • 18. Supporting Tools/Languages R is a free software environment for statistical computing and graphics. https://www.r-project.org/ Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. https://hadoop.apache.org/
  • 19. Let’s Learn Programming Interactively http://tryr.codeschool.com/levels/1/challenges/1