Big Data, Big Challenge in Drug Discovery

•

1 like•265 views

A very simplistic presentation on current Big Data challenge in bioinformatics. A case on study using one of the computational methods for drug discovery is presented. Cost for development of a new drug is increasing dramatically every year along with challenges associated with it. The big data approach in drug discovery is penetrating slowly, but with a constant speed. We believe effective use of big data would be highly beneficial for taking several crucial dicision during the complete drug discovery process. A discussion on data management using Hadoop and analysis using R programming package is also discussed.

Data & Analytics

Big Data, Big Challenge.
Puneet Kacker, Kanpur
08-OCT-2015

 Big Data is data that is too large, complex and dynamic for any
conventional data tools to capture, store, manage and
analyze.
 The right use of Big Data allows analysis to spot trends and
gives niche insights that help create value and innovation
much faster than conventional methods.
 However, there is more to the big data deluge than mere
volumes; in particular, increasing data heterogeneity and
complexity makes it difficult to extract knowledge from such
data.
 If the use of big data for drug discovery should indeed open
new frontiers, and not only be hype, new visions and concepts
are required to reduce data complexity and increase data
consistency from different sources.
What is Big Data?

What is the Challenge?
Three “V’s”, i.e., the Volume, Variety and
Velocity of data coming in is what creates the
challenge.
http://hlwiki.slais.ubc.ca/images/1/1a/Big_data_2013.jpg
1 PB = 1000 TB
big challenges in data storage,
processing and analysis.
Coordinated efforts from both
experimental biologists and
bioinformaticists are required
to overcome these challenges.

Drug Discovery Through Virtual Screening

One Target, One Compound
Disease
Enzyme, Drug Target
Potential Drug
Candidate

One Target, One Compound
Disease
Enzyme, Drug Target
Potential Drug
Candidate
1 Target, 1 Compound, 1 Disease = 1 Molecular Docking Run

One Compound to Many Targets
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N
Potential Drug
Candidate
10,000 Targets, 1 Compound, 10,000 Diseases = Total 10,000 Molecular
Docking Runs

One Compound to Many Targets and Their Conformations
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N
Potential Drug
Candidate
10,000X2 Target Conformations, 1 Compound, 10,000 Diseases = Total 20,000 Molecular Docking Runs
Conf-1Conf-2

Many Compounds to Many Targets and Their Conformations
10,000 Protein
Targets
Disease-1
Disease-2
Disease-N 60,826,590
Potential Compounds
10,000X2 Target Conformations, 60,826,590
Compounds, 10,000 Diseases = Total 1,216,531,800,000 Molecular Docking Runs
Conf-1Conf-2

Calculation
Suppose one docking run takes 1 min. time on single processor
 1,216,531,800,000 /60 = 20275530000 Hours
 1,216,531,800,000 /(60X24) = 844813750 Days
 1,216,531,800,000 /(60X24X30) = 28160458 Months
 1,216,531,800,000 /(60X24X30X12) = 2346704 Years
 1,216,531,800,000 /(60X24X30X12X60) = 39111 Births
10 Crores Processors will be needed to complete all the docking runs in less than a day time
An excel sheet can accommodate 1048576 rows by 16384 columns

What if the same calculations are carried out by two different methods!

Big Data requires Big resources and smart data handling methods

Supporting Tools/Languages
R is a free software environment for
statistical computing and graphics.
https://www.r-project.org/
Hadoop is an open-source framework that
allows to store and process big data in a
distributed environment across clusters of
computers using simple programming models.
https://hadoop.apache.org/

Let’s Learn Programming Interactively
http://tryr.codeschool.com/levels/1/challenges/1

Similar to Big Data, Big Challenge in Drug Discovery

Big Data, The Community and The Commons (May 12, 2014)Robert Grossman

HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve

Slides for st judesSean Ekins

wolstencroft-ogf20-astrowebuploader

MedChemica BigData What Is That All About?Al Dossetter

Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman

Collaborative Database and Computational Models for Tuberculosis Drug DiscoverySean Ekins

Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEd Griffen

AI-powered Medical Imaging Analysis for Precision MedicineSean Yu

Deep learning for large scale biodiversity monitoringGreenapps&web

Where Technology Meets Medicine: SickKids High Performance Computing Data CentreScalar Decisions

Big data in healthcareXavier Rafael Palou

Big data, big knowledge big data for personalized healthcareredpel dot com

Propagating Data Policies - A User StudyEnrico Daga

biomedicines-03-00203Christian Schmidt

Addressing the Challenge of Scalability in Viral VectorsMerck Life Sciences

Addressing the Challenge of Scalability in Viral VectorsMilliporeSigma

Molecular docking and its importance in drug designdevilpicassa01

Big Data in Biomedicine: Where is the NIH HeadedPhilip Bourne

HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...Business Turku

Similar to Big Data, Big Challenge in Drug Discovery (20)

Big Data, The Community and The Commons (May 12, 2014)

HEALTH PREDICTION ANALYSIS USING DATA MINING

Slides for st judes

wolstencroft-ogf20-astro

MedChemica BigData What Is That All About?

Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014

Collaborative Database and Computational Models for Tuberculosis Drug Discovery

Emerging Challenges for Artificial Intelligence in Medicinal Chemistry

AI-powered Medical Imaging Analysis for Precision Medicine

Deep learning for large scale biodiversity monitoring

Where Technology Meets Medicine: SickKids High Performance Computing Data Centre

Big data in healthcare

Big data, big knowledge big data for personalized healthcare

Propagating Data Policies - A User Study

biomedicines-03-00203

Addressing the Challenge of Scalability in Viral Vectors

Molecular docking and its importance in drug design

Big Data in Biomedicine: Where is the NIH Headed

HealthBIO 2021_PerkinElmer, leading with innovation - from COVID success into...

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

100-Concepts-of-AI by Anupama Kate .pptx

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Schema on read is obsolete. Welcome metaprogramming..pdf

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Call Girls In Mahipalpur O9654467111 Escorts Service

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

04242024_CCC TUG_Joins and Relationships

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Big Data, Big Challenge in Drug Discovery

1. Big Data, Big Challenge. Puneet Kacker, Kanpur 08-OCT-2015

2. What is Big Data?

3.  Big Data is data that is too large, complex and dynamic for any conventional data tools to capture, store, manage and analyze.  The right use of Big Data allows analysis to spot trends and gives niche insights that help create value and innovation much faster than conventional methods.  However, there is more to the big data deluge than mere volumes; in particular, increasing data heterogeneity and complexity makes it difficult to extract knowledge from such data.  If the use of big data for drug discovery should indeed open new frontiers, and not only be hype, new visions and concepts are required to reduce data complexity and increase data consistency from different sources. What is Big Data?

4. What is the Challenge? Three “V’s”, i.e., the Volume, Variety and Velocity of data coming in is what creates the challenge. http://hlwiki.slais.ubc.ca/images/1/1a/Big_data_2013.jpg 1 PB = 1000 TB big challenges in data storage, processing and analysis. Coordinated efforts from both experimental biologists and bioinformaticists are required to overcome these challenges.

5. Big Biological Data

6. Open Source

7. Chemical Compounds

8. Drug Targets 10,774 Targets

9. Drug Discovery Through Virtual Screening

10. One Target, One Compound Disease Enzyme, Drug Target Potential Drug Candidate

11. One Target, One Compound Disease Enzyme, Drug Target Potential Drug Candidate 1 Target, 1 Compound, 1 Disease = 1 Molecular Docking Run

12. One Compound to Many Targets 10,000 Protein Targets Disease-1 Disease-2 Disease-N Potential Drug Candidate 10,000 Targets, 1 Compound, 10,000 Diseases = Total 10,000 Molecular Docking Runs

13. One Compound to Many Targets and Their Conformations 10,000 Protein Targets Disease-1 Disease-2 Disease-N Potential Drug Candidate 10,000X2 Target Conformations, 1 Compound, 10,000 Diseases = Total 20,000 Molecular Docking Runs Conf-1Conf-2

14. Many Compounds to Many Targets and Their Conformations 10,000 Protein Targets Disease-1 Disease-2 Disease-N 60,826,590 Potential Compounds 10,000X2 Target Conformations, 60,826,590 Compounds, 10,000 Diseases = Total 1,216,531,800,000 Molecular Docking Runs Conf-1Conf-2

15. Calculation Suppose one docking run takes 1 min. time on single processor  1,216,531,800,000 /60 = 20275530000 Hours  1,216,531,800,000 /(60X24) = 844813750 Days  1,216,531,800,000 /(60X24X30) = 28160458 Months  1,216,531,800,000 /(60X24X30X12) = 2346704 Years  1,216,531,800,000 /(60X24X30X12X60) = 39111 Births 10 Crores Processors will be needed to complete all the docking runs in less than a day time An excel sheet can accommodate 1048576 rows by 16384 columns

16. What if the same calculations are carried out by two different methods!

17. Big Data requires Big resources and smart data handling methods

18. Supporting Tools/Languages R is a free software environment for statistical computing and graphics. https://www.r-project.org/ Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. https://hadoop.apache.org/

19. Let’s Learn Programming Interactively http://tryr.codeschool.com/levels/1/challenges/1

20. Further Reading

21. And After That

22. Thank You! www.puneetsclassroom.in

Big Data, Big Challenge in Drug Discovery

Recommended

Recommended

More Related Content

Similar to Big Data, Big Challenge in Drug Discovery

Similar to Big Data, Big Challenge in Drug Discovery (20)

Recently uploaded

Recently uploaded (20)

Big Data, Big Challenge in Drug Discovery