SlideShare a Scribd company logo
1 of 15
Big Data
and
Genomics
Al Costa – Alkol Biotech
Low sequencing costs = lots of data
As the costs of sequencing a genome decreases, the DNA of more and
more organisms become publicly available, meaning more data
Low sequencing costs = lots of data
This problem is increased if we consider initiatives such as 1000Genomes
or if we were to sequence everyone in the US today (313 Exabytes)
genomics = lots of data
In fact, the number of “bytes” involved in each DNA genome is in the
range of millions to billions
genomics = lots of data
And if you are still unconvinced, take the “Minion”, by UK company
Oxford Nanopores, which sells for US$900, is the size of a USB stick, and
can sequence a human genome in 8 hours
Comparative genomics =
promising
However, if we compare the genomes of different species, we’ll realize
they share a lot of common ground
Tools used = complicated
For genomics data we use ADAM, BLAST and several comparison tools
ADAM is an open-source, high performance, distributed platform for
genomic analsys. ADAM defines a:
1 - Data schema and layout on disk
2 - A Scala API
3 - A command line interface
BLAST is an aligment tool which is able to reconstruct the entire strand
based on “shotgun” chunks.
An example = our project
We are currently using Big Data to find promising strands among millions
of DNA sequences, using the tools described as I’ll explain now
How we use it = to build new crops
The current state of the biobased industry (biofuels, bioplastics, etc) is
trying to adapt to unsuitable feedstocks. That is exactly the opposite to
what making did with food, where it adapted crops to its feeding needs
Sugarcane = much more than sugar!
Among the feedstocks currently used by the biobased industries, one
stands out: sugarcane. However, it currently grows only in tropical
regions. A pity, considering the amount of products it originates.
Eunergycane = European sugarcane
Thus, being able to adapt sugarcane to grow in Europe would mean a lot
of new products being sustainably produced. We are half-way in that
project with our EUnergyCane variety, the only one genuinely european
a pine tree and an edelweiss?
Maybe the only thing that is common between a pine tree and an
edelweiss is the fact that both can stand cold places.
Looking for a philosopher’s stone
Thus, a comparison between the DNA strand of the pine tree and of the
Edelweiss should reveal common regions, one of which responsible for
example for giving a crop the ability to withstand the cold
How we use it = to build new crops
This is how we develop our work: by analizing DNA strands of crops which
can resist the cold in order to find that “Philosopher’s Stone” which, when
inserted into sugarcane, would make it able to grow in Europe. For that,
new techniques such as CRISPR/CAS 9 prevent the use of plasmids and
GMO’s
Conclusion = big data is much more
Big Data is not only for gathering customer data at banks and telcos, but a
valuable tool in finding new and unsuspecting data in any area of human
knowledge.
It use in Genomics may allow finding cures for otherwise incurable
diseases, develop new crops with increased capabilities, and much more
Thank you
alcosta@alkolbiotech.co.uk

More Related Content

Viewers also liked

What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
Camera Culture Group, MIT Media Lab
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
Camera Culture Group, MIT Media Lab
 

Viewers also liked (18)

Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Stereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt HirschStereo and 3D Displays - Matt Hirsch
Stereo and 3D Displays - Matt Hirsch
 
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
 
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview
 
Coded Photography - Ramesh Raskar
Coded Photography - Ramesh RaskarCoded Photography - Ramesh Raskar
Coded Photography - Ramesh Raskar
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
 
基因大数据分析入门 Slideshare
基因大数据分析入门   Slideshare基因大数据分析入门   Slideshare
基因大数据分析入门 Slideshare
 
Deep two-photon brain imaging with a red-shifted fluorometric Ca2+ indicator
Deep two-photon brain imaging with a red-shifted fluorometric Ca2+ indicatorDeep two-photon brain imaging with a red-shifted fluorometric Ca2+ indicator
Deep two-photon brain imaging with a red-shifted fluorometric Ca2+ indicator
 
Focused Ultrasound Neuromodulation
Focused Ultrasound NeuromodulationFocused Ultrasound Neuromodulation
Focused Ultrasound Neuromodulation
 
Introduction to Camera Challenges - Ramesh Raskar
Introduction to Camera Challenges - Ramesh RaskarIntroduction to Camera Challenges - Ramesh Raskar
Introduction to Camera Challenges - Ramesh Raskar
 

Similar to Big Data and Genomics

Some of the latest progress for the prevention, diagnosis and treatment of as...
Some of the latest progress for the prevention, diagnosis and treatment of as...Some of the latest progress for the prevention, diagnosis and treatment of as...
Some of the latest progress for the prevention, diagnosis and treatment of as...
Graham Atherton
 
Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)
Qaisar Khan
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
c.titus.brown
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
c.titus.brown
 

Similar to Big Data and Genomics (20)

Aspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey GilsenanAspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
 
Plant genome project(aribidopsis)
Plant genome project(aribidopsis)Plant genome project(aribidopsis)
Plant genome project(aribidopsis)
 
Untangling Synthetic Biology by Jim Thomas, ETC Group
Untangling Synthetic Biology by Jim Thomas, ETC GroupUntangling Synthetic Biology by Jim Thomas, ETC Group
Untangling Synthetic Biology by Jim Thomas, ETC Group
 
Some of the latest progress for the prevention, diagnosis and treatment of as...
Some of the latest progress for the prevention, diagnosis and treatment of as...Some of the latest progress for the prevention, diagnosis and treatment of as...
Some of the latest progress for the prevention, diagnosis and treatment of as...
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Closing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied researchClosing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied research
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
Big data in biology
Big data in biologyBig data in biology
Big data in biology
 
Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)
 
Wheat Rust Toolbox Related to New Initiatives on Yellow Rust
Wheat Rust Toolbox Related to New Initiatives on Yellow RustWheat Rust Toolbox Related to New Initiatives on Yellow Rust
Wheat Rust Toolbox Related to New Initiatives on Yellow Rust
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
DNA Storage at AGBT 2018
DNA Storage at AGBT 2018DNA Storage at AGBT 2018
DNA Storage at AGBT 2018
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genome
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Big Data and Genomics

  • 1. Big Data and Genomics Al Costa – Alkol Biotech
  • 2. Low sequencing costs = lots of data As the costs of sequencing a genome decreases, the DNA of more and more organisms become publicly available, meaning more data
  • 3. Low sequencing costs = lots of data This problem is increased if we consider initiatives such as 1000Genomes or if we were to sequence everyone in the US today (313 Exabytes)
  • 4. genomics = lots of data In fact, the number of “bytes” involved in each DNA genome is in the range of millions to billions
  • 5. genomics = lots of data And if you are still unconvinced, take the “Minion”, by UK company Oxford Nanopores, which sells for US$900, is the size of a USB stick, and can sequence a human genome in 8 hours
  • 6. Comparative genomics = promising However, if we compare the genomes of different species, we’ll realize they share a lot of common ground
  • 7. Tools used = complicated For genomics data we use ADAM, BLAST and several comparison tools ADAM is an open-source, high performance, distributed platform for genomic analsys. ADAM defines a: 1 - Data schema and layout on disk 2 - A Scala API 3 - A command line interface BLAST is an aligment tool which is able to reconstruct the entire strand based on “shotgun” chunks.
  • 8. An example = our project We are currently using Big Data to find promising strands among millions of DNA sequences, using the tools described as I’ll explain now
  • 9. How we use it = to build new crops The current state of the biobased industry (biofuels, bioplastics, etc) is trying to adapt to unsuitable feedstocks. That is exactly the opposite to what making did with food, where it adapted crops to its feeding needs
  • 10. Sugarcane = much more than sugar! Among the feedstocks currently used by the biobased industries, one stands out: sugarcane. However, it currently grows only in tropical regions. A pity, considering the amount of products it originates.
  • 11. Eunergycane = European sugarcane Thus, being able to adapt sugarcane to grow in Europe would mean a lot of new products being sustainably produced. We are half-way in that project with our EUnergyCane variety, the only one genuinely european
  • 12. a pine tree and an edelweiss? Maybe the only thing that is common between a pine tree and an edelweiss is the fact that both can stand cold places.
  • 13. Looking for a philosopher’s stone Thus, a comparison between the DNA strand of the pine tree and of the Edelweiss should reveal common regions, one of which responsible for example for giving a crop the ability to withstand the cold
  • 14. How we use it = to build new crops This is how we develop our work: by analizing DNA strands of crops which can resist the cold in order to find that “Philosopher’s Stone” which, when inserted into sugarcane, would make it able to grow in Europe. For that, new techniques such as CRISPR/CAS 9 prevent the use of plasmids and GMO’s
  • 15. Conclusion = big data is much more Big Data is not only for gathering customer data at banks and telcos, but a valuable tool in finding new and unsuspecting data in any area of human knowledge. It use in Genomics may allow finding cures for otherwise incurable diseases, develop new crops with increased capabilities, and much more Thank you alcosta@alkolbiotech.co.uk