SlideShare a Scribd company logo
1 of 71
Download to read offline
G  E  S C AL E
LAR
   D A TA   IN
    E  SC IE N CE
Update( new_suffix )
{
  current_suffix = active_point
  test_char = last_char in new_suffix
  done = false;
  while ( !done ) {
    if current_suffix ends at an explicit node {
      if the node has no descendant edge starting with test_char
        create new leaf edge starting at the explicit node
      else
        done = true;
    } else {
      if the implicit node's next char isn't test_char {
        split the edge at the implicit node
        create new leaf edge starting at the split in the edge
      } else
        done = true;
    }
    if current_suffix is the empty string
      done = true;
    else
       current_suffix = next_smaller_suffix( current_suffix )
  }
  active_point = current_suffix
}
photo by http://www.photoxpress.com/stock-photos/1814937
photo by @meguu
Large-scale data in Life Science
                                                                      Contents




fontin sans fonts by Jos Buivenga (exljbris). Thank You! -> www.exljbris.com
NOW IS THE TION
   T-G ENERA
NEX




                      ED ATA
                   CAL NCE
            L   ES
             ARG FE SCIE
                 LI
NOW IS THE TION
   T-G ENERA
NEX




                      ED ATA
                   CAL NCE
            L   ES
             ARG FE SCIE
                 LI
DBCLS:
DATABASE CENTER
 FOR LIFE SCIENCE
Tazro Inutano Ohta @iNut
/ Technical Specialist
Large-scale data in Life Science
or                     or                       ?
ATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGC
ATGCATGATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCAGCATGCAT
GCATGCATGCATGCATGCATGCAT


                           or                   or                         ?
GCATGCATGCATGCATGCATGCAT
GCATGCATGCATGCATGCATGCAT




                                photo by Togopic, Licensed under CreativeCommons 2.1 JP Attribution
ATGCATGCATGCATGCATGCATGC
ATGCATGCATGCATGCATGCATGC    Data ID : 000001
ATGCATGCATGCATGCATGCATGC
ATGCATGATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA    organism : mouse
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA   cell : nervous cell
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA
TGCATGCATGCATGCATGCATGCA     sequencer : 454
TGCATGCATGCATGCAGCATGCAT
GCATGCATGCATGCATGCATGCAT
                            date : 2011 12 08
GCATGCATGCATGCATGCATGCAT
GCATGCATGCATGCATGCATGCAT




                                                 photo by Togopic, Licensed under CreativeCommons 2.1 JP Attribution
Next-generation sequencing data
grazie per le informazioni @ma_ko   *   DNA   exon
*illumina   HiSeq 2000
https://twitter.com/#!/dritoshi/status/121817788200390656
short read from NGS




de novo assemble



  reference
   genome
                   reference alignment
short read from NGS



                                         de novo




de novo assemble



  reference
   genome
                   reference alignment
Velvet
http://www.ebi.ac.uk/~zerbino/velvet/
SOAPdenovo
http://soap.genomics.org.cn/soapdenovo.html


sequence assembly in wikipedia
http://en.wikipedia.org/wiki/Sequence_assembly
short read from NGS




de novo assemble



  reference
   genome
                   reference alignment
Chr1   Chr2   Chr3




CPU1   CPU2   CPU3
Troubles not yet shooted
https://twitter.com/#!/dritoshi/status/110559890413600768
https://twitter.com/#!/dritoshi/status/113546074760822784
https://twitter.com/#!/dritoshi/status/114675417998311425
http://bcbio.wordpress.com/tag/galaxy/
ITPro   http://itpro.nikkeibp.co.jp/article/NEWS/20110927/369510/
grazie per le informazioni @yag_ays!   asahi.com   http://www.asahi.com/digital/bcnnews/BCN201111240007.html
Large-scale data in life science
Large-scale data in life science
Large-scale data in life science
Large-scale data in life science
Large-scale data in life science

More Related Content

More from Tazro Ohta

遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティングTazro Ohta
 
Database Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataDatabase Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataTazro Ohta
 
Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Tazro Ohta
 
次世代おもろい話
次世代おもろい話次世代おもろい話
次世代おもろい話Tazro Ohta
 
第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション 第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション Tazro Ohta
 
Kusarinoko: developing the public next generation sequencing data search inte...
Kusarinoko: developing the public next generation sequencing data search inte...Kusarinoko: developing the public next generation sequencing data search inte...
Kusarinoko: developing the public next generation sequencing data search inte...Tazro Ohta
 
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方Tazro Ohta
 
JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011Tazro Ohta
 
"次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編""次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編"Tazro Ohta
 
Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Tazro Ohta
 

More from Tazro Ohta (11)

遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング
 
Database Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataDatabase Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence Data
 
Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...
 
次世代おもろい話
次世代おもろい話次世代おもろい話
次世代おもろい話
 
第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション 第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション
 
Kusarinoko: developing the public next generation sequencing data search inte...
Kusarinoko: developing the public next generation sequencing data search inte...Kusarinoko: developing the public next generation sequencing data search inte...
Kusarinoko: developing the public next generation sequencing data search inte...
 
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方
生存戦略、しましょうか。 或いは、もっともスマートな研究職の探し方
 
JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011
 
"次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編""次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編"
 
Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際
 
Jaspug 2010
Jaspug 2010Jaspug 2010
Jaspug 2010
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Large-scale data in life science