Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel

550 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
550
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel

  1. 1. © 2012 IBM Corporation1 Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel Romeo Kienzler Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics IBM Innovation Center Zurich romeo.kienzler@ch.ibm.com https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler
  2. 2. © 2012 IBM Corporation2 Outline ● Data Growth ● Data Growth in Life Science ● BigData in Life Science ● How to address BigData? ● Outlook
  3. 3. © 2012 IBM Corporation3 3 Data Growth Data AVAILABLE to an organization data an organization can PROCESS Missed opportunity 100 Million Tweets are posted every day, 35 hours of video are being uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net.80 % spam and viruses. => Filtering is more and more important. Up to 2003 the same amount of data has been produced as between 2003 and now
  4. 4. © 2012 IBM Corporation4 New Data Sources in Life Sciences ● DNA (RNA) Sequencing ● Next-Generation Sequencing ● DNA Transistor ● Imaging and Video ● Unstructured Text
  5. 5. © 2012 IBM Corporation5 Data Growth in Life Sciences Source: www.osehra.org
  6. 6. © 2012 IBM Corporation6 Data Growth in Life Sciences Source: www.crops.org
  7. 7. © 2012 IBM Corporation7 Examples - NGS
  8. 8. © 2012 IBM Corporation8 Images and Videos Source: www.phys.org
  9. 9. © 2012 IBM Corporation9 Examples – Text Analytics Source: www.theglobalistreport.com
  10. 10. © 2012 IBM Corporation10 Watson
  11. 11. © 2012 IBM Corporation11 SIIB (Strategic IP Insight Platform) Integrated chemical, biological and textual search Deep analytics on scientific literature and patents Aggregation of world wide Patent Data and scientific literature (30M+ docs) with ongoing updates
  12. 12. © 2012 IBM Corporation12 The challange ● Store a huge amount of data ● Process a huge amount of data (incl. Search/Find) ● Don't consume too much energy
  13. 13. © 2012 IBM Corporation13 Use many Hard Drives
  14. 14. © 2012 IBM Corporation14 Use many Hard Drives
  15. 15. © 2012 IBM Corporation15 Use many Hard Drives - Limits (*) Given a Disk Capacity of 25TB 300 Crashes per Day, Data Loss after two weeks
  16. 16. © 2012 IBM Corporation16 Separate the Signal From the Noise¹ ¹http://www.ibmsystemsmag.com/power/businessstrategy/BI-and-Analytics/signal_noise/
  17. 17. © 2012 IBM Corporation17 Store only what you need
  18. 18. © 2012 IBM Corporation18 Use many CPU's Supercomputer before ➔ Weather ➔ Atom Bombs ➔ Science ➔ Crash Tests Supercomputer in a Rack ➔ 18 TB Main Memory, 1008 CPU Cores, 113 TFLOPS (1st TOP500 2013: 17590 TFLOPS 2004: 71 TFLOPS)
  19. 19. © 2012 IBM Corporation19
  20. 20. © 2012 IBM Corporation20 Use specialized CPU's: GPUs Source: www.ethz.ch Source: www.nvidia.com
  21. 21. © 2012 IBM Corporation21 Use specialized CPU's: FPGA's Source: www.virtex.com
  22. 22. © 2012 IBM Corporation22 Example FPGA: IBM Pure Data ● Up to 1,28 PB Storage ● Up to 10 Racks ● Up to 500 GigaByte/s Throughput ● Up to 1120 FPGA + 1120 Intel CPU Cores / 960 Hard Drives
  23. 23. © 2012 IBM Corporation23 Example FPGA: Conveycomputers ● Accelerates BWA by 15x ● Accelerates Smith-Waterman Source: www.conveycomputer.com
  24. 24. © 2012 IBM Corporation24 Example: Algorithms Source: www.biomedcentral.com/1471-2105/9/S2/S10
  25. 25. © 2012 IBM Corporation25 Example: Cloud ● Managed Infrastructure ● Dynamic Provisioning ● Specialized HW ● SaaS Source: www.basespace.illumina.com
  26. 26. © 2012 IBM Corporation26 Conclusion ● Main BigData Sources are Sequences and Plain Text ● Many others to come (e.g. Images and Videos) ● Store Data on many Commodity Hard Drives (Energy Problem not solved) ● Filter Signal from Noise ● Process Data on many CPU's ● Usage of specialized Hardware / CPU's ● Research in performance of algorithms
  27. 27. © 2012 IBM Corporation27 Outlook ● Currently very heterogeneous infrastructures ● Trends: ● Virtualization ● Standardization ● Consumerization ● Limits ● Space ● Energy consumption ● What shall I do? ● RELAX
  28. 28. © 2012 IBM Corporation28 The future will be full of surprises A battery powered pocket size super computer? Raspberry Pi Parallela
  29. 29. © 2012 IBM Corporation29 Acknowledgements Slides 14 – 16 & 21 have been taken from a Keynote speech of Axel Köster, IBM Germany

×