Your SlideShare is downloading. ×
Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel

135
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
135
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © 2012 IBM Corporation1 Software and Hardware Infrastructures to conquer Data Explosion in Life Science - Life Science Network Basel Romeo Kienzler Data Scientist and Architect, Pos. Graduate in Information Systems and Bioinformatics IBM Innovation Center Zurich romeo.kienzler@ch.ibm.com https://www.ibm.com/developerworks/mydeveloperworks/profiles/user/RomeoKienzler
  • 2. © 2012 IBM Corporation2 Outline ● Data Growth ● Data Growth in Life Science ● BigData in Life Science ● How to address BigData? ● Outlook
  • 3. © 2012 IBM Corporation3 3 Data Growth Data AVAILABLE to an organization data an organization can PROCESS Missed opportunity 100 Million Tweets are posted every day, 35 hours of video are being uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net.80 % spam and viruses. => Filtering is more and more important. Up to 2003 the same amount of data has been produced as between 2003 and now
  • 4. © 2012 IBM Corporation4 New Data Sources in Life Sciences ● DNA (RNA) Sequencing ● Next-Generation Sequencing ● DNA Transistor ● Imaging and Video ● Unstructured Text
  • 5. © 2012 IBM Corporation5 Data Growth in Life Sciences Source: www.osehra.org
  • 6. © 2012 IBM Corporation6 Data Growth in Life Sciences Source: www.crops.org
  • 7. © 2012 IBM Corporation7 Examples - NGS
  • 8. © 2012 IBM Corporation8 Images and Videos Source: www.phys.org
  • 9. © 2012 IBM Corporation9 Examples – Text Analytics Source: www.theglobalistreport.com
  • 10. © 2012 IBM Corporation10 Watson
  • 11. © 2012 IBM Corporation11 SIIB (Strategic IP Insight Platform) Integrated chemical, biological and textual search Deep analytics on scientific literature and patents Aggregation of world wide Patent Data and scientific literature (30M+ docs) with ongoing updates
  • 12. © 2012 IBM Corporation12 The challange ● Store a huge amount of data ● Process a huge amount of data (incl. Search/Find) ● Don't consume too much energy
  • 13. © 2012 IBM Corporation13 Use many Hard Drives
  • 14. © 2012 IBM Corporation14 Use many Hard Drives
  • 15. © 2012 IBM Corporation15 Use many Hard Drives - Limits (*) Given a Disk Capacity of 25TB 300 Crashes per Day, Data Loss after two weeks
  • 16. © 2012 IBM Corporation16 Separate the Signal From the Noise¹ ¹http://www.ibmsystemsmag.com/power/businessstrategy/BI-and-Analytics/signal_noise/
  • 17. © 2012 IBM Corporation17 Store only what you need
  • 18. © 2012 IBM Corporation18 Use many CPU's Supercomputer before ➔ Weather ➔ Atom Bombs ➔ Science ➔ Crash Tests Supercomputer in a Rack ➔ 18 TB Main Memory, 1008 CPU Cores, 113 TFLOPS (1st TOP500 2013: 17590 TFLOPS 2004: 71 TFLOPS)
  • 19. © 2012 IBM Corporation19
  • 20. © 2012 IBM Corporation20 Use specialized CPU's: GPUs Source: www.ethz.ch Source: www.nvidia.com
  • 21. © 2012 IBM Corporation21 Use specialized CPU's: FPGA's Source: www.virtex.com
  • 22. © 2012 IBM Corporation22 Example FPGA: IBM Pure Data ● Up to 1,28 PB Storage ● Up to 10 Racks ● Up to 500 GigaByte/s Throughput ● Up to 1120 FPGA + 1120 Intel CPU Cores / 960 Hard Drives
  • 23. © 2012 IBM Corporation23 Example FPGA: Conveycomputers ● Accelerates BWA by 15x ● Accelerates Smith-Waterman Source: www.conveycomputer.com
  • 24. © 2012 IBM Corporation24 Example: Algorithms Source: www.biomedcentral.com/1471-2105/9/S2/S10
  • 25. © 2012 IBM Corporation25 Example: Cloud ● Managed Infrastructure ● Dynamic Provisioning ● Specialized HW ● SaaS Source: www.basespace.illumina.com
  • 26. © 2012 IBM Corporation26 Conclusion ● Main BigData Sources are Sequences and Plain Text ● Many others to come (e.g. Images and Videos) ● Store Data on many Commodity Hard Drives (Energy Problem not solved) ● Filter Signal from Noise ● Process Data on many CPU's ● Usage of specialized Hardware / CPU's ● Research in performance of algorithms
  • 27. © 2012 IBM Corporation27 Outlook ● Currently very heterogeneous infrastructures ● Trends: ● Virtualization ● Standardization ● Consumerization ● Limits ● Space ● Energy consumption ● What shall I do? ● RELAX
  • 28. © 2012 IBM Corporation28 The future will be full of surprises A battery powered pocket size super computer? Raspberry Pi Parallela
  • 29. © 2012 IBM Corporation29 Acknowledgements Slides 14 – 16 & 21 have been taken from a Keynote speech of Axel Köster, IBM Germany

×