Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database

on

  • 1,705 views

This Presentation is from the 2012 Strata Conference and Looks at the Synergies of Column Storage and Map Reduce....

This Presentation is from the 2012 Strata Conference and Looks at the Synergies of Column Storage and Map Reduce.

The presentation titled, “Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database,” was presented by Fernanda Foertter, HPC Scientific Programmer at Genus Plc, and Jim Tommaney, CTO at Calpont in March of 2012. They discussed how the team at Genus discovered an innovative way to store and access the huge volumes of data being generated modeling genotypes. The presentation also discussed the benefits of column storage and how InfiniDB’s built in map-reduce empowers high performance Big Data analytics.

A copy of the presentation is on You Tube at:
http://www.youtube.com/watch?v=m55CeVYCTSk&feature=player_detailpage


Statistics

Views

Total Views
1,705
Views on SlideShare
1,611
Embed Views
94

Actions

Likes
1
Downloads
31
Comments
0

5 Embeds 94

http://www.calpont.com 62
http://calpont.hathway.us 27
http://calpont.ruizmcpherson.com 3
http://webcache.googleusercontent.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database Presentation Transcript

  • 1. Calpont InfiniDB®Accelerating Data InsightsHuge Data Analytics: Calpont InfiniDBColumnar DBMS Empowers New Researchwith The World’s First Searchable GenotypeDatabaseStrata Conference 2012 Calpont Proprietary and Confidential
  • 2. Today’s Agenda •Introduction of today’s speakers •What is InfiniDB? •Announced today: InfiniDB 3 •Huge Data Analytics: InfiniDB Empowers New Research with The World’s First Searchable Genotype Database •Questions •More information and resourcesInfiniDB® Scalable. Fast. Simple. 2 Copyright © 2012 Calpont. All Rights Reserved.
  • 3. Today’s Presenters Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont CorporationInfiniDB® Scalable. Fast. Simple. 3 Copyright © 2012 Calpont. All Rights Reserved.
  • 4. What is InfiniDB?
  • 5. Calpont Corporation • Company o Privately held and backed Calpont Mission o Offices To provide a highly  Dallas (Headquarters) scalable data  Silicon Valley platform that enables analytic business • Business decisions as timely o Scale-out MPP analytic database as customers and markets dictate. o MySQL Columnar + Map Reduction o Commercial Open Core model • Products o InfiniDB Enterprise  Forthcoming 4th major release o InfiniDB Community  Modified Open Source licenseInfiniDB® Scalable. Fast. Simple. 5 Copyright © 2012 Calpont. All Rights Reserved.
  • 6. Innovative Companies Turning to InfiniDBInfiniDB® Scalable. Fast. Simple. 6 Copyright © 2012 Calpont. All Rights Reserved.
  • 7. What is InfiniDB? ® Scalable Fast SimpleInfiniDB® Scalable. Fast. Simple. 7 Copyright © 2012 Calpont. All Rights Reserved.
  • 8. What is InfiniDB? Big Data Analytics Engine Full-Featured Familiar MySQL SQL Look and Feel InfiniDB Game Changing PerformanceInfiniDB® Scalable. Fast. Simple. 8 Copyright © 2012 Calpont. All Rights Reserved.
  • 9. Focus on Analytics Workloads InfiniDB is … Engineered for large queries Engineered for ad-hoc flexibility Analytics, not OLTP Unique combination of columnar + map-reduceInfiniDB® Scalable. Fast. Simple. 9 Copyright © 2012 Calpont. All Rights Reserved.
  • 10. What is InfiniDB? ® Scalable Fast SimpleInfiniDB® Scalable. Fast. Simple. 10 Copyright © 2012 Calpont. All Rights Reserved.
  • 11. InfiniDB – Two Tier Architecture or … Purpose built for big data analytics. • User Module (UM) Single Server Understands SQL. • Performance Module (PM) Operates on data blocks.InfiniDB® Scalable. Fast. Simple. 11 Copyright © 2012 Calpont. All Rights Reserved.
  • 12. InfiniDB Performance Foundations ® The Power and Scale of Map-Reduce plus Transformational I/O EfficiencyInfiniDB® Scalable. Fast. Simple. 12 Copyright © 2012 Calpont. All Rights Reserved.
  • 13. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ SQL Operations are mapped to Performance Module threads • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions Results are returned to User Module in Reduce PhaseInfiniDB® Scalable. Fast. Simple. 13 Copyright © 2012 Calpont. All Rights Reserved.
  • 14. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ InfiniDB is not: … a hadoop style map-reduce framework.InfiniDB® Scalable. Fast. Simple. 14 Copyright © 2012 Calpont. All Rights Reserved.
  • 15. Power and Scalability of Map-Reduce Map ↓↓↓↓↓ Reduce ↑↑↑↑↑ InfiniDB is: … custom built and highly optimized map- reduce framework for queries.InfiniDB® Scalable. Fast. Simple. 15 Copyright © 2012 Calpont. All Rights Reserved.
  • 16. Transformational I/O Efficiency Techniques to Avoid Unnecessary I/O oVertical Partitioning: read only the columns required oHorizontal Partition: focus on the rows required oJust-in-time materializationInfiniDB® Scalable. Fast. Simple. 16 Copyright © 2012 Calpont. All Rights Reserved.
  • 17. Transformational I/O Efficiency Techniques for Efficient I/O oColumnar compression reduces I/O from disk oGlobal data buffer cache can reduce disk I/O oReal-time decompression accelerates reads from disk oAvoidance of Random I/OInfiniDB® Scalable. Fast. Simple. 17 Copyright © 2012 Calpont. All Rights Reserved.
  • 18. Simple - Automatic Everything • Vertical Partitioning • Horizontal Partitioning Simple • Compression • Compression Algorithm Selection • Distribution of data across disk resources • Distribution of work across CPU resourcesInfiniDB® Scalable. Fast. Simple. 18 Copyright © 2012 Calpont. All Rights Reserved.
  • 19. InfiniDB ® Scalable Fast SimpleInfiniDB® Scalable. Fast. Simple. 19 Copyright © 2012 Calpont. All Rights Reserved.
  • 20. InfiniDB 3 Announced Today
  • 21. InfiniDB 3: It is Now Possible... InfiniDB 3 21InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 22. Today’s Presenters Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont CorporationInfiniDB® Scalable. Fast. Simple. 22 Copyright © 2012 Calpont. All Rights Reserved.
  • 23. Where I WorkInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 24. Genetic Evaluation Breeding ValuesInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 25. Phenotype: Meat QualityInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 26. Selection for Lean Growth 1980 2005InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 27. Selection for Lean Growth 1980 2005InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 28. Halothane Gene (1991) • Gene is associated o High carcass yield (NN) o Stress triggers hyperthermia o Poor meat quality (Nn/nn) XInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 29. DNA Marker Use 2004 1999 Large-scale SNP discovery FUT1 & PRKAG3 1991 1994 1998 2003 HAL ESR RN & MC4R MIS 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1990 Large-scale SNP discovery, 2009 genome scans, sequencing 1991 - 2002 Single genes, QTL Candidate genesInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 30. Sudden Data Growth 70000 Porcine SNP Panel Density 60000 Number of SNPs 50000 40000 30000 20000 10000 0 2004 2005 2006 2007 2008 2009InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 31. Sudden Data Growth Sample Collection16,000,000 3,500,000 Animals (cumulative) Tissue(cumulative)14,000,000 3,000,00012,000,000 2,500,00010,000,000 2,000,000 8,000,000 6,000,000 1,500,000 4,000,000 1,000,000 2,000,000 500,000 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Year Year InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 32. Genetic Evaluation EBV economic weights Lean Yield Meat Quality Robustness Feed efficiency Etc Index = a1 × EBV1 + a2 × EBV2 + . . .InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 33. Data PipelineInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 34. Genomic Data DelugeInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 35. Project: Genotyping DB The Need Other Considerations • Accumulating SNP chip data • Store large data…BIG data • Difficulty searching through • Scalable • Next Gen Sequencing • Alternative to Oracle • Cheaper SNP chips • Minimally impact • LOTS of animals infrastructure • Other projects needed the • Easy for scientists to use dataInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 36. What Do Vendors Provide for Genotype Data? nothingInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 37. Think Outside the (Vendor’s) Box…InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 38. All Databases are Not Created EqualInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 39. All Vehicles are Not Created EqualInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 40. Genomic DataInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 41. SNP Data Animal ID SNP1 SNP2 SNP3 … SNP65K 1 0 1 2 1 2 2 1 1 0 0 0 3 4 5 1 2 2 0 2 … XXXXInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 42. Single Research Cohort What about selection and cohort comparisons?InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 43. Column Bases Make More SenseInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 44. InfiniDB: Parallel Columnar DB 2 3 7 9InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 45. Complicated Searches are Faster!InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 46. Scales for a Fraction of the Cost Compression Up 75% Speed vs RDBMS 15X faster Scalability 100’s TB, parallel queries/ingest Cost vs Oracle 25%InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 47. Future Projects: Imputation $150 $150 $15 $15InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 48. Caution: Data multiplies in a BIG wayInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 49. Conclusions • Helps to have a deep understanding of the scientific problems being solved • Have a good understanding of the data access pattern • Tool should solve 80% of the highest use patterns • Use combination of software, hardware knowledge to improve performance • Think “out of the vendor box”, especially where research is cutting edge • Take the lead to show new tools users may not even be aware they want/ needInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 50. QuestionsInfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 51. More Information on InfiniDB Visit us at: o www.Calpont.com o www.InfiniDB.org o Visit Booth #414 to register to win an iPad 3InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
  • 52. InfiniDB® Scalable. Fast. Simple. Enter for a Chance to Win an iPad 3 52 Copyright © 2012 Calpont. All Rights Reserved.