• Like

Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

  • 740 views
Uploaded on

Learn how Calpont’s Analytic DBMS, InfiniDB®, combines the I/O benefits of columnar storage with map-reduction distribution of work to deliver a fast, scalable, and simple to implement data solution …

Learn how Calpont’s Analytic DBMS, InfiniDB®, combines the I/O benefits of columnar storage with map-reduction distribution of work to deliver a fast, scalable, and simple to implement data solution for large scale BI and analytics.

In this 30 minute presentation, Jim Tommaney, CTO of Calpont, will discuss the architectural foundations and functionality of InfiniDB, and how they combine to provide a complete DBMS offering for near real-time analytics - giving BI and analytics organizations the power to dive deep into their data and examine any and all attributes for a wide view of that data, as fast the business dictates. NoSQL enables large data environments, but is often saddled with sub-optimal analytic query performance. InfiniDB releases the analytic performance constraints.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
740
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
10
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Calpont InfiniDB®Accelerating Data InsightsAccelerating Data Insights ®Scalable Analytics for Your NoSQL Big Data Jim Tommaney, CTO Calpont NoSQL Now August 24, 2011  Calpont Proprietary and Confidential
  • 2. Key Takeaways • Calpont and InfiniDB • Architecture – Columnar Storage Architecture  Columnar Storage • Architecture – Map Reduction Distribution of Work • Performance Characteristics Performance Characteristics • Ease of Use and Flexibility • ExtensibilityInfiniDB® Scalable. Fast. Simple. 2 Copyright © 2011 Calpont.  All Rights Reserved.
  • 3. Calpont Corporation • Company o Privately held and backed o Headquartered in Frisco TX Headquartered in Frisco, TX • Products Our Mission o InfiniDB Enterprise InfiniDB Enterprise To provide a Launched February 2010 scalable data platform that o InfiniDB Community enables analytic Launched in October, 2009 business decisions as timely as customers and markets dictate. ®InfiniDB® Scalable. Fast. Simple. 3 Copyright © 2011 Calpont.  All Rights Reserved.
  • 4. InfiniDB Release Highlights • Version 1.0 ‐ Oct. 2009/Feb. 2010 o Columnar storage.g o Map‐reduction distribution of work. o High speed data load. • Version 1.5 – June 2010 o Sub‐query added to map‐reduction framework. Select, From, Where clause support.   S l F Wh l Correlated, Non‐Correlated sub‐query.InfiniDB® Scalable. Fast. Simple. 4 Copyright © 2011 Calpont.  All Rights Reserved.
  • 5. InfiniDB Release Highlights • Version 2.0 – November 2010 o Compression with real‐time decompression. p p o User‐defined functions, fully parallel and distributed. Latitude/longitude distance calculation. Geo‐Fencing ‐ is a location within polygon. o Enhanced partition elimination. o E h Enhanced parallelization of reduction operations. d ll li i f d i i • Version 2.1 – March 2011 o Statistical aggregate functions Statistical aggregate functions. o View support. o Auto‐increment o Insert‐select.InfiniDB® Scalable. Fast. Simple. 5 Copyright © 2011 Calpont.  All Rights Reserved.
  • 6. InfiniDB Release Highlights • Version 2.2 – June 2011 o Group_concat and bit aggregate functions. p_ gg g o Additional scalar functions made parallel and distributed. o Improved performance and memory for large strings.   • Version 3.0 – Q4/Q1 o Cl d h d Cloud shared nothing. hi o Distributed/parallel load.InfiniDB® Scalable. Fast. Simple. 6 Copyright © 2011 Calpont.  All Rights Reserved.
  • 7. Technology Trends M o o re s L a w a n d B e y o n d 300 D ata W arehous e Grow th - 75% 250 Mem ory C apac ity - 60% D is k C apac ity - 50% 200 Moore s Moores Law (C P U) - 45% Percent Increase D is k B andw idth - 40% 150 Mem ory B andw idth - 20% D is k Latenc y - 10% P 100 Mem ory Latenc y -10% 50 0 5 6 7 8 9 10 Ye ar sInfiniDB® Scalable. Fast. Simple. 7 Copyright © 2011 Calpont.  All Rights Reserved.
  • 8. Trends Drive Demand for Alternate Solutions M o o re s L a w a n d B e y o n d 300 D ata W arehous e Grow th - 75% 250 Mem ory C apac ity - 60% D is k C apac ity - 50% 200 Moore s Moores Law (C P U) - 45% Percent Increase D is k B andw idth - 40% 150 Mem ory B andw idth - 20% D is k Latenc y - 10% P 100 Mem ory Latenc y -10% 50 0 5 6 7 8 9 10 Ye ar sInfiniDB® Scalable. Fast. Simple. 8 Copyright © 2011 Calpont.  All Rights Reserved.
  • 9. Traditional Row/Index Based DBMS for Analytics M o o re s L a w a n d B e y o n d 300 D ata W arehous e Grow th - 75% 250 Mem ory C apac ity - 60% D is k C apac ity - 50% 200 Moore s Moores Law (C P U) - 45% Percent Increase D is k B andw idth - 40% 150 Mem ory B andw idth - 20% D is k Latenc y - 10% Index Operations I d O ti P 100 Mem ory Latenc y -10% 50 0 5 6 7 8 9 10 Ye ar sInfiniDB® Scalable. Fast. Simple. 9 Copyright © 2011 Calpont.  All Rights Reserved.
  • 10. InfiniDB Technology Foundations M o o re s L a w a n d B e y o n d 300 D ata W arehous e Grow th - 75% 250 Mem ory C apac ity - 60% • Scalable Disk D is k C apac ity - 50% • Scalable Cache 200 Moore s Moores Law (C P U) - 45% • Real‐time Decompression l Percent Increase D is k B andw idth - 40% • Efficient I/O from cache 150 Mem ory B andw idth - 20% • Efficient I/O from disk D is k Latenc y - 10% P 100 No Random I/O Operations     Mem ory Latenc y / d -10% 50 0 5 6 7 8 9 10 Ye ar sInfiniDB® Scalable. Fast. Simple. 10 Copyright © 2011 Calpont.  All Rights Reserved.
  • 11. InfiniDB Architecture Columnar Storage
  • 12. InfiniDB Architecture – Columnar Storage What is Columnar Storage ? Column 1 File 1 Column 2 File 2 Column 3 File 3 • Stores each column for a table in a  different file/block on disk. o Column 1 values stored in file 1. o C l Column 2 values stored in file 2. 2 l d i fil 2 o Column 3 values stored in file 3. 12InfiniDB® Scalable. Fast. Simple. 12 Copyright © 2011 Calpont.  All Rights Reserved.
  • 13. InfiniDB Architecture – Columnar Storage • Rows are identified by offset.  Row 101  Column 1 File 1 Column 2 File 2 Column 3 File 3 can be found at: o Column 1 value is at offset 101 in file1. o Column 2 value is at offset 101 in file2. o C l Column 3 value is at offset 101 in file3. 3 l i t ff t 101 i fil 3 Offset 101 1234 2012‐01‐01 Smith 13InfiniDB® Scalable. Fast. Simple. 13 Copyright © 2011 Calpont.  All Rights Reserved.
  • 14. InfiniDB Architecture – Column RestrictionCol 1 Col 2 Col 3 Col 90File 1 File 2 File 3 File 90 Restriction ‐ find rows based on filters • Column Filter (filter 1 filter 2 filter 3) Column Filter  (filter 1, filter 2, filter 3) • Table Expression/Functions (exp 1, exp 2) • Join Filter (join 1, join 2, join 3) Join Filter (join 1, join 2, join 3) … Just‐in‐time column access defers I/O until  needed.  14InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.
  • 15. InfiniDB Architecture – Column ProjectionCol 1 Col 2 Col 3 Col 90File 1 File 2 File 3 File 90 Projection – display columns as selected. • Select Column Filter (filter 1 filter 2 Select Column Filter  (filter 1, filter 2,  filter 3, etc.) … Just do I/O for: • Columns selected • Rows that pass the filters 15InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.
  • 16. Column Restriction and Projection |------- |------- Column # Seve |------- Extent # 5 -------- Column # F -- -------- Co Filter 3 Projection Projection Filter 1 Filter 2 olumn # S ------- Four ------- enteen --- Six Extent # 27 ---------| ---------| ---------| • Automatic Vertical Partitioning and Horizontal Partitioning • Just‐In‐Time MaterializationInfiniDB® Scalable. Fast. Simple. 16 Copyright © 2011 Calpont.  All Rights Reserved.
  • 17. InfiniDB Architecture – Columnar Storage InfiniDB Eliminates: InfiniDB Adds: • Full Table Scan • Efficient I/O • Random I/O   • Real‐time Compression • Index Load Overhead  • Fast, predictable Load • Conditional  • Predictable Performance Performance 17InfiniDB® Scalable. Fast. Simple. 17 Copyright © 2011 Calpont.  All Rights Reserved.
  • 18. InfiniDB Architecture  Map Reduction Framework Map Reduction Framework
  • 19. InfiniDB – Two Tier Architecture or … Purpose built for big data analytics. Purpose built for big data analytics • User Module (UM) Single Server Understands SQL. Q • Performance Module (PM) Operates on data blocks.InfiniDB® Scalable. Fast. Simple. 19 Copyright © 2011 Calpont.  All Rights Reserved.
  • 20. Tiered MPP Building Blocks Module Process Functionality Value • Hosts MySQL  Familiar DBMS interface MySQL • Connection management Leverages existing partner integrations • SQL parsing & optimization Delivers full SQL syntax support Enables shared nothing and shared  • Abstracts physical and logical  p y g everything storage everything storage Extent Map storage Enables partition elimination • Metadata store Built‐in failover Independent scalability and tunable  • Work distribution Work distribution concurrency ExeMgr • Final results management and  Multi‐threaded to take advantage of multi‐ aggregation core HW platformsInfiniDB® Scalable. Fast. Simple. 20 Copyright © 2011 Calpont.  All Rights Reserved.
  • 21. Tiered MPP Building Blocks Module Process Functionality Value • Scale‐out cache management Independent scalability and tunable  • Distributed scan, filter, join and  b d fl d performance f PrimProc aggregation operations Multi‐threaded to take advantage of multi‐ • Resource management core HW platforms • High Speed Bulk Load g p Enables concurrent reads and writes, non‐ blocking read enabled Data • Transactional DML and DDL Multi‐threaded to take advantage of multi‐ • Online schema extensions core HW platformsInfiniDB® Scalable. Fast. Simple. 21 Copyright © 2011 Calpont.  All Rights Reserved.
  • 22. Tiered MPP Building Blocks What is the basic unit of work within the Performance Module? • One thread working on a range of rows.  Typically 1/2 million rows,  stored in a few hundred blocks of data. • Execute all column operations required (restriction and projection). • Execute any group by/aggregation against local data. • R t Return results to User Module.  lt t U M d l • Primitives are run in parallel and fully distributed (MPP).  InfiniDB® Scalable. Fast. Simple. 22 Copyright © 2011 Calpont.  All Rights Reserved.
  • 23. InfiniDB Performance  Characteristics Ch t i ti
  • 24. InfiniDB Load Performance • Load rate capable of 1 million rows/second depending  on disk and data model.  on disk and data model. • Consistent load rate over time. TIMEInfiniDB® Scalable. Fast. Simple. 24 Copyright © 2011 Calpont.  All Rights Reserved.
  • 25. InfiniDB Load Performance • Through 60 billion rows Through 60 billion rows. • Through 225 billion rows. g • Through 1.031 trillion rows.InfiniDB® Scalable. Fast. Simple. 25 Copyright © 2011 Calpont.  All Rights Reserved.
  • 26. InfiniDB Query Performance – Percona SSBInfiniDB® Scalable. Fast. Simple. 26 Copyright © 2011 Calpont.  All Rights Reserved.
  • 27. Performance Benchmark – Percona SSB Percona External Test vs. Internal Tests vs. 16 PMs @ AWS cached queries, scale factor 1000 50000 1PM 45000 2PMs 40000 4PMs 16PMS (AWS) 35000 InfoBright - Percona 30000 Lucid - Percona Seconds InfiniDB - Percona 25000 9,694.53 9 694 53 S 20000 15000 6,867.74 10000 5000 0 Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3InfiniDB® Scalable. Fast. Simple. 27 Copyright © 2011 Calpont.  All Rights Reserved.
  • 28. SSB Queries on Amazon Web Services (AWS) InfiniDB Internal vs. InfiniDB @ AWS - cached queries, scale factor 1000 1200 1PM 2PMs 1000 4PMs 16PMS (AWS) 800 seconds 600 s 400 7.83 200 0 Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3InfiniDB® Scalable. Fast. Simple. 28 Copyright © 2011 Calpont.  All Rights Reserved.
  • 29. Asia Region Distributor Benchmark InfiniDB (1 PM) InfiniDB (2 PMs) Legacy Columnar DBMS-X Row-BasedInfiniDB® Scalable. Fast. Simple. 29 Copyright © 2011 Calpont.  All Rights Reserved.
  • 30. Typical Proof‐of‐Concept ResultsInfiniDB® Scalable. Fast. Simple. 30 Copyright © 2011 Calpont.  All Rights Reserved.
  • 31. InfiniDB Ease of Use 
  • 32. InfiniDB Ease of Use – Load and GoInfiniDB Load and Go Experience: 1. Create Table. 2. Load Data. 3. Enjoy Performance. 3 Enjoy PerformanceInfiniDB® Scalable. Fast. Simple. 32 Copyright © 2011 Calpont.  All Rights Reserved.
  • 33. InfiniDB Ease of Use – Automatic Everything • Column storage happens automatically. • Compression  happens automatically. p pp y • Which compression to use happens automatically. • No index build or maintenance. • Extent map partition behavior happens automatically. • Distribution of data across server/disk resources happens  automatically. automatically • Distribution of work happens automatically. • Ad‐hoc performance happens automatically. Ad hoc performance happens automatically.InfiniDB® Scalable. Fast. Simple. 33 Copyright © 2011 Calpont.  All Rights Reserved.
  • 34. Full Featured SQL to Map‐Reduction Mapping Robust Column‐Aware Optimizer Handles: o Filter order optimization. o Join order optimization. Powerful Join Optimizations Handle: P f lJ i O i i i H dl o Inner join, outer join, semi‐join (sub‐query). o N‐table single step hash‐join (up to 60). N table single step hash join (up to 60).   Queue‐Based Scheduling of Performance Module Handles: o Automatically parallelizes query. o Allows small queries to get in, and return, while larger query is  running. runningInfiniDB® Scalable. Fast. Simple. 34 Copyright © 2011 Calpont.  All Rights Reserved.
  • 35. Full Featured Mapping from SQL to Map‐Reduce Robust Tools to Maximize Physical I/O: o Reading only the columns selected to avoid I/O. o Just‐in‐time materialization to avoid I/O. o Automatic partition elimination to avoid I/O. o S l bl d t b ff Scalable data buffer cache to avoid I/O from disk. h t id I/O f di k o Compression to minimize the bytes read from disk. Extensible User Defined Function (UDF): o UDFs run as full‐featured functions within InfiniDB. o Gain full benefits of Optimizer, Join, Scheduler, and Physical I/O  features.  InfiniDB® Scalable. Fast. Simple. 35 Copyright © 2011 Calpont.  All Rights Reserved.
  • 36. InfiniDB Ease of Use – Avoiding Trade‐Offs Traditional (and some current) DBMS technologies often involve  significant trade‐offs that just don’t exist within InfiniDB. Load Rate  vs.  More Indexes. More Attributes  vs.  M Att ib t Better Performance B tt P f Summary Tables vs. Real‐time access to data p Save Space      vs.  Q y Query PerformanceInfiniDB® Scalable. Fast. Simple. 36 Copyright © 2011 Calpont.  All Rights Reserved.
  • 37. InfiniDB Extensibility
  • 38. Extensibility for Big Data Big Data and Extensibility • Data size continues to escalate. • New uses of data to drive business actions.   • New attributes and dimensions are continually being included.InfiniDB® Scalable. Fast. Simple. 38 Copyright © 2011 Calpont.  All Rights Reserved.
  • 39. InfiniDB Extensibility – Scale Efficiently Handling Data Scale • InfiniDB scales with your data. y • Scalability combined with very efficient I/O.   o Columnar storage. o Just‐in‐time materialization. o Partition elimination. o Scalable cache Scalable cache.  o Columnar compression. InfiniDB® Scalable. Fast. Simple. 39 Copyright © 2011 Calpont.  All Rights Reserved.
  • 40. InfiniDB Extensibility – Online Schema Changes Schema Changes • The InfiniDB columnar architecture eliminates table rebuilds. • New column files are added without change to existing columns.   • InfiniDB also allows for these column additions to be handled as  on‐line operations. InfiniDB® Scalable. Fast. Simple. 40 Copyright © 2011 Calpont.  All Rights Reserved.
  • 41. InfiniDB Extensibility – Business Logic The Data Driven Business • Extend your analytics capability with InfiniDB’s User Defined  y y p y (parallel and distributed) Functions. • Reactive and predictive analysis of your data: o Quickly kl o Predictably • Remove Barriers Remove Barriers o No waiting for new aggregates to be built. o No waiting for new code to be written.InfiniDB® Scalable. Fast. Simple. 41 Copyright © 2011 Calpont.  All Rights Reserved.
  • 42. InfiniDB Connectivity with Hadoop™ The bi‐directional InfiniDB‐Hadoop connector is designed to transfer  data between the InfiniDB database and the Hadoop Cluster by  implementing Hadoop versions of InfiniDBInputFormat and  InfiniDBOutputFormat Classes for the Hadoop framework.  Calpont InfiniDB® – Hadoop™ Connector ‐ Coming September 2011InfiniDB® Scalable. Fast. Simple. 42 Copyright © 2011 Calpont.  All Rights Reserved.
  • 43. InfiniDB Customer  Experience
  • 44. InfiniDB Customer Experience A number of customer case studies are available at  www.calpont.com for further detail, but the key differential features  as to why customers are choosing InfiniDB include: •P f Performance at scale.   t l • Large number of dimensions. ® • Ad‐hoc query performance Ad hoc query performance.   • Unique record analysis.   • Near real‐time load capability.  • Faster time to market. • Predictable query performance.InfiniDB® Scalable. Fast. Simple. 44 Copyright © 2011 Calpont.  All Rights Reserved.
  • 45. Key Takeaways The InfiniDB Performance Architecture • Architecture – Columnar Storage Architecture  Columnar Storage • Architecture – Map Reduction Distribution of Work The InfiniDB Deployment Experience • Performance Characteristics Performance Characteristics • Ease of Use and Flexibility •EExtensibility ibiliInfiniDB® Scalable. Fast. Simple. 45 Copyright © 2011 Calpont.  All Rights Reserved.