Your SlideShare is downloading. ×
0
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Hi Speed Datawarehousing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hi Speed Datawarehousing

1,425

Published on

Strategies and (emerging) technologies to boost the performance of you datawarehouse

Strategies and (emerging) technologies to boost the performance of you datawarehouse

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,425
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hi-Speed DataWarehousing Jos van Dongen, Tholis Consulting
  • 2. Agenda  Introduction • Why Hi-Speed DWH? • Where do we ‘Hi-speed’ the DWH?  Part 1: Hi-Speed Strategies • Upgrade • Extend • Migrate  Part 2: New Hi-Speed DWH solutions • What’s new? • Which products? • How fast are they? • What does it cost? Database Systems 2008 2 THOLIS CONSULTING
  • 3. Hi-Speed Why?  Growing data volume: • Gartner group: – 2007: 50% of DWH’s > 10 TB – 2011: 50% of DWH’s > 50 TB • < 5 TB is considered small (!)  Increasing workload: • Operational BI • Pervasive BI • Advanced Analytics & Mining Database Systems 2008 3 THOLIS CONSULTING
  • 4. Hi-Speed Where?  Datawarehousing: • Development • ETL • Query & Analysis • Maintenance (Index, aggregate, backup, restore, authorization, etc.)  Presentation focus: Query & Analysis Database Systems 2008 4 THOLIS CONSULTING
  • 5. Hi-Speed How? Upgrade Extend Migrate 2-5* 5-100* 10-400* Hardware - Add datamarts - DWH Appliances •Processing power - OLAP engines - HW/SW packages •Memory - (Datamart) Appliances - SW (roll your own) •Disk - ‘Buddy’ system - Outsource Software - Datastore replacement •64bit •New OS versions •New RDBMS versions Database Systems 2008 5 THOLIS CONSULTING
  • 6. Upgrade Hardware: Cost ‘no issue’ <2000: Solve performance problems in software (tuning, optimization) 2008: hardware is cheaper than time! Database Systems 2008 6 THOLIS CONSULTING
  • 7. Upgrade Hardware: Memory Feb 2008: 2 GB €41,- (PC) 4 GB €160,- (Server) Entry level server: 32 GB à €1.280,- Database Systems 2008 7 THOLIS CONSULTING
  • 8. Upgrade Hardware: CPU 2008 2004 Database Systems 2008 8 THOLIS CONSULTING
  • 9. Upgrade Hardware: disk + Access time 0,1 ms (vs. 4-5 ms SAS disk) 2008: SSD disks + I/O *2-3 + Power consumption 10-20% of HDD + Noise level 0 db - Still very expensive: €2.300,- for 128 GB EMC Symmetrix: SSD as ‘ultra high performance’ option Database Systems 2008 9 THOLIS CONSULTING
  • 10. Upgrade Software  64 bit OS is mandatory (32 bit max 4 GB)  Oracle 11g: • Cube Organized Materialized View • (auto) Partitioning options • Data Compression • Information LifeCycle Management • Hot Standby DB for real-time reporting  SQL Server 2008: • Partitioning • Data Compression • Win2008/SQL2008 doubles Win2003/SQL2005 performance! Database Systems 2008 10 THOLIS CONSULTING
  • 11. Extend: Add datamart(s)  Default in Inmon (CIF) architecture.  Dimensional: Often DM’s as views on DWH • Add OLAP engine (or replace RDMBS) for datamarts • Add Appliance for datamarts – Netezza started here (but is scaling up to EDW level) – TeraData scales ‘down’ to this level as well – Competitive ‘sweet spot’ for Appliance Vendors • Use alternative solution (see also ‘Roll your own’) Database Systems 2008 11 THOLIS CONSULTING
  • 12. Extend: ‘Buddy’ system  ParAccel ‘Amigo’: Q-Router handles all requests: • OLTP is executed on Database of Record • Analytical query is executed on ParAccel MPP Grid Database Systems 2008 12 THOLIS CONSULTING
  • 13. Extend: Datastore replacement  DatAupia ‘Satori’: Database Systems 2008 13 THOLIS CONSULTING
  • 14. Migrate: Appliances  ‘Traditional’ • TeraData • HP NeoView • Kognitio • DATAllegro • GreenPlum • Netezza Characteristics: • Plug and play • Combination of HW, SW, Support & Services Database Systems 2008 14 THOLIS CONSULTING
  • 15. Migrate: ‘Roll your own’  Mostly column based, MPP, Shared Nothing architectures  1 Established Vendor: Sybase IQ, since 1993  Wide choice of closed and open source products: • Open Source: LucidDB, MonetDB • Software only: Vertica*, ParAccel*, Brighthouse, ExaSol, Valentina, VectorStar, Tenbase, Sand, etc#. • Soft/hardware: Dataupia • ‘Lab’ware: Calpont *Also available as DWH Appliance #Mostly special purpose solutions, e.g. BigTable Database Systems 2008 15 THOLIS CONSULTING
  • 16. Part 2: New Hi-Speed Solutions  Since 2005, 4 new vendors on the market: • Vertica (Michael Stonebraker, $25Mln funding) • ParAccel (Barry Zane*, $20Mln funding) • DatAupia (Foster Hinshaw*, $16Mln funding) • InfoBright (Warsaw University, $8 Mln funding) * Netezza founders Database Systems 2008 16 THOLIS CONSULTING
  • 17. What’s different?  Massive Parallel Processing (MPP) • Throw lots of commodity hardware at it (see ‘Upgrade’)  Column based data organization • Limit I/O by ‘pruning’ (compare horizontal partitioning) • 1 datatype per column allows for heavy compression  Data compression • CPU is not the bottleneck, I/O is  Read optimization  In memory operation Database Systems 2008 17 THOLIS CONSULTING
  • 18. SMP vs MPP Different storage approaches: •Shared Disk (clustering) •Shared Nothing All DWH appliance & new software vendors use Shared Nothing architecture Database Systems 2008 18 THOLIS CONSULTING
  • 19. Rows vs Columns  Nothing new about column storage: Taxir, 1969  Conceptual view: Rows Columns Database Systems 2008 19 THOLIS CONSULTING
  • 20. Products: Vertica (1) • MPP • Shared Nothing • Column Storage • Compression • Read Optimized Architecture: WOS & ROS Architecture: Columns & projections Database Systems 2008 20 THOLIS CONSULTING
  • 21. Products: Vertica (2) Database Systems 2008 21 THOLIS CONSULTING
  • 22. Products: ParAccel (1) • Two implementation modes: Amigo* & Maverick • Two versions: in memory & disk based (no hybrid solution yet) • MPP • Shared Nothing • Compression • Parallel loader *SQL Server only; Oracle version in Beta Database Systems 2008 22 THOLIS CONSULTING
  • 23. Products: ParAccel (2) • High availability built in: • Shattered TPC-H benchmark: • Appliance partnership with Sun: • Phoenix all in memory DWH appliance • Sedona disk based VLDB Database Systems 2008 23 THOLIS CONSULTING
  • 24. Products: ExaSol • MPP • Column based • Auto tuning • In-Memory based • In-Memory Compression • ExaCluster OS Database Systems 2008 24 THOLIS CONSULTING
  • 25. Products: BrightHouse (1)  Uses MySQL as DBMS  Not columns but 64K Data Packs  Knowledge Grid and DP nodes replace traditional indexes  Heavy Compression (10:1) Database Systems 2008 25 THOLIS CONSULTING
  • 26. Products: BrightHouse (2) Database Systems 2008 26 THOLIS CONSULTING
  • 27. Products: DatAupia • Database Appliance • Adds MPP capability to DB/2, Oracle & SQL • ‘Invisible’ appliance • Lowest cost solution on the market • Plug and play: Database Systems 2008 27 THOLIS CONSULTING
  • 28. How Fast: TPC/H Benchmark Typical BI queries, e.g. • Top 10 of non-shipped orders on date x • Annual growth of marketshare • Profit per producttype, year and country • Profit share local suppliers • etc. Database Systems 2008 28 THOLIS CONSULTING
  • 29. Query Example: TPC-H Q9 -- $ID$ -- TPC-H/TPC-R Product Type Profit Measure Query (Q9) -- Functional Query Definition -- Approved February 1998 Select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc Database Systems 2008 29 THOLIS CONSULTING
  • 30. Remember last year? Mone tDB 3,6 LucidDB 224 MySQL51 360 'X' 548 Postgre SQL 892 MySQL50 2322 0 300 600 900 1200 1500 1800 2100 2400 • Qry 1-10 on SF2 (2 GB data) #sec/qry average • Single CPU, Single disk, 2GB Ram, Windows 2003 Database Systems 2008 30 THOLIS CONSULTING
  • 31. How Fast: TPC-H 100 & 300GB Database Systems 2008 31 THOLIS CONSULTING
  • 32. How Fast: TPC-H 1 TB Exa 100GB Par 100GB Exa 1TB Par 1TB Q1 28,3 3,2 104,7 13,4 Q2 4,9 12,8 7,6 101,4 Q3 10,3 6,8 49,2 36,2 Q4 4,9 9,4 14,3 115,8 But: So: Q5 9,8 9,6 26,2 35,3 Q6 5,2 1,6 Always 16,2 13,9 Q7 13,2 9,6 verify your 65 30,3 Q8 8,7 8,2 own workload 18,2 27,5 Q9 29,3 14,7 against your 122,3 50,5 Q10 12,1 11,7 own data! 32,2 30,1 Avg 12,67 8,76 45,36 45,67 Database Systems 2008 32 THOLIS CONSULTING
  • 33. Hi-Speed, what now?  Upgrading hardware might be the most time- and cost effective short term solution to performance problems  OK, this software is fast, but what about • ETL/ELT ? • (physical) Design? • Maintenance ? • Support ?  When you hit the limits of your traditional DWH: • Evaluate & Proof of Value  When will Oracle, Microsoft and IBM enter this arena? Database Systems 2008 33 THOLIS CONSULTING
  • 34. Database Systems 2008 34 THOLIS CONSULTING

×