The IBM Netezza Data Warehouse Appliance

5,457 views

Published on

Netezza - Ett enklare sätt till smart analys.
Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Jacques Milman, Datawarehouse Architecture Leader, IBM

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,457
On SlideShare
0
From Embeds
0
Number of Embeds
137
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Netezza the new IBM business intelligence appliance
  • That’s exactly what the Netezza TwinFin is in the data warehousing and analytics world – a true appliance, which sets it apart from the competition It is engineered from the ground up for data warehousing and analytics And offers a complete solution that integrates database, server and storage together It supports standard interfaces such as ODBC, JDBC and ANSI SQL, making it very easy to deploy. Takes 2 days to get up and running versus weeks for other solutions The appliance characteristics translate into real value for customers (what we refer to as the 4 S’s)TwinFin is 10-100X faster than competitors like Oracle, Teradata and others. When analytic queries take seconds instead of hours to perform, customers get the opportunity to completely rethink their business processes and in some cases, even launch entirely new businesses The appliance is unlike anything that DBAs and IT teams have experienced in the past. Whereas Oracle and Teradata data warehouses require armies of specialists to manage, Netezza offers performance out-of-the-box, without requiring any tuning, indexing, aggregations, etc. A single appliance scales to more than a petabyte of user data capacity, not just acting as a repository for information, but allowing complex analytics to be conducted at-scale, on all the enterprise data By embedding analytics deep into the data warehouse, TwinFin powers high performance advanced analytics 100’s or even 1000’s of times faster than possible before(animation) But of all these benefits Simplicity is probably the most important, and impossible to achieve with conventional warehouse appliances. And simplicity is not just about initial commissioning of the appliance. It’s about the ease with which you can add more apps, and more data without expensive & time consuming re-tuning. Soyou get sustained agility and cost savings throughout the lifetime of the appliance.
  • This is a logical view of how all the components are configured in Netezza’s unique AMPP architecture, which combines an SMP front-end with a shared nothing MPP back-end for query processing. As a purpose-built appliance for high speed analytics, its power comes not from the most powerful and expensive components but from how the right components are assembled and work together to maximize performance. Each S-Blade operates on multiple data streams. The architecture offers linear scalability by adding more S-Blades and disk enclosures to the appliance in a balanced fashion. More than a thousand of these customized MPP streams work together to “divide and conquer” the workload in the largest TwinFin.The architecture also offers tremendous flexibility in creating different types of appliances by independently varying the number of disks, S-Blades or the RAM available on the S-Blades.
  • Many first generation warehouses are based on databases designed and optimized to run online transactions processing systems A relational database such as Oracle RAC is deployed on a general purpose server for example from Sun attached to storage from a vendor such as EMC This data warehouse infrastructure is COMPLEX with many moving parts to deploy, configure and manage. And complex systems are often very expensive to own and operate. More problematical is that this architecture is simply not good at analyzing big data. Transaction processing systems don’t need to move big data sets. They rely on an index to quickly find one or two records on disk and move them into memory for updates or deletes by the database management system. Data warehouse workloads are very different, typically reading very large data sets and then analyzing them to find threats and opportunities; to find the “needle in the haystack”Vendors who ignore fundamental differences between transaction processing and analytical workloads damn their customers to constant tuning. Transaction processing databases are simply not designed for analytic processing; their query performance is never fast enough. Poorly performing queries frustrate users attempting to solve challenging problems. Not rectifying this situation risks the business losing confidence in the warehouse.
  • The high capacity appliance provides even greater scalability for applications such as backup for multiple TwinFins or history for regulatory reporting.At the other end of the scale the Skimmer is deployed as a development or testing appliance.And now, with IDAA, we can apply Netezza Simplicity and Speed to transparently accelerate mainframe analytics performance.
  • That’s exactly what the Netezza TwinFin is in the data warehousing and analytics world – a true appliance, which sets it apart from the competition It is engineered from the ground up for data warehousing and analytics And offers a complete solution that integrates database, server and storage together It supports standard interfaces such as ODBC, JDBC and ANSI SQL, making it very easy to deploy. Takes 2 days to get up and running versus weeks for other solutions The appliance characteristics translate into real value for customers (what we refer to as the 4 S’s)TwinFin is 10-100X faster than competitors like Oracle, Teradata and others. When analytic queries take seconds instead of hours to perform, customers get the opportunity to completely rethink their business processes and in some cases, even launch entirely new businesses The appliance is unlike anything that DBAs and IT teams have experienced in the past. Whereas Oracle and Teradata data warehouses require armies of specialists to manage, Netezza offers performance out-of-the-box, without requiring any tuning, indexing, aggregations, etc. A single appliance scales to more than a petabyte of user data capacity, not just acting as a repository for information, but allowing complex analytics to be conducted at-scale, on all the enterprise data By embedding analytics deep into the data warehouse, TwinFin powers high performance advanced analytics 100’s or even 1000’s of times faster than possible before Let’s look at some examples of Netezza (true) appliances in real customer environments
  • The IBM Netezza Data Warehouse Appliance

    1. 1. The IBM Netezza Data Warehouse Appliancefaster, simpler, more accessible analytics Jacques Milman Frederik Oebius Big Data & Datawarehouse Solutions Leader Konsultchef IBM Europe +46 708 66 00 52 jacques.milman@fr.ibm.com frederik.oebius@lincube.se © 2012 IBM Corporation
    2. 2. Information ManagementThe IBM Netezza Appliance: Revolutionizing Analytics  Purpose-built analytics engine  Integrated database, server & storage  Standard interfaces  Low total cost of ownership  Speed: 10-100x faster than traditional systems  Simplicity: Minimal administration  Scalability: Peta-scale user data capacity  Smart: High-performance advanced analytics © 2012 IBM Corporation
    3. 3. IBM Netezza in the Data warehouse Space © 2012 IBM Corporation
    4. 4. Information ManagementThe Enterprise Analytics Architecture Integrate, Govern Manage Analyze Front Line / BI Applications / Predictive AnalyticsSource Systems,Data Marts,Silos Information Integration: • ETL, Replication Instrumentation Data, • Data Quality Unstructured Data • Data Profiling & Discovery © 2012 IBM Corporation
    5. 5. Information ManagementBig data overwhelms traditional data warehouses © 2012 IBM Corporation
    6. 6. Information ManagementLet’s simplify this mess … © 2012 IBM Corporation
    7. 7. Information Management… and bring analytics into the warehouse © 2012 IBM Corporation
    8. 8. Information ManagementThe Netezza appliance int the eco-system Data Integration Data Access OLE-DB OLE-DB •Actuate •Ab Initio •Business Objects/SAP •Business Objects/SAP •Cognos (IBM) •Composite Software •Information Builders •Expressor Software •Kalido •GoldenGate Software •KXEN JDBC •MicroStrategy JDBC (Oracle) •Informatica •Oracle OBIEE •IBM Information Server •QlikTech •Oracle ODI (Sunopsis) •Quest Software •WisdomForce •SAS ODBC •Fichiers plats •SPSS (IBM) ODBC •… •Unica (IBM) •Fichiers plats •… SQL SQL nzload © 2012 IBM Corporation 8
    9. 9. Information ManagementA look inside Optimized Hardware + Software Purpose-built for high performance analytics; requires no tuning True MPP All processors fully utilized for maximum speed and efficiency Streaming Data Hardware-based query acceleration for blistering-fast results Deep Analytics Complex analytics executed in-database for deeper insights © 2012 IBM Corporation9
    10. 10. Information ManagementThe Netezza AMPP™ Architecture FPGA CPU Advanced Analytics Memory BI FPGA CPU Host Memory Hosts ETL FPGA CPU Loader Memory Disk Network Applications Enclosures S-Blades™ Fabric Netezza Appliance © 2012 IBM Corporation Page 10
    11. 11. How the customers see IBM Netezza © 2012 IBM Corporation
    12. 12. Information ManagementNetezza Customers Digital Media Financial Services Government Health & Life Sciences Retail / Consumer Products Telecom Other © 2012 IBM Corporation
    13. 13. Information ManagementNetezza delivers speed • 15,000 users • Running 800,000+ queries per day “…when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process …” -- SVP Application Development, Nielsen http://www.youtube.com/watch?v=yOwnX14nLrE&feature=player_embedded © 2012 IBM Corporation
    14. 14. Information ManagementNetezza delivers scalability • 1 PB on IBM Netezza • 7 years of historical data • 100-200% annual data growth “NYSE … has replaced an Oracle 10 relational database with a data warehousing appliance from Netezza, allowing it to conduct rapid searches of 650 terabytes of data”. - ComputerWeekly.com ROASHOW 2011 © 2012 IBM Corporation
    15. 15. Information ManagementSome customers with (relatively) small warehouses benefit fromNetezza French retailer, 2000 stores WW –600GB of data –Planning to scale to 2 TB in the next 3 years Using Oracle, performance not meeting expectations @600GB Choose Netezza after a one week Proof of Concept - Better performance - Simplier solution - Best TCO © 2012 IBM Corporation
    16. 16. Implementation project © 2012 IBM Corporation
    17. 17. Information ManagementDesign & Performance Tuning Dramatically Simplified • NO database space / tablespace sizing and configuration • NO redo/physical log sizing and configuration • NO journaling/logical log sizing and configuration Oracle Netezza FACT Table (6 billion rows) Object Count Object Count • NO page/block sizing and configuration for tables Tables 1 1 • NO extent sizing and configuration for tables Indexes 12 • NO temp space allocation and monitoring Table Partitions 47 • NO RAID level decisions for dbspaces Index Partitions 564 • NO logical volume creations of files Table Partitions tablespaces 47 Index Partitions tablespaces 47 • NO integration of OS kernel recommendations Table Data Files 170 • NO maintenance of OS recommended patch levels Index Data Files 122 • NO JAD sessions to configure host/network/storage TOTAL 1,010 1 • NO software to install • ONE simple partitioning strategy: HASH 17 © 2012 IBM Corporation
    18. 18. Information ManagementTo start a project: Load n’Go approach  Physical Data model migration – Export existing DDLs – No change of the logical model – Same table and columns definition – All other attributes are removed (extents, blocks, pages, locks, indexes) – Views are kept  Data migration – Export tables to CSV files (or equivalent) – Reload into Netezza using nzload utility  Configure applications to access Netezza – Installation of the Netezza ODBC/JDBC drivers on users workstations  Application migrations – Almost no changes to SQL in BI applications – Some adjustements to loading processes to leverage MPP horse power © 2012 IBM Corporation1
    19. 19. Case study with a Swedish customer © 2012 IBM Corporation
    20. 20. Information Management Customer Case – Modebolaget  Starkt ökande volymer beroende på hög tillväxt (20%) och kontinuerliga krav på ökad funktionalitet till verksamheten.  Svårigheter att hålla SLA beroende på dålig prestanda.  Svårigheter att leverera ny funktionalitet beroende på ökad komplexitet och dålig prestanda.  Höga och ökande kostnader med låg förutsägbarhet. © 2012 IBM Corporation
    21. 21. Information Management Customer Case - målbild En plattform med mer än 5 gånger bättre prestanda än nuvarande plattform Ökad kontroll på kostnader och betydligt lägre TCO Plattformen ska skala i takt med verksamhetens tillväxt och konstanta behov av ny funktionalitet © 2012 IBM Corporation2
    22. 22. Information Management Prestanda – Customer Benchmark Fujitsu Current HP Integrity HP BL68x/RAC HP BL680 RX600 IBM HS22 Netezza 1000-6 Test 1 01:05:20 00:18:00 00:09:40 00:14:15 00:03:51 00:25:54 00:02:05 Test 2 00:03:15 00:02:44 00:02:02 00:01:27 00:00:29 00:01:59 00:00:12 Test 3 00:20:15 00:05:42 00:03:47 00:05:29 00:02:13 00:18:32 00:01:12 Test 4a 00:01:38 00:00:11 00:00:05 00:00:05 00:00:02 00:00:31 00:00:01 Test 4b 00:01:19 00:00:36 00:00:10 00:00:12 00:00:05 00:01:19 00:00:01 Test 4c 00:11:20 00:01:34 00:00:40 00:00:55 00:00:27 00:03:30 00:00:06 Test 6 00:10:00 00:00:22 00:00:17 00:01:19 00:00:48 00:09:12 00:00:13 Totalt 1:53:07 0:29:09 0:16:41 0:23:42 0:07:55 1:00:57 0:03:50 Faktor 1,00 3,88 6,78 4,77 14,29 1,63 29,51 © 2012 IBM Corporation
    23. 23. Information Management Migrering - Förutsättning  Oracle 11 EE på HP-UX  ETL: Informatica PowerCenter blandat med PL/SQL  Cirka 4000 ETL objekt i 500 workflows  Cirka 400 tabeller, 200 PL/SQL objekt och totalt +10 000 Oracle objekt  Totalt cirka 10 TB nyttjat SAN storage med Oracle compression  Ad-hoc queries och Cognos cubes  Mindre användning av Qlikview, SAS, Coremetrics, MS SSIS/SSRS/SSAS © 2012 IBM Corporation
    24. 24. Information Management Customer Case – Migrering Scope  Oracle till Netezza  PL/SQL till NZ/SQL  Informatica 7 på HP-UX till Informatica 9 på Windows  Unix Shellscript till VB-script Utmaningar  Datatyper  Tre unika Oracle PL/SQL funktioner fick skrivas om som NZ UDF  Lägg tid på att nyttja Netezzas fördelar © 2012 IBM Corporation
    25. 25. Information Management Customer Case – Tidplan migrering  2010/11 – 2011/05 –3 månaders förberedelse • 4 + 1 personer • Infrastruktur • Riktlinjer för migrering • Datamigrering • Automatrutiner för uppgradering av Informatica (+1 pers) –4 månaders sprint för migrering av ETL och Cognos • Hela utvecklingsgruppen • Sprinten slutade med ny funktionalitet levererad på ny plattform. © 2012 IBM Corporation
    26. 26. Information Management Customer Case - Resultat  Oracle lösningen stängdes ner 2011-06  Upp till x20 bättre prestanda utan nämnvärda förändringar i design eller arkitektur.  Databasresurser kan fokusera på ny funktionalitet och leverans istället för administration och brandsläckning.  Inga specialistkunskaper krävs i utvecklingsteamen.  Ökad leveransförmåga mot verksamheten.  Skalbar lösning utan märkbar förändring i prestanda, när man når begränsningar i volym uppgraderar man till ett fastställt pris.  50% lägre kostnader jämfört med närmaste konkurrenten © 2012 IBM Corporation
    27. 27. Information ManagementNästa steg: • Fortsätta lära sig om det “nya” sättet att bygga datalager • Våga tänka i nya tankebanor • Driva förändringar i arkitekturen mot ett mer informationsinriktat tankesätt • Orka driva dessa förändringar/förbättringar trots fortsatt hög leveransnivå mot beställarna © 2012 IBM Corporation
    28. 28. Conclusion © 2012 IBM Corporation
    29. 29. Information ManagementIBM Netezza data warehouse appliance familyfor data life-cycle management From a few terabytes to 10s of petabytes © 2012 IBM Corporation
    30. 30. Information ManagementThe IBM Netezza Appliance: Revolutionizing Analytics Digital Media Financial Services Government Health & Life Sciences Retail / Consumer Products  Speed: 10-100x faster than traditional systems Telecom  Simplicity: Minimal administration Other Page 30  Scalability: Peta-scale user data capacity  Smart: High-performance advanced analytics © 2012 IBM Corporation

    ×