• Like
Backup Options for IBM PureData for Analytics powered by Netezza
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Backup Options for IBM PureData for Analytics powered by Netezza

  • 2,235 views
Published

Confused about what options there are to backup your Netezza or IBM PureData for Analytics solution? This presentation provides alternatives related to file system and external backup software …

Confused about what options there are to backup your Netezza or IBM PureData for Analytics solution? This presentation provides alternatives related to file system and external backup software approaches using IBM Storwize V7000 Unified and IBM Tivoli Storage Manager

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,235
On SlideShare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
99
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © 2014 IBM Corporation Backup Options IBM PureData™ System for Analytics, powered by Netezza Tony Pearson – IBM Master Inventor and Senior IT Specialist March 2014
  • 2. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 2 Part of the IBM Big Data Platform Workload Optimized Solutions for All Your Analytic Needs Analytics & Decision Management Solutions Big Data Infrastructure IBM Big Data Platform Accelerators Information Integration & Governance Visualization & Discovery Application Development Systems Management Stream Computing Hadoop System Data Warehouse PureData System for Analytics
  • 3. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 3 3 Spend Less Time Managing and More Time Innovating No dbspace/tablespace sizing and configuration No redo/physical/Logical log sizing and configuration No page/block sizing and configuration for tables No extent sizing and configuration for tables No Temp space allocation and monitoring No RAID level decisions for dbspaces No logical volume creations of files No integration of OS kernel recommendations No maintenance of OS recommended patch levels No JAD sessions to configure host/network/storage No dbspace/tablespace sizing and configuration No redo/physical/Logical log sizing and configuration No page/block sizing and configuration for tables No extent sizing and configuration for tables No Temp space allocation and monitoring No RAID level decisions for dbspaces No logical volume creations of files No integration of OS kernel recommendations No maintenance of OS recommended patch levels No JAD sessions to configure host/network/storage Data Experts, not Database Experts Easy Administration Portal No software installation No indexes and tuning No storage administration IBM’s Advantage--FPGA A Real-Time silicon SQL accelerator Dynamically reprogrammed for each individual query. Eradicates ~95% of system I/O before the CPU ever sees it. Completely unique to PDA. Simplicity and Ease of Administration Simplicity and Ease of Administration
  • 4. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 4 PureData System for Analytics Hardware Overview: Model N200x User Data Capacity: 192 TB* Data Scan Speed: 478 TB/hr* Load Speed (per system): 5+ TB/hr Active Data Slices: 96 Power Requirements: 7.5 kW Cooling Requirements: 27,000 BTU/hr * Assuming 4X compression Scales from 1/4 Rack to 4 Racks 2 Hosts (Active-Passive) 2 Intel 2.7 GHz Sandy Bridge CPUs 7x300 GB SAS Drives Red Hat Linux 6 64-bit 7 PureData for Analytics S-Blades™ 2 Intel 8 Core 2+ GHz CPUs 2 8-Engine Xilinx Virtex-6 FPGAs 128 GB RAM + 8 GB slice buffer Linux 64-bit Kernel 12 Disk Enclosures 288 600 GB SAS2 Drives • 240 for User Data • 14 for S-Blades • 34 Spare RAID 1 Mirroring
  • 5. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 5 IBM PureData for Analytics – Reasons for Backup IBM will take care of Red Hat Enterprise Linux, Web Admin and other code as needed –No need for you to back it up yourself Firmware • Linux • Code Metadata • Host Catalog • Global users, groups, permissions User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z Backup this to protect host configuration from data corruption (rare) Various reasons to backup database schema and contents –As part of firmware upgrade/downgrade –To transfer data to another system –Protect against hardware failure / disaster –Protect against data corruption
  • 6. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 6 Compressed versus Text-format Firmware + 1 User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z Firmware Firmware -1 Firmware Compressed database backup Compressed external tables Text-format external table Other Database systems Upgrade Downgrade Restore to same or higher firmware Restore to any, but slower, takes up more space
  • 7. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 7 Two Primary Approaches 1. Filesystem Approach Backup metadata and databases to external NAS storage devices Built-in CLI commands included Scripts for large databases available 2. External Backup Software Backup metadata and databases to external backup server/media User-initiated and Automatic scheduled backups Supports disk, tape and virtual tape storage devices Metadata • Host Catalog • Global users, groups, permissions User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z
  • 8. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 8 Network Configuration using SAN or LAN as Backup Network Metadata • Host Catalog • Global users, groups, permissions User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z User Network • nzhostbackup • nzbackup -users • nzbackup –db • (up to 16 multiple streams) • CREATE EXTERNAL TABLE • nz_backup script for larger databases External storage device Backup Network
  • 9. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 9 Proof-of-Concept (PoC) Configuration Storwize V7000 Unified comprising – Two file modules (2073-700) – One V7000 control enclosure (2076-324) – Code level 1.4.0.1 File modules connected via 4 x 10 Gbit interfaces 24 x 600 GB 10K SAS drives installed in V7000 control enclosure Test database:
  • 10. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 10 Test Conclusion / Best Practices 4 NSD 8 NSD 20 NSD 2 NSD 10 NSD 3 NSD 6 NSD 8 NSD 4 x RAID-5 4+P 2 x RAID-5 8+P 3 x RAID-10 4+4 0 50 100 150 200 250 300 350 400 450 500 MB/sec * ~1.7 TB/h compressed data Matching the GPFS block size to RAID full stripe width is beneficial Matching the number of NSDs to number of RAIDs is beneficial When matching number of NSDs to number of RAIDs, usage of sequential NSDs is beneficial Small RAID-5 arrays (4+P) with the matching number of NSDs and mdisks (RAIDs) and 2 mount points shows best performance (multiple streams) Supports both nzbackup/nzrestore CLI and nz_backup/nz_restore scripts 6+ TB/h uncompressed data * Focusing on backup performance – Run multiple backup streams Focusing on restore performance – Run single backup stream
  • 11. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 11 Two Primary Approaches 1. Filesystem Approach Backup metadata and databases to external NAS storage devices Built-in CLI commands included Scripts for large databases available 2. External Backup Software Backup metadata and databases to external backup server/media User-initiated and Automatic scheduled backups Supports disk, tape and virtual tape storage devices Metadata • Host Catalog • Global users, groups, permissions User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z
  • 12. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 12 Network Configuration Metadata • Host Catalog • Global users, groups, permissions User Data Database 1 • Table A • Table B Database 2 • Table X • Table Y • Table Z User Network Backup Network • nzhostbackup to local file • transfer to backup server • Nzbackup –users • nzbackup –db • (up to 1000 multiple streams) • Specify –connector –connectorArgs • Create scripts for automatic schedule External Backup server
  • 13. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 13 External Backup Architecture Client code Backup Server Master Catalog Media Management SAN Storage Hierarchy •Disk •Physical Tape •Virtual Tape IBM Tivoli Storage Manager (TSM) server LAN
  • 14. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 14 External Backup Architecture – TSM Proxy Node Proxy node Backup Server Master Catalog Media Management SAN Storage Hierarchy •Physical Tape •Virtual Tape LAN Proxy node • Sends data directly to physical or virtual tape over SAN fabric • Registers copies with Master Catalog • Can support multiple PureData for Analytics systems TSM client code sends backup to Proxy node TSM server manages media, tape reclamation, backup copy pools, etc. XBSA code LAN Free Storage agent
  • 15. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 15 External Backup Architecture – TSM LAN Free XBSA code Backup Server Master Catalog Media Management SAN Storage Hierarchy •Physical Tape •Virtual Tape LAN TSM client code sends backups directly to physical or virtual tape over SAN fabric TSM client code registers backup copies with Master Catalog TSM server manages media, tape reclamation, backup copy pools, etc. LAN Free • Avoids congestion traffic on LAN by using SAN directly • Will consume more CPU resources on PureData for Analytics system LAN Free Storage agent
  • 16. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 16 Summary 1. Use Filesystem Method with SAN or NAS storage device such as Storwize V7000 Unified 2. Use IBM Tivoli Storage Manager server infrastructure to backup PureData for Analytics systems
  • 17. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 17
  • 18. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 18 About the Speaker Mr. Tony Pearson Master Inventor, Senior Managing Consultant IBM System Storage Tony Pearson is a Master Inventor and Senior IT storage consultant for the IBM System Storage™ product line. Tony Pearson joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, as well as various storage software products. He interacts with clients, speaks at conferences and events, and leads workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products. Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs of 2006 for the IT storage industry by “Networking World” magazine. The blog was published in book form as Inside System Storage: Volume I and Volume II , both available from Lulu publishing. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products. 9000 S. Rita Road Bldg 9070 Mail 9070 Tucson, AZ 85744 +1 520-799-4309 (Office) tpearson@us.ibm.com Tony Pearson Master Inventor, Senior Managing Consultant IBM System Storage™
  • 19. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 19 Additional Resources 19 Email: tpearson@us.ibm.com Twitter: http://twitter.com/az99Øtony Blog: http://ibm.co/brAeZØ Books: http://www.lulu.com/spotlight/99Ø_tony IBM Expert Network: http://www.slideshare.net/az99Øtony 19
  • 20. © 2014 IBM Corporation IBM PureData for Analytics powered by Netezza – Backup Options 20 Trademarks and disclaimers © IBM Corporation 2011. All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00