WILLIAM                        STARTING SMALL, BUT THINKINGMCKNIGHT                          LARGE AND SCALING FAST       ...
Starting Small, but Thinking Large and ScalingFast                                                                        ...
Complicating matters, in selecting the platform to support their data, companies are now faced with an exponentially highe...
MID-MARKET DATA WAREHOUSING AND BUSINESS INTELLIGENCEBusiness intelligence vendors have been slow to respond to the needs ...
Robust in Database Management Systems Features and Functions – Make sure there are DBA productivity tools,   monitoring fe...
nodes and grow, node-by-node if necessary, up to 46 nodes. The nodes can be provided with Capacity on Demand as well,which...
About the AuthorWilliam functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, h...
Upcoming SlideShare
Loading in …5

Starting Small (Teradata Appliance Family)


Published on

As companies take steps to manage their information asset, choosing a platform and database management system (DBMS) is absolutely fundamental. In fact, the platform is the foundation of architecture and business intelligence and the starting point for tool selection, consultancy hires, and more. In short, a company’s platform is key in defining its information culture.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Starting Small (Teradata Appliance Family)

  2. 2. Starting Small, but Thinking Large and ScalingFast provided by: William McKnightINTRODUCTION www.mcknight cg.comAs companies take steps to manage their information asset, choosing a platformand database management system (DBMS) is absolutely fundamental. In fact, theplatform is the foundation of architecture and business intelligence and the startingpoint for tool selection, consultancy hires, and more. In short, a company’splatform is key in defining its information culture.These platform decisions are taking place in a challenging context. Over time, datavolumes are continuing to soar as history accumulates, syndicated data is collectedand new sources with more detailed data are added. Furthermore, communitiesconsuming the data continue to grow, expanding well beyond usual companyboundaries to customers, supply-chain partners, and even the internet. Companiesneed to make sure they choose a proven platform not just for initial, knownrequirements but also with scalability to future, to-be-determined requirements asdata, users, and applications grow.These challenges are no longer only affecting the big players. Mid-size companies1have similar data management needs to Fortune companies, albeit with reduceddata volume and, sometimes, fewer users. They, too, need: Rapid development that can be built upon over time. Quality data that is available. Architectures that provide low, long-term total cost of ownership (TCO). Good query performance that results in increased interactive usage. Ability to get to real-time feeds. A platform to support advanced workload management. A scalable path forward as data, users, and application needs grow. Table of Contents Introduction ........................................................................ 2 Information is of Major Importance ................................... 3 The Enterprise Data Warehouse Approach ........................ 3 Mid-market Data Warehousing and BI ............................... 4 Criteria for an EDW Platform Selection .............................. 4 Teradata Innovations for Performance and Availability .... 5 The Teradata Data Warehouse Appliance .......................... 5 The Teradata Data Mart Appliance ..................................... 6 The Teradata Extreme Data Appliance ............................... 6 Scaling to the Teradata Active EDW ................................... 6 Conclusion ........................................................................... 6 About the Author ................................................................ 7 1 For purposes of this paper, mid-size companies will be defined as companies with $1B to $50B in annual revenue.WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 2
  3. 3. Complicating matters, in selecting the platform to support their data, companies are now faced with an exponentially highernumber of variations and distinct departures from the traditional online transactional processing (OLTP) DBMS than everbefore.In 2008, in concert with this increase in information management needs, Teradata Corporation – a successful data warehouseprovider for the one-terabyte+ market for nearly 30 years – began making its technology affordable to the mid-marketcustomer. This move is ushering in a new era of scalability and performance in that segment, as the #1 platform provider ispoised to provide its leadership and influence for companies off, as well as on, the Fortune charts.INFORMATION IS OF MAJOR IMPORTANCEThe battleground on which many industries engage today extends well beyond customary core competencies to the collection,management, and use of data. As proof even in a subdued economy, business intelligence remains at the forefront of IT-relatedspending. This is in large part due to the applicability of information directly and indirectly to the organization’s bottom line.Information must be flexible, manageable, and actionable. And it must be all these things within the framework of a multitudeof IT-related realities, such as: Multiple, complex applications serving a variety of users Exploding data size Data latency becoming intolerable as real-time information becomes necessary to competeAs data begins to accede to its profitable use and platforms evolve to handle the workload, it’s always only a matter of timeuntil new demands to leverage data arise, adding requirements on a seemingly ongoing basis. But there is a natural flow toinformation management maturity that Teradata is not only well aware of, but has helped define over the years. Today, thismaturity includes using data to take advantage of relationships that extend beyond the company walls.But acknowledging these requirements and realities, and being able to support them are two different things.THE ENTERPRISE DATA WAREHOUSE APPROACHThe efficacy of having a centralized data store with quality, integrated, accessible, high-performance, and scalable data cannotbe denied, regardless of company size. Yet some organizations with a decentralized orientation believe that initiating anenterprise data warehouse (EDW) is too difficult an endeavor without a quick and clear ROI. The assumption here is thatEDW architecture implementation has an unbearable, year-plus timeline when it comes to delivering business value.Fortunately, this is no longer the reality. Today, EDW represents commitment to organize the information of the corporation,regardless of its size, in the most efficient manner possible. It’s not put in place using a big bang approach, but is instead,primarily accomplished by meeting the objectives of a key subject area, data source, business objective, or user department,and then progressively building the environment with scalability from there. Another manageable aspect of EDWimplementation is through the consolidation of smaller, independent data marts into a centralized, money-saving architecture.The most efficient way to accomplish EDW objectives is the way that builds a data warehouse to solve specific needs, but doesso in a manner that leverages previous investment in the architecture, tools, processes, and people, and does not prohibit futuregrowth. This enables an efficient, programmatic approach to data warehousing created to serve information to the enterprise.Setting aside EDW implementation is also particularly important for mid-market organizations that are getting starteddeveloping their architectural foundations. Too often these decisions are made within departmental boundaries withoutconsideration of an overarching data warehousing strategy. This has led many organizations down the path of data martproliferation – the creation of non-integrated data sets developed to address specific application needs, usually with aninflexible design. In the vast majority of cases, data mart proliferation is not the result of a chosen architectural strategy, but aconsequence due to lack of an architectural strategy. In either case, bringing the EDW approach to bear economically at theoutset of such development is critical to economically taking advantage of its vast promise down the road. WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 3
  4. 4. MID-MARKET DATA WAREHOUSING AND BUSINESS INTELLIGENCEBusiness intelligence vendors have been slow to respond to the needs of the midmarket. This factor, combined with their ownmore limited budgets, has meant that many in the midmarket have had to take alternative paths to business intelligence than theFortune 50. In fact, the multi-layered architectures and multi-quarter “timeframes-to-value” were barriers to businessintelligence in the midmarket long before the current recession began.Teradata is among the vendors that has mobilized solutions with the realities of the mid-market in mind. Enterprise-classbusiness intelligence with simplicity and scalability is available now in a midmarket-oriented suite of affordable platformsdelivered in the increasingly popular preconfigured “data warehouse appliance” model. The data warehouse appliance is ahardware/software/OS/DBMS/storage preconfiguration for data management requirements. Low TCO for a mixed workloaddata warehouse environment is consequential with appliances.Naturally, vendors can mix and match their components to best suit certain workloads. Without compromising on the criteriathat experienced practitioners know to be required for success at any level, Teradata has done this with the Teradata® DataWarehouse Appliance, Teradata Data Mart Appliance, and the Teradata Extreme Data Appliance. All are designed and pricedto meet midmarket needs, or the departmental needs of the larger enterprise.Teradata appliances use the proven and powerful Teradata DBMS. They also benefit from Teradata’s industry-leadingintegration with multiple data integration and BI tools and vendors.CRITERIA FOR AN ENTERPRISE DATA WAREHOUSE PLATFORM SELECTIONThe decision process for choosing a data warehouse platform should go well beyond the usual consideration of the operationalDBMS vendor. Nuances about several potential requirements include: The immediate availability of information Cross-functional complexity The level of query concurrency The scalability needs of the platform The functionality of the DBMSGiven the state of the marketplace, the technical architecture for a data platform in a mid-size-or-larger company should be:Scalable – The solution should be scalable in both performance capacity and incremental data volume growth. The solution should scale in a near-linear fashion and allow for growth in database size, the number of concurrent users, and the complexity of queries. Understanding hardware and software requirements for such growth is paramount.Powerful – The platform should be designed for complex decision support in an advanced workload management environment. The optimizer should be mature enough to support every type of query with good performance. Determine the best execution plan based on changing data demographics. Check on conditional parallelism and the causes of variations in the parallelism deployed, and on dynamic and controllable prioritization of resources for queries.Manageable – The solution should be manageable with minimal support tasks requiring DBA/System Administrator intervention. There should be no need for the proverbial army of DBAs to support an environment, and the system should provide a single point of control to simplify administration. You should be able to create and implement new tables and indexes at will.Extensible – Look for flexible database design and system architecture that keeps pace with evolving business requirements and leverages existing investment in hardware and software applications. Know the answers to questions such as: What is required to add and delete columns? What is the impact of repartitioning tables?Interoperable – The system should have integrated access to the web, internal networks, and corporate mainframes.Recoverable – In the event of component failure, the system must continue providing value to the business. It also should allow the business to selectively recover the data to points in time – and provide an easy-to-use mechanism for doing this quickly.Affordable –The proposed solution (hardware, software, services) should provide a relatively low TCO over a multi-year period.Flexible – The system should provide optimal performance across the full range of normalized, star, and hybrid data schemas with large numbers of tables. Look for proven ability to support multiple applications from different business units, lever- aging data that is integrated across business functions and subject areas.WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 4
  5. 5. Robust in Database Management Systems Features and Functions – Make sure there are DBA productivity tools, monitoring features, parallel utilities, robust query optimizer, locking schemes, security methodology, intra-query parallel implementation for all possible access paths, chargeback and accounting features, and remote maintenance capabilities.There are few vendors who understand what it means to build mission-critical, well-performing data platforms that meet all ofthe above criteria. Of course, the vendor itself should be a major consideration, especially in these days of consolidation.When making this all-important decision, consider a vendor’s financial stability, the importance of data management to theiroverall business strategy, and their continued research and development in these areas towards a well-developed and relevantvision.TERADATA INNOVATIONS FOR MAXIMUM PERFORMANCE AND AVAILABILITYOne of the hallmarks of Teradata’s unique approach is that all database functions (table scan, index scan, joins, sorts, insert,delete, update, load and all utilities) are done in parallel all of the time. There is no conditional parallelism. All units ofparallelism participate in each database action.Also of special note is the table scan. One of Teradata Database’s main features is a technique called synchronous scan, whichallows scan requests to “piggy back” onto scans already in process. So maximum concurrency is achieved through maximumleverage of every scan. Teradata Database keeps a detailed profile of the data under management to efficiently scan only thelimited storage where query results might be found.2The Teradata optimizer intelligently runs steps in a query in parallel wherever possible. For example, for a three-table joinrequiring three-table scans, Teradata Database would start all three scans in parallel. When scans of tables B and C finished, itwould begin the join step as the scan for table A finished.Teradata’s optimizer is grounded in the knowledge that every query will be executing on a massively parallel processingsystem (MPP). Such systems are generally acknowledged as the preferred architecture for analytic query, business intelligence,and data warehousing. Teradata systems do not share memory or disk across the nodes, the collections of CPU, memory andbus connected in an MPP environment. Sharing disk and/or memory creates overhead. Sharing nothing minimizes disk accessbottlenecks.The Teradata BYNET®, the node-to-node interconnect, which scales linearly to more than a thousand nodes, has faulttolerant characteristics designed specifically for a parallel processing environment.Hot-pluggable components allow you to replace components without affecting your applications. If a component fails, built-in redundancy allows the application to continue running in Teradata systems. Furthermore, the growth path in the Teradataenvironment is a function of easily adding nodes and disk storage.Continual feeding without table-level locks with Teradata utilities can be done with multiple feeders at any point in time. Andagain, the impact of the data load on the resources is customizable. The process ensures no input data is missed regardless ofthe allocation.Teradata has extended the concepts that are interesting to the midmarket and to a single-application focus from their ActiveEnterprise Data Warehouse into their new appliance family. In so doing, Teradata has ushered in true business intelligenceaffordability for the midmarket.THE TERADATA DATA WAREHOUSE APPLIANCEThe Teradata Data Warehouse Appliance supports the EDW approach to building the data warehouse and is the Teradataappliance family flagship product. It is suitable for an upper midmarket true EDW or as the platform for a focused application.With four MPP nodes per cabinet and scaling up to 11 cabinets with 12.6 terabytes each, the Teradata Data WarehouseAppliance can manage up to 140 terabytes3, with the workload characteristics of a typical data warehouse – multiple, complexapplications serving a wide variety of users. The experience can begin at two terabytes of fully redundant user data on two 2 Teradata Intelligent Scanning WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 5
  6. 6. nodes and grow, node-by-node if necessary, up to 46 nodes. The nodes can be provided with Capacity on Demand as well,which means the capacity can be configured into the system unlicensed until it is needed. This makes adding the capacitysimple.THE TERADATA DATA MART4 APPLIANCEThe Teradata Data Mart Appliance is a more limited capacity equivalent of the Teradata Data Warehouse Appliance and isideal for the data warehouse or another of the larger data stores in the midmarket. It’s a single node, single cabinet design witha total user data capacity of six terabytes5. It should be noted, though, that a single node environment comes with the potentialfor downtime in the unlikely event that the node fails – there is no other node to cover for the failure.THE TERADATA EXTREME DATA APPLIANCEThough not nearly strictly a mid-market need, the Teradata Extreme Data Appliance is also part of the Teradata appliancefamily, and represents affordability for the management of large data. It out-scales even the Teradata Active EDW platform.While the Active EDW tops out at 10 petabytes, the Extreme Data Appliance will scale to 50 petabytes. A system of this sizewould have less concurrent access requirements due to access being spread out across the large data set.The Teradata Extreme Data Appliance is designed for high-volume data capture such as that found in click stream capture, calldetail records, high-end POS, scientific analysis, sensor data, and any other specialist system useful when the performance ofstraightforward, non-concurrent analytical queries is the overriding selection factor. It also can serve as a surrogate for near-line archival strategies that move interesting data to slow retrieval systems, and it will keep this data online.SCALING TO THE TERADATA ACTIVE ENTERPRISE DATA WAREHOUSEAny code built for a Teradata appliance is completely portable to the Teradata Active Enterprise Data Warehouse, in case youneed to go beyond the chosen Teradata appliance. This platform for data warehousing with nine nodes per cabinet scaling upto 1,024 nodes, has a total disk capacity of 10 petabytes. A superset of features is part of the Teradata Active EDW, includingautomatic node failover and recovery, active system management with full performance continuity with hot standby nodes, fall-back, backup and recovery, and dual active systems. The system is designed to manage the most mission-critical systems. Theneed for such management could be one reason to upsize to this platform. Another reason, except for those using the ExtremeData Appliance, might be data sizing.CONCLUSIONFrom straightforward mid-market data warehouse requirements to the global enterprise and beyond, Teradata’s platforms arebuilt on a foundation that has served the largest and most complex environments in the world for nearly 30 years. By meetingthe needs of the midmarket with the proven appliance model, as well as with a flexible combination in nodes, maximum datasize, storage and cabinet configurations, and high availability features, Teradata is showing its leadership in the midmarket, aswell as in the larger-company arena.Teradata solutions allow you to start small, think big, and scale fast in terms of an EDW approach to data management and, ifrequired, migrate to an Active EDW platform. The Teradata Data Mart Appliance is the robust selection for the mid-marketdata warehouse or data store. The Teradata Data Warehouse Appliance takes the data mart appliance benefits to another level,and the Teradata Extreme Data Appliance has the upper end of data size covered for any enterprise.Whatever your information needs, Teradata’s principles of scalability, power, manageability, extensibility, interoperability,manageable long-term TCO, flexibility, and robust features and functions support the possibilities. 3 Numbers do not assume compression, which should allow for 30% more user storage on average. 4 Data “Mart” (vs. Warehouse) is a product label only and is meant to address scale of the project and not the polar opposite of a Data Warehouse 5 However, as noted, once the limits are approached, porting to the Teradata Active Enterprise Data Warehouse is an attractive option. WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 6
  7. 7. About the AuthorWilliam functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data man-agement, business intelligence, data quality and operational business intelligence. Many of his clients have gonepublic with their success stories. William is a Southwest Entrepreneur of the Year Finalist, a frequent best prac-tices judge, has authored more than 150 articles and white papers and given over 150 international keynotes andpublic seminars. His team’s implementations from both IT and consultant positions have won Best Practicesawards. William is a former IT VP of a Fortune company, a former engineer of DB2 at IBM and holds an MBA.William can be reached at 214-514-1444 or william@mcknightcg.com. 5960 W. Parker Rd., Suite 278-133 Plano, TX 75093 Tel (214) 514-1444Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S. and worldwide.EB-5933 > 0609 WILLIAM MCKNIGHT STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 7