Building the Architecture for Analytic Competition


Published on

Lost amid the conversation on big data and the accelerating advancement of just about every aspect of enterprise software that manages information are the things that hold it all together. Yet this is critical: information-management components must come together in a meaningful fashion or there will be unneeded redundancy and waste and opportunities missed. Considering that optimizing the information asset goes directly to the organization’s bottom line, it behooves us to play an exceptional game— not a haphazard one—with our technology building blocks.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Building the Architecture for Analytic Competition

  1. 1. Building the Architecture for Analytic Competition: Why the Architecture Foundation is so Critical to Success Prepared by William McKnight Sponsored by
  2. 2. EB-7592 > 0513 > PAGE 2 OF 11 Contents Information Architecture Defined Information Architecture Defined. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Definition of Information Architecture. . . . . . . . . . . . . . . . . . . . 2 An Architecture Framework: Teradata’s Approach. . . . . . . . . 3 Design Patterns and Implementation Alternatives. . . . . . . . 4 Architecture Principles and Advocated Positions. . . . . . . . . . 5 Balancing Acts: Delivery Versus Architecture. . . . . . . . . . . . . . 6 Architecture Development and Information Management Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Harnessing Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 What Determines the Success of a Workload?. . . . . . . . . . . . . 7 Platform Selection Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The No-Reference Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The Analytic Ecosystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The Building Blocks of Analytic Competition. . . . . . . . . . . . . . . . . . 10 Teradata Analytic Architecture Technology. . . . . . . . . . . . . . 10 Teradata Analytic Architecture Solution Model. . . . . . . . . . . 10 A Consistent Approach Ensures Delivery. . . . . . . . . . . . . . . . . 11 Definition of Information Architecture Lost amid the conversation on big data and the accelerating advancement of just about every aspect of enterprise software that manages information are the things that hold it all together. Yet this is critical: information-management components must come together in a meaningful fashion or there will be unneeded redundancy and waste and opportunities missed. Considering that optimizing the information asset goes directly to the organiza- tion’s bottom line, it behooves us to play an exceptional game— not a haphazard one—with our technology building blocks. The glue that brings information management components together is called “architecture”—the high-level plan for the data stores, the applications that use the data, and everything in- between. The “everything in-between” can be quite extensive as that relates to data transport, middleware, and transformation. Architecture dictates the level of data redundancy, summarization, and aggregation since data can be consolidated or distributed across numerous data stores optimized for parochial needs, broad-ranging needs and innumerable variations in between. There must be a true north for enterprise information architec- ture. There needs to be a process to vet practices and ideas that accumulate in the industry and the enterprise, and assess their applicability to the architecture. We define this body of possibili- ties in terms of “Design Patterns,” “Implementation Alternatives,” “Architecture Principles” and “Advocated Positions.” These concepts will be defined later in this paper, but what is important to understand upfront is that analytic success requires focused attention on information architecture. Analytics, not reporting, is forming the basis of competition today. Rearview-mirror reporting can be essential in support of operational needs. However, the large payback from information undoubtedly comes in the form of analytics.
  3. 3. EB-7592 > 0513 > PAGE 3 OF 11 An Architecture Framework: Teradata’s Approach Architecture is immensely important to information success— and thus the recipe for that success begins with a good, well- rounded and complete architectural approach. You can architect an environment in a way that encourages data use by making it perform well, putting up the architecture/data quickly, and having minimal impact on users and budgets for ongoing mainte- nance by building it well from the beginning. Any or all of these requirements can quickly send users retreating to the safety of status quo information usage, instead of taking on what might seem like a formidable challenge of progressive usage. But consider that in the small windows of time most users have to engage with available data, they can only reach a certain level of depth with the information. If the data is architected well, that analysis will be deep, insightful and profitable. That is the power of architecture. If your service provider’s approach does not reflect this, the result will be less than successful. Conversely, let’s look at Teradata’s approach. Teradata defines its Architecture Framework using the BIAS approach, which consists of a focus on four key components that comprise architecture, as well as two components that make it all work together: 1. Business Architecture 2. Information Architecture 3. Application Architecture 4. Systems Architecture 5. Enablement 6. Program Management Teradata defines the Business Architecture as understanding the business requirements and providing vision to those requirements. It has to do with defining the organizational business model, structures, missions, goals, and processes, and understanding which business fundamentals are vital for organizational success. You can architect for known requirements effectively only by understanding the context of eventual requirements. The trajectory of systems in an organization is never a linear pro- jection from a near-recent state to a current state through known requirements. It must include contingencies for the unknown and for the forked paths that systems can take in an organization. It must impute vision derived from similar organizations, especially more advanced and progressive ones. You do not invest in archi- tecture to be status quo—you expect business success, supported by architecture. Business Architecture is supported by Informa- tion Architecture and Application Architecture. Teradata’s Information Architecture supports Business Archi- tecture through storing or otherwise processing the data that is required, both internally and externally generated. Information Architecture must take into consideration the numerous avenues for data today. Data must be put in the best place to succeed, which primarily means it must be enabled quickly, well-performing, and scalable. Information Architecture identifies the data (and the state of the data) needed to support the Business Architecture and includes logical and physical data models, and is supported by Systems Architecture. Like Information Architecture, Teradata’s Application Architecture can subdivide applications in many ways. Application Architecture uses Information Architecture and Systems Architecture to support Business Architecture. While applications execute the functional side of the Business Architecture, effective cross-referencing of applications to the required tools and other applications is an important component of the Application Architecture challenge. Where the architecture rubber meets the road in Teradata’s approach is Systems Architecture. This is the physical mani- festation of architecture—the base upon which Information Architecture and Applications Architecture reside and deliver for the Business Architecture. Like in other areas, Systems Architecture has the issues of subdivision and optimization.
  4. 4. EB-7592 > 0513 > PAGE 4 OF 11 Business, Information, Applications and Systems Architecture are each disciplines unto themselves and may be optimized individually. But they must be prioritized through Enablement. Enablement evaluates cultural and organizational readiness for architectural advances and prioritizes resources and work effort accordingly. According to Teradata, “Enablement evaluates cultural and orga- nizational readiness for the architectural advances and prioritizes resources and work effort accordingly. Enablement adds data management capabilities with each implementation, such as a data quality improvement program, a data governance capabil- ity or one of the ones reviewed below, that support current and future information initiatives.” Much of the work building architecture for analytic competition should include “soft” factors like Enablement, especially early in the process. Finally, according to Teradata, it is overall Program Management that will intelligently bring everything together into meaningful interim points that deliver analytics to address organizational goals in an agile fashion. Program Management extends through- out all implementations and ensures consistency and continuity among many projects and players. In summary, Teradata has a comprehensive approach to informa- tion architecture. It acknowledges the importance of architecture and skillfully decomposes architecture into layers that can be discretely worked on in context of a full approach. Design Patterns and Implementation Alternatives In daily information management activity, decisions are made with high frequency and major decisions are never far away. In order to support those decisions with program context and unbiased wisdom, it is necessary to make and implement design choices. To accomplish this, Teradata suggests addressing what it calls Design Patterns and Implementation Alternatives. Design Patterns, according to Teradata, are a set of proven architectural options for meeting an array of requirements. They are reusable approaches to solve commonly occurring problems, whether they are affecting a program at present or are those that should be anticipated. It is important to have alternatives laid out for different situations that are likely to be encountered, and plan them out with an appropriate level of nuance and understanding of the pros and cons of architectural decisions. While leaving room for personal judgment, which is always necessary, Teradata’s Design Patterns and its physical side— Implementation Alternatives—provide a strong basis for decision-making. This basis can be very beneficial in aligning people with ultimate decisions. If left to an unsupported process, decisions would not only take longer, they would be less accepted. Design Patterns and Implementation Alternatives enable pro- gram agility and appropriately shift some balance in what consti- tutes success away from simply decision-making to the execution of decisions. Teradata’s Design Patterns and Implementation Alternatives reduce the chances of failure by enabling a shop with alternatives thought out in advance, without the pressure of an impending sprint deadline. So why fail, even if it is “fast”? Well thought-out Design Patterns and Implementation Alternatives enable speed and reduce the chances for failure. Enablement addresses where organizations are weak and reasons they may fail.
  5. 5. EB-7592 > 0513 > PAGE 5 OF 11 Architecture Principles and Advocated Positions While Design Patterns and Implementation Alternatives are actionable, they are built upon what Teradata refers to as Architectural Principles and Advocated Positions. These beliefs about information and how things should be done will change less frequently and may be advocated from higher company positions than the Design Patterns and Implementa- tion Alternatives. Advocated Positions help balance between short- and long-term tradeoffs. They are the bedrock upon which everything in the program flows; it is essential to get these right, then ensure that the Design Patterns and Implementation Alter- natives are a correct interpretation of the positions. One of Teradata’s most important Advocated Positions is to prioritize data access over data loading. Although both areas can have performance issues, users (customers) of the analytic infra- structure will always prioritize the time they are interfacing with the data over the currency of the data. While layers of intake and distribution may be physically separated in a data warehouse, and thus able to be optimized for purpose, it is the overall architecture that should first be optimized for data access. Today, that analytic architecture extends well beyond the data warehouse, increasing the need for architecture. You need a process to make decisions as much as you need the decisions themselves. With Architectural Principles and Advocated Positions, Teradata has completely encapsulated the necessary decision-making side of analytic architecture. Architectural decision-making during development occurs with high frequency, but peaks at the beginning of an effort when deci- sions are made about what will be done in the sprint, and how. The team then should be able to know what is needed from previ- ous architecture decisions about their work and be empowered to deliver. Architecture provides proven, reusable components to accelerate development time. Architecture is about facilitating prioritized data access, not done for its own sake or to satisfy an abstract standard. Teradata’s Advocated Positions Include: • Load everything into the core physical data model • Touch it, take it (extract all columns) • Reversibility of data errors out of the core physical data model • Reusability of common components • Traceability of core data to its originating source system • Collect metadata, both technical and business • Abstracted core physical data model from business usage • Include acquisition/staging layer in the architecture • No production reporting from non-production systems • Integrated logical and physical data models • Permanently archive everything • Enforce referential integrity • Prioritize data access over data loading • Full copy of source data objects in acquisition area • A single route for data to flow into the core physical model
  6. 6. EB-7592 > 0513 > PAGE 6 OF 11 Balancing Acts: Delivery Versus Architecture Even business leaders can tend to take a tactical approach to the execution of the requirements. However, it does not necessarily take longer to satisfy information requirements in an architected fashion. If architecture principles and technology possibilities are not on the table beforehand, the means to satisfy the last require- ment may be used to satisfy a new requirement. This may or may not be appropriate. This also disconnects the solution from prior solutions that may lead the way to requirement satisfaction. For example, shops with countless multidimensional structures—and with more being built on almost a daily basis—can readily attest to a need for architecture. By taking a disciplined architectural approach, we have found that we are in a better position to solve the next busi- ness problem now. Teradata Unified Data Architecture™ When organizations put all their data to work, they make smarter decisions and create a new data-driven approach to improving their business. Through deeper insights about customers and operations, the data delivers competitive advantage for leading organizations that are able to compete on analytics by leveraging all their data. Companies should exploit this market opportunity to compete on analytics by creating a strong analytic foundation based on a comprehensive data architecture that leverages existing, new, and emerging technologies. This architecture should contain three main capabilities: • Data Warehousing—Integrated and shared data environments to manage the business, and deliver strategic and operational analytics to the extended organization • Data Discovery—Discovery analytics to rapidly unlock insights from big data through rapid exploration using a variety of analytic techniques that are accessible by mainstream business analysts. • Data Staging—Loading, storing, and refining data in preparation for analytics Teradata has responded to this market need by developing Teradata® Unified Data Architecture™ that allows organizations to leverage the complementary values of the Teradata® Database, Teradata Aster SQL-MapReduce®, and open-source Hadoop® technologies. This Unified Data Architecture™ helps companies define and deploy an architecture that makes use of these best-of-breed technologies in a way that unleashes the value of their data. Companies can apply the right technology to the right analytical opportunities so business users can isolate intelligent signals— and have an architecture for analytic decisions.
  7. 7. EB-7592 > 0513 > PAGE 7 OF 11 Architecture Development and Information Management Possibilites There is a need for architecture that falls outside of captive project timeframes and may seem somewhat removed from user require- ments—at least to users. However, the architecture requirements outlined here play a vital role in delivering user requirements. They are a skillful interpretation of user requirements. The best way to look at an analytics program is as a series of architecture sprints. Taking on analytics as architecture means analytics will be done to internally adjudicated current standards and built to company priorities. Architecture requires its own codified efforts. The continuous activity of information management is architecture. With disci- pline, Teradata Design Patterns and Implementation Alternatives as well as Architecture Principles and Advocated Positions will be continually used over time, providing ongoing value by limiting risk and not reinventing the wheel. Without architecture, analytic development is destined for high levels of wasted effort, restarts, redundancy and, most damaging, missed opportunity. Harnessing Workloads Workloads comprise functionality necessary to achieve with data, as well as the management of the data itself. Harnessing work- loads for allocation to an architecture component is both an art and a science. There are user communities with a list of require- ments upon a set of data. There are other user communities with their own list of requirements on the same data. Is this one workload? If ultimately it is best to store the data in one location and use the same tool(s) to satisfy the requirements, the practical answer is “yes.” When does the “set of data” end and become a different workload? It could, practically speaking, be when a new data store is appro- priate. Harnessing workloads can be puzzling, but ultimately workloads need to be ring-fenced for architecture purposes. What Determines the Success of a Workload? Many technology types have emerged in recent years to support the idea that analytic data needs to perform—the primary means of judging the success of a workload. As previously mentioned, it is the performance of the data access that constitutes the perfor- mance of a workload. Getting to fast performance quickly is the second measure of the success of an analytic workload. In the end, if the good perfor- mance goes away quickly because the application is not scaling, all would be for naught. The third measure of workload success is scale. Note that this does not mean the initial Systems Archi- tecture must last forever untouched. It does mean that Systems Architecture is maintained without user impact. As far as they are concerned, it hums along. Architecture component selection is more important than ever because it must scale with exponen- tially increasing data volumes and user requirements. Information Management is nothing more than the continuous activity of architecture.
  8. 8. EB-7592 > 0513 > PAGE 8 OF 11 Platform Selection Process Many companies are not having success with their workloads due to a lack of focus on architecture. Specifically, if the analytic architecture possibilities are not known or considered for a work- load, it is quite likely that the platform used for the last workload will be used again for the new workload. The more the platform possibilities are considered for the workload, the better the chance for success of that workload. There are many platform categories (each designed for specific types of workloads) for storing data in the analytic architecture. These will be discussed in the next section. There is no “one size fits all” when it comes to platform selection. There is a best plat- form for each workload and the odds of workload success go up tremendously if the correct platform is selected. The No-Reference Architecture We are in the post-reference architecture era of information man- agement. The 1990s were the decade of vendors going in and out of shops holding up laminated, uncustomized reference architec- tures and convincing clients to strive to attain that picture.1 Once they did, it was assumed, all their problems would be solved. It was also more palatable to the technology manager to hold out a technical standard to hit, as opposed to suggesting he must hit business goals with architecture. An analytic architecture approach keeps business goals foremost in mind. This also means that all shops will manifest different architectures. That “reference” architecture will also continu- ally change. Leadership must have an agile mindset to keep it updated. This is the essence of “no-reference” architecture. It is not definable in laminate. It is empowered with support compo- nents to meet all foreseeable business goals and it will change to meet those goals. And it considers all possibilities, knowing that it is controls of one of the most important assets that the company has—information—and one of the most important means of modern competition—analytics. The Analytic Ecosystem Analytics do not solely exist in the post-operational world. As a matter of fact, the whole notion of a hard boundary between operational (characterized by the ERP) and the post-operational (characterized by the data warehouse) is going away. Analytics certainly can be operational. So can Business Intelligence (BI). So much of what we’ve learned with post-operational BI is now being applied to the operational environment in the form of operational BI like operational dashboards, stream processing, and master data management. However, we must distinguish between creating and using analyt- ics. Analytics are used everywhere and should be generated from data created everywhere. We must get beyond making that default data store selection discussed earlier. We must have knowledge of, and consider, a list of usual suspects for analytic workloads. It includes: 1. The relational data warehouse, augmented with columnar capabilities 2. An analytic database management system 3. A data warehouse appliance Leading examples of these data stores will be examined the next section. For now, let us emphasize the interplay of the analytic components. There are no set rules for how data will flow in the analytic architecture. Architecture is important, practical, and holistic, and drives analytic and organizational success. Proceeding with analytics without an architecture approach is like trying to solve a Rubik’s Cube blindfolded. Sure, some extraordinary people, with extensive practice, can do it, but why make it so hard? 1. Some vendors still do this
  9. 9. EB-7592 > 0513 > PAGE 9 OF 11 It is important to work with a company that understands the methodology and components of architecture, and has the experience to help create an analytic organization. While directionally the data warehouse will feed data marts, there will be marts that do the reverse and stand alone. There are appli- cations that need unadulterated source data—not data that has gone through the data warehouse first. Even if the data warehouse certifiably does not alter the data, applications in audit, security, and the like will prefer the nondependent (on the data warehouse) data mart. This is not to say that nondependent data marts do not happen otherwise. They do. If the architecture is not sound and a focus of the program, the value-add of data passing through the data warehouse will not be clear. Architecture, and therefore ulti- mately business, may take a hit in these environments. Analytic database management systems such as Teradata Aster’s (discussed in the next section) may also play a strong role in the post-operational analytic environment. Though these systems do not replace the data warehouse, they store the increasingly important unstructured and semi-structured data of an organi- zation. This is data that largely has been ignored or force fit into relational structure over the years, to mixed results. Obviously all of this big data will not be replicated into the data warehouse, so interplay between the warehouse and the analytic database management system is a must. This gets back to the sup- port components mentioned earlier. Data warehouse appliances, however, could play the role of the data warehouse—minimally in terms of intake and distribution in the analytic environment, and storing history data. The data warehouse appliance, in some circumstances, could play this data warehouse-like role. The other role necessary in the analytic environment is access. The role of access is perhaps the most complex. Data is distributed from the data warehouse and other platforms to the best platform for the data access in an architected environment.
  10. 10. EB-7592 > 0513 > PAGE 10 OF 11 The Building Blocks of Analytic Competition Understanding the meaning and importance of architecture is not enough. It is imperative to implement the analytic environment with an architecture focus. This doesn’t happen by accident. Likewise, moving forward in an analytic program with agility means bringing support components to the table. And just as we need to leverage the support components, we need to leverage our partner for the analytic architecture. The partner should bring extensive architectural understanding and experience, and the right components to bear to create the proper analytic environment. These components include not only technology, but also a port- folio of “jump starts” for the use of the technology. In the case of Teradata, all needed components are already in place, integrated, and delivering world-class analytic organizations all over the world with the BIAS approach. Teradata Analytic Architecture Technology Teradata’s offerings undoubtedly stand out for data warehouse and data mart appliance platforms. Its Active Enterprise Data Warehouse line, based on the Teradata® Database, supports more than 50 percent of large-scale data warehouses today. All database functions in Teradata systems are always done in parallel, using multiple server nodes and disks with all units of parallelism par- ticipating in each database function. Teradata Optimizer is grounded in the knowledge that every query will be executing on a massively parallel processing system. Teradata manages contending requirements for resources through dynamic resource prioritization that is customizable by the cus- tomer. The server-nodes interconnect was designed specifically for a parallel processing multi-node environment. This inter- connect is a linearly scalable, high-performance, fault-tolerant, self-configuring, multi-stage network. In Teradata 14, Teradata added columnar structure to a table, effectively mixing row, column, and multi-column structures directly in the DBMS. With intelligent exploitation of Teradata Columnar, there is no longer the need to go outside the data warehouse DBMS for the power of performance that columnar provides, and it is no longer necessary to sacrifice robustness and support in the DBMS that holds the post-operational data to get the advantages of columnar. Teradata has extended its leadership from their EDWs into their appliance family for midmarket enterprise EDWs, as well as data marts for large companies. The Teradata Data Warehouse Appliance supports the EDW approach to building the data warehouse and is the Teradata appli- ance family flagship product. It is suitable for an upper-midmarket true EDW or as the platform for a focused application. The Teradata Data Mart Appliance is a more limited-capacity equiva- lent of the Teradata Data Warehouse Appliance and is ideal for the departmental or midmarket platform. The Teradata Extreme Data Appliance is also part of the Teradata appliance family and repre- sents affordability for the management of large quantities of data. Teradata Aster’s analytic database management system, has patent-pending In-Database MapReduce (MR), a hybrid row/col- umn store with an MR approach. Its MPP architecture makes it work for predictable as well as ad-hoc analytic use cases. It blends the performance of a relational database (i.e., indexes, optimizers, and more) with the programming flexibility of MapReduce (Java, Perl, Python, .Net, etc.) Teradata Analytic Architecture Solution Model A semantic data model is a set of symbols and text describing the information needed to answer a defined set of business ques- tions. It is a representation of the access layer whose purpose is to improve the simplicity, security, and speed of the data warehouse. Its characteristics are: • Usually dimensional • Often implemented through views • Easy and quick access to data • Variety of ways to look at the same data • Primary point of entry for BI tools
  11. 11. EB-7592 > 0513 > PAGE 11 OF 11 The semantic data model is usually dimensional but can also represent other types such as Analytical Data Sets. There are two data modeling mindsets: relational and dimensional. A relational model captures the business rules. A dimensional data model cap- tures the navigation paths and focuses on evaluating the meaning of the business being monitored through metrics such as Gross Sales Amount and Number of Customers. Most semantic data models are dimensional because such models support business questions that follow the pattern of: • What do I want to see? • What do I want to see it by? • What constraints are there on the results? The semantic data model is often implemented through views. A semantic data model can be shown at conceptual, logical, and physical levels of detail. At a physical level, it is often implemented as views over the integrated data layer. The semantic data model also provides quick and easy access to data—users and BI tools need to be able to answer business questions quickly and easily. In addition, the semantic data model must be designed to support a variety of ways to look at the same data. Although an order may be depicted just one way in the integrated data layer, it can be shown in multiple ways across multiple semantic data models depending on business needs. Also the semantic data model is the primary point of entry for BI tools. A Consistent Approach Ensures Delivery Architecture is not easy to come by without focused effort. It can easily be shortchanged if it is not understood that it is the direct cause of analytic success. Architecture is a way of life for deliver- ing analytics and a consistent approach ensures that delivery. Teradata’s consistent approach features: • A multi-component approach—BIAS—to architecture • A sound, repeatable, and successful methodology • Use of Architectural Principles and Advocated Positions • Use of Design Patterns and Implementation Alternatives • World-class technology building blocks • Use of architecture solution model building blocks Teradata provides the building blocks for the analytic architecture solution model. The Best Decision Possible and Unified Data Architecture are trademarks, and Teradata, the Teradata logo and SQL-MapReduce are registered trademarks of Teradata Corporation and/ or its affiliates in the U.S. and world-wide. Apache and Hadoop are registered trademarks of the Apache Software Foundation. William McKnight William is a consultant specializing in information management. His company, McKnight Consulting Group, has served clients such as Fidelity Investments, Teva Pharmaceuticals, Scotiabank, Samba Bank, Pfizer, France Telecom, and Verizon—in total, 16 of the Global 2000. William is also a very popular speaker worldwide and a prolific writer who has published hundreds of articles and white papers. An Ernst&Young Entrepreneur of the Year Finalist, William is a former Fortune 50 technology executive and software engineer. He provides clients with action plans, architectures, strategies, complete program, and vendor-neutral tool selection to manage information. He can be reached at 214-514-1444 or through his website at