ANALYTICS AND INFORMATION ARCHITECTUREWILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Custom Research Report
By William...
ANALYTICS AND INFORMATION ARCHITECTURE 2WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Are We Doing Analytics in the D...
ANALYTICS AND INFORMATION ARCHITECTURE 3WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
The schema-less NoSQL motto see...
ANALYTICS AND INFORMATION ARCHITECTURE 4WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
behavior, customer state, chara...
ANALYTICS AND INFORMATION ARCHITECTURE 5WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Customer profiling and customer...
ANALYTICS AND INFORMATION ARCHITECTURE 6WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
will be saved.
Multidimensional...
ANALYTICS AND INFORMATION ARCHITECTURE 7WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Master Data Management
Master D...
ANALYTICS AND INFORMATION ARCHITECTURE 8WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Another aspect of ParAccel over...
ANALYTICS AND INFORMATION ARCHITECTURE 9WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
The architecture for analytics ...
ANALYTICS AND INFORMATION ARCHITECTURE 10WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
About the Author
William funct...
Upcoming SlideShare
Loading in …5
×

Analytics and Information Architecture

619 views

Published on

Analytics are forming the basis of competition today. This white paper addresses what distinguishes analytics and answers the question are we doing analytics in the data warehouse? The paper further talks about the contending Platforms for the Analytics Workload and introduces the ParAccel Analytic Database as a key component of Information Architecture.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
619
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Analytics and Information Architecture

  1. 1. ANALYTICS AND INFORMATION ARCHITECTUREWILLIAM McKNIGHT © McKnight Consulting Group, 2013 Custom Research Report By William McKnight www.mcknightcg.com Analytics and Information Architecture William McKnight
  2. 2. ANALYTICS AND INFORMATION ARCHITECTURE 2WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Are We Doing Analytics in the Data Warehouse? Companies have already begun to enter the “long tail” with the enterprise data warehouse. Its functions are becoming a steady but less interesting part of the information workload. While the data warehouse is still a dominant fixture, analytic workloads are finding their way to platforms in the marketplace more appropriate to the analytic workload. When information needs were primarily operational reporting, the data warehouse was the center of the known universe by a long shot. While entertaining the notion that we were doing analytics in the data warehouse, competitive pressures have trained the spotlight on what true analytics are all about. Many warehouses have proven to not be up to the task. And it’s analytics, not reporting, that is forming the basis of competition today. Rearview-mirror reporting can support operational needs and pay for a data warehouse by virtue of being “essential” in running the applications that feed it. However, the large payback from information undoubtedly comes in the form of analytics. If the analytics do not weigh down the data warehouse, big data volumes will. As companies advance their capabilities to utilize every piece of information, they are striving to get all information under management. This includes the “big data” of sensor, webclick, social, complete logs, etc. Many have limited their big data to subsets put into data warehouses and other relational structures. The NoSQL world, with much more limited functionality, provides cost advantages to those companies who can make the mindshift necessary for adoption. Table of Contents Are We Doing Analytics in the Data Warehouse? .........................2 What Distinguishes Analytics?.......................................................3 Contending Platforms for the Analytics Workload........................5 ParAccel Analytic Database............................................................7 Information Architecture...............................................................8 provided by: William McKnight www.mcknightcg.com And it’s analytics, not reporting, that is forming the basis of competition today
  3. 3. ANALYTICS AND INFORMATION ARCHITECTURE 3WILLIAM McKNIGHT © McKnight Consulting Group, 2013 The schema-less NoSQL motto seems to be “Give me your tired, your poor, your huddled masses” when it comes to data. Dispensing with formalities of deep requirements, metadata and the like, these solutions collect massive amounts of high-velocity data. Some provide operational support for their customer’s internet experience. But for those with analytical aspirations with the data, the users - often the developers or very close to them - hope to turn this data into the information that drives the analytics. While these platforms have proven adept at loading and storing the information, doing modern analytics there is challenging. Furthermore, analytics require cross-enterprise information and moving information out of the NoSQL stores would be as problematic as trying to load it there. Data tends to stay where it lands in the information architecture. The enterprise data warehouse must still exist and must still be advanced (or reengineered as the case may be) with tremendous care. The data relationships must match the business relationships and the data must have sufficient quality. It must scale to the level it needs to. It’s not necessarily easier to do it than five years ago. However, other than some minor innovations, the goals and the platforms remain the same. So, if the data warehouse is not the end of the story for analytics and NoSQL solutions have limited information and capabilities, where should a company actually “do” their analytics? This is perhaps the most legitimate question in information management today. This paper will provide input to the decision. But first, what are analytics? What Distinguishes Analytics? Many approach analytics as a set of categories of value propositions to the company. However, from a data -use perspective, the definition of analytics is in how they are formed. They are formed from more complex uses of information than reporting. Analytics are formed from summaries of information. Addressing the propensity of a customer to make a purchase, for example, requires an in-depth look at her spending profile - perhaps by time slice, geography and other dimensions. It requires a look at those with similar demographics and how they responded. It requires a look at ad effectiveness. And it may require a recursive look at all of these and more. Analytics should also be tied to business action. A business should have actions to take as a result of analytics - for example, customer-touch or customer-reach programs. There are numerous categories that fit this perspective of analytics. Customer profiling, even for B2B customers, is an essential starting point for analytics. Companies need to understand their “whales” and how much they are worth comparatively. Companies need a sense of the states a customer goes through with them and the impact on revenue when a customer moves states. Customer profiling sets up companies for greatly improved targeted marketing and deeper customer analytics. This form of analytics starts by segmenting the customer base according to personal preferences, usage Analytics are formed from summaries of information.
  4. 4. ANALYTICS AND INFORMATION ARCHITECTURE 4WILLIAM McKNIGHT © McKnight Consulting Group, 2013 behavior, customer state, characteristics, and economic value to the enterprise. Economic value typically includes last quarter, last year-to-date, lifetime-to-date and projected lifetime values. Profit is best in the long run to utilize in the calculations. However, spend (shown in the bullets below) will work too. More simplistic calculations that are simply “uses” of the company’s product will provide far less reliable results. The key attributes to use should have financial linkage that maps directly to return on investment (ROI) of the company. Where possible, analyze customer usage history by customer for the following econometric attributes at a minimum:  Lifetime spend and percentile rank to date. This is a high priority item.  Last year-to-date spend and percentile rank.  Last year spend and percentile rank. This is a high priority item.  Last quarter spend and percentile rank.  Annual spend pattern by market season and percentile rank.  Frequency of purchase patterns across product categories.  Using commercial demographics (RL Polk, MediaMark or equivalent), match the customers to characteristic demographics at the census block and block group levels.  If applicable, social rank within the customer community.  If applicable, social group(s) within the customer community. These calculations provide the basis for customer lifetime value and assorted customer ranking. The next step is to determine all of these attributes for projected future spend based on assigning customers to lifetime spend based on (a) n-year performance linear regression or (b) n-year performance of their assigned quartile if less than n years of history available. Choose key characteristics of each customer quartile (determine last year spend quartile levels), determine unique characteristics of each quartile (age, geo, initial usage) and match new customers to their quartile and assign average projected spend of that quartile to new customers. Defining the relevant and various levels of retention and value is an extension of customer profiling. These are customer profiling variables like the ones above except they are addressing the need for more immediate preventative action as opposed to predicting the volume of future profit. Also, regardless of churn potential, the determination of the point at which customers tend to cross a customer state in a negative direction is essential to analytics. The determination of the point at which customers tend to cross a customer state in a negative direction is essential to analytics.
  5. 5. ANALYTICS AND INFORMATION ARCHITECTURE 5WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Customer profiling and customer state modeling should combine to determine the who and when of customer interaction. Actions could be a personal note, free minutes, free ad-free and free community points. Also in markets where customers are likely to utilize multiple providers for the services a company provides, the company should know the aspirant level of each customer by determining the 90th percentile of usage for the customers who share key characteristics of the customer (age band, geo, demographics, initial usage). This “gap” is an additional analytic attribute and should be utilized in customer actions. This is simply a start on analytics, and I’ve focused only on the customer dimension, but hopefully it is evident that many factors make true analytics:  Analytics are formed from summaries of information  Inclusion of complete, and often large, customer bases  Continual re-calculation of the metrics  Continual re-evaluation of the calculation methods  Continual re-evaluation of the resulting business actions, including automated actions  Adding big data to the mix extends the list of attributes and usability of analytics by a good margin Big data - and the combination of big data and relational data – greatly increases the effectiveness of analytics. Using analytics is an effective business strategy that must be supported with high quality, cross- platform-border data. I’ll now talk about the platforms in use and their potential for analytics. Contending Platforms for the Analytics Workload There are numerous data vessels that lay claim to a slice of data and/or processing today as well. There is no “one size fits all” as organizations pursue information strategies that give the data the best chance for success, with performance for the anticipated workload being an overriding factor in platform selection. These contending platforms include the enterprise data warehouse, multidimensional databases, the NoSQL family, Columnar Databases, Stream Processing and Master Data Management. Let’s look at these and their appropriate workloads. The Enterprise Data Warehouse Enterprise data warehouses (EDWs) are based on the relational theory, which supports the table as the basic structure. As the ubiquitous collection point for all operational data interesting in a post-operational world, it has served reports, dashboards, performance indicators, basic analytics, ad-hoc access and more. Extended with solid-state components as well as automated archival abilities, the data warehouse will remain a very important component of an information architecture. It is also where historical information
  6. 6. ANALYTICS AND INFORMATION ARCHITECTURE 6WILLIAM McKNIGHT © McKnight Consulting Group, 2013 will be saved. Multidimensional Databases Multidimensional databases (MDBs) or cubes are denormalized and compacted selective data. Often containing summarized data, MDBs support “slice and dice” of the select data within the cube with great speed. The building of the MDBs, from both size and speed standpoints, becomes the bottleneck to their widespread use. They are also clearly best for financial applications, which remains a priority use of this approach. The NoSQL Family NoSQL includes Hadoop, Cassandra, MongoDB, Riak and many others – over 100 in all – that do not strictly conform to use of the SQL language against its data. This is largely because the data is not in a relational database. The solutions, largely open sources, can be further broken into OLTP-mimicking key- value and column stores, relationship-based graph stores and analytic Hadoop stores. These are scale-out, schema-less solutions on commodity hardware that do not provide full ACID compliance. As previously mentioned, it does not follow that analytics on the data collected in NoSQL will be done in the NoSQL environment. These NoSQL stores are excellent at screening, sorting and loading data (that ETL and ACID would crush.) Any enterprise analytics solution would need to allow for the cost and performance advantages of NoSQL for loading big data. Columnar Databases Columnar databases physically isolate the values of each column in a relational table. This improves the I/ O bottleneck by bringing only the useful columns into query processing. It also greatly facilitates a compression strategy due to repeating values and the ability to apply compression to the much more finite set of values found in a single column, as opposed to having to consider entire rows. There are also many databases with a hybrid row and column implementation. A columnar orientation has proven to be a requirement for the analytic workload, which tends to require a small subset of all data in the tables implicated in a query. Stream Processing Circumventing the need to store then process information, stream processing observes data feeds and executes real-time business processes prior to optional data storage. Stream processing is a great way to execute immediate business processes in connection with a business condition evidenced by most recent data across the enterprise. It also is an approach that can benefit tremendously from analytics brought into the decisions.
  7. 7. ANALYTICS AND INFORMATION ARCHITECTURE 7WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Master Data Management Master Data Management (MDM) solutions pull together master sets of information for widespread use, once done in the data warehouse, and match that with a real-time distribution capability for the data. The master data might be sourced from other systems in real-time or it might be supported with workflow components. Data quality is an essential element for the data put into master data. Master Data Management, due to its leveragability, stands tremendously to benefit from analytics as the attributes stored about its entities can extend beyond the basic ones and into analytics. These analytic values can support stream processing as well as reporting out of NoSQL stores. The platform to support modern analytics must cost-effectively work for the multi- to hundreds of terabytes data set. And it must be able to utilize big data in NoSQL sources like Hadoop. Analytics are severely disadvantaged if restricted to one set of data or the other. Some redundancy is still a part of an effective information strategy. Federated queries can handle edge and unanticipated workloads that require cross-platform data. Some solutions support high data scale as well as the built-in ability to incorporate data from NoSQL stores like Hadoop into the analytic processing. This avoids redundancy and movement and provides access to a full data set. And they do it keeping the relational model intact and with extended performance and scale-out architectures. These systems strongly contend for the analytic workload. ParAccel Analytic Database ParAccel Analytic Database (ParAccel) is one such system. We would need to call the platform police on ParAccel as it has elements of many of the above platform categories in one platform. ParAccel is a columnar database. It has extensive compression routines such as delta, run length, LZ and null trim. The customer can choose the utilization of the routines or allow ParAccel to do it automatically. Being columnar with extensive compression, which pack the data down on disk, strongly minimize the I/O bottleneck found in many of the contenders for the analytic workload. ParAccel architecture is shared-nothing massively-parallel, the scalable architecture for the vast majority of the world’s largest databases. ParAccel also supports rich transformation – the “T” in ETL. We often need to massage the data coming into the analytics system. NoSQL systems focus on the extract, load and basic screening capabilities of data integration only. ParAccel has workload management that allow shorter queries to execute quickly and it has concurrency control. These are some of many aspects of being relational and having unique properties that gives ParAccel advantages over NoSQL stores for analytics. ParAccel architecture is shared -nothing massively-parallel, the scalable architecture for the vast majority of the world’s largest databases.
  8. 8. ANALYTICS AND INFORMATION ARCHITECTURE 8WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Another aspect of ParAccel over NoSQL is that ParAccel allows for full SQL. It also allows for third-party library functions and user defined functions. Together, these abilities allow a ParAccel user to do their analytics “in database”, utilizing and growing the leveragable power of the database engine and keeping the analysis close to the data. These functions include Monte Carlo, Univariate, Regression (multiple), Time Series and many more. It is most of the functionality of dedicated data mining software. Perhaps the feature that makes it work best for analytics is its unique accommodation of Hadoop. Without the need to replicate Hadoop’s enormous data, ParAccel treats Hadoop’s data like its own. With a special connector, ParAccel is able to see and utilize Hadoop data directly. The queries it executes in Hadoop utilize fully parallelized MapReduce. This supports the information architecture, suggested below, of utilizing Hadoop for big data, ParAccel for analytics and the data warehouse for operational support. It leverages Hadoop fully without performance overhead. Connectors to Teradata and ODBC also make it possible to see and utilize other data interesting to where the analytics will be performed. ParAccel offers “parallel pipelining” which fully utilizes the spool space without pausing when a step in the processing is complete. ParAccel is compiled architecture on scale-out commodity hardware. With in- memory and cloud options, a growing blue-chip customer base, but most importantly, a rich feature base for analytics and integration with Hadoop, ParAccel is built to contend for the analytic workload. Information Architecture Information architecture has been getting more complicated with companies adopting a unique system for each workload. With ParAccel, it is beginning to simplify, at least when it comes to where analytics are calculated. While analytics will permeate the modern competitive enterprise, enterprises need a robust platform for calculating analytics. Enterprise data warehouses will support operations and light analytics as well as remember history data. The EDW remains of vital importance to every enterprise. Big data systems like Hadoop must enter many environments to cost-effectively pick up the abundant sensor, social, webclick and otherwise FULL data of an enterprise. However, severely lacking tooling, transformation, schema, interactivity, ACID, concurrency, workload management and other relational benefits also limits its ability to be the analytics platform. In the information architecture, ParAccel will call MapReduce jobs to fetch Hadoop data and return it to ParAccel, where analytics can be performed. Multidimensional databases will continue to serve the financial departments and stream processing will begin to bring instant decision making to operational streams of data, utilizing analytics in the process. Although no one is mistaking master data management platforms for an analytics platform, MDM is another important vessel in utilizing – in this case disseminating – analytics. Enterprises are growing in their ability to utilize analytics in many ways and systems are supporting this strategy. Perhaps the feature that makes it work best for analytics is its unique accommodation of Hadoop.
  9. 9. ANALYTICS AND INFORMATION ARCHITECTURE 9WILLIAM McKNIGHT © McKnight Consulting Group, 2013 The architecture for analytics will be columnar. It will accommodate Hadoop’s data loading abilities and it will provide robust analytic functionality and the ability to customize and extend that functionality. It will not force a data warehouse to hold hundreds of terabytes or force Hadoop to hold less than that.
  10. 10. ANALYTICS AND INFORMATION ARCHITECTURE 10WILLIAM McKNIGHT © McKnight Consulting Group, 2013 About the Author William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, big data, master data management, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success stories. William is a Southwest Entrepreneur of the Year Finalist, a frequent best practices judge, has authored hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His team’s implementations from both IT and consultant positions have won Best Practices awards. William is a former IT VP of a Fortune 50 company, a former engineer of DB2 at IBM and holds an MBA. William can be reached at 214-514-1444 or wmcknight@mcknightcg.com. 5960 W. Parker Rd., Suite 278-133 Plano, TX 75093 Tel (214) 514-1444

×