SlideShare a Scribd company logo
1 of 10
Download to read offline
ANALYTICS AND INFORMATION ARCHITECTUREWILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Custom Research Report
By William McKnight
www.mcknightcg.com
Analytics and Information Architecture
William
McKnight
ANALYTICS AND INFORMATION ARCHITECTURE 2WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Are We Doing Analytics in the Data
Warehouse?
Companies have already begun to enter the “long tail” with the enterprise
data warehouse. Its functions are becoming a steady but less interesting
part of the information workload. While the data warehouse is still a
dominant fixture, analytic workloads are finding their way to platforms in
the marketplace more appropriate to the analytic workload.
When information needs were primarily operational reporting, the data
warehouse was the center of the known universe by a long shot. While
entertaining the notion that we were
doing analytics in the data warehouse,
competitive pressures have trained the
spotlight on what true analytics are all
about. Many warehouses have proven to
not be up to the task.
And it’s analytics, not reporting, that is
forming the basis of competition today. Rearview-mirror reporting can
support operational needs and pay for a data warehouse by virtue of being
“essential” in running the applications that feed it. However, the large
payback from information undoubtedly comes in the form of analytics.
If the analytics do not weigh down the data warehouse, big data volumes
will. As companies advance their capabilities to utilize every piece of
information, they are striving to get all information under management.
This includes the “big data” of sensor, webclick, social, complete logs, etc.
Many have limited their big data to subsets put into data warehouses and
other relational structures. The NoSQL world, with much more limited
functionality, provides cost advantages to those companies who can make
the mindshift necessary for adoption.
Table of Contents
Are We Doing Analytics in the Data Warehouse? .........................2
What Distinguishes Analytics?.......................................................3
Contending Platforms for the Analytics Workload........................5
ParAccel Analytic Database............................................................7
Information Architecture...............................................................8
provided by:
William McKnight
www.mcknightcg.com
And it’s analytics, not
reporting, that is
forming the basis of
competition today
ANALYTICS AND INFORMATION ARCHITECTURE 3WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
The schema-less NoSQL motto seems to be “Give me your tired, your poor, your huddled masses” when it
comes to data. Dispensing with formalities of deep requirements, metadata and the like, these solutions
collect massive amounts of high-velocity data. Some provide operational support for their customer’s
internet experience. But for those with analytical aspirations with the data, the users - often the developers
or very close to them - hope to turn this data into the information that drives the analytics. While these
platforms have proven adept at loading and storing the information, doing modern analytics there is
challenging. Furthermore, analytics require cross-enterprise information and moving information out of the
NoSQL stores would be as problematic as trying to load it there. Data tends to stay where it lands in the
information architecture.
The enterprise data warehouse must still exist and must still be advanced (or
reengineered as the case may be) with tremendous care. The data
relationships must match the business relationships and the data must have
sufficient quality. It must scale to the level it needs to. It’s not necessarily
easier to do it than five years ago. However, other than some minor
innovations, the goals and the platforms remain the same.
So, if the data warehouse is not the end of the story for analytics and NoSQL solutions have limited
information and capabilities, where should a company actually “do” their analytics? This is perhaps the
most legitimate question in information management today. This paper will provide input to the decision.
But first, what are analytics?
What Distinguishes Analytics?
Many approach analytics as a set of categories of value propositions to the company. However, from a data
-use perspective, the definition of analytics is in how they are formed. They are formed from more complex
uses of information than reporting. Analytics are formed from summaries of information.
Addressing the propensity of a customer to make a purchase, for example, requires an in-depth look at her
spending profile - perhaps by time slice, geography and other dimensions. It requires a look at those with
similar demographics and how they responded. It requires a look at ad effectiveness. And it may require a
recursive look at all of these and more. Analytics should also be tied to business action. A business should
have actions to take as a result of analytics - for example, customer-touch or customer-reach programs.
There are numerous categories that fit this perspective of analytics. Customer profiling, even for B2B
customers, is an essential starting point for analytics.
Companies need to understand their “whales” and how much they are worth comparatively. Companies
need a sense of the states a customer goes through with them and the impact on revenue when a customer
moves states. Customer profiling sets up companies for greatly improved targeted marketing and deeper
customer analytics.
This form of analytics starts by segmenting the customer base according to personal preferences, usage
Analytics are formed
from summaries of
information.
ANALYTICS AND INFORMATION ARCHITECTURE 4WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
behavior, customer state, characteristics, and economic value to the enterprise. Economic value typically
includes last quarter, last year-to-date, lifetime-to-date and projected lifetime values.
Profit is best in the long run to utilize in the calculations. However, spend (shown in the bullets below) will
work too. More simplistic calculations that are simply “uses” of the company’s product will provide far
less reliable results.
The key attributes to use should have financial linkage that maps directly to return on investment (ROI) of
the company. Where possible, analyze customer usage history by customer for the following econometric
attributes at a minimum:
 Lifetime spend and percentile rank to date. This is a high priority item.
 Last year-to-date spend and percentile rank.
 Last year spend and percentile rank. This is a high priority
item.
 Last quarter spend and percentile rank.
 Annual spend pattern by market season and percentile rank.
 Frequency of purchase patterns across product categories.
 Using commercial demographics (RL Polk, MediaMark or equivalent), match the customers to
characteristic demographics at the census block and block group levels.
 If applicable, social rank within the customer community.
 If applicable, social group(s) within the customer community.
These calculations provide the basis for customer lifetime value and assorted customer ranking. The next
step is to determine all of these attributes for projected future spend based on assigning customers to
lifetime spend based on (a) n-year performance linear regression or (b) n-year performance of their assigned
quartile if less than n years of history available.
Choose key characteristics of each customer quartile (determine last year spend quartile levels), determine
unique characteristics of each quartile (age, geo, initial usage) and match new customers to their quartile
and assign average projected spend of that quartile to new customers.
Defining the relevant and various levels of retention and value is an extension of customer profiling. These
are customer profiling variables like the ones above except they are addressing the need for more immediate
preventative action as opposed to predicting the volume of future profit.
Also, regardless of churn potential, the determination of the point at which customers tend to cross a
customer state in a negative direction is essential to analytics.
The determination of the
point at which customers
tend to cross a customer
state in a negative direction
is essential to analytics.
ANALYTICS AND INFORMATION ARCHITECTURE 5WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Customer profiling and customer state modeling should combine to determine the who and when of
customer interaction. Actions could be a personal note, free minutes, free ad-free and free community
points.
Also in markets where customers are likely to utilize multiple providers for the services a company
provides, the company should know the aspirant level of each customer by determining the 90th
percentile
of usage for the customers who share key characteristics of the customer (age band, geo, demographics,
initial usage). This “gap” is an additional analytic attribute and should be utilized in customer actions.
This is simply a start on analytics, and I’ve focused only on the customer dimension, but hopefully it is
evident that many factors make true analytics:
 Analytics are formed from summaries of information
 Inclusion of complete, and often large, customer bases
 Continual re-calculation of the metrics
 Continual re-evaluation of the calculation methods
 Continual re-evaluation of the resulting business actions, including automated actions
 Adding big data to the mix extends the list of attributes and usability of analytics by a good margin
Big data - and the combination of big data and relational data – greatly increases the effectiveness of
analytics. Using analytics is an effective business strategy that must be supported with high quality, cross-
platform-border data. I’ll now talk about the platforms in use and their potential for analytics.
Contending Platforms for the Analytics Workload
There are numerous data vessels that lay claim to a slice of data and/or processing today as well. There is
no “one size fits all” as organizations pursue information strategies that give the data the best chance for
success, with performance for the anticipated workload being an overriding factor in platform selection.
These contending platforms include the enterprise data warehouse, multidimensional databases, the NoSQL
family, Columnar Databases, Stream Processing and Master Data Management. Let’s look at these and
their appropriate workloads.
The Enterprise Data Warehouse
Enterprise data warehouses (EDWs) are based on the relational theory, which supports the table as the basic
structure. As the ubiquitous collection point for all operational data interesting in a post-operational world,
it has served reports, dashboards, performance indicators, basic analytics, ad-hoc access and more.
Extended with solid-state components as well as automated archival abilities, the data warehouse will
remain a very important component of an information architecture. It is also where historical information
ANALYTICS AND INFORMATION ARCHITECTURE 6WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
will be saved.
Multidimensional Databases
Multidimensional databases (MDBs) or cubes are denormalized and compacted selective data. Often
containing summarized data, MDBs support “slice and dice” of the select data within the cube with great
speed. The building of the MDBs, from both size and speed standpoints, becomes the bottleneck to their
widespread use. They are also clearly best for financial applications, which remains a priority use of this
approach.
The NoSQL Family
NoSQL includes Hadoop, Cassandra, MongoDB, Riak and many others – over 100 in all – that do not
strictly conform to use of the SQL language against its data. This is largely because the data is not in a
relational database. The solutions, largely open sources, can be further broken into OLTP-mimicking key-
value and column stores, relationship-based graph stores and analytic Hadoop stores. These are scale-out,
schema-less solutions on commodity hardware that do not provide full ACID compliance.
As previously mentioned, it does not follow that analytics on the data collected in NoSQL will be done in
the NoSQL environment. These NoSQL stores are excellent at screening, sorting and loading data (that
ETL and ACID would crush.) Any enterprise analytics solution would need to allow for the cost and
performance advantages of NoSQL for loading big data.
Columnar Databases
Columnar databases physically isolate the values of each column in a relational table. This improves the I/
O bottleneck by bringing only the useful columns into query processing. It also greatly facilitates a
compression strategy due to repeating values and the ability to apply compression to the much more finite
set of values found in a single column, as opposed to having to consider entire rows.
There are also many databases with a hybrid row and column implementation. A columnar orientation has
proven to be a requirement for the analytic workload, which tends to require a small subset of all data in
the tables implicated in a query.
Stream Processing
Circumventing the need to store then process information, stream processing observes data feeds and
executes real-time business processes prior to optional data storage. Stream processing is a great way to
execute immediate business processes in connection with a business condition evidenced by most recent
data across the enterprise. It also is an approach that can benefit tremendously from analytics brought into
the decisions.
ANALYTICS AND INFORMATION ARCHITECTURE 7WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Master Data Management
Master Data Management (MDM) solutions pull together master sets of information for widespread use,
once done in the data warehouse, and match that with a real-time distribution capability for the data. The
master data might be sourced from other systems in real-time or it might be supported with workflow
components. Data quality is an essential element for the data put into master data.
Master Data Management, due to its leveragability, stands tremendously to benefit from analytics as the
attributes stored about its entities can extend beyond the basic ones and into analytics. These analytic
values can support stream processing as well as reporting out of NoSQL stores.
The platform to support modern analytics must cost-effectively work for the multi- to hundreds of
terabytes data set. And it must be able to utilize big data in NoSQL sources like Hadoop. Analytics are
severely disadvantaged if restricted to one set of data or the other. Some redundancy is still a part of an
effective information strategy. Federated queries can handle edge and unanticipated workloads that require
cross-platform data.
Some solutions support high data scale as well as the built-in ability to incorporate data from NoSQL
stores like Hadoop into the analytic processing. This avoids redundancy and movement and provides
access to a full data set. And they do it keeping the relational model intact and with extended performance
and scale-out architectures. These systems strongly contend for the analytic workload.
ParAccel Analytic Database
ParAccel Analytic Database (ParAccel) is one such system. We
would need to call the platform police on ParAccel as it has
elements of many of the above platform categories in one
platform.
ParAccel is a columnar database. It has extensive compression
routines such as delta, run length, LZ and null trim. The
customer can choose the utilization of the routines or allow ParAccel to do it automatically. Being
columnar with extensive compression, which pack the data down on disk, strongly minimize the I/O
bottleneck found in many of the contenders for the analytic workload.
ParAccel architecture is shared-nothing massively-parallel, the scalable architecture for the vast majority of
the world’s largest databases.
ParAccel also supports rich transformation – the “T” in ETL. We often need to massage the data coming
into the analytics system. NoSQL systems focus on the extract, load and basic screening capabilities of
data integration only. ParAccel has workload management that allow shorter queries to execute quickly and
it has concurrency control. These are some of many aspects of being relational and having unique
properties that gives ParAccel advantages over NoSQL stores for analytics.
ParAccel architecture is shared
-nothing massively-parallel, the
scalable architecture for the
vast majority of the world’s
largest databases.
ANALYTICS AND INFORMATION ARCHITECTURE 8WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
Another aspect of ParAccel over NoSQL is that ParAccel allows for full SQL. It also allows for third-party
library functions and user defined functions. Together, these abilities allow a ParAccel user to do their
analytics “in database”, utilizing and growing the leveragable power of the database engine and keeping the
analysis close to the data. These functions include Monte Carlo, Univariate, Regression (multiple), Time
Series and many more. It is most of the functionality of dedicated data mining software.
Perhaps the feature that makes it work best for analytics is its unique accommodation of Hadoop. Without
the need to replicate Hadoop’s enormous data, ParAccel treats Hadoop’s data like its own. With a special
connector, ParAccel is able to see and utilize Hadoop data directly. The queries it executes in Hadoop
utilize fully parallelized MapReduce. This supports the
information architecture, suggested below, of utilizing Hadoop for
big data, ParAccel for analytics and the data warehouse for
operational support. It leverages Hadoop fully without
performance overhead. Connectors to Teradata and ODBC also
make it possible to see and utilize other data interesting to where
the analytics will be performed.
ParAccel offers “parallel pipelining” which fully utilizes the spool space without pausing when a step in the
processing is complete. ParAccel is compiled architecture on scale-out commodity hardware. With in-
memory and cloud options, a growing blue-chip customer base, but most importantly, a rich feature base for
analytics and integration with Hadoop, ParAccel is built to contend for the analytic workload.
Information Architecture
Information architecture has been getting more complicated with companies adopting a unique system for
each workload. With ParAccel, it is beginning to simplify, at least when it comes to where analytics are
calculated. While analytics will permeate the modern competitive enterprise, enterprises need a robust
platform for calculating analytics.
Enterprise data warehouses will support operations and light analytics as well as remember history data.
The EDW remains of vital importance to every enterprise.
Big data systems like Hadoop must enter many environments to cost-effectively pick up the abundant
sensor, social, webclick and otherwise FULL data of an enterprise. However, severely lacking tooling,
transformation, schema, interactivity, ACID, concurrency, workload management and other relational
benefits also limits its ability to be the analytics platform. In the information architecture, ParAccel will
call MapReduce jobs to fetch Hadoop data and return it to ParAccel, where analytics can be performed.
Multidimensional databases will continue to serve the financial departments and stream processing will
begin to bring instant decision making to operational streams of data, utilizing analytics in the process.
Although no one is mistaking master data management platforms for an analytics platform, MDM is
another important vessel in utilizing – in this case disseminating – analytics. Enterprises are growing in
their ability to utilize analytics in many ways and systems are supporting this strategy.
Perhaps the feature that
makes it work best for
analytics is its unique
accommodation of Hadoop.
ANALYTICS AND INFORMATION ARCHITECTURE 9WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
The architecture for analytics will be columnar. It will accommodate Hadoop’s data loading abilities and it
will provide robust analytic functionality and the ability to customize and extend that functionality. It will
not force a data warehouse to hold hundreds of terabytes or force Hadoop to hold less than that.
ANALYTICS AND INFORMATION ARCHITECTURE 10WILLIAM McKNIGHT
© McKnight Consulting Group, 2013
About the Author
William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex,
high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, big data,
master data management, business intelligence, data quality and operational business intelligence. Many of
his clients have gone public with their success stories. William is a Southwest Entrepreneur of the Year
Finalist, a frequent best practices judge, has authored hundreds of articles and white papers and given
hundreds of international keynotes and public seminars. His team’s implementations from both IT and
consultant positions have won Best Practices awards. William is a former IT VP of a Fortune 50 company, a
former engineer of DB2 at IBM and holds an MBA.
William can be reached at 214-514-1444 or wmcknight@mcknightcg.com.
5960 W. Parker Rd., Suite 278-133
Plano, TX 75093
Tel (214) 514-1444

More Related Content

What's hot

Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIBM India Smarter Computing
 
Advanced analytics
Advanced analyticsAdvanced analytics
Advanced analyticsShankar R
 
Customer Retention - Analytics paving way
Customer Retention - Analytics paving wayCustomer Retention - Analytics paving way
Customer Retention - Analytics paving wayAnubhav Srivastava
 
Pi cube banking on predictive analytics151
Pi cube   banking on predictive analytics151Pi cube   banking on predictive analytics151
Pi cube banking on predictive analytics151Cole Capital
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operationsTableau Software
 
Business analytics
Business analyticsBusiness analytics
Business analyticsDinakar nk
 
Adtech Analytic
Adtech AnalyticAdtech Analytic
Adtech Analytictmse
 
A case for business analytics learning
A case for business analytics learningA case for business analytics learning
A case for business analytics learningMark Tabladillo
 
Customer analytics. Turn big data into big value
Customer analytics. Turn big data into big valueCustomer analytics. Turn big data into big value
Customer analytics. Turn big data into big valueJosep Arroyo
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part Ijayroy
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningHoang Nguyen
 
20131202 Value of supply chain analytics for the electronics industry - a st...
20131202  Value of supply chain analytics for the electronics industry - a st...20131202  Value of supply chain analytics for the electronics industry - a st...
20131202 Value of supply chain analytics for the electronics industry - a st...Thorsten Schroeer
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 
A treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingA treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingVijay Raj
 
Analytics Pays Back $10.66 for Every Dollar Spent
Analytics Pays Back $10.66 for Every Dollar SpentAnalytics Pays Back $10.66 for Every Dollar Spent
Analytics Pays Back $10.66 for Every Dollar SpentInternetEvolution
 

What's hot (20)

Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your Business
 
Advanced analytics
Advanced analyticsAdvanced analytics
Advanced analytics
 
Customer Retention - Analytics paving way
Customer Retention - Analytics paving wayCustomer Retention - Analytics paving way
Customer Retention - Analytics paving way
 
Pi cube banking on predictive analytics151
Pi cube   banking on predictive analytics151Pi cube   banking on predictive analytics151
Pi cube banking on predictive analytics151
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operations
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Adtech Analytic
Adtech AnalyticAdtech Analytic
Adtech Analytic
 
A case for business analytics learning
A case for business analytics learningA case for business analytics learning
A case for business analytics learning
 
Customer analytics. Turn big data into big value
Customer analytics. Turn big data into big valueCustomer analytics. Turn big data into big value
Customer analytics. Turn big data into big value
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
20131202 Value of supply chain analytics for the electronics industry - a st...
20131202  Value of supply chain analytics for the electronics industry - a st...20131202  Value of supply chain analytics for the electronics industry - a st...
20131202 Value of supply chain analytics for the electronics industry - a st...
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
A treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingA treatise on SAP CRM information reporting
A treatise on SAP CRM information reporting
 
Revolutionising Retail with Business Analytics
Revolutionising Retail with Business AnalyticsRevolutionising Retail with Business Analytics
Revolutionising Retail with Business Analytics
 
Analytics Pays Back $10.66 for Every Dollar Spent
Analytics Pays Back $10.66 for Every Dollar SpentAnalytics Pays Back $10.66 for Every Dollar Spent
Analytics Pays Back $10.66 for Every Dollar Spent
 
bi
bibi
bi
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
HR analytics
HR analyticsHR analytics
HR analytics
 
dabblr_report_final
dabblr_report_finaldabblr_report_final
dabblr_report_final
 

Similar to Analytics and Information Architecture

What's the ROI of Embedded Analytics?
What's the ROI of Embedded Analytics?What's the ROI of Embedded Analytics?
What's the ROI of Embedded Analytics?GoodData
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraFlytxt
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
What's So Great About Embedded Analytics?
What's So Great About Embedded Analytics?What's So Great About Embedded Analytics?
What's So Great About Embedded Analytics?GoodData
 
Big Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer BehaviourBig Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer BehaviourIRJET Journal
 
Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analyticsThe Marketing Distillery
 
How data visualization helps cpg industry
How data visualization helps cpg industryHow data visualization helps cpg industry
How data visualization helps cpg industryPolestarsolutions
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BICCG
 
Break through the Analytics Barrier
Break through the Analytics BarrierBreak through the Analytics Barrier
Break through the Analytics BarrierCognizant
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_OverviewKatia Mar
 
McKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMcKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMatt Ariker
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsAnametrix
 
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESDEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESIRJET Journal
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxSasikalaKumaravel2
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013Newton Day Uploads
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Tommy Toy
 

Similar to Analytics and Information Architecture (20)

What's the ROI of Embedded Analytics?
What's the ROI of Embedded Analytics?What's the ROI of Embedded Analytics?
What's the ROI of Embedded Analytics?
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
What's So Great About Embedded Analytics?
What's So Great About Embedded Analytics?What's So Great About Embedded Analytics?
What's So Great About Embedded Analytics?
 
Big Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer BehaviourBig Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer Behaviour
 
Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analytics
 
How data visualization helps cpg industry
How data visualization helps cpg industryHow data visualization helps cpg industry
How data visualization helps cpg industry
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
 
Break through the Analytics Barrier
Break through the Analytics BarrierBreak through the Analytics Barrier
Break through the Analytics Barrier
 
BA_CEC.pptx
BA_CEC.pptxBA_CEC.pptx
BA_CEC.pptx
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_Overview
 
McKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMcKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning culture
 
Why connectcust (1)
Why connectcust (1)Why connectcust (1)
Why connectcust (1)
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
 
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUESDEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
DEMOGRAPHIC DIVISION OF A MART BY APPLYING CLUSTERING TECHNIQUES
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docx
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Analytics and Information Architecture

  • 1. ANALYTICS AND INFORMATION ARCHITECTUREWILLIAM McKNIGHT © McKnight Consulting Group, 2013 Custom Research Report By William McKnight www.mcknightcg.com Analytics and Information Architecture William McKnight
  • 2. ANALYTICS AND INFORMATION ARCHITECTURE 2WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Are We Doing Analytics in the Data Warehouse? Companies have already begun to enter the “long tail” with the enterprise data warehouse. Its functions are becoming a steady but less interesting part of the information workload. While the data warehouse is still a dominant fixture, analytic workloads are finding their way to platforms in the marketplace more appropriate to the analytic workload. When information needs were primarily operational reporting, the data warehouse was the center of the known universe by a long shot. While entertaining the notion that we were doing analytics in the data warehouse, competitive pressures have trained the spotlight on what true analytics are all about. Many warehouses have proven to not be up to the task. And it’s analytics, not reporting, that is forming the basis of competition today. Rearview-mirror reporting can support operational needs and pay for a data warehouse by virtue of being “essential” in running the applications that feed it. However, the large payback from information undoubtedly comes in the form of analytics. If the analytics do not weigh down the data warehouse, big data volumes will. As companies advance their capabilities to utilize every piece of information, they are striving to get all information under management. This includes the “big data” of sensor, webclick, social, complete logs, etc. Many have limited their big data to subsets put into data warehouses and other relational structures. The NoSQL world, with much more limited functionality, provides cost advantages to those companies who can make the mindshift necessary for adoption. Table of Contents Are We Doing Analytics in the Data Warehouse? .........................2 What Distinguishes Analytics?.......................................................3 Contending Platforms for the Analytics Workload........................5 ParAccel Analytic Database............................................................7 Information Architecture...............................................................8 provided by: William McKnight www.mcknightcg.com And it’s analytics, not reporting, that is forming the basis of competition today
  • 3. ANALYTICS AND INFORMATION ARCHITECTURE 3WILLIAM McKNIGHT © McKnight Consulting Group, 2013 The schema-less NoSQL motto seems to be “Give me your tired, your poor, your huddled masses” when it comes to data. Dispensing with formalities of deep requirements, metadata and the like, these solutions collect massive amounts of high-velocity data. Some provide operational support for their customer’s internet experience. But for those with analytical aspirations with the data, the users - often the developers or very close to them - hope to turn this data into the information that drives the analytics. While these platforms have proven adept at loading and storing the information, doing modern analytics there is challenging. Furthermore, analytics require cross-enterprise information and moving information out of the NoSQL stores would be as problematic as trying to load it there. Data tends to stay where it lands in the information architecture. The enterprise data warehouse must still exist and must still be advanced (or reengineered as the case may be) with tremendous care. The data relationships must match the business relationships and the data must have sufficient quality. It must scale to the level it needs to. It’s not necessarily easier to do it than five years ago. However, other than some minor innovations, the goals and the platforms remain the same. So, if the data warehouse is not the end of the story for analytics and NoSQL solutions have limited information and capabilities, where should a company actually “do” their analytics? This is perhaps the most legitimate question in information management today. This paper will provide input to the decision. But first, what are analytics? What Distinguishes Analytics? Many approach analytics as a set of categories of value propositions to the company. However, from a data -use perspective, the definition of analytics is in how they are formed. They are formed from more complex uses of information than reporting. Analytics are formed from summaries of information. Addressing the propensity of a customer to make a purchase, for example, requires an in-depth look at her spending profile - perhaps by time slice, geography and other dimensions. It requires a look at those with similar demographics and how they responded. It requires a look at ad effectiveness. And it may require a recursive look at all of these and more. Analytics should also be tied to business action. A business should have actions to take as a result of analytics - for example, customer-touch or customer-reach programs. There are numerous categories that fit this perspective of analytics. Customer profiling, even for B2B customers, is an essential starting point for analytics. Companies need to understand their “whales” and how much they are worth comparatively. Companies need a sense of the states a customer goes through with them and the impact on revenue when a customer moves states. Customer profiling sets up companies for greatly improved targeted marketing and deeper customer analytics. This form of analytics starts by segmenting the customer base according to personal preferences, usage Analytics are formed from summaries of information.
  • 4. ANALYTICS AND INFORMATION ARCHITECTURE 4WILLIAM McKNIGHT © McKnight Consulting Group, 2013 behavior, customer state, characteristics, and economic value to the enterprise. Economic value typically includes last quarter, last year-to-date, lifetime-to-date and projected lifetime values. Profit is best in the long run to utilize in the calculations. However, spend (shown in the bullets below) will work too. More simplistic calculations that are simply “uses” of the company’s product will provide far less reliable results. The key attributes to use should have financial linkage that maps directly to return on investment (ROI) of the company. Where possible, analyze customer usage history by customer for the following econometric attributes at a minimum:  Lifetime spend and percentile rank to date. This is a high priority item.  Last year-to-date spend and percentile rank.  Last year spend and percentile rank. This is a high priority item.  Last quarter spend and percentile rank.  Annual spend pattern by market season and percentile rank.  Frequency of purchase patterns across product categories.  Using commercial demographics (RL Polk, MediaMark or equivalent), match the customers to characteristic demographics at the census block and block group levels.  If applicable, social rank within the customer community.  If applicable, social group(s) within the customer community. These calculations provide the basis for customer lifetime value and assorted customer ranking. The next step is to determine all of these attributes for projected future spend based on assigning customers to lifetime spend based on (a) n-year performance linear regression or (b) n-year performance of their assigned quartile if less than n years of history available. Choose key characteristics of each customer quartile (determine last year spend quartile levels), determine unique characteristics of each quartile (age, geo, initial usage) and match new customers to their quartile and assign average projected spend of that quartile to new customers. Defining the relevant and various levels of retention and value is an extension of customer profiling. These are customer profiling variables like the ones above except they are addressing the need for more immediate preventative action as opposed to predicting the volume of future profit. Also, regardless of churn potential, the determination of the point at which customers tend to cross a customer state in a negative direction is essential to analytics. The determination of the point at which customers tend to cross a customer state in a negative direction is essential to analytics.
  • 5. ANALYTICS AND INFORMATION ARCHITECTURE 5WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Customer profiling and customer state modeling should combine to determine the who and when of customer interaction. Actions could be a personal note, free minutes, free ad-free and free community points. Also in markets where customers are likely to utilize multiple providers for the services a company provides, the company should know the aspirant level of each customer by determining the 90th percentile of usage for the customers who share key characteristics of the customer (age band, geo, demographics, initial usage). This “gap” is an additional analytic attribute and should be utilized in customer actions. This is simply a start on analytics, and I’ve focused only on the customer dimension, but hopefully it is evident that many factors make true analytics:  Analytics are formed from summaries of information  Inclusion of complete, and often large, customer bases  Continual re-calculation of the metrics  Continual re-evaluation of the calculation methods  Continual re-evaluation of the resulting business actions, including automated actions  Adding big data to the mix extends the list of attributes and usability of analytics by a good margin Big data - and the combination of big data and relational data – greatly increases the effectiveness of analytics. Using analytics is an effective business strategy that must be supported with high quality, cross- platform-border data. I’ll now talk about the platforms in use and their potential for analytics. Contending Platforms for the Analytics Workload There are numerous data vessels that lay claim to a slice of data and/or processing today as well. There is no “one size fits all” as organizations pursue information strategies that give the data the best chance for success, with performance for the anticipated workload being an overriding factor in platform selection. These contending platforms include the enterprise data warehouse, multidimensional databases, the NoSQL family, Columnar Databases, Stream Processing and Master Data Management. Let’s look at these and their appropriate workloads. The Enterprise Data Warehouse Enterprise data warehouses (EDWs) are based on the relational theory, which supports the table as the basic structure. As the ubiquitous collection point for all operational data interesting in a post-operational world, it has served reports, dashboards, performance indicators, basic analytics, ad-hoc access and more. Extended with solid-state components as well as automated archival abilities, the data warehouse will remain a very important component of an information architecture. It is also where historical information
  • 6. ANALYTICS AND INFORMATION ARCHITECTURE 6WILLIAM McKNIGHT © McKnight Consulting Group, 2013 will be saved. Multidimensional Databases Multidimensional databases (MDBs) or cubes are denormalized and compacted selective data. Often containing summarized data, MDBs support “slice and dice” of the select data within the cube with great speed. The building of the MDBs, from both size and speed standpoints, becomes the bottleneck to their widespread use. They are also clearly best for financial applications, which remains a priority use of this approach. The NoSQL Family NoSQL includes Hadoop, Cassandra, MongoDB, Riak and many others – over 100 in all – that do not strictly conform to use of the SQL language against its data. This is largely because the data is not in a relational database. The solutions, largely open sources, can be further broken into OLTP-mimicking key- value and column stores, relationship-based graph stores and analytic Hadoop stores. These are scale-out, schema-less solutions on commodity hardware that do not provide full ACID compliance. As previously mentioned, it does not follow that analytics on the data collected in NoSQL will be done in the NoSQL environment. These NoSQL stores are excellent at screening, sorting and loading data (that ETL and ACID would crush.) Any enterprise analytics solution would need to allow for the cost and performance advantages of NoSQL for loading big data. Columnar Databases Columnar databases physically isolate the values of each column in a relational table. This improves the I/ O bottleneck by bringing only the useful columns into query processing. It also greatly facilitates a compression strategy due to repeating values and the ability to apply compression to the much more finite set of values found in a single column, as opposed to having to consider entire rows. There are also many databases with a hybrid row and column implementation. A columnar orientation has proven to be a requirement for the analytic workload, which tends to require a small subset of all data in the tables implicated in a query. Stream Processing Circumventing the need to store then process information, stream processing observes data feeds and executes real-time business processes prior to optional data storage. Stream processing is a great way to execute immediate business processes in connection with a business condition evidenced by most recent data across the enterprise. It also is an approach that can benefit tremendously from analytics brought into the decisions.
  • 7. ANALYTICS AND INFORMATION ARCHITECTURE 7WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Master Data Management Master Data Management (MDM) solutions pull together master sets of information for widespread use, once done in the data warehouse, and match that with a real-time distribution capability for the data. The master data might be sourced from other systems in real-time or it might be supported with workflow components. Data quality is an essential element for the data put into master data. Master Data Management, due to its leveragability, stands tremendously to benefit from analytics as the attributes stored about its entities can extend beyond the basic ones and into analytics. These analytic values can support stream processing as well as reporting out of NoSQL stores. The platform to support modern analytics must cost-effectively work for the multi- to hundreds of terabytes data set. And it must be able to utilize big data in NoSQL sources like Hadoop. Analytics are severely disadvantaged if restricted to one set of data or the other. Some redundancy is still a part of an effective information strategy. Federated queries can handle edge and unanticipated workloads that require cross-platform data. Some solutions support high data scale as well as the built-in ability to incorporate data from NoSQL stores like Hadoop into the analytic processing. This avoids redundancy and movement and provides access to a full data set. And they do it keeping the relational model intact and with extended performance and scale-out architectures. These systems strongly contend for the analytic workload. ParAccel Analytic Database ParAccel Analytic Database (ParAccel) is one such system. We would need to call the platform police on ParAccel as it has elements of many of the above platform categories in one platform. ParAccel is a columnar database. It has extensive compression routines such as delta, run length, LZ and null trim. The customer can choose the utilization of the routines or allow ParAccel to do it automatically. Being columnar with extensive compression, which pack the data down on disk, strongly minimize the I/O bottleneck found in many of the contenders for the analytic workload. ParAccel architecture is shared-nothing massively-parallel, the scalable architecture for the vast majority of the world’s largest databases. ParAccel also supports rich transformation – the “T” in ETL. We often need to massage the data coming into the analytics system. NoSQL systems focus on the extract, load and basic screening capabilities of data integration only. ParAccel has workload management that allow shorter queries to execute quickly and it has concurrency control. These are some of many aspects of being relational and having unique properties that gives ParAccel advantages over NoSQL stores for analytics. ParAccel architecture is shared -nothing massively-parallel, the scalable architecture for the vast majority of the world’s largest databases.
  • 8. ANALYTICS AND INFORMATION ARCHITECTURE 8WILLIAM McKNIGHT © McKnight Consulting Group, 2013 Another aspect of ParAccel over NoSQL is that ParAccel allows for full SQL. It also allows for third-party library functions and user defined functions. Together, these abilities allow a ParAccel user to do their analytics “in database”, utilizing and growing the leveragable power of the database engine and keeping the analysis close to the data. These functions include Monte Carlo, Univariate, Regression (multiple), Time Series and many more. It is most of the functionality of dedicated data mining software. Perhaps the feature that makes it work best for analytics is its unique accommodation of Hadoop. Without the need to replicate Hadoop’s enormous data, ParAccel treats Hadoop’s data like its own. With a special connector, ParAccel is able to see and utilize Hadoop data directly. The queries it executes in Hadoop utilize fully parallelized MapReduce. This supports the information architecture, suggested below, of utilizing Hadoop for big data, ParAccel for analytics and the data warehouse for operational support. It leverages Hadoop fully without performance overhead. Connectors to Teradata and ODBC also make it possible to see and utilize other data interesting to where the analytics will be performed. ParAccel offers “parallel pipelining” which fully utilizes the spool space without pausing when a step in the processing is complete. ParAccel is compiled architecture on scale-out commodity hardware. With in- memory and cloud options, a growing blue-chip customer base, but most importantly, a rich feature base for analytics and integration with Hadoop, ParAccel is built to contend for the analytic workload. Information Architecture Information architecture has been getting more complicated with companies adopting a unique system for each workload. With ParAccel, it is beginning to simplify, at least when it comes to where analytics are calculated. While analytics will permeate the modern competitive enterprise, enterprises need a robust platform for calculating analytics. Enterprise data warehouses will support operations and light analytics as well as remember history data. The EDW remains of vital importance to every enterprise. Big data systems like Hadoop must enter many environments to cost-effectively pick up the abundant sensor, social, webclick and otherwise FULL data of an enterprise. However, severely lacking tooling, transformation, schema, interactivity, ACID, concurrency, workload management and other relational benefits also limits its ability to be the analytics platform. In the information architecture, ParAccel will call MapReduce jobs to fetch Hadoop data and return it to ParAccel, where analytics can be performed. Multidimensional databases will continue to serve the financial departments and stream processing will begin to bring instant decision making to operational streams of data, utilizing analytics in the process. Although no one is mistaking master data management platforms for an analytics platform, MDM is another important vessel in utilizing – in this case disseminating – analytics. Enterprises are growing in their ability to utilize analytics in many ways and systems are supporting this strategy. Perhaps the feature that makes it work best for analytics is its unique accommodation of Hadoop.
  • 9. ANALYTICS AND INFORMATION ARCHITECTURE 9WILLIAM McKNIGHT © McKnight Consulting Group, 2013 The architecture for analytics will be columnar. It will accommodate Hadoop’s data loading abilities and it will provide robust analytic functionality and the ability to customize and extend that functionality. It will not force a data warehouse to hold hundreds of terabytes or force Hadoop to hold less than that.
  • 10. ANALYTICS AND INFORMATION ARCHITECTURE 10WILLIAM McKNIGHT © McKnight Consulting Group, 2013 About the Author William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, big data, master data management, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success stories. William is a Southwest Entrepreneur of the Year Finalist, a frequent best practices judge, has authored hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His team’s implementations from both IT and consultant positions have won Best Practices awards. William is a former IT VP of a Fortune 50 company, a former engineer of DB2 at IBM and holds an MBA. William can be reached at 214-514-1444 or wmcknight@mcknightcg.com. 5960 W. Parker Rd., Suite 278-133 Plano, TX 75093 Tel (214) 514-1444