SlideShare a Scribd company logo
Analytics Tools
1
Chapter Synopsis
Wars are won by Armies and Strategies but fought with Weapons
--Anonymous
This chapter focuses on the tools available in the market to carry out different types of analytics. In the
beginning we give you a quick look on the typical data flow in an organization, from the time a customer
interacts with the business system and generates activity data, through the various stages of data
preparation and how it finally lands with a Business User as Insights/Recommendations. This is followed
by a quick breakup of type of analytics done at various life stages of the data, e.g., frontend analytics to
Upsell solutions to Customers. Then we give you a quick overview of the various factors that shape the
decision on which analytics tool to deploy and then give you brief summary of top tools for each type of
analytical needs. Finally we wrap up this chapter with mention of other top tools available in the market,
which you might want to explore for your needs.
Structure of the Chapter:
As mentioned above the chapter is broken down into four major categories,
 Quick introduction into the typical data flow in an organization
 Type of Analytics and the top toolkits under each
 Factors to decide toolkits for each type of analytics
 Brief overview of Top Tools
 Detailed description of Top Tools
 Other worthy mentions
 Quick introduction into the typical data flow in an organization
Figure 1 illustrates at high level the typical data flow in an organization. As shown in the figure, the first
Presentation Tier is where a Customer interacts with the Business and generates data. The data could
be of various types – Customer data, Transactional data, Web/Mobile Activity data, etc. To illustrate
better let’s take an example- a Customer John walks into a bank to open a Checking account. He
provides all details required by the bank to open the account. When the executive enters his
information into the system, the warehouse takes in the data (and creates a row) and assigns the John
an Identification Number. When John walks to the Teller and deposits money into his Account, the
corresponding field in the warehouse is updated. Later when John logs onto his online account to check
balances, it generates web Activity and his row is updated. If John transfers to someone transactional
information is checked against his balance and then updated.
Now let’s move on to what happens in the backend- the data is stored in the warehouse and whenever
John interacts with the system, the front end system interacts with the warehouse through an
intermediate Logic Tier and serves John. Logic Tier stores all the logic required to perform the business
operations – commands, mathematical calculations, analytical decision making structure, etc. It’s
responsible for moving the data between the front and the back end and ensuring that all John’s
requests are served correctly.
Analytics Tools
2
Analytics Tools
3
The Data Tier is the layer where the data operations happen. Logic Tier directly works with the front end
tables which store data for serving business queries. e.g.,
 Customer John’s Snapshot data – current account balance, risk profile, statement summary etc.
 Location profile data -nearest ATMs, Branches, Merchants offering discounts, etc.
 Recommendations - use his credit card for discount on a weekend movie
 Up sell or Cross Sell – apply for a Mortgage
Other front end tables record the transactions/activities, e.g.,
 John used his ATM for $500 withdrawal
 statement printout
 logged on to site/app and reached Customer service.
Many a times for running business effectively, Businesses need to have a complete view of the
Customers, for which they source 3rd
party data, e.g., Credit Bureau, Nielsen’s Ratings, Macroeconomic
data, etc.
Given that most of the data generated by the front end and/or received by 3rd
party systems are
unstructured/unorganized they need to be processed, cleaned and combined logically for eventual
storage and usage in analysis or serving business request. These operations are called ETL (Extraction,
Transformation and Loading) are done on regular intervals depending on Business requirements.
Post ETL, the structured data flows into various tables in the Enterprise Data Warehouse(EDW). EDW
might have specific tables for specific type of information, e.g.,
 Customer table – with demographics, snapshot of activity, risk & marketing profile.
 Transaction tables – containing transactional information like Amount, Number of transactions,
Type of Product purchased, etc.
Business Users (Product Mangers, Marketers, Sales Professionals, etc.) rely on some standard metrics
for running their day to day operations. They need to see it daily or at regular intervals to understand
what’s going on in their business and if it needs more attention. Given the repetitive nature and
standardization of these requirements, it makes sense to create a structure where this information is
captured in required format and constantly refreshed and available on multiple channels (Email, Cloud
or App) – this is called “Reporting”. To run it again-and-again on the granular tables discussed above will
be inefficient & slow, so Business Intelligence professionals typically pre-aggregate the data in a
standard structure to serve the various reporting requests. This is called “OLAP(On-line Analytical
Processing) ” roll-ups or cubes . The reports are then built off of these cubes and so are efficient/quick.
Analysts typically are interested in finding out what happened, why, where and when, how good or bad
it is, etc. and they do this by looking at various metrics and KPIs of the business. They might leverage the
reports or cubes or might hit the database directly for getting answers to their questions. Their analysis
might consist of charting, tabulations, simple/advanced math or statistical techniques. We will look at
the various types of analytical techniques in detail in the following sections.
Analytics Tools
4
 Type of Analytics and the top toolkits under each
Table 1 summarizes four broad types of Analytics, why they are done and the top tools used when
carrying out that type of Analytics.
A. Data Collection, ETL & Storage:
Whenever a customer interacts with the business system, data is generated which has to be captured
efficiently & accurately and stored in the system from a customer service point of view, business
operations view and regulatory requirements. Given the ever dynamic nature of businesses today, data
collection, storage & retrieval technologies have proliferated each with their own merits and limitations.
Many of them are best for specific set of needs but might not be that useful in other sets of
circumstances. Data Storage has really matured from early days when they were simply stored as a
dump of information, which then gave way to relational data structure (RDBMS), which was followed by
Analytics Tools
5
parallel processing and now back to amalgamation of all these broad technologies. Given the varied
requirements, fast & accurate delivery of structured business requirements, efficiency of scale at the
back end to handling swathes of unstructured data from social media/videos/surveys; no one tool can
help run the business end to end.
Going into detail of these technologies will require a dedicated book by itself, but let’s attempt to
summarize details at a very top level,
 Front End Tables (OLTP), e.g. Oracle, DB2
Front End Tables or OLTP (Online Transaction Processing) tables are best to run the client-facing
businesses. Their biggest strength is speed, accuracy and lesser failure rates.
 Large scale Historical Storage, e.g., Teradata, SQL Server
These systems are the repository of all the data generated and store the information from the front end
& other internal (Clickstream, Survey systems, Testing Infrastructure) and 3rd
party sources. Data from
the various sources undergo ETL (Extraction, Transformation & Loading) processing, combined in logical
sequences and fed to these systems. These tools are characterized by efficient processing and retrieval
of huge data sizes (typically massive parallel processing). They also need to be easily integrated with
reporting/analytics platforms.
 Unstructured Data, e.g., Hadoop
Over time visionaries realized the need for systems which can capture non-traditional data (videos,
comments) that is going to be generated in large quantities unforeseen in their times. They started
developing technologies that capture such data without putting any restrictions on the structure of the
data but having the flexibility to define the structure at the time of retrieval (reporting/analysis). This
strength is also its Achilles heel, no structure means slow retrieval, but with Web 2.0 the time of such
technologies has truly arrived and the rapid development of reporting/analytical tools based on these
platforms or at least a connectivity tool with existing tools points to a promising & mainstream future of
big data.
B. Reporting:
Reporting tools are primarily a visualization (tables, charts, maps, etc.) tool and are specifically used by
Business users/Executives to make sense of the data, monitor & understand dynamics (using KPIs) in
their portfolio on-the-fly. Analysts too leverage the reports for similar purposes; however they are more
interested in the data available in the reports to understand the drives the movements in KPIs. Analysts
also leverage reporting tools to understand the enterprise-wide standard KPI creating logic which they
can use for their analysis.
Reporting tools are usually judged on the “30-60 rule”. The “30-60” rule says that the broad story should
be conveyed in first 30 secs of viewing and should provide capability to do one-level drill-down to get a
directional sense of the story.
Analytics Tools
6
Reporting tools might need to deal with various kinds of data,
 Instrumentation Data: record of activity, on live business site, captured via instrumentations
 Call Log Data: dump of server calls from the live business site and what was delivered
 Transactional Data
 Active Customer Data
 Customer Feedback Data (social discussions, Survey data, etc.)
Some reporting tools also need to incorporate budget, forecasts, competitors & benchmark for Users to
best understand where they are.
Given the importance of Reporting in running a business and regulatory compliance and the like, many
enterprises create dedicated “Reporting” product for specific needs/industries/domains, e.g., SAS CRMS
which is SAS Basel II compliance module.
Factors for deciding Tools for Reporting
Specific needs from a Reporting tool: Wide variety of visualization; availability to access reports from a
wide variety of channels – emails, texts, alerts, website, Apps; speed of report refresh; and ability to
consolidate data from a wide variety of data sources (and now Big Data too).
Below is the list of factors that should be considered to zero-in on a tool, in the order of priority.
1. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC,
Mails, Text, Alerts, Tweets, Social Shares, etc.
2. Integration with other tools: How easily/seamlessly can it connect to various other
tools/systems both for output delivery or connecting to multiple data sources through ODBC or
other data pipes (Hadoop connectivity)?
3. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if
Hadoop Connectivity for Big Data analysis?
4. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase
size or complexity, flexibility in data modeling, scalability issues?
5. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained
resources, Training materials & Training cost?
6. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs,
Scalability Costs, and cost of resources
7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team
needed or Self-Serve, Support availability.
8. Editorial & Tagging Capabilities: Enabling users to check backend logic for debugging/single
source of truth.
9. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics
(slicing/dicing enabled) visible across all channels
10. Types of Aggregations possible: OLAP Cubes, Simple/Advanced Math, Statistical techniques,etc.
A plethora of tool are available in the market for Reporting, hence the need for a structured decision
making process like above, so that you end up with the tool satisfying most of your needs.
Analytics Tools
7
Table 2 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above.
Overview
Adobe Marketing Cloud (AMC), the erstwhile Omniture Web Reporting/Analytics suite, is the
leader in Web Analytics (Analysis of Clickstream data). Mobile reporting/analytics capabilities
are being ramped up.
ADOBE MARKETING CLOUD (OMNITURE SITECATALYST & AD HOC ANALYSIS)
Analytics Tools
8
AMC “instruments” actions on Web Pages, buttons, callouts in emails, etc. which it then tracks
in its warehouse on Cloud and provides front end (SiteCatalyst for Reporting & Ad Hoc Analysis
3.2 for Slicing-and-Dicing Analytics).
AMC provides real-time data for a select subset of ~100+ metrics and is slowly ramping up
capabilities to make all reporting real-time.
Adobe provides multiple solutions for e-businesses to track UX of website visitors, tracking
online campaigns effectiveness, Social Media Activity, SEO, SEM and Reporting on Product
performance.
Output Delivery System
SiteCatalyst & Ad Hoc Analysis (erstwhile Adobe Discover) are cloud solutions which can also be
accessed on Mobile via Apps.
Integration with Other Tools
Limited Data import(excel, csv, txt) functionality. Report exported in excel/pdf.
AMC does provide data dump via FTP, which can then be utilized for additional analysis.
Type of Data it can handle
It typically works with Clickstream Data instrumented on Websites, Apps or Emails.
Recent efforts to expand into Mobile Web/Apps.
Data/User Limitations (if any)
Data/user limitations dependent on service contract. However speed performance remains
pretty stable with increasing size/users.
However FTP speed varies on many factors.
Ease of Learning
Both SiteCatalyst and Ad Hoc Analysis are GUI based.
SiteCatalyst and Ad Hoc Analysis require <=1 month of training on Business Analytics &
Reporting.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Cost
Cloud License: CPM $0.01 to $1. Per month or Annual?
To check if Ad Hoc Analysis inclusive cost?
Operational Efficiency
6-12 months initial implementation. A significant effort should go into planning, esp on what
metrics to implement, where and the naming conventions, since cost of errors significantly
Analytics Tools
9
higher. Given the amount of required effort in implementation (Omniture expert+Dev+QA), if
something goes wrong, it typically takes long & is costly to make changes.
AMC requires dedicated trained professionals to manage the system.
Editorial & Tagging Capabilities
Editorial & Tagging Capabilities within SiteCatalyst/Ad Hoc Analysis is not sufficient. Most
professionals maintain documentation outside of the system (MS-OFFICE etc.)
Visualization Options
SiteCatalyst and AMC provide standard visualization options – Tables, Charts, Click Maps,
Funnels, etc.
Types of Aggregations possible
Profiling, not many advanced math functionalities.
Ideal for what type of users: Business Users (Product Managers, Marketers), Developers and Analysts.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling).
Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since AMC is
very costly. For start-ups/organizations on a budget, Google Analytics is a cost-effective option.
Overview
Microstrategy is a leading reporting solution and has seen widespread acceptance among Large
Enterprise Users.
Microstrategy integrates with the warehouse and/or other secondary sources (typically after
ETL).
Microstrategy has recently expanded its Big Data connectivity and Advanced Analytics
capabilities.
Output Delivery System
Microstrategy offers both on-premise and Cloud delivery solutions which can also be accessed
on Mobile via Apps.
Integration with Other Tools
Microstrategy has among the widest range of integrations possible from Warehouses to Hadoop
to ODBC to XML export/import.
Microstrategy cubes reside on the warehouse and so can be leveraged by other systems directly
from there too.
Type of Data it can handle
MICROSTRATEGY
Analytics Tools
10
Works with structured data. Hadoop plug-in available.
Data/User Limitations (if any)
Depends on Service contract if user pricing. If requirements are significant, Customers buy an
On-promise dedicated Microstrategy.
Ease of Learning
Reports/drilldown capabilities are GUI based. However coding in Microstrategy scripting
language/SQL is required for report creation.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Cost
Report User Pricing: $500-1K per Report receiver. Per month or Annual?
Dedicated Server Pricing: >=$25K. Per month or Annual?
Operational Efficiency
6-12 months initial implementation, since Microstrategy experts (Programmers, Architects)
required for setting up of reporting framework.
Dedicated team required to manage Microstrategy reporting framework.
Editorial & Tagging Capabilities
Editorial & Tagging Capabilities within Microstrategy is pretty intuitive. Users can click on
“Report Details Page” and figure out the underlying logic behind the reports & metrics.
Microstrategy recommends both technical (SQL logic) and non-technical (plain english)
commentary.
Visualization Options
Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word
clouds which can be dynamically linked to the back end data.
Types of Aggregations possible
Profiling, simple & advanced math and statistical capabilities.
Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis
and Correlation Analysis. Even though Sizing & Estimation possible, it’s not very easy to execute.
Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since
Microstrategy is costly. For start-ups/organizations with a limited scale, other cost-effective reporting
options are available like warehouse packages, Tableau, Excel VBA reporting suite.
Analytics Tools
11
Overview
Tableau is fast gaining ground among the business and non-tech analytical users on account of
its powerful simplicity.
It’s takes data from the warehouse and/or other secondary sources (typically after ETL).
Data Import/Export, Analysis, Presentation (Tables/Graphs), Automated Reporting, Scenarios
can all be done intuitively, quickly, seamlessly and transitioned with ease. Tableau is
incorporating some statistical capabilities like simple predictive modeling in recent versions.
Output Delivery System
Tableau reports need to be created on a PC, but can be hosted on Cloud using Tableau server.
Hosted Reports retain OLAP structure of the tables in the backend to facilitate on-the-fly slicing
& dicing by the report consumers.
Tableau now is also on Cloud and the outputs can be accessed using Apps.
Integration with Other Tools
Tableau has among the widest range of integrations possible from Warehouses to Hadoop to
ODBC to XML exports/imports.
Type of Data it can handle
Works with structured data. Hadoop plug-in available.
Data/User Limitations (if any)
Depends on Hardware Configuration.
Ease of Learning
GUI based. Requires 1-2 weeks for being able to leverage most of the features of Tableau.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Cost
Individual PC Licenses cost between $1-2K. Annual Maintenance of $400.
Server Licenses cost $1K per report receiver. Annual Maintenance of $200.
Operational Efficiency
TABLEAU
Analytics Tools
12
Desktop framework takes minutes to install/use. Tableau server first installation needs some co-
ordination effort between in-house DBAs and Tableau Support team. Timelines depends on
complexity of the problem but rarely exceed a week.
Once initial set-up is completed, no major help needed for ongoing needs/changes.
Editorial & Tagging Capabilities
Tableau provides many options for editorials – Title, Summary, sheet description for the reports
and dashboard. Given the nature of report creation, types of Aggregation can be checked
visually. “Describe option” talks more about the exact operation being done for Metrics.
Visualization Options
Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word
clouds which can be dynamically linked to the back end data.
Types of Aggregations possible
Profiling, simple & advanced math and some simple statistical capabilities.
Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis,
Correlation Analysis and Sizing & Estimation. Tableau is the best tool for Sizing & Estimation and
Scenario Analysis.
Ideal for organization at what stage of Analytics Maturity: Tableau is useful for all types of users.
However it suffers from lack of advanced analytics capabilities.
Overview
Flurry is a leader in Mobile App Reporting. Over 100,000 companies use Flurry Analytics in more
than 300,000 applications to Reporting, Marketing Attribution and Operational Analytics.
Flurry like Omniture “instruments” actions on the front end & campaigns outreach channels for
the native Apps by integrating a SDK in the App libraries. This data is then tracked in their
warehouse on the cloud and reporting happens on this data.
Flurry also has other tools -
Output Delivery System
Flurry is a cloud solution.
Integration with Other Tools
Flurry offers capabilities to download the metrics to CSV on which additional analysis can be
performed.
FLURRY
Analytics Tools
13
Type of Data it can handle
Flurry works on Activity data from the Apps directly.
Data/User Limitations (if any)
Flurry doesn’t impose restrictions on data size. However Business version also exists, which
extends capabilities to xyz.
Ease of Learning
Flurry is GUI based solution. Requires 1-2 weeks for being able to leverage most of the features
of Flurry.
Large pool of hands-on and/or trained professionals.
Lot of training materials is also available.
Cost
Basic version is free. Check Business Version
Operational Efficiency
<=30 minutes for basic integration - a small piece of SDK needs to be added to the App libraries
and it starts tracking the standard metrics. Some custom events can also be defined in the App.
Once initial set-up is completed, no major help needed for ongoing needs/changes.
Editorial & Tagging Capabilities
Metrics are standard and fixed on Flurry reports. However some custom events can be defined
and tracked, whose definitions can also be tracked.
Documentation on the reports available within Flurry.
Visualization Options
Standard visualization options – tables, charts, funnels.
Types of Aggregations possible
Profiling, simple math.
Ideal for what type of users: Business Users (Product Managers, Marketers), Operational Analysts,
Developers and Analysts.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling)
Ideal for organization at what stage of Analytics Maturity: Flurry is of a great help to Start-ups,
individual developers and small scale organizations. Given that Flurry supports a smaller range of
reporting/analytics it’s not ideal for mature organizations or large scale enterprises.
B. Business Analytics:
Analytics Tools
14
Business Analysts is one step further in the analytics food chain. They are entrusted with responsibility
of making sense of data deluge; find hidden patterns, explaining fluctuations (up or down), sizing
opportunities and high level projections. They play a critical role in enterprise decision making. They
leverage reports or might query the data sources directly to answer the various business questions.
Factors for deciding Tools for Business Analytics
Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the
order of priority.
Primary
1. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if
Hadoop Connectivity for Big Data analysis?
2. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Correlation Analysis
(pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios
3. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics
(slicing/dicing enabled) visible across all channels
4. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs,
Scalability Costs, and cost of resources
Secondary
1. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained
resources, Training materials & Training cost?
2. Integration with other tools: How easily/seamlessly can it connect to various other
tools/systems both for output delivery or connecting to multiple data sources through ODBC or
other data pipes (Hadoop connectivity)?
3. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase
size or complexity, flexibility in data modeling, scalability issues?
4. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team
needed or Self-Serve, Support availability.
5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC,
Mails, Text, Alerts, Tweets, Social Shares, etc.
Analytics Tools
15
Now let’s look at each tool’s capabilities in detail,
Overview
MS-Excel is a spreadsheet application packaged in MS-OFFICE.
It’s the most widely used tool for Business Analytics and has seen more powerful additions
required to do more sophisticated analysis in recent years.
It also has a programming language, VBA, which enhances power for reporting/automation
needs.
Type of Data it can handle
Excel requires a traditional table structures (rows and columns of data)
MS-EXCEL
Analytics Tools
16
It also has plug-ins which can connect it to Hadoop/PIG at the back end.
Type of Analytics
MS-EXCEL is typically used for Aggregate Analytics (Descriptive, Profiling), Correlation and Trend
Analysis, Sizing & Estimation and Simple Predictive Modeling & Time Series Forecasting.
Recent versions have seen added advanced statistical and math functionalities.
Visualization options
Recent versions incorporate sophisticated, dynamic and powerful graphing options –both static
and dynamic (pivots).
Cost
Excel PC version comes packaged within MS-OFFICE.
Office360 cost TBD?
Ease of Learning
Excels popularity stems from a very intuitive and easy-to-learn GUI.
Low learning curve (1-2 weeks) to be able to use for less sophisticated business
analysis/reporting. VBA coding requires a month of hands-on learning to realize full potential.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Integration with Other tools
Excel can be accessed using PC, Cloud(Office360) and through Apps on Smartphones.
Most major tools have Excel import/export options.
Excel also have XML import/export capabilities.
Data/User Limitations (if any)
Latest versions can handle max of 1 MM rows.
However recent extensions like Power Pivot can handle upto 10 MM rows.
Operational Efficiency
Excel gets installed automatically as an office package (<=2 hrs max). Cloud360 TBD?. Power
pivot and other extensions can be added as plug-ins online.
Output Delivery System
Excel outputs can be accessed on PC, Cloud(Office360) and via Smartphone Apps.
Ideal for what type of users: Non-technical users, not requiring handling of large datasets and doing
high level analytics (simple analysis, reporting, simluations, scenarios or modeling).
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple
Correlation/Trend/Sizing & Estimations.
Ideal for organization at what stage of Analytics Maturity: Useful for all organizations as a simple, cost
effective tool for simpler analytical tasks.
HIVE
Analytics Tools
17
Overview
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis. While initially developed by Facebook, Apache Hive is now
used and developed by other companies such as Netflix.
Apache Hive stores metadata in a RDBMS, significantly reducing the time to perform semantic
checks during query execution.
It has built-in User Defined Functions (UDFs) to manipulate dates, strings, and other data-mining
tools. Hive supports extending the UDF set to handle use-cases not supported by built-in
functions.
Type of Data it can handle
Unstructured/Structured data in Hadoop.
Type of Analytics
Hive can be used for Aggregate Analytics (Descriptive, Profiling).
User Defined Functions (UDFs) can be created for advanced querying needs – Trend Analysis,
Correlation Analysis, Sizing & Estimation.
Visualization options
TBD
Cost
Cloudera or HortonWorks pricing packages.
Ease of Learning
Medium learning curve (1-3 months) to be able to use for business analysis/reporting.
Given the increase in Big Data interest, pool of hands-on and/or trained professionals is
growing.
Training materials/content for Analysts are being ramped up.
Cloudera is the leader in training professionals on HIVE, PIG and Impala. It has dedicated training
modules for Developers, DBAs & Analytics professionals.
Integration with Other tools
TBD
Data/User Limitations (if any)
TBD
Operational Efficiency
TBD
Output Delivery System
TBD
Ideal for what type of users: Technical Users but who are comfortable with SQL coding and wouldn’t
prefer advanced scripting.
Analytics Tools
18
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple
Correlation, Text Mining.
Ideal for organization at what stage of Analytics Maturity: Organizations ramping up the Big Data
framework in their organizations.
Overview
Ksuite is a suite of products developed by Kontagent. Ksuite has three major tools – Ksuite
Mobile, Ksuite Social and Ksuite DataMine.
Ksuite Mobile is mobile app activity reporting tool and Ksuite, a social metrics reporting tool –
targeted for Business Users. Ksuite DataMine is advanced tool targeted for Analysts who need to
go beyond charts/tables and understand what’s happening behind the scenes. Ksuite is a SQL
like Querying platform.
Ksuite like Omniture “instruments” actions on the front end & campaigns outreach channels for
the native Apps by integrating a SDK in the App libraries. This data is then tracked in their
warehouse on the cloud and reporting happens on this data.
Ksuite is a real-time monitoring platform.
Type of Data it can handle
It operates on the App activity data stored on its cloud.
Type of Analytics
Ksuite helps with Aggregate Analytics (Descriptive, Profiling).
Visualization options
Broad range of advanced visualization options – Tables, Charts, etc.
Cost
Depends on data and number of apps tracked in Ksuite. Costs >$2,000 per month.
Ease of Learning
Low learning curve (1-2 weeks) to be able to use for business analysis/reporting. Ksuite also
provides Mobile Analysts and Data Scientists for Consulting.
Large pool of hands-on and/or trained professionals.
Lots of training materials are also available.
Integration with Other tools
Kontagent provides FTP data pipe using which raw data dump can be taken for additional
analysis inhouse.
Data/User Limitations (if any)
Depends on Service contract, since pricing is data size dependent.
Operational Efficiency
Ksuite
Analytics Tools
19
Kontagent installation takes minutes, since only the SDK has to be integrated with the App.
Kontagent also provides Mobile Analysts/Data Scientists as Consultants to assist with anything
during or after installation.
Output Delivery System
Ksuite is a cloud solution. Ksuite Mobile can be accessed via App.
Ideal for what type of users: Non-technical users/Analysts. Best suited for efficient reporting and high -
level analytics.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple
Correlation/Trend.
Ideal for organization at what stage of Analytics Maturity: Useful for established App developers with
scale, since Kontagent can be expensive. Flurry could be a cost-effective solution for organizations on
budget or individual developers.
C. Advanced Analytics:
Advanced Analytics can be quickly summarized as making sense of the data through in-depth analysis
beyond normal business analytics. It could be advanced Text mining (parsing of unstructured data) or
statistical (predictive or driver) analysis.
I. Front-end Analytics/Machine Learning:
Front end Analytics is performed on the raw front end tables. Two broad types of data in the front end
tables are,
 Instrumentation Data: record of activity, on live business site, captured via instrumentations
 Call Log Data: dump of server calls from the live business site and what was delivered
Front end Analytics differs from Business Analytics in the scope of deliverables. Traditionally biggest
users of Front end Analytics were Operational Users (e.g. IT Ops, Security) to monitor site stability,
security breaches, etc. However given the richness of the data from being close to user activity,
businesses have started performing Machine learning on this data to deliver more upstream solutions
like Transactional marketing (offer Credit Card to an ATM user or Netflix recommendations). Tools need
to be able to do String Operations, Text Mining and Associativity Analysis apart from usual profiling and
descriptive analysis.
Factors for deciding Tools for Front End Analytics/Machine Learning
Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the
order of priority.
1. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Machine learning (Text
Mining, String Operations, Associativity Analysis) & Operational Analytics (Alerts, Control
Charts)?
2. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if
Hadoop Connectivity for Big Data analysis?
Analytics Tools
20
3. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase
size or complexity, flexibility in data modeling, scalability issues?
4. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained
resources, Training materials & Training cost?
5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC,
Mails, Text, etc.
6. Integration with other tools: How easily/seamlessly can it connect to various other
tools/systems both for output delivery or connecting to multiple data sources through ODBC or
other data pipes (Hadoop connectivity)? Front end delivery systems?
7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team
needed or Self-Serve, Support availability.
8. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs,
Scalability Costs, and cost of resources
A plethora of tool are available in the market for Front end Analytics, hence the need for a structured
decision making process like above, so that you end up with the tool satisfying most of your needs.
Table 4 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above.
Analytics Tools
21
Now let’s look at each tool’s capabilities in detail,
Overview
Splunk is the leader in API data Analytics (Analysis of API Logs data). Used in Operational
Reporting & Analytics.
Splunk is a cloud solution, where the Customers dump their data and use Splunk Text
Processing technology for the analytical/reporting requirement.
Type of Analytics
Splunk text analytics tool is primarily an operational analytics tool but can be leveraged for
Business Analytics, Machine Learning & Reporting also.
Aggregate Analysis (Descriptive, Profiling). This data can be then analyzed in other tools.
Recently some advanced math & statistical analytics capabilities have been added to SQL.Check?
Type of Data it can handle
It typically works with API Logs Data which record the service calls from the front end.
Data type could be structured/unstructured as text or name -value pairs.
Splunk recently launched HUNK- Hadoop connectivity tool.
Data/User Limitations (if any)
Query speed depends on size of data.
Max Size of data on Splunk Cloud is specified by service contact.
Ease of Learning
Splunk coding typically involves Regular expressions, PERL coding, but it also has a GUI.
It requires 1-3 months hands-on learning to familiarize with all capabilities of Splunk.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Output Delivery System
Splunk is a cloud based solution, but its reports can also be accessed via Mobile Apps.
Integration with Other tools
TBD
Operational Efficiency
<=1 month for data FTP to be established. Once the data pipes are set-up, reporting/analytics
set up can be ramped up in another month.
One DBA is sufficient for maintaining/monitoring/troubleshooting the system. A warehouse DBA
can double up as Splunk Manager since protocols are similar.
Cost
Data Size(amount of data indexed daily) Pricing. Perpetual License ($5K)+Annual Maintenance
(20%) fees.
SPLUNK
Analytics Tools
22
Ideal for what type of users: Operational Analytics or Front data data profiling needs. Users with some
Regular expressions Coding experience needed to build reports/perform analysis.
Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling).
Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since Splunk is
very costly. If the scale is not a problem and in-house programmers are available then same analytics
can be performed using scripting languages like PERL/Python. There are some text analytics tools like
PolyAnalyst which can also double-up as Operational Analytics tool if there FTP can be easily
established. There are other inbuilt tools in other front-end monitoring systems too.
Overview
Megaputer took birth after development of ground-breaking techniques in machine learning by
Moscow State University and Bauman Technical University at Moscow
Their flagship product PolyAnalyst (a suite of reporting+text mining solutions) has been
consistently getting rave reviews from peers, users and industry and is now deployed by 8+ US
Federal Agencies, 200 Universities, 20 Fortune 100 Companies and so on.
TextAnalyst and X-SellAnalyst are two niche products developed for specific user groups. The
USP of these products are that enable non-technical users to perform sophisticated analysis
easily, quickly and at a larger scale.
Type of Analytics
PolyAnalyst is a powerful text mining tool which can also be used for Aggregate (Descriptive,
Profiling), Trend & Correlation Analysis, Advanced Text Mining, Predictive Modeling,
Segmentation, Natural Language Processing and Machine Learning. Its strength is bringing
together analysis of traditional statistically analyzable data with non-traditional unstructured
text data.
TextAnalyst is a dedicated Natural Language Processing tool (based on linguistic and neural
network model), which is most beneficial for summarizing huge volume of text data,
Summarization, Clustering of Text, etc.
X-SellAnalyst is a cross sell recommendation engine (sold as COM component) that works real-
time at Point-Of-Sale. It analyzes historical transactions, profitability, recency and other metrics
for analysis.
Type of Data it can handle
PolyAnalyst can connect to both RDBMS warehouses through ODBC drivers and also work with
Unstructured Text data. Integrates with Microsoft Data Transformation Services and similar
software.
TextAnalyst can connect to text repositories on PCs, Web and in libraries, news agencies, etc.
X-SellAnalyst works with any RDBMS warehouse (structured data).
Data/User Limitations (if any)
PolyAnalyst: Depends on hardware configuration. Claims quick processing of gigabytes of data
and that the productivity can be increased by using 64 bit and cluster server architecture.
TextAnalyst:
MEGAPUTER (POLYANALYST, TEXTANALYST & X-SELLANALYST)
Analytics Tools
23
X-SellAnalyst: Fast response time (<1 sec for 100K products in portfolio). Scales well with large
scale data. Calculation time increases linearly based on number of products already purchased.
Ease of Learning
GUI driven. No coding required. However some training necessary to understand all features
and functionalities available in the tool and how best to leverage them.
Megaupter provides training to facilitate Customer Teams to start using the tools to their full
potential. It claims <=2 weeks training for complete hands-on independence.
Availability and abundance of 3rd
party training materials unknown.
Output Delivery System
PolyAnalyst: Resides on PC. Automated email alerts/logs functionalities. Organization wide
sharing features provided.
TextAnalyst:
X-SellAnalyst integrates with Web/Transaction Server to offer recommendations for Cross sell
on the fly.
Integration with Other tools
TBD
Operational Efficiency
TBD
Cost
TBD
Ideal for what type of Users & Analytics:
PolyAnalyst: Non-coding Data Analysts with sophisticated Text Mining needs.
TextAnalyst: Non-coding users looking for a quick black-box language processing tool. Journal Editors,
Researchers, Scientists, Investment Bankers, Lawyers
X-SellAnalyst: Retailers (Online & Offline) & Call Centers with needs to increase speed/RoI of cross-sales
for a large volume.
Ideal for organization at what stage of Analytics Maturity: Depends on when the organizations needs
advanced text mining and the budget. X-SellAnalyst resembles a solution which solves large scale
problem.
B. Statistical Analytics:
To be able to predict something correctly has always captured the fancy of humankind. Game of odds
can be seen everyone around us – games, elections, stock markets, etc. We are all always surrounded by
decisions where the future is unknown and uncertain and no one can get it right all the time in all the
questions. No one is required to be able to predict future with 100% accuracy, all we want is someone
with a vision, a foresight. With the advance in sciences and mathematics where scientists come up with
formulae and equations that can relate one thing with another in a fairly reliable way, the same
principles and thoughts have been formulated into the discipline of “Statistics” and Economics has
proved to be an ardent follower of these rules and laws. With the proven success of Statistics in
Economics why would business leaders stay behind, they started applying the same discipline in running
Analytics Tools
24
business – predicting odds of something happening, predicting the directions of market, forecasting
inventory and sales, etc. Thus took birth the era of Statistical Business Analytics.
Over time, many tools were developed and used by academicians in schools and universities and
Statisticians and Analysts in corporate world but few could keep up with changes in technologies and
techniques. Some have stayed, grown and matured with the market and requirements; some have
lagged behind and lost in history with golden mention. Some still find application in niche industries,
academia, government, research institutions and trading floors, some were acquired as part of vertical
integration by larger players in other domain and some have grown into billion dollar entities. Matlab
falls predominantly in first group, SPSS in second and SAS in third. And finally some challengers have
taken birth, whose meteoric rise is a tale of legends and are here to stay and become even more
mainstream – R falls in this bucket. Let’s first look at the factors to decide what tool to use when
followed by broader description of each of them.
Factors for deciding Tools for Advanced Analytics
Primary
6. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if
Hadoop Connectivity for Big Data analysis?
7. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained
resources, Training materials & Training cost?
8. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Text Mining, Correlation
Analysis (pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios, Predictive Analysis, Time
Series Forecasting, Segmentation (Decision Trees and Clustering), Life Cycle analysis
9. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs,
Scalability Costs, and cost of resources
Secondary
10. Integration with other tools: How easily/seamlessly can it connect to various other
tools/systems both for output delivery or connecting to multiple data sources through ODBC or
other data pipes (Hadoop connectivity)?
11. Visualization Options: Ease of understanding and communicating insights through Tables,
Charts, Maps, Heatmaps, etc. with commenting and delivered across all channels
12. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase
size or complexity, flexibility in data modeling, scalability issues?
13. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team
needed or Self-Serve Support availability.
Analytics Tools
25
Overview
SAS has traditionally been a leader in the Analytics Industry.
SAS creates solutions for a wide variety of analytics across many industries and domains from
Banking to Pharma.
It has capabilities to host an Enterprise Data Warehouse, Business & Advanced Analytics,
Executive Reporting & Regulatory Compliance (e.g. BASEL II) and Analytical Solution Deployment
(e.g. Credit Score based Decision Framework).
Type of Data it can handle
SAS
Analytics Tools
26
SAS requires a traditional table structures (rows and columns of data). SAS also has abilities to
host an Enterprise Data Warehouse dedicated to serving Analytical needs effectively and
efficiently.
SAS DataFlux module extends capabilities to handle unstructured text data.
It also has plug-ins which can connect it to Hadoop/PIG at the back end.
Ease of Learning
SAS coding requires 1-6 months of training to be able to do Business/Advanced Analytics &
Reporting. However the GUI version of SAS (SAS JMP) which is good for quick analysis requires
<=1 month of hands-on exposure.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Type of Analytics
SAS works on “Modules” concept - a module is a dedicated solution set, e.g., ETS module for
Time Series Forecasting.
SAS foundation sits on BASE and STAT module which contain data preparation and some
statistical modeling capabilities. This module can also support many a widely used statistical
analysis – A/B Testing, Clustering, Correlation and Trend Analysis.
However for other additional features like Decision Trees, Time Series, Text Mining, etc.
dedicated modules have to be bought separately.
SAS Eminer is the End-to-End tool with GUI frontend (with functions as drag-&-drop nodes). Sold
at a premium.
Cost
BASE/STAT SAS PC licenses can cost between $8-10K per license. Annual Maintenance $3K
BASE/STAT SAS Server licenses can cost $20-30K. Annual Maintenance TBD?
Significant scaling costs to include additional techniques.
E2E Eminer suite costs a premium TBD
Integration with Other tools
SAS requires a PC (Desktop or Laptop) for querying/analysis.
However SAS outputs can be taken across many platforms through reporting/delivery modules
and/or 3rd
party integrations.
Visualization Options
SAS offers many visualization options with comments on what each output stands for. Further
flexibility provided within coding framework to include editorials.
Data/User Limitations (if any) (Data size/users)
SAS has no limitation per se. Limitations dependent only on Hardware configurations or
Warehouse connections.
Certain plug-ins and modules can handle huge quantities of data (TBs).
Operational Efficiency
<=2 hrs for desktop. Server installation <=2 weeks of IT effort.
Analytics Tools
27
Complex installations (advanced server configurations, certain modules esp. EMiner, etc.) need
support from SAS.
Ideal for what type of users: Advanced Users with high end statistical needs but less complex
coding/GUI driven. Typically suited for large enterprises or entities/teams with sufficient budget that can
match with scaling costs (even though BASE/STAT modules can answer many needs for some specific
needs additional modules need to be purchased). SAS best suited for large scale end-to-end analytical
framework.
Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics.
Ideal for organization at what stage of Analytics Maturity: SAS adoption more driven by budget
available since SAS has modules for most of the statistical needs.
Overview
R is quickly becoming a leader in the Analytics Industry.
R was developed as an Open Source alternative and was very popular in the Academia/Research
circles. However with its value being proved there, it quickly gained ground in the corporate
arena as a cost-effective powerful tool.
Type of Data it can handle
R can take data from multiple sources through ODBC connectivity and various libraries. It also
has plug-ins which can connect it to Hadoop at the back end.
Ease of Learning
R is a coding-intensive tool and hence requires 1-12 months of training to be able to do
Business/Advanced Analytics. Recently there have been attempts to bring in GUI.
Given the growing popularity, pool of hands-on and/or trained professionals is growing in
recent years.
Lots of training materials are also available.
Type of Analytics
R works on “Libraries” concepts - these are “function-like” scripts which can carry out specific
functionalities, e.g., Logistic Models or Decision Trees.
R has 3000+ libraries of advanced statistical techniques over the entire spectrum from
Aggregated Analytics to Text Mining to Predictive Analysis.
Capabilities of R keeps extending with new libraries being added and in-memory limitations
being overcome in some proprietary solutions. It also was one of the pioneers in bridging Big
Data with advanced analytics needs.
Cost
Revolution R packages-PC License $1000, Server License >=$25K
R has “Zero Functionality Scaling Cost”- just use the new library to solve a specific problem
instead of buying a new module for every new problem.
Integration with Other tools
R requires a PC (Desktop or Laptop) for querying/analysis.
R
Analytics Tools
28
However R outputs can be taken across many platforms through reporting/delivery integrations.
Visualization Options
R offers many visualization options with comments on what each output stands for. Further
flexibility provided within coding framework to include editorials.
Data/User Limitations (if any) (Data size/users)
R works on in-memory functionalities, hence suffers from RAM limitations.
However some proprietary versions like Revolution R overcomes those limitations via huge
parallel processing. TBD?
Operational Efficiency
<=2 hrs for desktop. Complex Server installations need support from vendors.
Ideal for what type of users: Advanced Users with high end statistical needs and willing/able to write
complex codes. Typically used by start-ups/small organizations with constrained budget, but enough
time/resources’ flexibility to spend on training and implementing R.
Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics.
Ideal for organization at what stage of Analytics Maturity: R adoption more driven by budget and
complexity of needs. Biggest adoption of R is in Academia/Research institutions with needs that can’t be
addressed by other commercially available solutions.
Overview
KS is famous among non-tech users primarily because it offers an intuitive, easy to
learn/execute GUI for advanced statistical techniques.
KS tools are used in broad range of domains from BASEL to Fraud protection to Loyalty
programs.
Type of Data it can handle
KS requires a traditional table structures (rows and columns of data)
It’s currently missing plug-ins to Hadoop/PIG.
Ease of Learning
KS GUI requires <=1 month of training on KS/Strategy Builder.
Large pool of hands-on and/or trained professionals.
Lot of training materials are also available.
Type of Analytics
Even though KS has a broad set of statistical capabilities, it’s especially regarded for Decision
Trees and Strategy Builder functionality.
It offers a decent, cost-effective end-to-end framework (analysis to scenarios) which is sufficient
for most non-tech users.
Its primary limitation is scale, automation and advanced user needs (macros, loops, advanced
statistical techniques).
KNOWLEDGE SEEKER
Analytics Tools
29
Cost
Individual PC license -TBD
Knowledge Seeker
Knowledge Studio
Strategy Builder
Server license -TBD
Knowledge Seeker
Knowledge Studio
Strategy Builder
Integration with Other tools
KS requires a PC (Desktop or Laptop) for querying/analysis.
It offers “In-Database Analytics mode” to perform data mining directly within databases
(Teradata, SQL Server, ORACLE and Netezza).
Visualization Options
KS offers many visualization options with comments on what each output stands for. Further
flexibility provided within coding framework to include editorials.
Data/User Limitations (if any) (Data size/users)
TBD?
Operational Efficiency
<=1 hr for desktop. Complex Server installations need support from vendors.
Ideal for what type of users: GUI users with needs for Advanced Statistical Techniques. Marketing
Professionals and Product Manager (in Financial Services Domain) typically favor this not only for
Statistical Modeling but also the Strategy Builder Project which offers excellent Scenario Analysis
capabilities.
Ideal for what type of analytics: Decision Trees, Scenario Building.
Ideal for organization at what stage of Analytics Maturity: KS adoption is primarily driven by user
technical coding flexibility. KS and Strategy Builder together may cost >$5K and so are also dictated by
budget.
 Other Worthy Mentions
Given the broad spectrum of data consumption from Reporting to Business Analytics to various types of
Advanced Analytics with flavors of Big data integrations, type of analyzable data (Video, Social
comments, Location, etc.), platforms analyzed (Web, Mobile, Tablets and now Google Glass), focus on
functions (Sales, Dev, PM), industry (High Tech, Banking, Pharma, etc.) no one list of Tools can do justice
to all the tools available in the market. Ours was a humble attempt to bring to you a list of strong
contenders which are instrumental in driving analytics in many areas.
In this section of the chapter, we list down a few noteworthy tools who didn’t appear in the list above,
but which are leaders in themselves and/or are expected to become a force in near future.
Google Analytics
Analytics Tools
30
Google Analytics is the default Analytics choice for many Small and Medium Enterprises(<=10 Million
hits per month and <=50 rows of data in reports), since it offers a broad suite of Reporting/Analytics
solutions for free. It’s quick and easy to set-up, helpful in defining & monitoring KPIs. Data refresh
happens every 24 hours. Reports are best suited for KPI tracking, Advertising, Multi-Channel ,Social ,
Mobile & Video tracking. It can also be leveraged for Aggregate Analytics (Descriptive Analysis &
Profiling). However the biggest limitation is Enterprise Scalability, even Premium Version can support a
max of 1 Billion Hits per month. Also Google Analytics KPIs allows App creation on the data but doesn’t
support data transfer via FTP yet. All said and done, Google Analytics is among the best RoI tool
investment for individual developers and SMEs.
RapidMiner
Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and
text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for
large amounts of structured data like database systems and unstructured data like texts. The open-
source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data
mining and business intelligence. The discovery and leverage of unused business intelligence from
existing data enables better informed decisions and allows for process optimization.
 RapidMiner
The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-
source system for knowledge discovery and data mining. It is available as a stand-alone
application for data analysis and as a data mining engine which can be integrated into own
products.
 RapidNet
Relation and Net explorer – identifies interrelationships in the data, define KPIs at nodes and
intersperse geo relationship on Maps.
 RapidSentilyzer
RapidSentilyzer provides all relevant customer and market information in a single real-time
system. It combines efficient crawling techniques with the power of data and text mining and
automatically categorizes the latest news according to sentiments and opinions. The
RapidSentilyzer BuzzBoard can easily be inspected and gives all necessary information in a
central place. This is the way competitive intelligence and customer intelligence has to look like.
 RapidDoc
Automated Document classification engine offered over web.
IBM Analytics
IBM carried forward it’s warehousing expertise into the new ‘Analytics Era” through acquisition of
industry Stalwarts like Cognos for Reporting/Business Analytics and SPSS for Advanced Analytics
capabilities. With them, IBM now has a comprehensive, unified portfolio of business analytics software
(Cognos, SPSS, OpenPages and Algorithmics) with capabilities from Data Storage to Processing to
Reporting to Business and Advanced Analytics and even Analytics Delivery Management. Based on open
Analytics Tools
31
standards, IBM business analytics products can be used independently, in combination with each other,
and as part of broader solutions to key business challenges.
 IBM SPSS products
IBM SPSS predictive analytics software facilitates statistical analysis, data and text mining,
predictive modeling and decision optimization to anticipate change and take action to improve
outcomes.
 IBM Cognos products
IBM Cognos business intelligence and performance management software provides the
integrated dashboards, scorecards, reporting, analysis, and planning and budgeting capabilities
to gain and act on fact-based insights.
 IBM OpenPages products
OpenPages GRC software allows organizations to manage enterprise operational risk and
compliance initiatives using a single, integrated solution.
 IBM Algorithmics products
Algorithmics software helps businesses gain transparency into financial risks in advance,
providing information that is vital to organizations.
SAP Analytics
SAP is a world leader in Enterprise software applications. It has now forayed into advanced data insights
world with the acquisition Business Objects and HANA product suites.
 SAP Business Objects Products
SAP Business Objects suite contains solutions from BI platform management to OLAP
capabilities to Reporting solutions (customizable for various types of delivery – Lumira, Crystal
Report and ESRI integrations). Lumira helps in delivering self-service reports on cloud. Crystal
Reports assists in integrating reports within Business Applications and Processes. ESRI
integration is for geo-spatial reporting.
 SAP Predictive Analytics & HANA
SAP Predictive Analytics solution offers intuitive framework for building complex Analytical
models. It can work with existing data environment as well as with the SAP BusinessObjects BI
Platform to help mine and analyze data.
 SAP HANA
HANA is new in-memory platform offered by SAP to increase speed of Analytics/Reporting
solutions rapidly.
ORACLE Analytics
ORACLE extended its leadership in Data Storage solutions to Business Analytics with acquisition of
Hyperian Essbase and launch of Advanced Analytics solution kit.
Analytics Tools
32
 Oracle Hyperion Enterprise Performance Management combines market-leading performance
management applications with powerful analytics to align financial close, planning, reporting,
analysis, and modeling and unlock business potential. It helps customers leverage their ERP
investments through seamless data and process integration with Oracle E-Business Suite,
PeopleSoft, JD Edwards, Fusion, SAP and other ERP applications. Flexible deployment options
include on-premise, cloud, or on engineered systems designed for high performance and
scalability.
Oracle Hyperion Enterprise Performance Management delivers a comprehensive, integrated
suite of applications featuring common Web and Microsoft Office interfaces, reporting tools,
mobile information delivery, and administration. Best-in-class, in-memory analytics software
and hardware (optimized to work together) combines planning at the speed of business with
unique and powerful strategic and predictive modeling capabilities that improve analytic insight.
Best suited for Strategy Management, Planning, Budgeting and Forecasting, Financial Close and
Reporting and Profitability and Cost Management.
 Oracle Business Intelligence Enterprise Edition
Delivers a robust set of reporting, ad-hoc query and analysis, OLAP, dashboard, and scorecard
functionality with a rich end-user experience that includes visualization, collaboration & alerts.
Makes corporate data easier for business users to access. Provides a common infrastructure for
producing and delivering enterprise reports, scorecards, dashboards, ad-hoc analysis, and OLAP
analysis. Includes rich visualization, interactive dashboards, a vast range of animated charting
options, OLAP-style interactions and innovative search, and actionable collaboration capabilities
to increase user adoption. Reduces cost with a proven Web-based service-oriented architecture
that integrates with existing IT infrastructure.
It also has Mobile BI, Real Time Decision Management and Big Data Solutions.
 Analytic Applications
ORACLE offers a pre-configured suite of Analytics solutions for various business roles, product
lines and industries.
Market Share Research
Gartner publishes annual performance report of business intelligence (BI), corporate performance
management (CPM) and analytics applications/performance management software. Revenue totaled
$13.1 billion in 2012, a 6.8 percent increase from 2011 revenue of $12.3 billion, according to Gartner,
Inc. Tough macro conditions and confusion related to emerging technology terms led to more muted
market growth than in previous years.
Source: Gartner Research http://www.gartner.com/newsroom/id/2507915
Table 5: Top 5 BI, CPM and Analytic Applications/Performance Management Vendors,
Worldwide, 2011-2012 (Millions of Dollars)
Company 2012 Revenue 2012 Market Share (%) 2011 Revenue
SAP 2,902.5 22.1 2,884.0
Oracle 1,952.1 14.9 1,913.5
IBM 1,625.6 12.4 1,478.8
Analytics Tools
33
SAS 1,599.7 12.2 1,542.9
Microsoft 1,189.3 9.1 1,059.9
Others 3,861.90 29.3 3,416.00
Total 13,131.1 100.0 12,295.1
Note: SAP reports in Euros, and faced currency head wind that hampered growth in USD.
Source: Gartner (June 2013)
While all five of the top five BI software vendors retained their top five status, IBM and SAS exchanged
places to move IBM into third position and SAS into fourth (see Table 1). IBM grew 9.9 percent in 2012,
with revenue of $1.6 billion. The top five vendors together accounted for 70 percent of the total BI
software market revenue.
In first place, SAP once again had significantly higher revenue than any other vendor at $2.9 billion with
22.1 percent of the market, although this was up by just 0.6 percent from 2011. Second-place Oracle's
revenue grew by 2.0 percent from 2011 to reach $1.9 billion. Fifth-place Microsoft enjoyed the highest
growth of the top five vendors in 2012, with revenue rising by 12.2 percent compared with 2011, to
reach $1.2 billion.
Chapter Summary
This chapter attempts to impart an intuitive sense of the data movement in the organizations and how it
flows from the front end systems to the back end analytical engines and back to consumers as different
services, e.g., personalized offering, information or better customer service. Data is consumed by
decision makers in various ways as reports informing them about the portfolio condition or as key
insights and recommendations from Analysts. A plethora of tools are available in the market to facilitate
efficient and effective insights generation, hence the users are recommended to put on a examining lens
of factors suggested above to decide on what tool will best serve their needs. The above chapter is just a
small door in the bigger universe of ever-evolving tools available for specific functions and readers are
recommended to perform their own research before deciding on them.
 Pending content:
 Flowchart of decision making

More Related Content

What's hot

Business Intelligence 3.0 Revolution
Business Intelligence 3.0 RevolutionBusiness Intelligence 3.0 Revolution
Business Intelligence 3.0 Revolution
www.panorama.com
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
Michael Lamont
 
Bi ppt version 3.6.2
Bi ppt version 3.6.2Bi ppt version 3.6.2
Bi ppt version 3.6.2p_SarafiGohar
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Dr. Dipti Patil
 
Business Analytics
Business AnalyticsBusiness Analytics
Business Analytics
Jignesh Kariya
 
Business intelligence overview
Business intelligence overviewBusiness intelligence overview
Business intelligence overview
Canara bank
 
Clean Architecture
Clean ArchitectureClean Architecture
Clean Architecture
NSCoder Mexico
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
Ronan Soares
 
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...Swapna Tammishetty
 
Types of business intelligence tools
Types of business intelligence toolsTypes of business intelligence tools
Types of business intelligence toolsgreenliondigital
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Dr. Sunil Kr. Pandey
 
Business Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the SameBusiness Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the Same
Heath Turner
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
Home
 
MicroStrategy 9 - Extending Business Intelligence
MicroStrategy 9 - Extending Business IntelligenceMicroStrategy 9 - Extending Business Intelligence
MicroStrategy 9 - Extending Business Intelligence
MicroStrategy Nederland
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspective
vinaya.hs
 
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONSBUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
George Krasadakis
 
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
Joshua Hoskins
 
Just in Time (JiT) Business Rules Mining
Just in Time (JiT) Business Rules MiningJust in Time (JiT) Business Rules Mining
Just in Time (JiT) Business Rules Mining
Shirley Sartin, PBA, BSAC, CBAP, PMP, CSM
 

What's hot (20)

Business Intelligence 3.0 Revolution
Business Intelligence 3.0 RevolutionBusiness Intelligence 3.0 Revolution
Business Intelligence 3.0 Revolution
 
Business intelligence kpi
Business intelligence kpiBusiness intelligence kpi
Business intelligence kpi
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Bi ppt version 3.6.2
Bi ppt version 3.6.2Bi ppt version 3.6.2
Bi ppt version 3.6.2
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business Analytics
Business AnalyticsBusiness Analytics
Business Analytics
 
Business intelligence overview
Business intelligence overviewBusiness intelligence overview
Business intelligence overview
 
Clean Architecture
Clean ArchitectureClean Architecture
Clean Architecture
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...
Swapna Tammishetty CV-Business & Systems Analyst-Data Analyst-Crystal Reports...
 
Types of business intelligence tools
Types of business intelligence toolsTypes of business intelligence tools
Types of business intelligence tools
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3Business Analysis, Query Tools, Dm unit-3
Business Analysis, Query Tools, Dm unit-3
 
Business Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the SameBusiness Intelligence & Reporting are Not the Same
Business Intelligence & Reporting are Not the Same
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
 
MicroStrategy 9 - Extending Business Intelligence
MicroStrategy 9 - Extending Business IntelligenceMicroStrategy 9 - Extending Business Intelligence
MicroStrategy 9 - Extending Business Intelligence
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspective
 
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONSBUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
BUSINESS INTELLIGENCE OVERVIEW & APPLICATIONS
 
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
[DF2U] Deep Dive into Salesforce.com Reporting, Analytics, and Dashboard
 
Just in Time (JiT) Business Rules Mining
Just in Time (JiT) Business Rules MiningJust in Time (JiT) Business Rules Mining
Just in Time (JiT) Business Rules Mining
 

Similar to An overview of popular analytics toolkits

Business Analytics
 Business Analytics  Business Analytics
Business Analytics
ICFAI Business School
 
What is the relationship between Accounting and an Accounting inform.pdf
What is the relationship between Accounting and an Accounting inform.pdfWhat is the relationship between Accounting and an Accounting inform.pdf
What is the relationship between Accounting and an Accounting inform.pdf
annikasarees
 
business analytics.ppt
business analytics.pptbusiness analytics.ppt
business analytics.ppt
Renu Lamba
 
Business intelligence an introduction
Business intelligence an introductionBusiness intelligence an introduction
Business intelligence an introduction
Isaac Victor
 
Lecture 5 the information system a general model of ais:update version
Lecture 5  the information system   a general model of ais:update versionLecture 5  the information system   a general model of ais:update version
Lecture 5 the information system a general model of ais:update version
Habib Ullah Qamar
 
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Newton Day Uploads
 
Business inteligence
Business inteligenceBusiness inteligence
Business inteligence
Mufaddal Nullwala
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
Kate Subramanian
 
Dashboards Driving Decision Making - ui and me
Dashboards Driving Decision Making - ui and meDashboards Driving Decision Making - ui and me
Dashboards Driving Decision Making - ui and me
Mary Chant
 
Information technology In Business
Information technology In BusinessInformation technology In Business
Information technology In Business
Shivaraj Bhardwaj
 
Bi
BiBi
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
Anametrix
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
Business Intelligence
Business Intelligence Business Intelligence
Business Intelligence
Migrant Systems
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
Sukirti Garg
 
Accounting information system.pptx
Accounting information system.pptxAccounting information system.pptx
Accounting information system.pptx
MohammedRasel9
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
amitparashar42
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
amitparashar42
 

Similar to An overview of popular analytics toolkits (20)

Business Analytics
 Business Analytics  Business Analytics
Business Analytics
 
BI
BIBI
BI
 
What is the relationship between Accounting and an Accounting inform.pdf
What is the relationship between Accounting and an Accounting inform.pdfWhat is the relationship between Accounting and an Accounting inform.pdf
What is the relationship between Accounting and an Accounting inform.pdf
 
business analytics.ppt
business analytics.pptbusiness analytics.ppt
business analytics.ppt
 
Business intelligence an introduction
Business intelligence an introductionBusiness intelligence an introduction
Business intelligence an introduction
 
Information system
Information systemInformation system
Information system
 
Lecture 5 the information system a general model of ais:update version
Lecture 5  the information system   a general model of ais:update versionLecture 5  the information system   a general model of ais:update version
Lecture 5 the information system a general model of ais:update version
 
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013Operational Analytics: Best Software For Sourcing Actionable Insights 2013
Operational Analytics: Best Software For Sourcing Actionable Insights 2013
 
Business inteligence
Business inteligenceBusiness inteligence
Business inteligence
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Dashboards Driving Decision Making - ui and me
Dashboards Driving Decision Making - ui and meDashboards Driving Decision Making - ui and me
Dashboards Driving Decision Making - ui and me
 
Information technology In Business
Information technology In BusinessInformation technology In Business
Information technology In Business
 
Bi
BiBi
Bi
 
The CFO in the Age of Digital Analytics
The CFO in the Age of Digital AnalyticsThe CFO in the Age of Digital Analytics
The CFO in the Age of Digital Analytics
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Business Intelligence
Business Intelligence Business Intelligence
Business Intelligence
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Accounting information system.pptx
Accounting information system.pptxAccounting information system.pptx
Accounting information system.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 

More from Ramkumar Ravichandran

Risk Product Management - Creating Safe Digital Experiences, Product School 2019
Risk Product Management - Creating Safe Digital Experiences, Product School 2019Risk Product Management - Creating Safe Digital Experiences, Product School 2019
Risk Product Management - Creating Safe Digital Experiences, Product School 2019
Ramkumar Ravichandran
 
Improving AI products with Analytics
Improving AI products with AnalyticsImproving AI products with Analytics
Improving AI products with Analytics
Ramkumar Ravichandran
 
Advancing the analytics maturity curve at your organization
Advancing the analytics maturity curve at your organizationAdvancing the analytics maturity curve at your organization
Advancing the analytics maturity curve at your organization
Ramkumar Ravichandran
 
Advancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organizationAdvancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organization
Ramkumar Ravichandran
 
Leadership, analytics & you
Leadership, analytics & youLeadership, analytics & you
Leadership, analytics & you
Ramkumar Ravichandran
 
Augment the actionability of Analytics with the “Voice of Customer”
Augment the actionability of Analytics with the “Voice of Customer”Augment the actionability of Analytics with the “Voice of Customer”
Augment the actionability of Analytics with the “Voice of Customer”
Ramkumar Ravichandran
 
Predictive Analytics as a Product
Predictive Analytics as a Product Predictive Analytics as a Product
Predictive Analytics as a Product
Ramkumar Ravichandran
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
Ramkumar Ravichandran
 
Power of Small Data
Power of Small DataPower of Small Data
Power of Small Data
Ramkumar Ravichandran
 
Optimizing Marketing Decisions
Optimizing Marketing DecisionsOptimizing Marketing Decisions
Optimizing Marketing Decisions
Ramkumar Ravichandran
 
Building & nurturing an Analytics Team
Building & nurturing an Analytics TeamBuilding & nurturing an Analytics Team
Building & nurturing an Analytics Team
Ramkumar Ravichandran
 
Analytics as an enabler of Company Culture
Analytics as an enabler of Company CultureAnalytics as an enabler of Company Culture
Analytics as an enabler of Company Culture
Ramkumar Ravichandran
 
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insights
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insightsDigital summit Dallas 2015 - Research brings back the 'human' aspect to insights
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insights
Ramkumar Ravichandran
 
Social media analytics - a delicious treat, but only when handled like a mast...
Social media analytics - a delicious treat, but only when handled like a mast...Social media analytics - a delicious treat, but only when handled like a mast...
Social media analytics - a delicious treat, but only when handled like a mast...
Ramkumar Ravichandran
 
Optimizing product decisions
Optimizing product decisionsOptimizing product decisions
Optimizing product decisions
Ramkumar Ravichandran
 
Moving beyond numbers
Moving beyond numbersMoving beyond numbers
Moving beyond numbers
Ramkumar Ravichandran
 
Taming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model FrameworkTaming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model Framework
Ramkumar Ravichandran
 
Actionability of insights
Actionability of insights Actionability of insights
Actionability of insights
Ramkumar Ravichandran
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
Ramkumar Ravichandran
 
Transform your Analytics Practice into Insights Practice
Transform your Analytics Practice into Insights PracticeTransform your Analytics Practice into Insights Practice
Transform your Analytics Practice into Insights Practice
Ramkumar Ravichandran
 

More from Ramkumar Ravichandran (20)

Risk Product Management - Creating Safe Digital Experiences, Product School 2019
Risk Product Management - Creating Safe Digital Experiences, Product School 2019Risk Product Management - Creating Safe Digital Experiences, Product School 2019
Risk Product Management - Creating Safe Digital Experiences, Product School 2019
 
Improving AI products with Analytics
Improving AI products with AnalyticsImproving AI products with Analytics
Improving AI products with Analytics
 
Advancing the analytics maturity curve at your organization
Advancing the analytics maturity curve at your organizationAdvancing the analytics maturity curve at your organization
Advancing the analytics maturity curve at your organization
 
Advancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organizationAdvancing Testing Program Maturity in your organization
Advancing Testing Program Maturity in your organization
 
Leadership, analytics & you
Leadership, analytics & youLeadership, analytics & you
Leadership, analytics & you
 
Augment the actionability of Analytics with the “Voice of Customer”
Augment the actionability of Analytics with the “Voice of Customer”Augment the actionability of Analytics with the “Voice of Customer”
Augment the actionability of Analytics with the “Voice of Customer”
 
Predictive Analytics as a Product
Predictive Analytics as a Product Predictive Analytics as a Product
Predictive Analytics as a Product
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
 
Power of Small Data
Power of Small DataPower of Small Data
Power of Small Data
 
Optimizing Marketing Decisions
Optimizing Marketing DecisionsOptimizing Marketing Decisions
Optimizing Marketing Decisions
 
Building & nurturing an Analytics Team
Building & nurturing an Analytics TeamBuilding & nurturing an Analytics Team
Building & nurturing an Analytics Team
 
Analytics as an enabler of Company Culture
Analytics as an enabler of Company CultureAnalytics as an enabler of Company Culture
Analytics as an enabler of Company Culture
 
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insights
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insightsDigital summit Dallas 2015 - Research brings back the 'human' aspect to insights
Digital summit Dallas 2015 - Research brings back the 'human' aspect to insights
 
Social media analytics - a delicious treat, but only when handled like a mast...
Social media analytics - a delicious treat, but only when handled like a mast...Social media analytics - a delicious treat, but only when handled like a mast...
Social media analytics - a delicious treat, but only when handled like a mast...
 
Optimizing product decisions
Optimizing product decisionsOptimizing product decisions
Optimizing product decisions
 
Moving beyond numbers
Moving beyond numbersMoving beyond numbers
Moving beyond numbers
 
Taming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model FrameworkTaming the Data Lake with Scalable Metrics Model Framework
Taming the Data Lake with Scalable Metrics Model Framework
 
Actionability of insights
Actionability of insights Actionability of insights
Actionability of insights
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
Transform your Analytics Practice into Insights Practice
Transform your Analytics Practice into Insights PracticeTransform your Analytics Practice into Insights Practice
Transform your Analytics Practice into Insights Practice
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

An overview of popular analytics toolkits

  • 1. Analytics Tools 1 Chapter Synopsis Wars are won by Armies and Strategies but fought with Weapons --Anonymous This chapter focuses on the tools available in the market to carry out different types of analytics. In the beginning we give you a quick look on the typical data flow in an organization, from the time a customer interacts with the business system and generates activity data, through the various stages of data preparation and how it finally lands with a Business User as Insights/Recommendations. This is followed by a quick breakup of type of analytics done at various life stages of the data, e.g., frontend analytics to Upsell solutions to Customers. Then we give you a quick overview of the various factors that shape the decision on which analytics tool to deploy and then give you brief summary of top tools for each type of analytical needs. Finally we wrap up this chapter with mention of other top tools available in the market, which you might want to explore for your needs. Structure of the Chapter: As mentioned above the chapter is broken down into four major categories,  Quick introduction into the typical data flow in an organization  Type of Analytics and the top toolkits under each  Factors to decide toolkits for each type of analytics  Brief overview of Top Tools  Detailed description of Top Tools  Other worthy mentions  Quick introduction into the typical data flow in an organization Figure 1 illustrates at high level the typical data flow in an organization. As shown in the figure, the first Presentation Tier is where a Customer interacts with the Business and generates data. The data could be of various types – Customer data, Transactional data, Web/Mobile Activity data, etc. To illustrate better let’s take an example- a Customer John walks into a bank to open a Checking account. He provides all details required by the bank to open the account. When the executive enters his information into the system, the warehouse takes in the data (and creates a row) and assigns the John an Identification Number. When John walks to the Teller and deposits money into his Account, the corresponding field in the warehouse is updated. Later when John logs onto his online account to check balances, it generates web Activity and his row is updated. If John transfers to someone transactional information is checked against his balance and then updated. Now let’s move on to what happens in the backend- the data is stored in the warehouse and whenever John interacts with the system, the front end system interacts with the warehouse through an intermediate Logic Tier and serves John. Logic Tier stores all the logic required to perform the business operations – commands, mathematical calculations, analytical decision making structure, etc. It’s responsible for moving the data between the front and the back end and ensuring that all John’s requests are served correctly.
  • 3. Analytics Tools 3 The Data Tier is the layer where the data operations happen. Logic Tier directly works with the front end tables which store data for serving business queries. e.g.,  Customer John’s Snapshot data – current account balance, risk profile, statement summary etc.  Location profile data -nearest ATMs, Branches, Merchants offering discounts, etc.  Recommendations - use his credit card for discount on a weekend movie  Up sell or Cross Sell – apply for a Mortgage Other front end tables record the transactions/activities, e.g.,  John used his ATM for $500 withdrawal  statement printout  logged on to site/app and reached Customer service. Many a times for running business effectively, Businesses need to have a complete view of the Customers, for which they source 3rd party data, e.g., Credit Bureau, Nielsen’s Ratings, Macroeconomic data, etc. Given that most of the data generated by the front end and/or received by 3rd party systems are unstructured/unorganized they need to be processed, cleaned and combined logically for eventual storage and usage in analysis or serving business request. These operations are called ETL (Extraction, Transformation and Loading) are done on regular intervals depending on Business requirements. Post ETL, the structured data flows into various tables in the Enterprise Data Warehouse(EDW). EDW might have specific tables for specific type of information, e.g.,  Customer table – with demographics, snapshot of activity, risk & marketing profile.  Transaction tables – containing transactional information like Amount, Number of transactions, Type of Product purchased, etc. Business Users (Product Mangers, Marketers, Sales Professionals, etc.) rely on some standard metrics for running their day to day operations. They need to see it daily or at regular intervals to understand what’s going on in their business and if it needs more attention. Given the repetitive nature and standardization of these requirements, it makes sense to create a structure where this information is captured in required format and constantly refreshed and available on multiple channels (Email, Cloud or App) – this is called “Reporting”. To run it again-and-again on the granular tables discussed above will be inefficient & slow, so Business Intelligence professionals typically pre-aggregate the data in a standard structure to serve the various reporting requests. This is called “OLAP(On-line Analytical Processing) ” roll-ups or cubes . The reports are then built off of these cubes and so are efficient/quick. Analysts typically are interested in finding out what happened, why, where and when, how good or bad it is, etc. and they do this by looking at various metrics and KPIs of the business. They might leverage the reports or cubes or might hit the database directly for getting answers to their questions. Their analysis might consist of charting, tabulations, simple/advanced math or statistical techniques. We will look at the various types of analytical techniques in detail in the following sections.
  • 4. Analytics Tools 4  Type of Analytics and the top toolkits under each Table 1 summarizes four broad types of Analytics, why they are done and the top tools used when carrying out that type of Analytics. A. Data Collection, ETL & Storage: Whenever a customer interacts with the business system, data is generated which has to be captured efficiently & accurately and stored in the system from a customer service point of view, business operations view and regulatory requirements. Given the ever dynamic nature of businesses today, data collection, storage & retrieval technologies have proliferated each with their own merits and limitations. Many of them are best for specific set of needs but might not be that useful in other sets of circumstances. Data Storage has really matured from early days when they were simply stored as a dump of information, which then gave way to relational data structure (RDBMS), which was followed by
  • 5. Analytics Tools 5 parallel processing and now back to amalgamation of all these broad technologies. Given the varied requirements, fast & accurate delivery of structured business requirements, efficiency of scale at the back end to handling swathes of unstructured data from social media/videos/surveys; no one tool can help run the business end to end. Going into detail of these technologies will require a dedicated book by itself, but let’s attempt to summarize details at a very top level,  Front End Tables (OLTP), e.g. Oracle, DB2 Front End Tables or OLTP (Online Transaction Processing) tables are best to run the client-facing businesses. Their biggest strength is speed, accuracy and lesser failure rates.  Large scale Historical Storage, e.g., Teradata, SQL Server These systems are the repository of all the data generated and store the information from the front end & other internal (Clickstream, Survey systems, Testing Infrastructure) and 3rd party sources. Data from the various sources undergo ETL (Extraction, Transformation & Loading) processing, combined in logical sequences and fed to these systems. These tools are characterized by efficient processing and retrieval of huge data sizes (typically massive parallel processing). They also need to be easily integrated with reporting/analytics platforms.  Unstructured Data, e.g., Hadoop Over time visionaries realized the need for systems which can capture non-traditional data (videos, comments) that is going to be generated in large quantities unforeseen in their times. They started developing technologies that capture such data without putting any restrictions on the structure of the data but having the flexibility to define the structure at the time of retrieval (reporting/analysis). This strength is also its Achilles heel, no structure means slow retrieval, but with Web 2.0 the time of such technologies has truly arrived and the rapid development of reporting/analytical tools based on these platforms or at least a connectivity tool with existing tools points to a promising & mainstream future of big data. B. Reporting: Reporting tools are primarily a visualization (tables, charts, maps, etc.) tool and are specifically used by Business users/Executives to make sense of the data, monitor & understand dynamics (using KPIs) in their portfolio on-the-fly. Analysts too leverage the reports for similar purposes; however they are more interested in the data available in the reports to understand the drives the movements in KPIs. Analysts also leverage reporting tools to understand the enterprise-wide standard KPI creating logic which they can use for their analysis. Reporting tools are usually judged on the “30-60 rule”. The “30-60” rule says that the broad story should be conveyed in first 30 secs of viewing and should provide capability to do one-level drill-down to get a directional sense of the story.
  • 6. Analytics Tools 6 Reporting tools might need to deal with various kinds of data,  Instrumentation Data: record of activity, on live business site, captured via instrumentations  Call Log Data: dump of server calls from the live business site and what was delivered  Transactional Data  Active Customer Data  Customer Feedback Data (social discussions, Survey data, etc.) Some reporting tools also need to incorporate budget, forecasts, competitors & benchmark for Users to best understand where they are. Given the importance of Reporting in running a business and regulatory compliance and the like, many enterprises create dedicated “Reporting” product for specific needs/industries/domains, e.g., SAS CRMS which is SAS Basel II compliance module. Factors for deciding Tools for Reporting Specific needs from a Reporting tool: Wide variety of visualization; availability to access reports from a wide variety of channels – emails, texts, alerts, website, Apps; speed of report refresh; and ability to consolidate data from a wide variety of data sources (and now Big Data too). Below is the list of factors that should be considered to zero-in on a tool, in the order of priority. 1. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, Alerts, Tweets, Social Shares, etc. 2. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)? 3. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis? 4. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues? 5. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost? 6. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources 7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability. 8. Editorial & Tagging Capabilities: Enabling users to check backend logic for debugging/single source of truth. 9. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics (slicing/dicing enabled) visible across all channels 10. Types of Aggregations possible: OLAP Cubes, Simple/Advanced Math, Statistical techniques,etc. A plethora of tool are available in the market for Reporting, hence the need for a structured decision making process like above, so that you end up with the tool satisfying most of your needs.
  • 7. Analytics Tools 7 Table 2 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above. Overview Adobe Marketing Cloud (AMC), the erstwhile Omniture Web Reporting/Analytics suite, is the leader in Web Analytics (Analysis of Clickstream data). Mobile reporting/analytics capabilities are being ramped up. ADOBE MARKETING CLOUD (OMNITURE SITECATALYST & AD HOC ANALYSIS)
  • 8. Analytics Tools 8 AMC “instruments” actions on Web Pages, buttons, callouts in emails, etc. which it then tracks in its warehouse on Cloud and provides front end (SiteCatalyst for Reporting & Ad Hoc Analysis 3.2 for Slicing-and-Dicing Analytics). AMC provides real-time data for a select subset of ~100+ metrics and is slowly ramping up capabilities to make all reporting real-time. Adobe provides multiple solutions for e-businesses to track UX of website visitors, tracking online campaigns effectiveness, Social Media Activity, SEO, SEM and Reporting on Product performance. Output Delivery System SiteCatalyst & Ad Hoc Analysis (erstwhile Adobe Discover) are cloud solutions which can also be accessed on Mobile via Apps. Integration with Other Tools Limited Data import(excel, csv, txt) functionality. Report exported in excel/pdf. AMC does provide data dump via FTP, which can then be utilized for additional analysis. Type of Data it can handle It typically works with Clickstream Data instrumented on Websites, Apps or Emails. Recent efforts to expand into Mobile Web/Apps. Data/User Limitations (if any) Data/user limitations dependent on service contract. However speed performance remains pretty stable with increasing size/users. However FTP speed varies on many factors. Ease of Learning Both SiteCatalyst and Ad Hoc Analysis are GUI based. SiteCatalyst and Ad Hoc Analysis require <=1 month of training on Business Analytics & Reporting. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Cost Cloud License: CPM $0.01 to $1. Per month or Annual? To check if Ad Hoc Analysis inclusive cost? Operational Efficiency 6-12 months initial implementation. A significant effort should go into planning, esp on what metrics to implement, where and the naming conventions, since cost of errors significantly
  • 9. Analytics Tools 9 higher. Given the amount of required effort in implementation (Omniture expert+Dev+QA), if something goes wrong, it typically takes long & is costly to make changes. AMC requires dedicated trained professionals to manage the system. Editorial & Tagging Capabilities Editorial & Tagging Capabilities within SiteCatalyst/Ad Hoc Analysis is not sufficient. Most professionals maintain documentation outside of the system (MS-OFFICE etc.) Visualization Options SiteCatalyst and AMC provide standard visualization options – Tables, Charts, Click Maps, Funnels, etc. Types of Aggregations possible Profiling, not many advanced math functionalities. Ideal for what type of users: Business Users (Product Managers, Marketers), Developers and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling). Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since AMC is very costly. For start-ups/organizations on a budget, Google Analytics is a cost-effective option. Overview Microstrategy is a leading reporting solution and has seen widespread acceptance among Large Enterprise Users. Microstrategy integrates with the warehouse and/or other secondary sources (typically after ETL). Microstrategy has recently expanded its Big Data connectivity and Advanced Analytics capabilities. Output Delivery System Microstrategy offers both on-premise and Cloud delivery solutions which can also be accessed on Mobile via Apps. Integration with Other Tools Microstrategy has among the widest range of integrations possible from Warehouses to Hadoop to ODBC to XML export/import. Microstrategy cubes reside on the warehouse and so can be leveraged by other systems directly from there too. Type of Data it can handle MICROSTRATEGY
  • 10. Analytics Tools 10 Works with structured data. Hadoop plug-in available. Data/User Limitations (if any) Depends on Service contract if user pricing. If requirements are significant, Customers buy an On-promise dedicated Microstrategy. Ease of Learning Reports/drilldown capabilities are GUI based. However coding in Microstrategy scripting language/SQL is required for report creation. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Cost Report User Pricing: $500-1K per Report receiver. Per month or Annual? Dedicated Server Pricing: >=$25K. Per month or Annual? Operational Efficiency 6-12 months initial implementation, since Microstrategy experts (Programmers, Architects) required for setting up of reporting framework. Dedicated team required to manage Microstrategy reporting framework. Editorial & Tagging Capabilities Editorial & Tagging Capabilities within Microstrategy is pretty intuitive. Users can click on “Report Details Page” and figure out the underlying logic behind the reports & metrics. Microstrategy recommends both technical (SQL logic) and non-technical (plain english) commentary. Visualization Options Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word clouds which can be dynamically linked to the back end data. Types of Aggregations possible Profiling, simple & advanced math and statistical capabilities. Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis and Correlation Analysis. Even though Sizing & Estimation possible, it’s not very easy to execute. Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since Microstrategy is costly. For start-ups/organizations with a limited scale, other cost-effective reporting options are available like warehouse packages, Tableau, Excel VBA reporting suite.
  • 11. Analytics Tools 11 Overview Tableau is fast gaining ground among the business and non-tech analytical users on account of its powerful simplicity. It’s takes data from the warehouse and/or other secondary sources (typically after ETL). Data Import/Export, Analysis, Presentation (Tables/Graphs), Automated Reporting, Scenarios can all be done intuitively, quickly, seamlessly and transitioned with ease. Tableau is incorporating some statistical capabilities like simple predictive modeling in recent versions. Output Delivery System Tableau reports need to be created on a PC, but can be hosted on Cloud using Tableau server. Hosted Reports retain OLAP structure of the tables in the backend to facilitate on-the-fly slicing & dicing by the report consumers. Tableau now is also on Cloud and the outputs can be accessed using Apps. Integration with Other Tools Tableau has among the widest range of integrations possible from Warehouses to Hadoop to ODBC to XML exports/imports. Type of Data it can handle Works with structured data. Hadoop plug-in available. Data/User Limitations (if any) Depends on Hardware Configuration. Ease of Learning GUI based. Requires 1-2 weeks for being able to leverage most of the features of Tableau. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Cost Individual PC Licenses cost between $1-2K. Annual Maintenance of $400. Server Licenses cost $1K per report receiver. Annual Maintenance of $200. Operational Efficiency TABLEAU
  • 12. Analytics Tools 12 Desktop framework takes minutes to install/use. Tableau server first installation needs some co- ordination effort between in-house DBAs and Tableau Support team. Timelines depends on complexity of the problem but rarely exceed a week. Once initial set-up is completed, no major help needed for ongoing needs/changes. Editorial & Tagging Capabilities Tableau provides many options for editorials – Title, Summary, sheet description for the reports and dashboard. Given the nature of report creation, types of Aggregation can be checked visually. “Describe option” talks more about the exact operation being done for Metrics. Visualization Options Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word clouds which can be dynamically linked to the back end data. Types of Aggregations possible Profiling, simple & advanced math and some simple statistical capabilities. Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis, Correlation Analysis and Sizing & Estimation. Tableau is the best tool for Sizing & Estimation and Scenario Analysis. Ideal for organization at what stage of Analytics Maturity: Tableau is useful for all types of users. However it suffers from lack of advanced analytics capabilities. Overview Flurry is a leader in Mobile App Reporting. Over 100,000 companies use Flurry Analytics in more than 300,000 applications to Reporting, Marketing Attribution and Operational Analytics. Flurry like Omniture “instruments” actions on the front end & campaigns outreach channels for the native Apps by integrating a SDK in the App libraries. This data is then tracked in their warehouse on the cloud and reporting happens on this data. Flurry also has other tools - Output Delivery System Flurry is a cloud solution. Integration with Other Tools Flurry offers capabilities to download the metrics to CSV on which additional analysis can be performed. FLURRY
  • 13. Analytics Tools 13 Type of Data it can handle Flurry works on Activity data from the Apps directly. Data/User Limitations (if any) Flurry doesn’t impose restrictions on data size. However Business version also exists, which extends capabilities to xyz. Ease of Learning Flurry is GUI based solution. Requires 1-2 weeks for being able to leverage most of the features of Flurry. Large pool of hands-on and/or trained professionals. Lot of training materials is also available. Cost Basic version is free. Check Business Version Operational Efficiency <=30 minutes for basic integration - a small piece of SDK needs to be added to the App libraries and it starts tracking the standard metrics. Some custom events can also be defined in the App. Once initial set-up is completed, no major help needed for ongoing needs/changes. Editorial & Tagging Capabilities Metrics are standard and fixed on Flurry reports. However some custom events can be defined and tracked, whose definitions can also be tracked. Documentation on the reports available within Flurry. Visualization Options Standard visualization options – tables, charts, funnels. Types of Aggregations possible Profiling, simple math. Ideal for what type of users: Business Users (Product Managers, Marketers), Operational Analysts, Developers and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling) Ideal for organization at what stage of Analytics Maturity: Flurry is of a great help to Start-ups, individual developers and small scale organizations. Given that Flurry supports a smaller range of reporting/analytics it’s not ideal for mature organizations or large scale enterprises. B. Business Analytics:
  • 14. Analytics Tools 14 Business Analysts is one step further in the analytics food chain. They are entrusted with responsibility of making sense of data deluge; find hidden patterns, explaining fluctuations (up or down), sizing opportunities and high level projections. They play a critical role in enterprise decision making. They leverage reports or might query the data sources directly to answer the various business questions. Factors for deciding Tools for Business Analytics Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the order of priority. Primary 1. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis? 2. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Correlation Analysis (pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios 3. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics (slicing/dicing enabled) visible across all channels 4. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources Secondary 1. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost? 2. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)? 3. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues? 4. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability. 5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, Alerts, Tweets, Social Shares, etc.
  • 15. Analytics Tools 15 Now let’s look at each tool’s capabilities in detail, Overview MS-Excel is a spreadsheet application packaged in MS-OFFICE. It’s the most widely used tool for Business Analytics and has seen more powerful additions required to do more sophisticated analysis in recent years. It also has a programming language, VBA, which enhances power for reporting/automation needs. Type of Data it can handle Excel requires a traditional table structures (rows and columns of data) MS-EXCEL
  • 16. Analytics Tools 16 It also has plug-ins which can connect it to Hadoop/PIG at the back end. Type of Analytics MS-EXCEL is typically used for Aggregate Analytics (Descriptive, Profiling), Correlation and Trend Analysis, Sizing & Estimation and Simple Predictive Modeling & Time Series Forecasting. Recent versions have seen added advanced statistical and math functionalities. Visualization options Recent versions incorporate sophisticated, dynamic and powerful graphing options –both static and dynamic (pivots). Cost Excel PC version comes packaged within MS-OFFICE. Office360 cost TBD? Ease of Learning Excels popularity stems from a very intuitive and easy-to-learn GUI. Low learning curve (1-2 weeks) to be able to use for less sophisticated business analysis/reporting. VBA coding requires a month of hands-on learning to realize full potential. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Integration with Other tools Excel can be accessed using PC, Cloud(Office360) and through Apps on Smartphones. Most major tools have Excel import/export options. Excel also have XML import/export capabilities. Data/User Limitations (if any) Latest versions can handle max of 1 MM rows. However recent extensions like Power Pivot can handle upto 10 MM rows. Operational Efficiency Excel gets installed automatically as an office package (<=2 hrs max). Cloud360 TBD?. Power pivot and other extensions can be added as plug-ins online. Output Delivery System Excel outputs can be accessed on PC, Cloud(Office360) and via Smartphone Apps. Ideal for what type of users: Non-technical users, not requiring handling of large datasets and doing high level analytics (simple analysis, reporting, simluations, scenarios or modeling). Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation/Trend/Sizing & Estimations. Ideal for organization at what stage of Analytics Maturity: Useful for all organizations as a simple, cost effective tool for simpler analytical tasks. HIVE
  • 17. Analytics Tools 17 Overview Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix. Apache Hive stores metadata in a RDBMS, significantly reducing the time to perform semantic checks during query execution. It has built-in User Defined Functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use-cases not supported by built-in functions. Type of Data it can handle Unstructured/Structured data in Hadoop. Type of Analytics Hive can be used for Aggregate Analytics (Descriptive, Profiling). User Defined Functions (UDFs) can be created for advanced querying needs – Trend Analysis, Correlation Analysis, Sizing & Estimation. Visualization options TBD Cost Cloudera or HortonWorks pricing packages. Ease of Learning Medium learning curve (1-3 months) to be able to use for business analysis/reporting. Given the increase in Big Data interest, pool of hands-on and/or trained professionals is growing. Training materials/content for Analysts are being ramped up. Cloudera is the leader in training professionals on HIVE, PIG and Impala. It has dedicated training modules for Developers, DBAs & Analytics professionals. Integration with Other tools TBD Data/User Limitations (if any) TBD Operational Efficiency TBD Output Delivery System TBD Ideal for what type of users: Technical Users but who are comfortable with SQL coding and wouldn’t prefer advanced scripting.
  • 18. Analytics Tools 18 Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation, Text Mining. Ideal for organization at what stage of Analytics Maturity: Organizations ramping up the Big Data framework in their organizations. Overview Ksuite is a suite of products developed by Kontagent. Ksuite has three major tools – Ksuite Mobile, Ksuite Social and Ksuite DataMine. Ksuite Mobile is mobile app activity reporting tool and Ksuite, a social metrics reporting tool – targeted for Business Users. Ksuite DataMine is advanced tool targeted for Analysts who need to go beyond charts/tables and understand what’s happening behind the scenes. Ksuite is a SQL like Querying platform. Ksuite like Omniture “instruments” actions on the front end & campaigns outreach channels for the native Apps by integrating a SDK in the App libraries. This data is then tracked in their warehouse on the cloud and reporting happens on this data. Ksuite is a real-time monitoring platform. Type of Data it can handle It operates on the App activity data stored on its cloud. Type of Analytics Ksuite helps with Aggregate Analytics (Descriptive, Profiling). Visualization options Broad range of advanced visualization options – Tables, Charts, etc. Cost Depends on data and number of apps tracked in Ksuite. Costs >$2,000 per month. Ease of Learning Low learning curve (1-2 weeks) to be able to use for business analysis/reporting. Ksuite also provides Mobile Analysts and Data Scientists for Consulting. Large pool of hands-on and/or trained professionals. Lots of training materials are also available. Integration with Other tools Kontagent provides FTP data pipe using which raw data dump can be taken for additional analysis inhouse. Data/User Limitations (if any) Depends on Service contract, since pricing is data size dependent. Operational Efficiency Ksuite
  • 19. Analytics Tools 19 Kontagent installation takes minutes, since only the SDK has to be integrated with the App. Kontagent also provides Mobile Analysts/Data Scientists as Consultants to assist with anything during or after installation. Output Delivery System Ksuite is a cloud solution. Ksuite Mobile can be accessed via App. Ideal for what type of users: Non-technical users/Analysts. Best suited for efficient reporting and high - level analytics. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation/Trend. Ideal for organization at what stage of Analytics Maturity: Useful for established App developers with scale, since Kontagent can be expensive. Flurry could be a cost-effective solution for organizations on budget or individual developers. C. Advanced Analytics: Advanced Analytics can be quickly summarized as making sense of the data through in-depth analysis beyond normal business analytics. It could be advanced Text mining (parsing of unstructured data) or statistical (predictive or driver) analysis. I. Front-end Analytics/Machine Learning: Front end Analytics is performed on the raw front end tables. Two broad types of data in the front end tables are,  Instrumentation Data: record of activity, on live business site, captured via instrumentations  Call Log Data: dump of server calls from the live business site and what was delivered Front end Analytics differs from Business Analytics in the scope of deliverables. Traditionally biggest users of Front end Analytics were Operational Users (e.g. IT Ops, Security) to monitor site stability, security breaches, etc. However given the richness of the data from being close to user activity, businesses have started performing Machine learning on this data to deliver more upstream solutions like Transactional marketing (offer Credit Card to an ATM user or Netflix recommendations). Tools need to be able to do String Operations, Text Mining and Associativity Analysis apart from usual profiling and descriptive analysis. Factors for deciding Tools for Front End Analytics/Machine Learning Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the order of priority. 1. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Machine learning (Text Mining, String Operations, Associativity Analysis) & Operational Analytics (Alerts, Control Charts)? 2. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis?
  • 20. Analytics Tools 20 3. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues? 4. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost? 5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, etc. 6. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)? Front end delivery systems? 7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability. 8. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources A plethora of tool are available in the market for Front end Analytics, hence the need for a structured decision making process like above, so that you end up with the tool satisfying most of your needs. Table 4 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above.
  • 21. Analytics Tools 21 Now let’s look at each tool’s capabilities in detail, Overview Splunk is the leader in API data Analytics (Analysis of API Logs data). Used in Operational Reporting & Analytics. Splunk is a cloud solution, where the Customers dump their data and use Splunk Text Processing technology for the analytical/reporting requirement. Type of Analytics Splunk text analytics tool is primarily an operational analytics tool but can be leveraged for Business Analytics, Machine Learning & Reporting also. Aggregate Analysis (Descriptive, Profiling). This data can be then analyzed in other tools. Recently some advanced math & statistical analytics capabilities have been added to SQL.Check? Type of Data it can handle It typically works with API Logs Data which record the service calls from the front end. Data type could be structured/unstructured as text or name -value pairs. Splunk recently launched HUNK- Hadoop connectivity tool. Data/User Limitations (if any) Query speed depends on size of data. Max Size of data on Splunk Cloud is specified by service contact. Ease of Learning Splunk coding typically involves Regular expressions, PERL coding, but it also has a GUI. It requires 1-3 months hands-on learning to familiarize with all capabilities of Splunk. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Output Delivery System Splunk is a cloud based solution, but its reports can also be accessed via Mobile Apps. Integration with Other tools TBD Operational Efficiency <=1 month for data FTP to be established. Once the data pipes are set-up, reporting/analytics set up can be ramped up in another month. One DBA is sufficient for maintaining/monitoring/troubleshooting the system. A warehouse DBA can double up as Splunk Manager since protocols are similar. Cost Data Size(amount of data indexed daily) Pricing. Perpetual License ($5K)+Annual Maintenance (20%) fees. SPLUNK
  • 22. Analytics Tools 22 Ideal for what type of users: Operational Analytics or Front data data profiling needs. Users with some Regular expressions Coding experience needed to build reports/perform analysis. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling). Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since Splunk is very costly. If the scale is not a problem and in-house programmers are available then same analytics can be performed using scripting languages like PERL/Python. There are some text analytics tools like PolyAnalyst which can also double-up as Operational Analytics tool if there FTP can be easily established. There are other inbuilt tools in other front-end monitoring systems too. Overview Megaputer took birth after development of ground-breaking techniques in machine learning by Moscow State University and Bauman Technical University at Moscow Their flagship product PolyAnalyst (a suite of reporting+text mining solutions) has been consistently getting rave reviews from peers, users and industry and is now deployed by 8+ US Federal Agencies, 200 Universities, 20 Fortune 100 Companies and so on. TextAnalyst and X-SellAnalyst are two niche products developed for specific user groups. The USP of these products are that enable non-technical users to perform sophisticated analysis easily, quickly and at a larger scale. Type of Analytics PolyAnalyst is a powerful text mining tool which can also be used for Aggregate (Descriptive, Profiling), Trend & Correlation Analysis, Advanced Text Mining, Predictive Modeling, Segmentation, Natural Language Processing and Machine Learning. Its strength is bringing together analysis of traditional statistically analyzable data with non-traditional unstructured text data. TextAnalyst is a dedicated Natural Language Processing tool (based on linguistic and neural network model), which is most beneficial for summarizing huge volume of text data, Summarization, Clustering of Text, etc. X-SellAnalyst is a cross sell recommendation engine (sold as COM component) that works real- time at Point-Of-Sale. It analyzes historical transactions, profitability, recency and other metrics for analysis. Type of Data it can handle PolyAnalyst can connect to both RDBMS warehouses through ODBC drivers and also work with Unstructured Text data. Integrates with Microsoft Data Transformation Services and similar software. TextAnalyst can connect to text repositories on PCs, Web and in libraries, news agencies, etc. X-SellAnalyst works with any RDBMS warehouse (structured data). Data/User Limitations (if any) PolyAnalyst: Depends on hardware configuration. Claims quick processing of gigabytes of data and that the productivity can be increased by using 64 bit and cluster server architecture. TextAnalyst: MEGAPUTER (POLYANALYST, TEXTANALYST & X-SELLANALYST)
  • 23. Analytics Tools 23 X-SellAnalyst: Fast response time (<1 sec for 100K products in portfolio). Scales well with large scale data. Calculation time increases linearly based on number of products already purchased. Ease of Learning GUI driven. No coding required. However some training necessary to understand all features and functionalities available in the tool and how best to leverage them. Megaupter provides training to facilitate Customer Teams to start using the tools to their full potential. It claims <=2 weeks training for complete hands-on independence. Availability and abundance of 3rd party training materials unknown. Output Delivery System PolyAnalyst: Resides on PC. Automated email alerts/logs functionalities. Organization wide sharing features provided. TextAnalyst: X-SellAnalyst integrates with Web/Transaction Server to offer recommendations for Cross sell on the fly. Integration with Other tools TBD Operational Efficiency TBD Cost TBD Ideal for what type of Users & Analytics: PolyAnalyst: Non-coding Data Analysts with sophisticated Text Mining needs. TextAnalyst: Non-coding users looking for a quick black-box language processing tool. Journal Editors, Researchers, Scientists, Investment Bankers, Lawyers X-SellAnalyst: Retailers (Online & Offline) & Call Centers with needs to increase speed/RoI of cross-sales for a large volume. Ideal for organization at what stage of Analytics Maturity: Depends on when the organizations needs advanced text mining and the budget. X-SellAnalyst resembles a solution which solves large scale problem. B. Statistical Analytics: To be able to predict something correctly has always captured the fancy of humankind. Game of odds can be seen everyone around us – games, elections, stock markets, etc. We are all always surrounded by decisions where the future is unknown and uncertain and no one can get it right all the time in all the questions. No one is required to be able to predict future with 100% accuracy, all we want is someone with a vision, a foresight. With the advance in sciences and mathematics where scientists come up with formulae and equations that can relate one thing with another in a fairly reliable way, the same principles and thoughts have been formulated into the discipline of “Statistics” and Economics has proved to be an ardent follower of these rules and laws. With the proven success of Statistics in Economics why would business leaders stay behind, they started applying the same discipline in running
  • 24. Analytics Tools 24 business – predicting odds of something happening, predicting the directions of market, forecasting inventory and sales, etc. Thus took birth the era of Statistical Business Analytics. Over time, many tools were developed and used by academicians in schools and universities and Statisticians and Analysts in corporate world but few could keep up with changes in technologies and techniques. Some have stayed, grown and matured with the market and requirements; some have lagged behind and lost in history with golden mention. Some still find application in niche industries, academia, government, research institutions and trading floors, some were acquired as part of vertical integration by larger players in other domain and some have grown into billion dollar entities. Matlab falls predominantly in first group, SPSS in second and SAS in third. And finally some challengers have taken birth, whose meteoric rise is a tale of legends and are here to stay and become even more mainstream – R falls in this bucket. Let’s first look at the factors to decide what tool to use when followed by broader description of each of them. Factors for deciding Tools for Advanced Analytics Primary 6. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis? 7. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost? 8. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Text Mining, Correlation Analysis (pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios, Predictive Analysis, Time Series Forecasting, Segmentation (Decision Trees and Clustering), Life Cycle analysis 9. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources Secondary 10. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)? 11. Visualization Options: Ease of understanding and communicating insights through Tables, Charts, Maps, Heatmaps, etc. with commenting and delivered across all channels 12. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues? 13. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve Support availability.
  • 25. Analytics Tools 25 Overview SAS has traditionally been a leader in the Analytics Industry. SAS creates solutions for a wide variety of analytics across many industries and domains from Banking to Pharma. It has capabilities to host an Enterprise Data Warehouse, Business & Advanced Analytics, Executive Reporting & Regulatory Compliance (e.g. BASEL II) and Analytical Solution Deployment (e.g. Credit Score based Decision Framework). Type of Data it can handle SAS
  • 26. Analytics Tools 26 SAS requires a traditional table structures (rows and columns of data). SAS also has abilities to host an Enterprise Data Warehouse dedicated to serving Analytical needs effectively and efficiently. SAS DataFlux module extends capabilities to handle unstructured text data. It also has plug-ins which can connect it to Hadoop/PIG at the back end. Ease of Learning SAS coding requires 1-6 months of training to be able to do Business/Advanced Analytics & Reporting. However the GUI version of SAS (SAS JMP) which is good for quick analysis requires <=1 month of hands-on exposure. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Type of Analytics SAS works on “Modules” concept - a module is a dedicated solution set, e.g., ETS module for Time Series Forecasting. SAS foundation sits on BASE and STAT module which contain data preparation and some statistical modeling capabilities. This module can also support many a widely used statistical analysis – A/B Testing, Clustering, Correlation and Trend Analysis. However for other additional features like Decision Trees, Time Series, Text Mining, etc. dedicated modules have to be bought separately. SAS Eminer is the End-to-End tool with GUI frontend (with functions as drag-&-drop nodes). Sold at a premium. Cost BASE/STAT SAS PC licenses can cost between $8-10K per license. Annual Maintenance $3K BASE/STAT SAS Server licenses can cost $20-30K. Annual Maintenance TBD? Significant scaling costs to include additional techniques. E2E Eminer suite costs a premium TBD Integration with Other tools SAS requires a PC (Desktop or Laptop) for querying/analysis. However SAS outputs can be taken across many platforms through reporting/delivery modules and/or 3rd party integrations. Visualization Options SAS offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials. Data/User Limitations (if any) (Data size/users) SAS has no limitation per se. Limitations dependent only on Hardware configurations or Warehouse connections. Certain plug-ins and modules can handle huge quantities of data (TBs). Operational Efficiency <=2 hrs for desktop. Server installation <=2 weeks of IT effort.
  • 27. Analytics Tools 27 Complex installations (advanced server configurations, certain modules esp. EMiner, etc.) need support from SAS. Ideal for what type of users: Advanced Users with high end statistical needs but less complex coding/GUI driven. Typically suited for large enterprises or entities/teams with sufficient budget that can match with scaling costs (even though BASE/STAT modules can answer many needs for some specific needs additional modules need to be purchased). SAS best suited for large scale end-to-end analytical framework. Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics. Ideal for organization at what stage of Analytics Maturity: SAS adoption more driven by budget available since SAS has modules for most of the statistical needs. Overview R is quickly becoming a leader in the Analytics Industry. R was developed as an Open Source alternative and was very popular in the Academia/Research circles. However with its value being proved there, it quickly gained ground in the corporate arena as a cost-effective powerful tool. Type of Data it can handle R can take data from multiple sources through ODBC connectivity and various libraries. It also has plug-ins which can connect it to Hadoop at the back end. Ease of Learning R is a coding-intensive tool and hence requires 1-12 months of training to be able to do Business/Advanced Analytics. Recently there have been attempts to bring in GUI. Given the growing popularity, pool of hands-on and/or trained professionals is growing in recent years. Lots of training materials are also available. Type of Analytics R works on “Libraries” concepts - these are “function-like” scripts which can carry out specific functionalities, e.g., Logistic Models or Decision Trees. R has 3000+ libraries of advanced statistical techniques over the entire spectrum from Aggregated Analytics to Text Mining to Predictive Analysis. Capabilities of R keeps extending with new libraries being added and in-memory limitations being overcome in some proprietary solutions. It also was one of the pioneers in bridging Big Data with advanced analytics needs. Cost Revolution R packages-PC License $1000, Server License >=$25K R has “Zero Functionality Scaling Cost”- just use the new library to solve a specific problem instead of buying a new module for every new problem. Integration with Other tools R requires a PC (Desktop or Laptop) for querying/analysis. R
  • 28. Analytics Tools 28 However R outputs can be taken across many platforms through reporting/delivery integrations. Visualization Options R offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials. Data/User Limitations (if any) (Data size/users) R works on in-memory functionalities, hence suffers from RAM limitations. However some proprietary versions like Revolution R overcomes those limitations via huge parallel processing. TBD? Operational Efficiency <=2 hrs for desktop. Complex Server installations need support from vendors. Ideal for what type of users: Advanced Users with high end statistical needs and willing/able to write complex codes. Typically used by start-ups/small organizations with constrained budget, but enough time/resources’ flexibility to spend on training and implementing R. Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics. Ideal for organization at what stage of Analytics Maturity: R adoption more driven by budget and complexity of needs. Biggest adoption of R is in Academia/Research institutions with needs that can’t be addressed by other commercially available solutions. Overview KS is famous among non-tech users primarily because it offers an intuitive, easy to learn/execute GUI for advanced statistical techniques. KS tools are used in broad range of domains from BASEL to Fraud protection to Loyalty programs. Type of Data it can handle KS requires a traditional table structures (rows and columns of data) It’s currently missing plug-ins to Hadoop/PIG. Ease of Learning KS GUI requires <=1 month of training on KS/Strategy Builder. Large pool of hands-on and/or trained professionals. Lot of training materials are also available. Type of Analytics Even though KS has a broad set of statistical capabilities, it’s especially regarded for Decision Trees and Strategy Builder functionality. It offers a decent, cost-effective end-to-end framework (analysis to scenarios) which is sufficient for most non-tech users. Its primary limitation is scale, automation and advanced user needs (macros, loops, advanced statistical techniques). KNOWLEDGE SEEKER
  • 29. Analytics Tools 29 Cost Individual PC license -TBD Knowledge Seeker Knowledge Studio Strategy Builder Server license -TBD Knowledge Seeker Knowledge Studio Strategy Builder Integration with Other tools KS requires a PC (Desktop or Laptop) for querying/analysis. It offers “In-Database Analytics mode” to perform data mining directly within databases (Teradata, SQL Server, ORACLE and Netezza). Visualization Options KS offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials. Data/User Limitations (if any) (Data size/users) TBD? Operational Efficiency <=1 hr for desktop. Complex Server installations need support from vendors. Ideal for what type of users: GUI users with needs for Advanced Statistical Techniques. Marketing Professionals and Product Manager (in Financial Services Domain) typically favor this not only for Statistical Modeling but also the Strategy Builder Project which offers excellent Scenario Analysis capabilities. Ideal for what type of analytics: Decision Trees, Scenario Building. Ideal for organization at what stage of Analytics Maturity: KS adoption is primarily driven by user technical coding flexibility. KS and Strategy Builder together may cost >$5K and so are also dictated by budget.  Other Worthy Mentions Given the broad spectrum of data consumption from Reporting to Business Analytics to various types of Advanced Analytics with flavors of Big data integrations, type of analyzable data (Video, Social comments, Location, etc.), platforms analyzed (Web, Mobile, Tablets and now Google Glass), focus on functions (Sales, Dev, PM), industry (High Tech, Banking, Pharma, etc.) no one list of Tools can do justice to all the tools available in the market. Ours was a humble attempt to bring to you a list of strong contenders which are instrumental in driving analytics in many areas. In this section of the chapter, we list down a few noteworthy tools who didn’t appear in the list above, but which are leaders in themselves and/or are expected to become a force in near future. Google Analytics
  • 30. Analytics Tools 30 Google Analytics is the default Analytics choice for many Small and Medium Enterprises(<=10 Million hits per month and <=50 rows of data in reports), since it offers a broad suite of Reporting/Analytics solutions for free. It’s quick and easy to set-up, helpful in defining & monitoring KPIs. Data refresh happens every 24 hours. Reports are best suited for KPI tracking, Advertising, Multi-Channel ,Social , Mobile & Video tracking. It can also be leveraged for Aggregate Analytics (Descriptive Analysis & Profiling). However the biggest limitation is Enterprise Scalability, even Premium Version can support a max of 1 Billion Hits per month. Also Google Analytics KPIs allows App creation on the data but doesn’t support data transfer via FTP yet. All said and done, Google Analytics is among the best RoI tool investment for individual developers and SMEs. RapidMiner Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open- source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.  RapidMiner The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open- source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products.  RapidNet Relation and Net explorer – identifies interrelationships in the data, define KPIs at nodes and intersperse geo relationship on Maps.  RapidSentilyzer RapidSentilyzer provides all relevant customer and market information in a single real-time system. It combines efficient crawling techniques with the power of data and text mining and automatically categorizes the latest news according to sentiments and opinions. The RapidSentilyzer BuzzBoard can easily be inspected and gives all necessary information in a central place. This is the way competitive intelligence and customer intelligence has to look like.  RapidDoc Automated Document classification engine offered over web. IBM Analytics IBM carried forward it’s warehousing expertise into the new ‘Analytics Era” through acquisition of industry Stalwarts like Cognos for Reporting/Business Analytics and SPSS for Advanced Analytics capabilities. With them, IBM now has a comprehensive, unified portfolio of business analytics software (Cognos, SPSS, OpenPages and Algorithmics) with capabilities from Data Storage to Processing to Reporting to Business and Advanced Analytics and even Analytics Delivery Management. Based on open
  • 31. Analytics Tools 31 standards, IBM business analytics products can be used independently, in combination with each other, and as part of broader solutions to key business challenges.  IBM SPSS products IBM SPSS predictive analytics software facilitates statistical analysis, data and text mining, predictive modeling and decision optimization to anticipate change and take action to improve outcomes.  IBM Cognos products IBM Cognos business intelligence and performance management software provides the integrated dashboards, scorecards, reporting, analysis, and planning and budgeting capabilities to gain and act on fact-based insights.  IBM OpenPages products OpenPages GRC software allows organizations to manage enterprise operational risk and compliance initiatives using a single, integrated solution.  IBM Algorithmics products Algorithmics software helps businesses gain transparency into financial risks in advance, providing information that is vital to organizations. SAP Analytics SAP is a world leader in Enterprise software applications. It has now forayed into advanced data insights world with the acquisition Business Objects and HANA product suites.  SAP Business Objects Products SAP Business Objects suite contains solutions from BI platform management to OLAP capabilities to Reporting solutions (customizable for various types of delivery – Lumira, Crystal Report and ESRI integrations). Lumira helps in delivering self-service reports on cloud. Crystal Reports assists in integrating reports within Business Applications and Processes. ESRI integration is for geo-spatial reporting.  SAP Predictive Analytics & HANA SAP Predictive Analytics solution offers intuitive framework for building complex Analytical models. It can work with existing data environment as well as with the SAP BusinessObjects BI Platform to help mine and analyze data.  SAP HANA HANA is new in-memory platform offered by SAP to increase speed of Analytics/Reporting solutions rapidly. ORACLE Analytics ORACLE extended its leadership in Data Storage solutions to Business Analytics with acquisition of Hyperian Essbase and launch of Advanced Analytics solution kit.
  • 32. Analytics Tools 32  Oracle Hyperion Enterprise Performance Management combines market-leading performance management applications with powerful analytics to align financial close, planning, reporting, analysis, and modeling and unlock business potential. It helps customers leverage their ERP investments through seamless data and process integration with Oracle E-Business Suite, PeopleSoft, JD Edwards, Fusion, SAP and other ERP applications. Flexible deployment options include on-premise, cloud, or on engineered systems designed for high performance and scalability. Oracle Hyperion Enterprise Performance Management delivers a comprehensive, integrated suite of applications featuring common Web and Microsoft Office interfaces, reporting tools, mobile information delivery, and administration. Best-in-class, in-memory analytics software and hardware (optimized to work together) combines planning at the speed of business with unique and powerful strategic and predictive modeling capabilities that improve analytic insight. Best suited for Strategy Management, Planning, Budgeting and Forecasting, Financial Close and Reporting and Profitability and Cost Management.  Oracle Business Intelligence Enterprise Edition Delivers a robust set of reporting, ad-hoc query and analysis, OLAP, dashboard, and scorecard functionality with a rich end-user experience that includes visualization, collaboration & alerts. Makes corporate data easier for business users to access. Provides a common infrastructure for producing and delivering enterprise reports, scorecards, dashboards, ad-hoc analysis, and OLAP analysis. Includes rich visualization, interactive dashboards, a vast range of animated charting options, OLAP-style interactions and innovative search, and actionable collaboration capabilities to increase user adoption. Reduces cost with a proven Web-based service-oriented architecture that integrates with existing IT infrastructure. It also has Mobile BI, Real Time Decision Management and Big Data Solutions.  Analytic Applications ORACLE offers a pre-configured suite of Analytics solutions for various business roles, product lines and industries. Market Share Research Gartner publishes annual performance report of business intelligence (BI), corporate performance management (CPM) and analytics applications/performance management software. Revenue totaled $13.1 billion in 2012, a 6.8 percent increase from 2011 revenue of $12.3 billion, according to Gartner, Inc. Tough macro conditions and confusion related to emerging technology terms led to more muted market growth than in previous years. Source: Gartner Research http://www.gartner.com/newsroom/id/2507915 Table 5: Top 5 BI, CPM and Analytic Applications/Performance Management Vendors, Worldwide, 2011-2012 (Millions of Dollars) Company 2012 Revenue 2012 Market Share (%) 2011 Revenue SAP 2,902.5 22.1 2,884.0 Oracle 1,952.1 14.9 1,913.5 IBM 1,625.6 12.4 1,478.8
  • 33. Analytics Tools 33 SAS 1,599.7 12.2 1,542.9 Microsoft 1,189.3 9.1 1,059.9 Others 3,861.90 29.3 3,416.00 Total 13,131.1 100.0 12,295.1 Note: SAP reports in Euros, and faced currency head wind that hampered growth in USD. Source: Gartner (June 2013) While all five of the top five BI software vendors retained their top five status, IBM and SAS exchanged places to move IBM into third position and SAS into fourth (see Table 1). IBM grew 9.9 percent in 2012, with revenue of $1.6 billion. The top five vendors together accounted for 70 percent of the total BI software market revenue. In first place, SAP once again had significantly higher revenue than any other vendor at $2.9 billion with 22.1 percent of the market, although this was up by just 0.6 percent from 2011. Second-place Oracle's revenue grew by 2.0 percent from 2011 to reach $1.9 billion. Fifth-place Microsoft enjoyed the highest growth of the top five vendors in 2012, with revenue rising by 12.2 percent compared with 2011, to reach $1.2 billion. Chapter Summary This chapter attempts to impart an intuitive sense of the data movement in the organizations and how it flows from the front end systems to the back end analytical engines and back to consumers as different services, e.g., personalized offering, information or better customer service. Data is consumed by decision makers in various ways as reports informing them about the portfolio condition or as key insights and recommendations from Analysts. A plethora of tools are available in the market to facilitate efficient and effective insights generation, hence the users are recommended to put on a examining lens of factors suggested above to decide on what tool will best serve their needs. The above chapter is just a small door in the bigger universe of ever-evolving tools available for specific functions and readers are recommended to perform their own research before deciding on them.  Pending content:  Flowchart of decision making