Steering Away from Bolted-On Analytics

Steering Away From Bolted-on Analytics
Whitepaper
info@connexica.comwww.connexica.com +44(0)1785 246777
Search Powered Data Discovery

Introduction
Since the dawn of the PC, the spreadsheet and the database, developers have been continually
enhancing the same products to eke out more and more life from what are now legacy
technologies.
The cost of ground up development is extremely high so it makes sense to “sweat the asset” and
try to prolong the shelf life of products.
In the context of data storage software, OLTP databases such as DB2 and Oracle were great
inventions. These technologies were designed speciﬁcally to allow data to be stored securely
inside highly optimised, normalised databases.
Thousands of systems were developed where a relational database was the heartbeat of the
application and SQL the de-facto way of getting data in and out of it.
OLTP databases were not designed for fast data retrieval so we came up with new ways of
storing data, by designing warehouses and duplicating data in a de-normalised form. This meant
fewer SQL joins and consequently faster data retrieval.
When SQL over a Warehouse became too slow, new ways of pre-aggregating data were devised,
resulting in the rise of OLAP and its multiple storage variants HOLAP, MOLAP, ROLAP -
depending on how you wish to tune performance against having additional storage and
hardware overheads.
Technology Evolution
2

When OLAP became too restrictive as users needed access to
data at a more granular, transaction level with reduced loading
times, in-memory technologies gathered popularity, taking
advantage of 64 bit architectures and the ability to build
hardware with more and more RAM.
When in-memory technology struggles or the specialised
hardware is deemed too expensive to cope with the sheer
volume of data being generated by the modern day business, or
where aggregated data is insufficient for the ad-hoc information
needs of today’s business worker, we need to take stock and
consider if we may finally have come to a point where we need to
do things differently.
So what’s next?
All of these developments have been incremental add-ons that
have enabled the core data repository – a relational database – to
carry on doing what its best at – OLTP – Online Transaction
Processing, aka “storing” data.
All of the reporting technologies have been incremental add-ons
to SQL, to get around the limitations of a technology that was
never designed for fast data retrieval. The technology is there to
plug a gap, yet businesses are gambling their futures and cash on
these systems continuing to work and be fit for purpose, whilst
the world of Big Data, Social Media and real-time analytics are
becoming a way of life.
3
90%of the world’s data
was generated over
the last 2 years
Source: www.sciencedaily.com

Search-Based Business Analytics
CXAIR is a combined Search Engine and a Business Analytics tool that is not based on OLTP, OLAP
or in-memory technology. CXAIR instead uses the same principles adopted by Internet Search
Engines such as Google and Bing.
Where traditional enterprise reporting tools either report directly off the source data or off a
pre-aggregated CUBE or in-memory aggregations, CXAIR users report off a search engine created
from an organisations source data in a similar way to how a search engine crawls web sites for
content.
Whilst Google crawls web sites for text, documents, media etc…, CXAIR crawls databases and
stores a copy of that data in super-fast, highly scalable indexes. These indexes, when grouped
together, form a search engine that can be “Googled” with sub-second speeds using natural
language search terms.
Unlike internet search engines, CXAIR can update its indexes in near real time, allowing users to
search and find data seconds after being entered into its originating system.
Unlike OLTP and OLAP databases, CXAIR indexes can be searched without the need to write SQL
or MDX.
CXAIR is both a flexible data store and a powerful search and analytical tool, providing many of
the best features associated with the most popular Business Intelligence tools and Warehouse
solutions - but in a fundamentally different way.
The CXAIR approach is focused very much around the business user and how they gain insight
from their corporate data assets without reliance on IT.
CXAIR uses search technology as the “engine” and “storage mechanism” for storing data from
disparate data sources and like Google handles extremely large data volumes and returns
sub-second response times to queries on commodity hardware.
4

5
Below is a summary of some of the key diﬀerences between Search BI and other technologies
used to store and retrieve data.
Understanding
Search BI
What’s diﬀerent between a search engine and OLTP?
A search engine is built for fast data retrieval, not for transaction based data entry.
The technology itself is relatively new coming over 30 years after OLTP and has evolved to
handle huge data volumes and provide rapid retrieval times.
Search engines are extremely easy to configure.
Search engines are queried by typing in natural language search terms not complex SQL.
Search engines are not designed for data entry and do not implement a comprehensive
transaction management system.
OLTP is designed for secure data entry and does not provide an integrated reporting
and analysis layer.
Search engines are designed specifically for the web.

6
What’s diﬀerent between a search engine and OLAP?
In a search engine, data is accessed through natural language searches, rather than MDX.
Search engine data is held at document / transaction level and is not pre-aggregated.
A search engine does not require data to be in an organised structure.
•OLAP is designed as a fast aggregation engine that sits on top of one or more OLTP or
Warehouse systems and does not provide an integrated reporting and analysis layer.
In a search engine all fields and values are available for searching whereas OLAP requires
you to decide exactly what information is to be made available to the user by structuring
cubes, dimensions and measures.
What’s diﬀerent between a search engine and in-memory analytics?
A search engine does not pre-load data into memory.
A search engine is not restricted by memory limits but by disk storage and IO performance.
A search engine is easily distributed across multiple servers to spread load for large
numbers of users and high volumes of data.
A search engine will run on commodity hardware.
Search engine data is held at document / transaction level and is not pre-aggregated.
In a search engine all fields and values are available for searching whereas in-memory
analytics requires you to decide exactly what information is to be made available to the
user by structuring in-memory aggregations, hierarchies and measures.

7
Advantages of Search BI
Search BI is easier to use.
The technology is inherently fast.
The technology is relatively light weight and is easy to implement.
Search engines are designed to handle very large data volumes.
End users do not require SQL skills.
Search engines do not need to differentiate between structured and unstructured data.
Search BI stores its data at document / transaction level and is not pre-aggregated.
Search engines do not need to pre-load data into memory.
The scalability of a search engine is not restricted by memory limits.
Search BI provides an integrated storage repository and query tool.
Search engines run on commodity hardware.
Here are some of the key advantages Search BI has over other technologies used to store and
retrieve data.

8
Understanding
CXAIR
CXAIR is Search BI that quickly and inexpensively presents
actionable information to all of your business users without the
need for IT.
The product uses search technology to provide a simple, easy to
understand interface for querying and reporting on diverse
information collated from multiple, disparate data sources.
Combined with a natural language search capability, CXAIR provides
a highly visual front-end that allows business users to create and
view high quality charts, dashboards and Infographic style outputs.
Unlike traditional reporting tools, CXAIR can query across millions
of transactions at the speed of Google, providing near real-time
responses to information requests.
CXAIR is able to pull structured and unstructured content from
operational data stores, applications, spread sheets, document
directories and all manner of diﬀerent media streams including
Twitter and RSS web feeds and present that data back in
consolidated format for consumption by all levels of the business.
200+implementations of
CXAIR across a variety
of industries
Source: Connexica

9
How does it work?
To best understand how CXAIR works, the ﬁrst thing to understand are the various components
that make up the technology and how you would go about building a search engine, then
analysing the contents though the CXAIR analysis and reporting engine.
A data gathering engine that continually mines information from multiple data sources and
stores a copy of that data as encoded index files.
A high performance search engine that allows data contained in the index files to be queried
and analysed using natural language search terms.
A visualisation engine for transforming search results into graphics.
An analysis and reporting engine for transforming search results and visualisations into reports
and dashboards.
A configuration manager that maintains the metadata and configuration details relating to the
CXAIR installation.
A web user interface for end users who wish to search or run pre-created reports and analyses.
A web interface for full CXAIR users who require access to the entire front-end search, analysis
and report development capabilities.
A web interface for administrators to configure and administer the CXAIR instance for access by
authorised users and 3rd party applications.
CXAIR consists of:

10
Getting Data into CXAIR
In the context of CXAIR, a search engine is a series of indexes which have been logically grouped
together to form a single searchable source of information.
Indexes are stored as a series of segments which simply appear as a group of files held within a
sub-directory on a disk. These files are stored in binary form and are accessed whenever a user
queries a search engine that contains that index via CXAIR or a 3rd party application using a CXAIR
API call.
Index segments contain a series of documents that contain searchable text, dates, numbers and
images that have been extracted and analysed by the index build process and converted into a
proprietary format designed for fast data access.
To create an index there are a series of wizards that allow you to select the type of index you wish
to create - complete refresh, incremental, continuous update, snapshot or archive, the data source
you wish to index e.g. database, file system, web URL, spread sheet, email etc… and then which
filters to apply to restrict what data is returned to the indexing process.

11
The data gathering engine then crawls the source system and transforms the data into a
searchable index.
This process can be repeated any number of times to create any number of searchable indexes.
Administrators can create multiple search engines that either share common indexes or have their
own indexes and restrict access to those indexes and search engines to specific groups of users.
Once an index has been built and added to a search engine, a user (subject to access permissions)
can search the index without any further configuration.
Users can perform free text searches, filter data by clicking on search results to narrow down and
refine the search or use range filters such as date pickers, sliders, check boxes and numeric range
controls to perform more sophisticated searching.
Once the data has been filtered to the records (documents) you are interested in you can then
transform the output into a table, chart, Venn diagram or dashboard without any coding or SQL.
Diagram showing how different data sources can be combined to create multiple search engines.

12
Warehouses are typically built to provide a unified view of the business within a single database.
Often data in critical OLTP based systems will get archived off into a warehouse due to the need
for the OLTP database to function as quickly as possible. In contrast Warehouses are often used
to store historical data for periodic reporting and trend analysis.
Designing a Warehouse requires a combination of SQL expertise as well as business knowledge.
Designing the layout, structures, dimensions and measures for calculating totals and metrics
requires both technical skills and knowledge of the systems and their data, as well as the
reporting requirements and business processes of the organisation.
Critical to the warehouse and the ability to provide timely and accurate reports is “good data”. If
the data is not good, you can’t report against it as it won’t join together to allow you to produce
meaningful reports. Not having “good data” typically forces you to split the Warehouse into
multiple databases - a landing database, a staging database where you correct and augment the
data and a production database or multiple production data marts which are used as the source
for management reporting.
CXAIR can sit on top of a Warehouse and provide the reporting layer or act as an alternative
reporting layer over the production databases and marts.
Warehouse becomes an option, not a necessity
So why is this
different?

13
Alternatively CXAIR can extend the Data Warehouse by taking in data from other systems that are
not easily accessible to the Warehouse.
A fundamental difference between CXAIR and traditional data warehouse implementations is
that it does not need “good data”.
As CXAIR is powered by a search engine, it is inherently designed for structured data and
unstructured data. From a proto-typing and data discovery perspective, using fuzzy matching and
natural language search allows you to navigate around “bad data” and identify errors and
omissions in your operational systems.
Another alternative application of CXAIR is for it to be the Data Warehouse.
Where it is different to a data warehouse is the way it stores the data in indexes which are joined
together to create a unified search engine that spans all of your critical business data.
Search engines are inherently designed to store and retrieve huge data volumes so holding
historic records which might otherwise need to be archived off is standard functionality. In
addition as CXAIR is both the reporting / analysis engine and the storage mechanism, there is no
need to have separate reporting and analysis tools.
Reporting without SQL
A key differentiator between CXAIR and traditional reporting tools is that under the hood it is not
using SQL, OLAP or in-memory analytics. Behind every action is a “search” against an index which
returns matches in sub-second time even over data volumes of millions of records.
Because of the raw speed of search engine technology, the approach to querying and report
writing can be turned on its head.
In normal reporting the primary skill is to know how to get at the underlying data. This would be
achieved either via the creation of complex SQL for a relational database, MDX for an OLAP cube
or proprietary scripting as part of the load and aggregation process for in-memory analytics.
For CXAIR the end user is able to get to the underlying data themselves by simply clicking and
selecting data values. What’s more, this can be done in real time and iteratively to follow the
user’s train of thought.

14
From there the process of transforming that data into a table or chart and iteratively reﬁning the
layout is simple and fast due to the user being able to continually review what the report looks
like as they go along because of the sheer speed in which the data is returned by the search
engine.
To highlight the beneﬁts the speed of a search engine brings for analysis and report building,
CXAIR has an in-built Venn diagram function that allows users to create interactive VENN
diagrams over indexes and search results.
The Venn functionality allows you to identify patterns, relationships and clusters in your data. In
this example we are looking at Health data where we can see the total number of patients that
were admitted electively, the total number of patients admitted in Q3 and all of the patients
admitted that were referred by Dr R. Jones.
Combining the 3 sets shows which of those patients, treated by Dr R. Jones, have also been
admitted electively in Q3.
This functionality is possible because of the speed of search and would not be possible over
traditional database technology, due of the vast number of complex joins that would need to be
done to calculate the various Venn segments over potentially huge data sets.

15
Search BI is a new way of analysing and reporting across todays every increasing and diverse
information sources through natural language searching.
It was developed around the need to join up a network of systems and data and provide a way of
locating that information extremely quickly through the use of simple search terms.
Search Bi and CXAIR is not the next generation of OLTP or an aggregated layer on top of OLTP to
provide improved query responses but a new application of a search engine.
Whilst CXAIR end user functionality converges with traditional business intelligence tools to
cater for what has become a standard set of requirements for standard reporting, CXAIR is also
able to oﬀer something new.
Search BI is an evolution of the Internet Search engine not OLTP. CXAIR is the only BI technology
available today that oﬀers integrated storage and analysis over a search engine, capable of
coping with the diverse demands of modern day information requirements.
Summary

Steering Away from Bolted-On Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Steering Away from Bolted-On Analytics

Similar to Steering Away from Bolted-On Analytics (20)

More from Connexica

More from Connexica (20)

Recently uploaded

Recently uploaded (20)

Steering Away from Bolted-On Analytics