Data monetization with an internal platform

Prithwi Thakuria

DATA MONETIZATION WITH AN
INTERNAL PLATFORM

PERSPECTIVES TOWARDS A PRAGMATIC APPROACH
AND SOLUTION LEVERAGING THE LATEST
TECHNOLOGIES
Jan 2018

DATA MONETIZATION WITH AN INTERNAL PLATFORM 2

Contents

Executive Summary ...................................................................................................................................... 4
Introduction ................................................................................................................................................. 5
Defining Data Monetization ..................................................................................................................... 5
Data Monetization Landscape ................................................................................................................. 5
A Conceptual Framework ............................................................................................................................. 6
The Technologies .......................................................................................................................................... 7
Data Lake on Hadoop stack ...................................................................................................................... 7
Cloud Computing ..................................................................................................................................... 9
Blockchain .............................................................................................................................................. 10
Cognitive Computing .............................................................................................................................. 11
Business Rules Engine ............................................................................................................................ 13
The Solution ............................................................................................................................................... 14
Conclusion .................................................................................................................................................. 19
About the Author ....................................................................................................................................... 19

List of Figures

Figure 1 - Data Monetization Landscape ........................................................................................ 5
Figure 2 - Data Monetization Framework ...................................................................................... 6
Figure 3 - The Modern Data Monetization Stack ........................................................................... 7
Figure 4 - Hadoop 2.0 Stack [courtesy: admin-magazine.com] ...................................................... 7
Figure 5 - How Data Lakes are supposed to work .......................................................................... 8
Figure 6 - Data Lake building blocks ............................................................................................... 9
Figure 7- Cognitive Building Blocks .............................................................................................. 12
Figure 8 - Data Lake + Cognitive ................................................................................................... 13
Figure 9- Data Monetization Architecture Topology .................................................................... 14
Figure 10- Data Ingestion ............................................................................................................. 15
Figure 11- Stages in Data Lake development ............................................................................... 16
Figure 12- CoE Building Blocks ..................................................................................................... 17
Figure 13-Monetizable Assets ...................................................................................................... 17
Figure 14- Representative Auto-Pricing Flow ............................................................................... 17


data monetization:

referring to techniques to make
money off your data
Disclaimer: This disclaimer informs readers that the views, thoughts, and opinions expressed in this
document belong solely to the author, and not necessarily to the author’s employer, organization,
committee or other group or individual.


Executive Summary

We all know that data is the new currency – the
new so called “wealth”. Converting this to
actionable data becomes knowledge – which is
power. This power can be harvested to generate
measurable economic benefits as: revenue or
expense savings, market share, new business
models or selling packaged data assets.
Most businesses realize they have a wealth of
data but not all realize that data is their primary
reason of existence. And the ones that do, like
Google, Amazon, Facebooks and Uber are the
progressive and disruptors in the economy.
Armed with it, they are disrupting established
industries.
Until recently, only information service providers
or aggregators like Nielsen and Thomson Reuters
have been about deriving value from data.
However, the ability to use and monetize data is
now impacting almost every business sector and
businesses are looking at “Data Monetization” as
part of an overall business strategy.
However, realizing value from data monetization
has not been easy because of technological and
cultural challenges.
The challenges that businesses usually face are:
1. Strategy: A comprehensive strategy that
accounts for business imperatives,
business processes and operating
model.
2. Technology: In this rapidly changing
technology environment it is important
to have a technology stack that is not
only reliable and robust but also flexible
to new technologies and standards.
3. Solution: Though the core data
monetization framework can cut across
industries, solutions could vary from
industry to industry. As an example, a
retail solution will vary from a telecom
solution that relies on IoTs.
4. Operating Model: Data monetization is
a complex topic and will challenge
existing operating model to make sure
the expected outcomes are properly
monitored and managed.
5. Processes: Existing data and business
related processes might be candidates
to change and most businesses see this
as a tall barrier to adoption.
Nevertheless, there is a huge opportunity to be
had. There are 3 approaches available and each
of them differ in approaches, capabilities and
commitments requires.
1. Improving internal processes and
decisions
2. Making core products and services more
data driven
3. Selling information products and
offerings to markets.
This whitepaper addresses approach #3.

New technologies like Big Data, Cloud
Computing, Blockchain and Cognitive is
enabling businesses to monetize their
data and outpace their competitors.


Introduction

Defining Data Monetization
According to Wikipedia, “data monetization, a
form of monetization, is generating revenue
from available data sources or real time
streamed data by instituting the discovery,
capture, storage, analysis, dissemination and use
of data”.

Data giants like Google, Amazon, Facebook,
Nielsens of the world have been monetizing data
directly for some time now.
Data can be monetized in two methods – indirect
and direct. In the indirect method, economic
benefits are realized by using data effectively to
improve efficiencies in business processes,
decision making, and partner relationships
among others. In the direct method,
monetization is in the form of: trading with
information (discounts, loyalty etc.), selling data
through a broker (research reports, benchmarks,
etc.) and internal platforms to sell data assets.
Data Monetization Landscape
According to Gartner, by 2020, 10% of
organizations will have a highly profitable
business unit specifically for monetizing
information. And by 2019, 75% of analytics
solutions will incorporate 10 or more exogenous
data sources from second-party partners or
third-party providers.
It is interesting to note that 10% of the
respondents have already started selling their
data on their circa 2015 own and this category is
only growing over the years.

Figure 1 - Data Monetization Landscape

The most compelling reason is creating a
robust equity business with an ongoing
supplemental revenue stream with an
internal data monetization platform.


A Conceptual Framework

A conceptual framework is an abstract
representation, connected to the goal of data
monetization and is useful to make conceptual
distinctions and organize ideas. It makes an
approach real and do this in a way that is easy to
remember and apply.

In the following figure, we illustrate a 5-step
framework.

For a data monetization initiative to be
successful C-suite commitment is vital. Usually
businesses implement a cross line-of-business
working group and in some cases depending on
the scale of opportunity establishes altogether a
new LOB with the responsibility of driving the
overall strategy and implementation plan.

The strategy should be clear about what data
assets are candidates for monetization and
should consider rationale, cost, risks, market
relevance, technology, impact to “business as
usual” and changes to governance and operating
models.

From the technology standpoint, it is important
that there is a well-defined technical framework
that drives subsequent architectures and
technical blueprints for implementation. The
technical approach must include scale, speed
and flexibility to adopt new techniques, tools
and technologies.

One other aspect that cannot be overlooked is
talent. The technologies that will be core to the
solution – Big Data, Cloud Computing,
Blockchain, Cognitive are new, emerging and
cutting edge. There is a dearth in the market for
talent in these technologies and businesses have
been successful by partnering with trusted
partners who can provide talent along with
thought leadership and capital.

Figure 2 - Data Monetization Framework
Execution of a successful data
monetization rests a lot on choosing a
partner who a history of success, tool kit
for rapid development and deployment
and talent.


The Technologies

At the heart of the solution are four technologies
that are at the forefront of the new technical
revolution and which fits perfectly for the
outcomes that we seek.

Data Lake on Hadoop stack

So what is Hadoop? There are lot of definitions
around it but a very simple definition for Hadoop
is that it is a free, Java-based data management
framework from Apache that supports
the processing and computation of
large data sets in a distributed
computing environment. It allows the
capture, process and sharing of data in
any format and scale.

In the last few years, the Hadoop stack
has been used to build a data
ecosystem called Data Lake. A data
lake is a storage repository that holds
a vast amount of raw data in its native
format in a flat architecture [usually
linked to Hadoop related object
storage] until it is needed, while a
hierarchical data warehouse stores data in files
or folders.

Each data element in a lake is assigned a unique
identifier and tagged with a set of extended
metadata tags. When a business question arises,
the data lake can be queried for relevant data,
and that smaller set of data can then be analyzed
to help answer the questions.

It terms of processing it is schema-on-read.
Meaning, just load the data “as-is” and apply
your own lens to the data when you read it back
out. For decades now, the database world has

Figure 3 - The Modern Data Monetization Stack
Figure 4 - Hadoop 2.0 Stack [courtesy: admin-magazine.com]


been oriented towards the schema-on-write
approach where schemas were defined up-front,
then data was written in to the defined schemas,
and then rea from the schemas. Hence, the
rampant use of the term “schema-less”.

Though the workings of a Data Lake and a Data
Warehouse have many similarities they are not
the same in terms of functions and capabilities.

Deciding to implement a Data Warehouse vs.
Data Lake architecture provides different
approaches to data analysis and usage. Which
one to use and when depends upon many factors
which are outside the scope of this paper. But in
short, it is best to use multiple data storage
technologies, chosen based upon the way data is
being used by individual applications or
components of the solution – all coexisting
synergistically.

One important thing to note here is that
depending on the sector for which a data
monetization solution is being built the data
stack will vary. As an example, if the solution is
for the industrial sector where the predominant
sources are IoT devices with bi-temporal or
spatio-temporal data we will likely rely more on
HBase driven solution.

Data Lake is the ideal solution for data
monetization. The primary reasons are the
support for the following usage patterns:

1. Dirty Operational Data Store: Raw data of
any size and form, of limited consistency and
cleanliness, is accessed for operational use
where “good enough” is acceptable. Support
constructs for centralized data landing,
processing, archival and other operational
uses that is core to a Data Lake.

2. Bulk Data operations & Extreme ETL: Batch
and real-time operations on data at massive
scale are conducted using parallel processing
techniques. Making operations faster and
cheaper with massive scale bulk data
movements and rationalized ETL/ELT is
important to a data monetization solution
due to the volume, different formats and
frequency of data.

3. Rapid Analytics Generation: Rapidly arriving
and changing data can be processed in
parallel using complex events or more
sophisticated stream filtering and mining
techniques. It allows iterating on large data
sets, looking for patterns and insights for
new ways to predict future trends that are
true value add to a data monetization
portfolio. Additionally, it enables

Figure 5 - How Data Lakes are supposed to work


augmenting, re-engineering or re-purposing
existing analytical capabilities with new
analytical capabilities.

4. Coexistence with the “old”: Introducing
Hadoop and ancillary technologies to
existing IM investments bolsters capabilities
and the end solution is a composite end-to-
end rapid data and analytics fabric. It means
picking and integrating the right tools for the
right use cases.

5. One Logical System: Data monetization
solutions are local in flavor but global in
nature. Therefore, the solution needs to be
one logical ecosystem for the persistence,
processing, provenance and provisioning of
data and analytics although partitioned
physically for different geographies, markets
or LOBs. Data lakes can be designed with the
above principles as an “on premise” or
“cloud” solution.

This brings us to the next topic of cloud
computing.

It is to be noted, that a data lake does not
necessarily have to be on cloud. It can be
“on-premise” l, if that’s the preference.

Cloud Computing

Why is it preferable to have a Data Lake in the
cloud?

1. Cloud Economics: For starters, data
monetization is a subscription based model
and so is cloud computing. So, in a way it is a
match made in heaven. You pay for what you
use. To create data assets there is cost for
raw data, compute and storage. All these are
native to cloud. Some cost more and some
less. Some are more in demand and some
less. There is peak time and there is off peak
time. All these variations are handled very
well in a cloud computing model in contrast
to the traditional capital-intensive model of
enterprise computing. So, it makes sense to
leverage the economics of the cloud model.

2. Data Fabric: As mentioned earlier a large
enterprise data lake can be spread across
multiple cloud providers – partitioned by
geographies, markets and LOB. To create
this “one logical” data lake the cloud
computing model provides the necessary
tools and technologies to create the data
fabric, that extends beyond the enterprise
and multiple cloud providers tied together

Figure 6 - Data Lake building blocks


by a management plane/layer to integrate
all the individual physical data lakes. This
also provides a seamless view of data assets
and optimal use of multiple storage and
compute options across private and public
clouds.

3. Hyperscale: Data monetization solutions
need to hyperscale – meaning (a) computing
resources and configurability scale with the
demand placed on them (b) computing
capability can be accessed from anywhere in
the world with similar latency. Cloud
computing is built around these principles.
The global scale of the public cloud provides
number of new capabilities, such as the
ability to do geo-distribution of data and to
do cross-region failover, system backup,
network maintenance, patches, and
software upgrades to name a few.

4. As-A-Service: Because of this incredible
global scale, computing can be provided “as-
a-service”, meaning that the cloud offers a
set of capabilities that enterprises can rent,
use and expose on demand. This should be a
fundamental design point as it addresses
core requirement of a data monetization
solution. Moreover, technologies like API,
microservice, containerization that enables
“as-a-service” are native to a cloud model.

Data monetization “As-a-Service” requires
controls and trust. Blockchain is a
technology that fits this bill.

Blockchain

Simply stated, blockchain is a distributed
database technology that does record-
keeping. Some call it a distributed ledger but
ledgers are nothing but a database. It stores
information about digital events, eliminating
the possibility of modifying them and shares
records peer-to-peer, across all databases
within its network.

1. Control and Trust: Blockchain becomes
interesting in a data monetization solution
as it can instill controls and trust. If we take
an example of data monetization in the
telecom industry, there could be millions of
IoT devices that needs to be registered,
trusted and authenticated who will be
sending out data every day. Also, there could
new devices being added or replaced in an
on-going basis. This process needs to be
automated and not humanely possible.
Another scenario is where trust needs to be
instantiated between the data monetization
firm and counterparties without the need for
central authority/stewards to arbitrate
transactions. Blockchain is the perfect
candidate as these features are built in. In
other words, it will remove friction in three
key areas in data monetization: control,
trust, and value.

2. Central Registry: A data monetization
solution will need a central repository where
all relevant information like registration,
authentication, entitlement, audit etc. are
captured and stored. Additionally, meta data
about data being published and subscribed,
cost of assets, compute and storage costs,
billing, tax and other data points will also
have to be stored to implement a
comprehensive framework to address
finance, accounting, regulatory and tax
needs. Blockchain is a good solution in this
regard.

3. Data Goods & Rate Plans: Creating data
products involves crunching raw data and
packaging them as raw feeds, analytics or
insights with the help of subject matter
experts like business analysts, data
scientists, developers and admins. To sell
these goods profitably with rate plans for
different categories of subscribers, it will be
important to know the cost of producing
these goods. Also, it will be nice if “dynamic”
rate plans are created automatically by the
system taking in to account all the factors
that contributed to good(s) consumers


wants to buy. These can be implemented in
blockchain using smart contracts.

I mentioned earlier that blockchain is a
distributed database. So, one might ask, if a
relational database is an option. And the answer
is yes, but it depends.

While blockchain and relational databases are
both useful tools for data monetization, each
technology excels in different areas. Blockchain
have a decisive advantage when it comes to
providing a robust, fault-tolerant way to store
critical data. Relational databases seem to have
a decisive advantage when it comes to
performance. It is not clear that the gains from
disintermediation in blockchain, often cited as
a key advantage, will ever be realized once the
costs to support and maintain a blockchain-
based application are considered. And “smart
contracts” exist in the world of relational
databases, where they’re known as stored
procedures. Anything that can be accomplished
with one technology can also be accomplished
with the other. The right question to ask is
whether it is a fit for the business.

Though blockchain has a limited but important
role in data monetization, it is a technology that
other areas of the business can leverage as well
to build other solutions on it.

Cognitive Computing

When Big Data technology and the changing
economics of cloud computing merge with the
need for business and industry to be smarter, we
have the beginning of this new paradigm that
some call it machine learning, cognitive
computing, artificial intelligence, knowledge
management, and learning machines. IBM’s
Watson, is a good example. A cognitive system
has three fundamental principles:

Learn: A cognitive system learns by leveraging
data to make inferences about a domain, a topic,
a person, or an issue based on training and
observations from all varieties of data. This
internal store (universe) of data is called a corpus
and is used to manage codified knowledge. The
data required to establish the domain for the
system is included in the corpus. One important
thing to note here is that, because a data lake is
a large repository of raw data it can be easily
extended or converted to support corpuses.

Model: To learn, the system creates models or
representations of a domain (which includes
internal and potentially external data) and
assumptions that dictate what learning
algorithms are used. Understanding the context
of how the data fits into the model is key to a
cognitive system. The model refers to the corpus
and the set of assumptions and algorithm that
generates and score hypotheses to answer
questions, solve problems and discover new
insights.

Generate hypotheses: A cognitive system is
probabilistic and generates hypotheses with
associated confidence levels. A cognitive system
uses the data to train, test, or score a hypothesis.
A hypothesis is a candidate explanation for some
of the data already understood. It assumes that
there is not a single correct answer. The most
appropriate answer is based on the data itself.
Sometimes hypotheses are also referred as
insights.

In data monetization, cognitive techniques and
algorithms can be used for identifying data
patterns in large, complex data sets to create
next generation of monetize-able assets. It can
also be used in operational areas - from data
quality, fraud management, operations, data
and process workflow and choreography, and
market analytics. In data monetiation the
following capabilities are desirable which are
well supported by cognitive capabilities:

• Finding unknown patterns
• Generate, evaluate conflicting
hypotheses
• Report on findings and conclusions


• Use variety of predictive and
prescriptive algorithms and statistical
techniques.
• Search and explore
• Continuous learning

In data monetization, the need for unlimited and
undefined interaction paths to search, explore
and discover insights without specific structures
and categories of data is important and cognitive
can enable this.

Also, in data discovery it is very difficult to
ascertain upfront all the intelligence and insights
one would be able to derive from the variety of
different sources that keep cropping up on a
regular basis in data monetization. Ability to
navigate from a starting question or data point
to different directions in any ad-hoc way that the
train-of-thought of analysis demands is essential
for real data discovery which can be powered by
cognitive. These capabilities are needed for data
scientists, analysts and researchers when
designing data assets.

Traditional approach of manually curated data
lakes, which provide limited window view of
data and are designed to answer only questions
identified at the design time, doesn’t make sense
any more for data discovery in today’s big data
world. A cognitive approach is required to
provide an unlimited window of data for anyone
to run ad-hoc queries and perform cross-source
navigation and analysis on the fly.

Successful data lake implementations enabled
and/or powered by cognitive will respond better
to queries in real-time and provide users an easy
and uniform access interface to the disparate
sources of data.

Data Lakes can also benefit from cognitive Smart
Agents that delivers the following:

• Continuous Machine Learning (CML) for
meta data and corpus build
• CML for generating Insights from Analytics
• Automatically create data lakes using
proprietary algorithms and machine learning
techniques which identify and index entities
and relations from across disparate data
sources

Figure 7- Cognitive Building Blocks


• A natural language question answering
interface to search, explore, discover,
analyze & visualize data from these data
lakes.

Hence, it makes perfect sense to marry data
lakes and cognitive to have a real smart data
monetization solution.
And it makes perfect sense, because:

• All cognitive building blocks can be easily
deployed and integrated in Data Lakes.
• Cognitive capabilities can be gradually built
in to Data Lakes

So, in summary a modern data monetization
solution is combination of four core technologies
that are leading edge and provides a great
opportunity to leapfrog existing barriers.
Business Rules Engine

Another important technology is a business rules
engine. In a data monetization solution, there
will be many publishers and subscribers of data
and depending on the needs rules will have to be
implemented to work with the data. Examples:
• process data in real time above a specified
threshold e.g. temperature between 80-100
from nest
• send data from a region to a specified target
e.g. landing zone, table
• notify consumers about availability of
specific data

A business rules engine will provide a smart way
on "What to do", "How to do it" and “Why it was
done” with incoming and outgoing data. This
repository of knowledge which is executable
becomes a single point of truth, for business
policy and provides separation between
business logic and data – a core motivation in
data monetization. This separation also lends
itself to speed and scale as algorithms such as
Drools’ Reteeo and Leaps provide very efficient
ways of matching rule patterns to domain object
data sets. Also, other tools from IDEs to audit
and debugging, can be integrated with the
business rules engine for extended features.

Most Big Data solutions can easily be
extended to adapt and adopt to
Cognitive and Blockchain and deployed in
Cloud. Therefore, it is a good idea to look
at them holistically in data monetization.
Figure 8 - Data Lake + Cognitive


The Solution

Data Monetization is a complex topic, yet, a
practical and needed initiative. With the right
foundation, design principles and patterns this
aspiration is achievable.

The right strategy, roadmap, implementation
plan and most importantly the rigor to stay on
course will provide the sought-after dividends.

With a strategic roadmap and strategy that is
agile and aimed at delivering wins and benefits
in short increments and duration, the journey
will take anywhere from 2-3 years for a fully
operationally evolved solution.

Publisher Registration:
Using blockchain as the underlying technology,
publishers (IoT devices, mobile phones, IP etc.)
will be automatically registered and
authenticated with the registry with proper
entitlements. It will also capture the kind and
format of data, data volume, data frequency,
data domain(s) etc. All this information will later
be used to automatically compute cost, revenue,
performance and other key metrics.

Data Ingestion:
A. Incoming data from publishers is preferred
through a fully managed gateway(s) that:
1. relies on lightweight messaging protocol
like MQTT
2. is secure, authenticates and authorizes
the incoming data e.g. TLS
3. is based on a pub/sub model and
provides
1. decreased flexibility to modify the
publisher and the structure of the
published data
2. subscribers as well as data lake
zones, data table, spark stream etc.
receive topic-based or content-
based which is only a subset of the
total messages published:.
B. Gateway provides “control information”
a.k.a. meta data that is useful in downstream
activities e.g. access level, pricing, audit
among others. This meta data will be stored
in a “registry” which is on blockchain but
could be database tables in Hive, Hbase or
any relational table.
C. Incoming data is “preferred” through a Rules
Engine for proper filtering, routing and
orchestration. The rules engine:

Figure 9- Data Monetization Architecture Topology


• will take SQL like commands for
selection and filtering before it reaches
its target
• will route to the proper targets in the Big
Data ecosystem for further processing
• will personalize, contextualize, prioritize
and orchestrate for subscribers of
packaged information

Data Lake on Cloud:
An open and flexible modern data architecture is
central to a scale-out data monetization
solution. A Data Lake will provide:

• A highly scalable and efficient
infrastructure that lowers costs and
easily keeps pace with growing data
volumes
• Powerful yet easy-to-use computing
platform and analytic tools to unlock the
business value of information that lives
in the data
• Enterprise class data protection to
maximize availability and robust security
options to meet business governance
requirements

A scale-out data lake will also enable businesses
to lower costs by rationalizing existing system
and design issues.

However, it can be time consuming and
complicated to design and integrate data lakes
with the broader data ecosystem. Also, proper
governance, supporting tools and products,
talent, and capabilities needed to deploy data
lakes and realize significant business benefits
adds to the complexity.

Businesses should apply an agile approach to
their design and rollout of data lakes—piloting a
range of technologies and management
approaches and testing and refining them before
getting to optimal processes for data
monetization.

There are four stages of data lake
development when building out the
solution and it can follow an agile
method.

Ingesting Raw Data: The data lake is built
as a low-cost, scalable environment just
to capture data that allows raw data to
be stored indefinitely before being
prepared for use in computing
environments. To make the solution
work and avoid a “data swamp” strong
governance, including rigorous tagging and
classification of data, is required during this early
phase. The data is first loaded into a transient
loading zone, where basic data quality checks are
performed and then moves to the raw data zone.
In the raw zone, sensitive and vulnerable data
are masked. This zone is also used for initial
exploration and discovery by analysts and
scientists.

Data Readiness: At this next level, organizations
may start to more actively use the data lake as a
platform for experimentation and broadening
their understanding on what it will take to create
monetizable assets. The trusted zone contains
both master data and reference data that have
been cleansed and validated and is up to date
using change data capture mechanisms. In this
stage data can be further refined by LOB and
business specific needs and placed in the refined
zone. Data in this stage are standardized and
conformed. While the raw zone in stage 1 is the
source of truth with history the trusted zone in
stage 2 can serve as the single version of the

Figure 10- Data Ingestion


truth and serve as an authoritative data source
for domain(s).

Most importantly, underlying all of this must be
an integrated framework that manages,
monitors, and governs the metadata, the data
quality, the data catalog, and security.

Assets’ Creation: From the trusted and refined
zone, data moves into the sandbox zone, where
analysts and scientists have easy, rapid access to
data and focus more on running experiments
with data and analyzing data, work on models,
analytics and insights. This zone is also used for
data wrangling, discovery, and exploratory
analysis. In the sandbox zone, a range of open-
source and commercial tools are deployed so
that they can work with unaltered data to build
prototypes for analytics programs. Once the
assets are collaborated upon, tested and
validated they are moved to the production zone
for general availability (GA).

Consumption: Finally, the consumption zone
exists for internal users and well as external
subscribers. The consumption zone is loosely
coupled with the blockchain part of the solution
so that all user subscriptions are accounted for
as illustrated in figure 9.

All metrics and KPIs from these 4 stages are
captured in the monetization registry as data in
ingested, processed, integrated, packaged and
consumed. This information is critical to the data
monetization process.

Data Monetization Process: Now that we have
core plumbing around data in place, the next
step is putting together an approach. The
approach has 4 components:

1. Data and Analytics CoE
2. Portfolio Of Assets
3. Rate Plans / Pricing assets
4. Managing Subscriptions

Data and Analytics CoE: It is always a good idea
for an initiative of this scale and importance to
be driven by a CoE. It is an internal strategic team
of experts - data scientists, business analysts and
business leaders to ideate and deliver on
monetize-able ideas. It should be supported by
an agile, design thinking approach based
methodology.

This maximizes the quality, efficiency, success
rate and creation of portfolio of assets across all
lines of business. Also, it results in greater
confidence and consistency in decision-making:
what to create or not create, what will sell, what
to sell, how to sell etc.

It also provides a formal organizational
structure, enabling business to strike the right
balance between agility and sound management

Figure 11- Stages in Data Lake development


while reducing the gap between business and IT,
improving time-to-market and responsiveness to
change.

Some of the core responsibilities are:
• Rationalizing, creating and managing the
enterprise wide portfolio of monetizable
assets
• Build and maintain roadmap that reflect
priorities and key objectives
• Measure, capture benchmarks and report on
the value of the business model
• Create teams and corresponding role maps
incl. xLOB collaboration
• Review and revise processes and programs
on an on-going basis

Rolling out a CoE can be a monumental task and
most businesses prefer to outsource this
function to a trusted partner in the initial phases
and gradually brings it in-house, in a phased
manner.

Portfolio Of Assets: It is a good practice to have a
comprehensive “Business Case Framework”
(BCF) that initiates, vets and validates any asset
that is being proposed. A BCF can be viewed as a
funnel with many sieves, each sieve representing
a criterion, e.g.: business rationale, funding, cost,
risk, market alignment, buyer demographics,
time-to-market, priority, shelf-life etc.

The BCF approach inherently puts important
controls and governance in place which are
important for regulatory, tax, finance and
related purposes.

The following figure illustrates the typical assets
that are monetizeable.

One of the most critical part of the solution is
how to come up with an automatic and dynamic
way for creating rate plans. A manual way to
create, manage and monitor is out of question
considering: (a) ever growing number of
publishers/data (b) always changing opex (c)
supporting custom needs of subscribers (d) cost
of data, compute, storage and SME time (e)
continuous roll-out of new assets (f) subscriber
profile.

This is one of the main barriers in implementing
a scalable solution.
Rate Plan / Pricing Assets: As stated in the earlier
section the “registry” is continuously capturing

A Business Case Framework guarantees
proper mechanisms, controls and
oversight for a “monetizeable” portfolio.

Figure 13-Monetizable Assets

Figure 12- CoE Building Blocks
Figure 14- Representative Auto-Pricing Flow


key metrics/data points through all the four
stages (figure 11). This includes:

• Publisher and subscriber profiles
• Authentication and entitlements
• Data volumes, formats, frequency
• Value and cost of data per GB/TB
• Storage and compute cost
• Data processing/ preparation
• SME hours in creating assets

All these data points can be used to create
dynamic rate plans which self-adjusts and self-
corrects when parameters changes.

As an example, if we can get from the registry
that to create a analytic it took 50 hrs of SME
time, 5 TB of raw data, 3 hours of compute, 7 TB
of storage and 4 hours of data processing time
along with the unit cost of each, we can
automatically create the rate card/cost of this
asset by applying some simple mathematics.

This approach gives us the ultimate speed, scale
and flexibility in pricing the assets. As the registry
is in blockchain we can also be certain that it
tamper-proof.
Managing Assets and Subscriptions: The solution
will allow a self-serve model for subscribers
without compromising security.

Assets will be exposed and delivered through
APIs and as microservices (figure 9) that can
adapt to deliver a holistic and uniform
experience to the customer across all business
channels. This will allow to break down the
coupling between business channels and the
backend systems of record that cater to them.

A monetized asset as a microservice that
encapsulates a core business capability and
adheres to set design principles and goals is a
true digital asset for the subscriber as it brings
value to the business because it can be adapted
for use in multiple contexts - business processes,
applications, transactions, digital channels etc.

Most assets will be domain, LOB or country
specific. But subscribers on the other hand might
request assets that span domains, LOB, region or
countries. Exposing assets as microservices
addresses this limitation.

The assets will be registered in the blockchain
based registry with soft links to the data lake
where the actual assets reside.

Subscribers will invoke the microservice assets
through an API gateway by HTTP REST API calls.
The interface definition and publication is
defined by RAML (Restful API Modeling
Language) that can define every resource and
operation exposed by the microservice that
encapsulates the asset. This information is also
important for governance, controls and pricing.

There will be cases where one asset might have
to collaborate or mash-up with other assets to
satisfy a custom request of a subscriber. In these
instances, an event-based approach can be
adopted where the asset/microservice
subscribes to a business domain event which are
published to message broker like JMS or AMQP.
Adding a message broker to the solution also
adds reliability to the subscription/consumption
model.

Monetized assets will be delivered and
accessible through API invocation calls and
domain event subscriptions.

The microservice and API based approach serves
the concept of a self- service and designed with
enough abstraction to hide the underlying data


and related process (which also can be
API/microservice based).

Subscribers and counterparties will be registered
in the blockchain based registry where their
profile will be authenticated and accordingly
entitlements will be granted.

Subscribers will be able search a catalog of assets
by domain, topic or interest and then try them
before deciding to buy or subscribe.

Subscriptions will be logged and audited and the
registry will have all the intelligence for
automatic billing, taxes, reports, fraud and other
consumer facing needs.

The registry will also provide all the general
ledger needs for the business e.g. invoice,
accounts receivable, income statement, profit &
loss.
Conclusion

Data Monetization is a complex topic, yet, a
practical and needed initiative. With the right
foundation, design principles and patterns this
aspiration is achievable. The things to look out
for are:

• Upper management buy-in
• Solution that can provide scale, speed
and flexibility
• Need for automation / manual
bottlenecks
• Poorly defined portfolio
• Trust in data, processes and assets
• Need for continuous innovation

The right strategy, roadmap, implementation
plan and most importantly the rigor to stay on
course will provide the sought-after dividends.

With a strategic roadmap and strategy that is
agile and aimed at delivering wins and benefits
in short increments and duration, the journey
could take anywhere from 2-3 years for a fully
operationally evolved solution.

A properly executed plan will build an impressive
brand for the business, differentiate them from
the competition and create a new revenue
model.

Finally, data monetization can only be achieved
when businesses have visionaries who recognize
opportunities to derive value from data, and
then effectively seize upon them.

Want to succeed in data monetization? Start
with a vision and mission and don’t look back.
About the Author

Prithwi Thakuria is the Global Practice Leader of
IBM’s Big Data Services Practice that focuses on
Big Data, Analytics and Cognitive. He is an
innovative and “Hands-On” international IT
Leader specializing in everything Digital – Big
Data, Analytics, Enterprise Information
Management, Mobility, Cloud, Social Media,
Digital Marketing and Enterprise Architecture.
He conceived, championed, architected and
deployed EIM ideas and solutions involving
bleeding edge technologies globally. Prior to
joining IBM, Prithwi was with Tata America Intl,
PwC, EMC Consulting and State Street Bank
&Trust.

He has written several publications, and figures
as a speaker in premium industry events.

Prithwi received a Bachelor’s of Engineering,
Electronics and Communications Engineering
from the premier National Institute Of
Technology under Kashmir University in India.

A smart self-serve subscription solution
will provide the necessary safeguards
and automation.

Data monetization with an internal platform

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Data monetization with an internal platform

Similar to Data monetization with an internal platform (20)

Recently uploaded

Recently uploaded (20)

Data monetization with an internal platform