Dealing with Dark Data

WHITE PAPER
Dealing with
Dark Data

We’re in the difficult middle years of the
information age, where a nexus of factors
like cheap storage, rich HD media, ubiquitous
connectivity and more sophisticated SaaS
products are generating more data than we
can affordably store or meaningfully process.
Why are we growing so much?
Data is flooding in from a multitude of sources
– some known and some invisible – which
organizations today have neither the time nor
the resources to effectively manage, let alone
benefit from.
The trouble is, whilst big data and analytics
remain in vogue, neither the volume of
data produced, nor the impulse to store it
all, will change. In the pursuit of business
intelligence, many organizations are hoarding
– often unconsciously - useless data with
the expectation that its potential value will
eventually offset the costs of a bloated and
unnavigable storage environment.
Dark Data
The main culprit behind this trend is something
Gartner has called “dark data” – data which
accumulates through automatic and manual
processes, but which remains invisible to the
business: idle, unanalyzed and without a clear
owner. Being invisible, quantifying exactly how
much dark data organizations are struggling
with is problematic, but that hasn’t stopped the
major analysts trying.
First, a 2013 survey conducted by IDG
Research Services found that only 28% of
stored data presents any value to the day-
to-day operations of a business, suggesting a
massive 72% is non-essential.
Second, IDC’s “Top 10 predictions for CMOs in
2014” corroborates these figures, suggesting
that organizations will fail to realize any value
from 80% of the customer data they hold
because of “immature enterprise value chains”.
Just in case you don’t speak analyst, that
means current data management practices
aren’t capable of locating and extracting the
supposedly valuable information hidden
amongst terabytes of collected data.
It’s expensive to maintain that much unused
data, as Gartner rightly points out: “…
organizations that fail to optimize the way they
manage and retain their data will be forced
to deal with constant increases in storage
costs”. But financial cost is only a part of the
reason dark data is so damaging. Perhaps
more importantly, dark data has become so
ubiquitous that it obscures the useful stuff.
It’s not just that organizations don’t have an
adequate tool to sift through the data heap;
it’s that in worshipping at the altar of analytics
prematurely, we are actively hoarding
useless data in the hope of one day extracting
enormous value from it.
As IDC’s CMO of Advisory Services put it,
whilst big data analytics is a hot topic, most of
this collected data: “[is] garbage. IDC’s data
group researchers say that some 80% of data
collected has no meaning whatsoever.” Or at
least it won’t, until organizations are “smart
enough [to have] a tool be able to differentiate
between the signal and the noise.”
A survey by IDG Research Services found
that only 28% of stored data presents any
value to the day-to-day operations of a
business.
02

What does dark data look like?
Before we go on to look at what these tools
might look like, we should think about the scale
of the problem we expect them to fix. We must
categorize the types of dark data organizations
possess, and for each category, reconcile its
potential value against the cost of its storage.
For instance, server log files are individually
small and unobtrusive, and may contain
useful insights into customer behaviour when
processed together. Even if they’re dark, they
don’t represent a significant burden on the
storage environment.
Unstructured data, on the other hand, is
without exception the single biggest driver
in dark data growth. It’s a broad category of
storage, which can include almost anything
that exists outside of semantically tagged
field forms and databases, and is estimated
to constitute around 70-80% of all data in an
average organization.
It’s often human-generated information in the
form of documents, presentations, reports,
graphics, videos and audio that all begin as
potentially valuable, but end up as half-finished
ideas, discarded early-drafts or simply assets
that serve their purpose and are no longer
useful.
Why is there so much of it?
The answer to the spiraling growth of
unstructured data is the same as its cause –
data management practices (or rather, the
lack of them). We’ll go on to look at the way
tools can encourage better policy-based
management of the data lifecycle shortly, but it
is briefly worth reiterating that the solution to
dark data is not technology – it is management.
There’s no single cause behind the volume and
variety of unstructured data organizations
produce. Some of it is just a symptom of
technological progress. We are using,
producing and sharing more stuff - whether
that is documents, presentations, emails, or
media – because both the tools (and therefore
output) have become more sophisticated and
the quality of connectivity between us is faster
and more reliable.
There is one common thread though:
standards of data management have not kept
up with the pace of data growth. Not by a long
shot.
One of the most common problems is poorly
maintained folder structures. In organizations
where users are free to create data and
folders within shared file stores, duplication
of both content and the effort required to
create it is incredibly common. Users become
less productive because they can’t find the
information they need, and the file stores
become a tangled mess of non-standardized
naming conventions, leading to massive
amounts of erroneous data putting a great
strain on storage.
Another common problem is that old and
unused file data is not actively retired once
it is updated or has become irrelevant. In the
Databarracks Data Health Check 2014, 49%
of 401 respondents did not actively distinguish
between unused and recently accessed file
data despite it being the largest cause of
storage growth.
Unstructured data is estimated to
constitute around 70-80% of all data in an
average organization.
03

What do we do about it?
There is an appetite for tools able to shed some light on dark data. IDG’s report found that whilst
77% of enterprises expressed interest in a single platform solution that automatically manages
data, only 10% actually had a completely automated process in place.
Of course, organizations struggling with dark data (which, to be clear, is everyone) must first
identify what they hope to achieve in finding it. Is it that there may be hidden value in documents
long forgotten about, or that they hope to retire useless data to enable more cost-effective
storage?
In truth, this is a bit of a false dilemma – the answer is probably a combination of the two.
However, it remains a useful distinction to make, if only to make a more informed decision about
the capabilities they require from their chosen solution.
Prospective data analytics tools must offer three core capabilities to reveal the location and
condition of dark data, and minimize preventable growth in future.
Search
First, organizations need a strong search capability that scrapes
both metadata and the actual content of unstructured data.
This increases visibility into the dark areas of your storage
environment and connects users to the information they need
more quickly.
Analyze
Secondly, organizations need powerful analytics and reporting
capabilities in order to extract actionable intelligence from large
volumes of dark data. This is a twofold challenge: half technical
and half design. The analytics must be accurate, responsive
and exhaustive, but they must also be beautifully visualized to
increase usability, comprehension and insight.
Archive
Finally, to address the problem of dark data in the long term,
data analytics tools must facilitate the transfer of old and unused
data to cheaper archive storage platforms. Cloud-based object
storage is a cheap and highly scalable alternative to costly
primary storage, and with the creation of management policies
based on usage-rates and compliance obligations, organizations
can automate the process of retiring inactive data.
651 results found
in 10 ms
Server-1
Software
MarketingEvents
Sales
To find out more visit www.kazoup.com.
Kazoup brings unstructured file data back under control in 3 steps: search, analyze and archive. Leveraging
beautiful data visualization, policy-based lifecycle management and cheap cloud object storage, Kazoup
helps you realize more value from your data whilst lowering the cost of storage.

Dealing with Dark Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dealing with Dark Data

Similar to Dealing with Dark Data (20)

Recently uploaded

Recently uploaded (20)

Dealing with Dark Data