Every company that is serious about data governance needs data stewards. Data stewards connect business information requirements and processes with information technology capabilities. This function is essential to bridging data management policies and standards to day-to-day operational practices.
1. SOLUTION BRIEF
The Virtual Data Steward
Data Management 3.0
Empower Your Data Stewards to do More With Less
Are You Serious About Data
Governance?
Every company that is serious about data
governance needs data stewards. Data
stewards connect business information
requirements and processes with
information technology capabilities. This
function is essential to bridging data
management policies and standards to
day-to-day operational practices.
Data stewards improve the reusability,
accessibility, and quality of an
organization’s data. It is the data
steward’s responsibility to approve
business-naming standards, develop
consistent data definitions, document
business rules, monitor the quality of the
data in the data repository, and define
security requirements. A common and
seemingly simple example is looking at
two records and determining if they are
identical entities after a computer system
cannot confidently make that decision.
As critical as the role is, many companies
struggle with proper data stewardship, and
consequently overall data governance.
Midsize companies often make due
without data stewards early in their
growth and pay the price later with data
completeness and consistency issues.
Large organizations, by their nature, have
large amounts of data accumulated over
many years. Data may be out of date
or inconsistent across applications or
company divisions. In each of these
instances, organizations that struggle
with proper data stewardship eventually
face the challenge of potentially basing
critical business decisions off of bad or
incomplete data.
Large companies that grow via acquisition
face yet another problem. Aligning the
data between the acquiring company and
the acquired company can be a daunting
task; it may take several quarters, even
several years to fix data problems, and
often requires temporary staff to be hired
and assigned.
Despite decades of software development,
differences in naming standards and
definitions inevitably cause problems that
only humans can resolve. And with the
amount and variety of data growing as
fast as ever – from adding social media
handles to geolocation data – what is
a company to do? Crowdsourcing is a
means to gain access to millions of
people willing to perform work for pay,
and is the basis for recruiting and training
thousands of data stewards to work for
your organization.
What is Crowdsourcing?
Coined by Wired reporter Jeff Howe,
“crowdsourcing” is the act of taking a job
traditionally performed by a designated
person (usually an employee) and
outsourcing it to an undefined, generally
large group of people in the form of an
open call. Some people use the term
crowdsourcing broadly to describe many
different models, such as crowd funding,
crowd design contests, and crowd
ideation platforms. For the purposes of
this article, we limit crowdsourcing to the
act of distributing small, simple tasks
– microtasks – among a large group of
people online.
Virtual Data Steward
Advantages
• Easily scale throughput
up and down
• Leverage local knowledge
globally
• Increase efficiency and
quality
2. Microtasking is the act of dividing a
large task into smaller and well-defined
microtasks. For example, dividing a
customer record to be verified into
discrete fields, such as company name,
street address, phone number, company
website URL, and LinkedIn profile, is
microtasking. The idea is that verifying 10
URLs is faster and simpler than verifying
10 complete customer records. Once a
person gets good at the URL verification
task, they can do it faster and with greater
accuracy. Other people in the crowd can
take care of verifying street addresses
and phone numbers. Yet other people
can set off on researching and verifying
LinkedIn profiles.
Microtasks require human intelligence
and therefore are performed online by
a person, usually with some amount of
research, as opposed to being automated
algorithmically. The benefit to microtasking
is that a large volume of work can be
completed through the crowd with minimal
training.
Microtasking has many use cases, but
works best for low-complexity, high-volume
work. Some common uses are:
Data Collection and Enhancement
•• Finding or appending existing business
data with updated information
Data Categorization
•• Organizing data into predefined
categories
Content Creation and Moderation
•• Creating or reviewing short-form
content, such as product descriptions
Sentiment Analysis
•• Collecting public sentiment on a
particular product or service, typically
from social media sources
So, how then can crowdsourcing help
data stewardship? The answer is using
the crowd to augment internal data
stewardship, in what I term the virtual
data steward.
Virtual Data Steward
The virtual data steward is a person or set
of people in the crowd who completes
microtasks assigned to them by an
internal data steward. Using virtual data
stewards has several advantages:
Scale Throughput Up and Down
An organization can quickly process
a large volume of data – backlogs
resulting from system migrations or high
transactional volumes, for example – and
hire the crowd virtual data stewards to
process only those tasks. It can scale
back down afterward.
Leverage Local Knowledge
Crowd workers are located in more than
200 countries and have knowledge
of regional address conventions,
neighborhoods, phone syntax and all
kinds of local knowledge an outsourcer in
a single country cannot match.
Language Skills
Crowd workers can speak hundreds
of languages and many are capable of
translation or transliteration.
Increase Efficiency
A variable workforce is a less expensive
workforce, and is usually more cost
effective than hiring employees or
outsourced consultants.
Increase Quality
An option with virtual data stewards is
plurality – multiple people completing and
verifying individual data elements, which
improves overall quality.
A Win for the Data Steward
Internal data stewards should welcome,
rather than fear the emergence of the
virtual data steward. Mixing internal and
virtual data stewards means:
Increase Bandwidth
Many internal data stewards are
overwhelmed with data and can barely
keep up. Virtual data stewards free them
up to complete their work.
Focus on higher value work
Virtual data stewards can take the lower
complexity, or country- or languagespecific, work off the plates of internal
data stewards. This allows internal
staff to work on higher value and higher
complexity work, such as business rule
definition.
“Autodesk
processes
approximately
100,000 records
a month via virtual
data stewards.”