Virtual Data Steward: Data Management 3.0


Published on

Every company that is serious about data governance needs data stewards. Data stewards connect business information requirements and processes with information technology capabilities. This function is essential to bridging data management policies and standards to day-to-day operational practices.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Virtual Data Steward: Data Management 3.0

  1. 1. SOLUTION BRIEF The Virtual Data Steward Data Management 3.0 Empower Your Data Stewards to do More With Less Are You Serious About Data Governance? Every company that is serious about data governance needs data stewards. Data stewards connect business information requirements and processes with information technology capabilities. This function is essential to bridging data management policies and standards to day-to-day operational practices. Data stewards improve the reusability, accessibility, and quality of an organization’s data. It is the data steward’s responsibility to approve business-naming standards, develop consistent data definitions, document business rules, monitor the quality of the data in the data repository, and define security requirements. A common and seemingly simple example is looking at two records and determining if they are identical entities after a computer system cannot confidently make that decision. As critical as the role is, many companies struggle with proper data stewardship, and consequently overall data governance. Midsize companies often make due without data stewards early in their growth and pay the price later with data completeness and consistency issues. Large organizations, by their nature, have large amounts of data accumulated over many years. Data may be out of date or inconsistent across applications or company divisions. In each of these instances, organizations that struggle with proper data stewardship eventually face the challenge of potentially basing critical business decisions off of bad or incomplete data. Large companies that grow via acquisition face yet another problem. Aligning the data between the acquiring company and the acquired company can be a daunting task; it may take several quarters, even several years to fix data problems, and often requires temporary staff to be hired and assigned. Despite decades of software development, differences in naming standards and definitions inevitably cause problems that only humans can resolve. And with the amount and variety of data growing as fast as ever – from adding social media handles to geolocation data – what is a company to do? Crowdsourcing is a means to gain access to millions of people willing to perform work for pay, and is the basis for recruiting and training thousands of data stewards to work for your organization. What is Crowdsourcing? Coined by Wired reporter Jeff Howe, “crowdsourcing” is the act of taking a job traditionally performed by a designated person (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call. Some people use the term crowdsourcing broadly to describe many different models, such as crowd funding, crowd design contests, and crowd ideation platforms. For the purposes of this article, we limit crowdsourcing to the act of distributing small, simple tasks – microtasks – among a large group of people online. Virtual Data Steward Advantages • Easily scale throughput up and down • Leverage local knowledge globally • Increase efficiency and quality
  2. 2. Microtasking is the act of dividing a large task into smaller and well-defined microtasks. For example, dividing a customer record to be verified into discrete fields, such as company name, street address, phone number, company website URL, and LinkedIn profile, is microtasking. The idea is that verifying 10 URLs is faster and simpler than verifying 10 complete customer records. Once a person gets good at the URL verification task, they can do it faster and with greater accuracy. Other people in the crowd can take care of verifying street addresses and phone numbers. Yet other people can set off on researching and verifying LinkedIn profiles. Microtasks require human intelligence and therefore are performed online by a person, usually with some amount of research, as opposed to being automated algorithmically. The benefit to microtasking is that a large volume of work can be completed through the crowd with minimal training. Microtasking has many use cases, but works best for low-complexity, high-volume work. Some common uses are: Data Collection and Enhancement •• Finding or appending existing business data with updated information Data Categorization •• Organizing data into predefined categories Content Creation and Moderation •• Creating or reviewing short-form content, such as product descriptions Sentiment Analysis •• Collecting public sentiment on a particular product or service, typically from social media sources So, how then can crowdsourcing help data stewardship? The answer is using the crowd to augment internal data stewardship, in what I term the virtual data steward. Virtual Data Steward The virtual data steward is a person or set of people in the crowd who completes microtasks assigned to them by an internal data steward. Using virtual data stewards has several advantages: Scale Throughput Up and Down An organization can quickly process a large volume of data – backlogs resulting from system migrations or high transactional volumes, for example – and hire the crowd virtual data stewards to process only those tasks. It can scale back down afterward. Leverage Local Knowledge Crowd workers are located in more than 200 countries and have knowledge of regional address conventions, neighborhoods, phone syntax and all kinds of local knowledge an outsourcer in a single country cannot match. Language Skills Crowd workers can speak hundreds of languages and many are capable of translation or transliteration. Increase Efficiency A variable workforce is a less expensive workforce, and is usually more cost effective than hiring employees or outsourced consultants. Increase Quality An option with virtual data stewards is plurality – multiple people completing and verifying individual data elements, which improves overall quality. A Win for the Data Steward Internal data stewards should welcome, rather than fear the emergence of the virtual data steward. Mixing internal and virtual data stewards means: Increase Bandwidth Many internal data stewards are overwhelmed with data and can barely keep up. Virtual data stewards free them up to complete their work. Focus on higher value work Virtual data stewards can take the lower complexity, or country- or languagespecific, work off the plates of internal data stewards. This allows internal staff to work on higher value and higher complexity work, such as business rule definition. “Autodesk processes approximately 100,000 records a month via virtual data stewards.”
  3. 3. Ultimately, having help from virtual data stewards makes the internal data steward’s day-to-day job more fulfilling by reducing some of the monotonous work. Perhaps most interestingly, virtual data stewards make possible the acquisition and validation of entirely new data – data valuable to the organization – such as social media handle or GPS location information, to name a few. Virtual Data Stewards at Autodesk: Enhancing Sales Leads Autodesk is one of the world’s 25 largest software companies. The company provides design, engineering and entertainment software to customers in architecture, manufacturing, building, and media and entertainment. In a move from selling individual products to end-to-end solutions, Autodesk needed a better way to identify its most promising sales leads to incentivize its sales team to pursue solution sales. To do this, Autodesk’s CRM system needed complete data for every lead: industry, company size, parent and child companies, website URL, executive team bios and contact information. This data historically came from multiple sources with varying quality. One source was Dun & Bradstreet, but it could provide enhancement for just 70 percent of Autodesk’s CRM database. Almost a third of Autodesk’s potential sales were not being used to incentivize its sales force. To boost data quality, Autodesk turned to crowdsourcing. Autodesk has a small staff of internal data stewards, and uses the crowd as virtual data stewards. Autodesk funnels business records missing key data into the CrowdFlower’s platform via a direct API connection. Virtual data stewards from the crowd first work on cleaning bad data, then enrich business records with company hierarchy and categorization information to provide critical support for targeting and solution selling. They also cross-check and match entries with the different data sources, de-duplicate redundant information, and categorize by business industry code, allowing accurate reporting of customers and sales by industry. The results are automatically transmitted to Autodesk, where its internal data steward oversees results. Autodesk is currently processing approximately 100,000 records a month via virtual data stewards. To date, Autodesk improved the its data completeness from 70 percent to 85 percent, at a cost that is 75 percent less than paying an outsourcing company. As a side benefit, instead of licensing from data providers as it did in the past, Autodesk retains the data it receives back from the crowd, avoiding annual data licensing fees. Getting Started Companies interested in learning how to leverage the crowd as virtual data stewards should speak to a CrowdFlower crowdsourcing specialist. A specialist can review data requirements and make a recommendation on the best approach to creating virtual data stewards with our crowd. CrowdFlower offers customers the choice of a managed service – with monthly quality and throughput SLAs – or a license to our technology platform to manage the process internally. Our system integration partners also offer a combination of crowdsourcing and data management expertise. About CrowdFlower CrowdFlower combines human intelligence with the scalability and efficiency of computer algorithms to offer quality-ensured processing of business information. CrowdFlower’s platform provides solutions to a wide variety of data needs for enterprises such as product catalog enhancement, content generation, image moderation, and business listing enrichment. With 5 million Crowd Contributors completing millions of judgments each month for over 500 customers, CrowdFlower is the leader in enterprise crowdsourcing. The company has successfully worked with well-respected enterprises including Apple, AT&T, Autodesk, eBay, Ford, LinkedIn, Microsoft, Sears, Toshiba and Twitter. For more information, visit or email CrowdFlower, Inc. • 2111 Mission Street, Suite 302, San Francisco, CA 94110 • (415) 471-1920 Copyright © 2013 CrowdFlower. All rights reserved. CrowdFlower is a registered trademark in the U.S.A. and certain other countries. All other trademarks or registered trademarks, product names and company names or logos cited are the property of their respective owners.