Advanced redaction technology can help governments comply with privacy laws by automating the removal of sensitive information from records. Automated redaction software uses optical character recognition to locate personal data like social security numbers and then redacts it. This saves significant time and costs compared to manual redaction. The accuracy of the software impacts verification needs and costs, with higher accuracy software requiring less human review. As privacy laws continue to expand the types of protected data, automated redaction helps governments scale redaction efforts over growing volumes of records.
Advanced Redaction Technology Cuts Costs and Secures Data
1.
2. Advanced RedactionTechnology: How to Provide Secure Access,
Reduce Costs and Anticipate Future Legislative Requirements
More than 11 million adult Americans were victims of Identity Theft in 2009, a 10%
increase from 2008. The collective cost of these crimes was over $54 billion. Private
information, such as Social Security numbers, within online public records can be
vulnerable to cyber criminals. It is estimated that approximately 10% of Identity Theft
cases originate with personal data collected from government records.
A Growing Legislative Concern
Across the United States, strict legislation is being passed requiring State and County
governments to redact sensitive information, such as Social Security and credit card
numbers from the official and public record. North Carolina, Pennsylvania, Iowa and
Wisconsin recently passed records modernization laws mandating the redaction of
Social Security numbers. In some states, forward-thinking local and state government
officials have independently determined that it is their responsibility to protect
constituents from identity theft.
Source information gathered from
National Conference of State Legislators
Updated May 2010
1
3. Redaction Defined
Redaction, sometimes referred to as sanitization, is the permanent removal of personal
or sensitive information from hard copy or electronic documents. The traditional
technique of redacting confidential material from a paper document before its public
release involves crossing out portions of text with a wide black pen, followed by
photocopying the result. This manual processing of thousands or millions of document
pages is a time-consuming process that can strain staff resources. As public records
repositories shift from paper documents to electronic images, the challenges facing
state and local governments are also shifting. Complying with state legislation and
Federal “Sunshine” laws, such as the Freedom of Information Act and Openness in
Government Act, requires a records management system, strategy and workflow that
provide data security and accessibility. Deploying automated redaction technology is a
powerful tool for securing personal data and maintaining public record access.
Automated Redaction Technology
As public records are scanned, the electronic images are processed through Optical
Character Recognition (OCR) software that converts it into a digital format. This
conversion allows the document to become “searchable” by a rules-based search
engine that locates sensitive data within the OCR results. The search engine is
powered by rules, clues, pattern recognition and algorithms designed to locate user-
defined sensitive data types. After locating a sensitive data type, the software assigns a
value that measures how well the data matches the pattern and clues. For example,
the search engine may find a Social Security number by finding the clue “SSN” followed
by a pattern of numbers such as 123-45-6789. This example falls into the “high
confidence” range of results where the clue and the number pattern found by the search
engine are an exact match. On another document, the search engine may find the clue
“SSN” followed by eight numbers instead of the standard nine-character Social Security
number. This result may be defined as “medium confidence.” These values or
“confidence” classifications are used to streamline verification workflow.
Accuracy
Accuracy refers to a mathematical calculation involving the number of sensitive data
fields found by the software compared to the total number of sensitive data types within
the record. False positives occur when software locates non-sensitive data and marks it
for redaction. This type of error is included in the overall accuracy rate.
Accuracy is arguably the most important feature of automated redaction. Because no
industry standard exists for calculating accuracy, evaluating and comparing the
accuracy rates among redaction providers can be challenging. To help facilitate the
2
4. evaluation process, vendors’ accuracy formulas must be transparent and
straightforward. If each sensitive data type undetected by the software represents a
failure to protect a citizen’s private information, it stands to reason that the software’s
accuracy rate should be downgraded for every occurrence of this type.
Pre-verification accuracy calculates how well the software locates sensitive data
automatically, without human intervention for quality assurance (verification). Achieving
a high pre-verification accuracy rate is critical for two reasons. When redaction software
automatically finds virtually all sensitive data within records, the security of individuals’
personal data is increased. Additionally, high pre-verification accuracy dramatically
reduces labor costs.
Verification
An important part of any redaction workflow is verification or quality control. The two
most influential factors that affect verification are the quality of the paper records before
scanning and the targeted level of accuracy (higher accuracy requires more
verification). Verification workflow is based on the particular needs of each client, and
generally includes three options: 1) Fully Automated Redaction, where the software
finds and redacts sensitive information automatically. 2) Semi-Automated Redaction
allows a step for an end user to verify each redaction. 3) A Hybrid Redaction approach
allows user-defined “high confidence” redactions to be automatically processed while
lower confidence results are submitted for verification.
Impact of Pre-Verification Accuracy on Labor Costs
To demonstrate the relationship between software accuracy and verification labor costs,
here is an example of a government office processing 40,000 image pages of records
per month utilizing the Hybrid Redaction workflow. Software #1 has a pre-verification
accuracy rate of 80% and Software #2 has an accuracy rate of 99%.
Verification Labor Costs:
40,000 Pages of Records/Month Using Hybrid Redaction Workflow
Software #1 Software #2
(80% Accuracy) (99% Accuracy)
Pages Processed/Day 1,905 1,905
Pages to be Verified/Day 381 19
Verification Labor in Hours/Day 1.5 0.08
Verification Labor in Hours/Month 31.5 1.7
Verification Labor in Hours/Year 378 20
~ Annual Verification Labor Costs $7,500 $400
3
5. Selecting a Redaction Provider
The redaction vendor selection process should consider 1) experience, 2) accuracy and
3) overall technology.
1. Experienced redaction providers have completed installations with many different
types of records management software. The exposure to different systems helps
seasoned customer support teams anticipate problems before they happen. Verification
labor is often the highest cost within a redaction project. Working with a team
experienced in verification workflow maximizes accuracy, minimizes human intervention
and saves money.
2. The quality of paper records and the complexity of the data to be redacted have an
impact on the accuracy that can achieved for each project. Under most circumstances,
high quality automated redaction can achieve a pre-verification accuracy rate of 95%.
3. Redaction is an evolving technology. Top vendors are constantly adding new
technology to improve accuracy and speed, and to meet the emerging needs of
governments.
Privacy and Information Security Regulations: What Does the Future Hold?
The threat of unauthorized access to sensitive information within public records is
unlikely to diminish in the near future. This proliferation may pave the way for additional
federal and state data security measures. Government offices that are complying with
existing regulations to redact Social Security numbers may face additional legislative
mandates in the future that require the redaction of additional data types. In fact, this is
already happening.
In 2003, the Florida legislature mandated the redaction of Financial Account information
including bank, credit and debit card numbers. Similarly, the Nevada legislature issued
a revised statute in 2006 to mandate the redaction of Drivers’ License numbers,
Identification Card numbers and Financial Account information including bank, credit
and debit card numbers. Government agencies can successfully navigate the redaction
of additional data fields in the future by leveraging today’s technology. As records are
being processed, reports can be created that identify specific documents that contain
the additional data fields (credit card number, drivers’ license number) that may need to
be redacted in the future. This captured information can be used to create a budget for
the additional verification process and to isolate suspected images for automatic/manual
redaction processing. The passage of time presents some problems for this approach.
Documents change and data capture tools and techniques improve rapidly. Using rules
and capture technology from a previous project may decrease accuracy and/or increase
verification labor costs. To maintain accuracy and keep manual labor costs low, a
better solution may be to save the OCR output from the original project to avoid
incurring the cost of rescanning, and write new custom rules for subsequent mandates
as they arise.
4
6. Conclusion
At a minimum, redaction software can help government agencies make public
information available in a secure manner. Advanced technology can be harnessed to
save labor costs and eliminate a significant percentage of tedious data entry tasks.
Government agencies can gain significant, ongoing benefits from selecting a software
partner with leading edge technology.
ID Shield Redaction Software
ID Shield is a proven, cost-effective redaction solution that permanently removes private
information within records and documents. ID Shield Redaction Software customers
have redacted over one billion images. Extract Systems offers server-based and
desktop redaction software.
About the Author
Mark Miller is Vice President of Sales for Extract Systems, a leading provider of best-in-
class data capture and redaction software. Extract’s products are built to adapt and
integrate into any type of environment, providing flexibility, scalability and efficiency. The
productivity gains achieved with Extract Systems’ data automation solutions save
organizations money, improve workflow and eliminate paper.
For more information, please contact:
Extract Systems, LLC
6418 Normandy Lane, Suite 200
Madison, WI 53719
Phone: (877) 778-2543 or (608) 216-7950
E-mail: mark_miller@extractsystems.com
www.extractsystems.com
Sources:
Javelin’s 2010 Consumer Identity Fraud Report
National Conference of State Legislatures
5