1
Words that Matter
Application of Text Analytics
Topics
 Business Questions
 Success Strategy
 Project Steps
 Technical Solution
 Analytic Requirements
 Results
 Business Application
 Lessons Learned
2
Business Questions
 How well has the Office of the Inspector General (OIG)
fulfilled its mission?
 How can the OIG prioritize final rule reviews?
• Did common terms in public comments appear in final rules?
• What sentiment did public comments express?
3
Success Strategy
 Sizing the Project
• Data – Available, Processable, Standardized
• Security Concerns – factor in information security governance
 Seeking an Executive Champion
• Do they support the answer value?
• To what extent will they fund the project (budgetary
considerations)?
 Repeating a Quick Win
• Is the project repeatable to gain support for subsequent
projects?
4
 Engaged management buy-in for questions
 Assessed security concerns for public facing data
 Contracted technical support and quantitative and
qualitative statistical expertise
 Used Amazon Web Services for infrastructure support
 Used Amazon Marketplace for selecting text mining
tool
 Documented repeatable technical tasks
5
Project Steps
Technical Solution
6
 MarkLogic – platform enabled ability to parse
unstructured text and calculate term frequencies
 Term Frequency Normalization – where N is equal to
the total number of terms within a document or set of
documents
𝑡𝑡𝑡𝑡 𝑡𝑡 =
𝑤𝑤𝑖𝑖 𝑓𝑓
𝑁𝑁
 Gap Concept – differences between normalized
frequencies of baseline terms and corpus documents
7
Analytic Requirement #1
OIG Standards of Work
 Business Question: How well has the Office of the
Inspector General (OIG) fulfilled its mission?
 Answer: OIG could improve its standards of audit work.
8
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
Baseline Terms
Gap
OIG Mission Results
9
-0.015
-0.01
-0.005
0
0.005
0.01
Gap
Baseline Terms
risk
Audit Mission Term Gap Analysis
 “Risk” stood out for key mission terms. This suggests that the OIG
generally balances workload to meet its mission.
 Since “risk” is typically associated with “control” work, the OIG
either has to emphasize more internal control work or the impact of
the work.
 Utilize TeamMate software to standardize audit
planning and execution
 Emphasize internal control risks with project starts
 Emphasize the impact associated with business
question
10
Strategic Planning Application
 Business Question: Did common terms in public
comments appear in final rules?
 Answer: Yes, with varying degrees of intensity enabling
differentiation.
11
Rule Review Results
0%
20%
40%
60%
80%
100%
75FR55410 76FR41398 76FR43851 76FR53172 76FR71626 76FR80674 77FR20128 77FR30596 77FR42559 81FR636
Gap Distribution
Gap => + 1% Gap <= -1% -1% < Gap < 1%
 IBM AlchemyAPI – Natural Language Processing
platform, learning algorithm
 Scoring Mechanism – Positive, Neutral, Negative
 Sentiment Attributes – Mixed Sentiment
 Limitations of Exercise
• Number of Available Comments for Each Rule
• Data Quality – Data Capture, PDF’s, Noise
• Document Level vs Entity Level
• False Positives
12
Analytic Requirement #2
 Business Question: What sentiment did public comments express?
 Answer: The majority of public comments are positive towards
proposed rules.
13
Rule Review Results
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
76FR41398 76FR43851 76FR53172 76FR80674 77FR20128 77FR30596 77FR42559 81FR636 76FR71626
Sentiment Distribution by Dodd-Frank Rule
positive negative neutral
 Text mining tools,
with some limitations,
are useful in
prioritizing OIG
reviews of final rules.
 Three rules in the
negative quadrants
should be considered
for further study.
14
Strategic Planning Application
Negative Positive
Positive
77FR20128 76FR41398
76FR43851
76FR71626
77FR30596
Negative 77FR42559
76FR53172
75FR55410
Sentiment
Term Frequency Gap
81FR636
76FR80674
Lessons Learned—Success Strategy
15
?
 Sizing the Project
• Data – Available, Processable, Standardized
• Security Concerns – factor in information security governance
 Seeking an Executive Champion
• Do they support the answer value?
• To what extent will they fund the project (budgetary
considerations)?
 Repeating a Quick Win
• Is the project repeatable to gain support for subsequent
projects?

Words that Matter

  • 1.
  • 2.
    Topics  Business Questions Success Strategy  Project Steps  Technical Solution  Analytic Requirements  Results  Business Application  Lessons Learned 2
  • 3.
    Business Questions  Howwell has the Office of the Inspector General (OIG) fulfilled its mission?  How can the OIG prioritize final rule reviews? • Did common terms in public comments appear in final rules? • What sentiment did public comments express? 3
  • 4.
    Success Strategy  Sizingthe Project • Data – Available, Processable, Standardized • Security Concerns – factor in information security governance  Seeking an Executive Champion • Do they support the answer value? • To what extent will they fund the project (budgetary considerations)?  Repeating a Quick Win • Is the project repeatable to gain support for subsequent projects? 4
  • 5.
     Engaged managementbuy-in for questions  Assessed security concerns for public facing data  Contracted technical support and quantitative and qualitative statistical expertise  Used Amazon Web Services for infrastructure support  Used Amazon Marketplace for selecting text mining tool  Documented repeatable technical tasks 5 Project Steps
  • 6.
  • 7.
     MarkLogic –platform enabled ability to parse unstructured text and calculate term frequencies  Term Frequency Normalization – where N is equal to the total number of terms within a document or set of documents 𝑡𝑡𝑡𝑡 𝑡𝑡 = 𝑤𝑤𝑖𝑖 𝑓𝑓 𝑁𝑁  Gap Concept – differences between normalized frequencies of baseline terms and corpus documents 7 Analytic Requirement #1
  • 8.
    OIG Standards ofWork  Business Question: How well has the Office of the Inspector General (OIG) fulfilled its mission?  Answer: OIG could improve its standards of audit work. 8 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 Baseline Terms Gap
  • 9.
    OIG Mission Results 9 -0.015 -0.01 -0.005 0 0.005 0.01 Gap BaselineTerms risk Audit Mission Term Gap Analysis  “Risk” stood out for key mission terms. This suggests that the OIG generally balances workload to meet its mission.  Since “risk” is typically associated with “control” work, the OIG either has to emphasize more internal control work or the impact of the work.
  • 10.
     Utilize TeamMatesoftware to standardize audit planning and execution  Emphasize internal control risks with project starts  Emphasize the impact associated with business question 10 Strategic Planning Application
  • 11.
     Business Question:Did common terms in public comments appear in final rules?  Answer: Yes, with varying degrees of intensity enabling differentiation. 11 Rule Review Results 0% 20% 40% 60% 80% 100% 75FR55410 76FR41398 76FR43851 76FR53172 76FR71626 76FR80674 77FR20128 77FR30596 77FR42559 81FR636 Gap Distribution Gap => + 1% Gap <= -1% -1% < Gap < 1%
  • 12.
     IBM AlchemyAPI– Natural Language Processing platform, learning algorithm  Scoring Mechanism – Positive, Neutral, Negative  Sentiment Attributes – Mixed Sentiment  Limitations of Exercise • Number of Available Comments for Each Rule • Data Quality – Data Capture, PDF’s, Noise • Document Level vs Entity Level • False Positives 12 Analytic Requirement #2
  • 13.
     Business Question:What sentiment did public comments express?  Answer: The majority of public comments are positive towards proposed rules. 13 Rule Review Results 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 76FR41398 76FR43851 76FR53172 76FR80674 77FR20128 77FR30596 77FR42559 81FR636 76FR71626 Sentiment Distribution by Dodd-Frank Rule positive negative neutral
  • 14.
     Text miningtools, with some limitations, are useful in prioritizing OIG reviews of final rules.  Three rules in the negative quadrants should be considered for further study. 14 Strategic Planning Application Negative Positive Positive 77FR20128 76FR41398 76FR43851 76FR71626 77FR30596 Negative 77FR42559 76FR53172 75FR55410 Sentiment Term Frequency Gap 81FR636 76FR80674
  • 15.
    Lessons Learned—Success Strategy 15 ? Sizing the Project • Data – Available, Processable, Standardized • Security Concerns – factor in information security governance  Seeking an Executive Champion • Do they support the answer value? • To what extent will they fund the project (budgetary considerations)?  Repeating a Quick Win • Is the project repeatable to gain support for subsequent projects?