SlideShare a Scribd company logo
1 of 21
Auditing & Data
Science
Narrowing the Gap
Traditional tools will continue to evolve
3/16/2020 2
EXCEL, ACL, IDEA, AND OTHERS WILL
CONTINUE TO ADD FEATURES
THEIR ORIENTATION IS TOWARD
“KNOWN UNKNOWNS” RATHER THAN
“UNKNOWN UNKNOWNS”
Data science & Complexity
•Why sample?
•Are visualizations good enough?
•Do you have tools of discovery,
such as forecasting, clustering,
and anomaly detection?
3/16/2020 3
Analytics tools advance daily but
that’s not the most important
thing …
• Profound
changes in the
philosophy of
data science
• Underlying
algorithms are
moving from
simply
following
instructions
to “thinking”
(at least a
little).
3/16/2020 4
This
presentation
does not
include
Machine
Learning …
3/16/2020 5
Machine learning routines require considerable
expertise and smarts
It also means the surrender of humans to the
machines for a narrow section of
thought/prediction
Theoretically, from the 1930’s (Alan Turing) to
the early 2000’s – you could look at the program
code and follow its progression
Machine learning is only a WRAPPER of code. Its
real behavior comes from millions & billions of
data points. Not possible to understand.
Tools & Algorithms
• Clustering – where’s Waldo on steroids
• Prediction – yea, this is way more than Excel
• Gaps & discontinuities
• Credit card fraud
• Sentiment analysis: Like my stuff? Think it stinks?
• Text analytics: Billy Bob (purchasing) next to Sally
Mae (vendor) in 10K emails?
• Anomaly detection
• R has more than 12K packages, many of them
vertical apps
6
Many R packages
are industry
specific. Some are
even
understandable.
Python has many
as well
3/16/2020 7
What does Harvey
talk about in his
emails?
8
What is not said is as important as what is said. For
example, you do not see the phrase “Bill, you ignorant pig
….”
3/16/2020 9
Let’s run some
code
3/16/2020 1
0
Stuff to think about
3/16/2020 11
ANOMALIES BIAS COMMON SENSE SAMPLING CHERRY PICKING LONG TERM
MANAGEMENT CAPITAL
HEDGE FUND. DUH! DEAR
HERR DOCKTOR PH.DS –
PLEASE BE AWARE THAT
NOT ALL POPULATIONS
ARE “NORMAL.” YOU GOT
THE TAILS WRONG AND
WE (TAXPAYERS) PAID
FOR YOUR IDIOT MISTAKE
3/16/2020 12
3/16/2020 13
Kudos to Microsoft/Excel.
Enter dates and values.
Highlight. Click data, forecast
sheet. You can specify the
number of periods ahead you
want. This is a TIME Series
projection (better than
moving average).
3/16/2020 14
Data = New Zealand
government core expenditures
since 1972
3/16/2020 15
Data science thinking: How do you look at a large log
of, for example, 20,000 time stamps to determine if
the log shows someone turned off recording for a few
minutes or hours?
One approach: create an artificial file (via a program
or even Excel) with all dates/times for the period. For
example, if the log collects data every 5 minutes, show
1/31/20 8:00am, 1/31/20 8:05am, and so on.
Perform an “anti-join” to show records in your artificial
file with no match on your log file. This could be done
in R with an actual antijoin statement or in Excel using
Vlookup.
Any other ways?
3/16/2020 16
GUIs are great
for
exploration.
Excel Pivot
tables, with
timeline and
slicer are cool
3/16/2020 17
Data science is as
much about
graphics as
anything else.
Complexity not
required.
3/16/2020 18
Questions –
Discussion
(half duplex,
please)
3/16/2020 19
3/16/2020 20
This stuff is fun … call or email me with questions,
ideas to solve audit problems or suggestions for
research. Thanks for your time today.
• Bill Yarberry, CPA
• ICCM Consulting
• 713.582.6275
• byarberry@iccmconsulting.net
3/16/2020 21

More Related Content

What's hot

Evolution of the Spreadsheet
Evolution of the SpreadsheetEvolution of the Spreadsheet
Evolution of the SpreadsheetRumble Marketing
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleDomino Data Lab
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceEnes Bolfidan
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on FinanceRoger Fried
 

What's hot (12)

Data science
Data scienceData science
Data science
 
Evolution of the Spreadsheet
Evolution of the SpreadsheetEvolution of the Spreadsheet
Evolution of the Spreadsheet
 
Data Science
Data ScienceData Science
Data Science
 
Data science
Data scienceData science
Data science
 
Data science
Data science Data science
Data science
 
data science
data sciencedata science
data science
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Data science
Data scienceData science
Data science
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on Finance
 
Post-Science Business
Post-Science BusinessPost-Science Business
Post-Science Business
 

Similar to Auditing and Data Science

SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniquesijistjournal
 
Things you need to know about big data
Things you need to know about big dataThings you need to know about big data
Things you need to know about big dataLantern Institute
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...ijistjournal
 
Data science and the art of persuasion
Data science and the art of persuasionData science and the art of persuasion
Data science and the art of persuasionAlex Clapson
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdfUSDSI
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st centuryMartinFrigaard
 

Similar to Auditing and Data Science (20)

SWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning TechniquesSWOT of Bigdata Security Using Machine Learning Techniques
SWOT of Bigdata Security Using Machine Learning Techniques
 
Things you need to know about big data
Things you need to know about big dataThings you need to know about big data
Things you need to know about big data
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...
A STUDY- KNOWLEDGE DISCOVERY APPROACHESAND ITS IMPACT WITH REFERENCE TO COGNI...
 
Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
Benefits of big data
Benefits of big dataBenefits of big data
Benefits of big data
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Data Science for Finance Interview.
Data Science for Finance Interview. Data Science for Finance Interview.
Data Science for Finance Interview.
 
Data science and the art of persuasion
Data science and the art of persuasionData science and the art of persuasion
Data science and the art of persuasion
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
15 DATA SCIENCE TRENDS TO RULE IN 2023.pdf
 
Business with Big data
Business with Big dataBusiness with Big data
Business with Big data
 
Data scientist
Data scientistData scientist
Data scientist
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Big data
Big dataBig data
Big data
 
data science
data sciencedata science
data science
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Auditing and Data Science

  • 2. Traditional tools will continue to evolve 3/16/2020 2 EXCEL, ACL, IDEA, AND OTHERS WILL CONTINUE TO ADD FEATURES THEIR ORIENTATION IS TOWARD “KNOWN UNKNOWNS” RATHER THAN “UNKNOWN UNKNOWNS”
  • 3. Data science & Complexity •Why sample? •Are visualizations good enough? •Do you have tools of discovery, such as forecasting, clustering, and anomaly detection? 3/16/2020 3
  • 4. Analytics tools advance daily but that’s not the most important thing … • Profound changes in the philosophy of data science • Underlying algorithms are moving from simply following instructions to “thinking” (at least a little). 3/16/2020 4
  • 5. This presentation does not include Machine Learning … 3/16/2020 5 Machine learning routines require considerable expertise and smarts It also means the surrender of humans to the machines for a narrow section of thought/prediction Theoretically, from the 1930’s (Alan Turing) to the early 2000’s – you could look at the program code and follow its progression Machine learning is only a WRAPPER of code. Its real behavior comes from millions & billions of data points. Not possible to understand.
  • 6. Tools & Algorithms • Clustering – where’s Waldo on steroids • Prediction – yea, this is way more than Excel • Gaps & discontinuities • Credit card fraud • Sentiment analysis: Like my stuff? Think it stinks? • Text analytics: Billy Bob (purchasing) next to Sally Mae (vendor) in 10K emails? • Anomaly detection • R has more than 12K packages, many of them vertical apps 6
  • 7. Many R packages are industry specific. Some are even understandable. Python has many as well 3/16/2020 7
  • 8. What does Harvey talk about in his emails? 8 What is not said is as important as what is said. For example, you do not see the phrase “Bill, you ignorant pig ….”
  • 11. Stuff to think about 3/16/2020 11 ANOMALIES BIAS COMMON SENSE SAMPLING CHERRY PICKING LONG TERM MANAGEMENT CAPITAL HEDGE FUND. DUH! DEAR HERR DOCKTOR PH.DS – PLEASE BE AWARE THAT NOT ALL POPULATIONS ARE “NORMAL.” YOU GOT THE TAILS WRONG AND WE (TAXPAYERS) PAID FOR YOUR IDIOT MISTAKE
  • 14. Kudos to Microsoft/Excel. Enter dates and values. Highlight. Click data, forecast sheet. You can specify the number of periods ahead you want. This is a TIME Series projection (better than moving average). 3/16/2020 14 Data = New Zealand government core expenditures since 1972
  • 15. 3/16/2020 15 Data science thinking: How do you look at a large log of, for example, 20,000 time stamps to determine if the log shows someone turned off recording for a few minutes or hours? One approach: create an artificial file (via a program or even Excel) with all dates/times for the period. For example, if the log collects data every 5 minutes, show 1/31/20 8:00am, 1/31/20 8:05am, and so on. Perform an “anti-join” to show records in your artificial file with no match on your log file. This could be done in R with an actual antijoin statement or in Excel using Vlookup. Any other ways?
  • 17. GUIs are great for exploration. Excel Pivot tables, with timeline and slicer are cool 3/16/2020 17
  • 18. Data science is as much about graphics as anything else. Complexity not required. 3/16/2020 18
  • 21. This stuff is fun … call or email me with questions, ideas to solve audit problems or suggestions for research. Thanks for your time today. • Bill Yarberry, CPA • ICCM Consulting • 713.582.6275 • byarberry@iccmconsulting.net 3/16/2020 21

Editor's Notes

  1. Discuss how audit has fallen behind simply because data science is evolving so quickly
  2. I’m not suggesting in any way that audit throw away the traditional tools; but they are not oriented to discovery
  3. I’m not suggesting in any way that audit throw away the traditional tools; but they are not oriented to discovery
  4. Early days of statistics – get the most information out of the least possible data; now what are the possibilities if I have millions, billions or even trillions of points of data.
  5. Not enough time but it is an extremely important topic. Robots in factories, identification of rogue behavior, just about any narrow expertise can be gained. I ran a ML routine for determination of breast cell malignancy – 30 characteristics, 98% accuracy.
  6. How would you do it? One variable distance from another, easy. Millions of trans per second. Is this fraudulent? Age, income, $ history, sex, location, abrupt shift. Sentiment analysis: lexicon of good and bad words (various ones).
  7. Some tools are really easy to use.
  8. Diederik Stapel, Dutch psychologist, found to have falsified data in 30 papers. Extreme values more probable than in normal distributions. Russia defaulted. Late 1990s
  9. M,athematical skepticism. Think rational, dispassionate, not ape-to-ape social dominance. SSC is a legitimate trading algorithm
  10. Rattle, plot.ly
  11. Don’t make viewer work too hard.
  12. What is the sweet spot between too much time devoted to analytics and missing valuable information? What are some ways to improve data awareness & opportunities? Where to get objective advice? Nobody pushes tools they can’t make money on.
  13. What is the sweet spot between too much time devoted to analytics and missing valuable information? What are some ways to improve data awareness & opportunities? Where to get objective advice? Nobody pushes tools they can’t make money on.