Your SlideShare is downloading. ×
0
Grab some coffee and enjoy
the pre-show banter before
the top of the hour!
The Dirty Work – Why Data Must Be Reconciled

The Briefing Room
Welcome

Host & Analyst:
Eric Kavanagh

Guest:
Geoffrey Malafsky
Twitter Tag: #briefr

The Briefing Room
Mission

!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed an...
Data Reconciliation
Garbage In

GIGO

Garbage Out

GARBAGE
DATA

PERFECT
MODEL

GARBAGE
RESULTS

PERFECT
DATA

GARBAGE
MOD...
§ Current	
  data	
  is	
  disjointed	
  and	
  of	
  low	
  quality	
  
§ Variable	
  use	
  and	
  meaning	
  among	
 ...
“Many CIOs believe data is inexpensive because storage has become
inexpensive. But data is inherently messy – it can be wr...
§  Suffix	
  in	
  source	
  A,	
  prefix	
  in	
  B,	
  neither	
  in	
  C	
  for	
  same	
  (part	
  number,	
  =tle,	
  …...
§  Data	
  Ra=onaliza=on	
  is	
  the	
  process	
  of	
  building	
  and	
  managing	
  a	
  con=nuously	
  
adap=ve	
  ...
Design	
  Ra-onaliza-on	
  Issues	
  
•  Mul=ple	
  data	
  models	
  
•  Conflic=ng	
  defini=ons	
  
•  Similar,	
  suppos...
Ra=onalized	
  Data=Meaningful	
  Analysis,	
  Decision	
  Support,
	
  
Enterprise	
  Applica=ons
	
  

12	
  
Copyright	...
§ Example	
  from	
  DARPA	
  Evidence	
  
Extrac=on	
  &	
  Link	
  Discovery	
  
§ Today’s	
  Situa=on:	
  	
  ~10k	
 ...
Complicated	
  Mixture	
  of	
  Commercial,	
  Custom,	
  Legacy,	
  Services	
  Applica=ons,	
  Data	
  Stores	
  

14	
 ...
15	
  
Copyright	
  Phasic	
  Systems	
  Inc	
  2013	
  
Costs	
  
Business	
  Alignment:	
  Goal,	
  Capability,	
  Architecture	
  
Data	
  Assets:	
  Systems,	
  Owners,	
  Use...
17	
  
Copyright	
  Phasic	
  Systems	
  Inc	
  2013	
  
The Ψ–KORS™ System Model

Point-select data models, codes, entities	
  

18
Copyright	
  Phasic	
  Systems	
  Inc	
  2013	...
Corporate NoSQL™

19
§ DOD	
  CIO	
  
§ Adap=vely	
  blend	
  financial	
  and	
  program	
  data	
  from	
  
mul=ple	
  sources	
  with	
  un...
§ Challenge:	
  Complicated	
  environment	
  with	
  conflic=ng	
  data	
  
values,	
  standards,	
  business	
  uses	
  ...
§ Yet,	
  there	
  were	
  problems	
  with	
  the	
  most	
  
basic	
  data	
  fields,	
  which	
  for	
  the	
  Navy,	
 ...
§ Bridge	
  Organiza=ons,	
  Processes,	
  Technologies	
  to	
  Data	
  
Concepts	
  

23	
  
Logical	
  Models	
  derive	
  directly	
  from	
  conceptual	
  and	
  use	
  business	
  terms	
  

24	
  
•  Promulgate	
  key	
  technologies	
  to	
  help	
  field	
  overcome	
  major	
  
obstacles	
  
•  Iden=fy	
  cause	
  a...
Twitter Tag: #briefr

The Briefing Room
Upcoming Topics

November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS
2014 Editorial Calendar at

www.insideanaly...
Thank You
for Your
Attention

Twitter Tag: #briefr

The Briefing Room
The Dirty Work -- Why Data Must Be Reconciled
Upcoming SlideShare
Loading in...5
×

The Dirty Work -- Why Data Must Be Reconciled

1,106

Published on

The Briefing Room with Eric Kavanagh and the PSI-KORS Institute
Live Webcast Nov. 12, 2013
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7727087&rKey=66b1fa7d82868199

Let's face it -- most enterprise information systems are a mess. That's often due to grunt work which was overlooked months or years ago and had nothing to do with you, except that you inherited it. Some mistakes can be swept under the rug for a while, but sooner or later, garbage in results in very expensive garbage out.

Register for this episode of the Briefing Room to hear Senior Analyst Eric Kavanagh outline a roadmap from the past into the possible futures of the information economy. He'll be briefed by Dr. Geoffrey Malafsky, Founder and Data Scientist for the PSI-KORS Institute, a new organization focused on data reconciliation. Malafsky will share his institute's methodology and explain how the process of doing the dirty work can yield tremendous benefits.

Visit InsideAnalysis.com for more information

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,106
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The Dirty Work -- Why Data Must Be Reconciled"

  1. 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  2. 2. The Dirty Work – Why Data Must Be Reconciled The Briefing Room
  3. 3. Welcome Host & Analyst: Eric Kavanagh Guest: Geoffrey Malafsky Twitter Tag: #briefr The Briefing Room
  4. 4. Mission !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room
  5. 5. Data Reconciliation Garbage In GIGO Garbage Out GARBAGE DATA PERFECT MODEL GARBAGE RESULTS PERFECT DATA GARBAGE MODEL GARBAGE RESULTS Twitter Tag: #briefr The Briefing Room
  6. 6. § Current  data  is  disjointed  and  of  low  quality   § Variable  use  and  meaning  among  systems  even  for  “same”   data  elements   § Undocumented  defini=ons  and  data  mgmt  processes   § Errors  in  data  systems   § Disagreement  among  data  systems   § Lack  of  exis=ng  descrip=ons  for  key  readiness  use  cases   § Legacy  data  systems  have  failed  to  overcome  these   problems  despite  several  years  of  new    marts/houses/ brokers/IPTs/applica=ons  
  7. 7. “Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy – it can be wrong, it can be duplicative, and it can be irrelevant – which means it requires handling, which is where the real expenses come in. ‘The cost of more data is the application and the computing power and the processes to reconcile all these things’,” "While there are a myriad of analytical tools that can be leveraged, a recent study indicated that more than 70% of CMOs feel they are underprepared to manage the explosion of data and ‘lack true insight.’ “ 1.  Wall Street Journal, CIO‘s Big Problem with Big Data, 2012-08-02 2.  Forbes, The CEO/CMO Dilemma: So Much Data, So Little Impact, 2012-07-18 8  
  8. 8. §  Suffix  in  source  A,  prefix  in  B,  neither  in  C  for  same  (part  number,  =tle,  …)?   §  Conflict  syntac=cally  (simplest  case)  and  seman=cally  (most  difficult)   §  Other  tools  &  methods  never  solve  this  because  they  deal  with  the   obstacles  independently  or  not  at  all:    Data  values  out-­‐of-­‐sync  with   metadata,  data  models   Different  Meanings  (Legal  and  Business  Ac=vi=es)     NKY HomeSeekers 1.  2.  3.  4.  5.  Texas Create  table  –  =tle  aligned  to  business  =  Garage   Create  vocabulary:  spaces.descrip=on,  spaces.na=onal,  spaces.state,  .   Define  ETL  logic   Merge  in  warehouse  and  process  in  virtualiza=on  layer   Change  as  needed   9   Copyright  Phasic  Systems  Inc  2013  
  9. 9. §  Data  Ra=onaliza=on  is  the  process  of  building  and  managing  a  con=nuously   adap=ve  data  environment  that  fuels  current  and  future  business  needs  for   decision  making  and  system  opera=ons   §  It  ensures  data  (i.e.  not  just  metadata)  is  as  accurate,  meaningful,  and   useful  as  possible  while  con=nuously  adjus=ng  to  improve  and  add   capability   §  It  provides  collabora=ve  management  of  data  assets,  the  designs  governing   who,  why,  and  how  of  data  ,  and  the  where,  when,  how  of  data  use  in   opera=onal  systems   §  It  solves  the  great  challenge  of  mapping  all  source  values  to  each  target   along  the  en=re  complex  paths  of  enterprise  data  use   §  Consolidated  values  when  possible  with  con=nuous  improvement   §  Simplified  and  adap=ve  mapping  with  Corporate  NoSQL   10  
  10. 10. Design  Ra-onaliza-on  Issues   •  Mul=ple  data  models   •  Conflic=ng  defini=ons   •  Similar,  supposedly  similar,  opera=onally   dis=nct  values   •  Unknown  business  logic   •  Mul=ple  ETL  mappings   Design  Ra-onaliza-on   •  •  •  •  •  Consolidated,  adap=ve  data  models   Standardized  defini=ons   Synchronized  dis=nct  opera=onal  values   Managed  business  logic   Coordinated  ETL  mappings   System  Ra-onaliza-on  Issues   •  •  •  •  •  Mul=ple  database  systems   Conflic=ng  formats   Redundant  storage   Unsynchronized  values   Mul=ple  integra=on  points   System  Ra-onaliza-on   •  •  •  •  •  Consolidated,  adap=ve  systems   Common,  interoperable  formats   Common  storage   Synchronized  interfaces   Coordinated  integra=on   11   Copyright  Phasic  Systems  Inc  2013  
  11. 11. Ra=onalized  Data=Meaningful  Analysis,  Decision  Support,   Enterprise  Applica=ons   12   Copyright  Phasic  Systems  Inc  2013  
  12. 12. § Example  from  DARPA  Evidence   Extrac=on  &  Link  Discovery   § Today’s  Situa=on:    ~10k   messages/day  from  mul=ple   sources  read  by  mul=ple   analysts  and  analyzed  in   mul=ple  manual  non-­‐integrated   tools   § Similar  to  Social  Network   Analysis   13  
  13. 13. Complicated  Mixture  of  Commercial,  Custom,  Legacy,  Services  Applica=ons,  Data  Stores   14   Copyright  Phasic  Systems  Inc  2013  
  14. 14. 15   Copyright  Phasic  Systems  Inc  2013  
  15. 15. Costs   Business  Alignment:  Goal,  Capability,  Architecture   Data  Assets:  Systems,  Owners,  Use   16
  16. 16. 17   Copyright  Phasic  Systems  Inc  2013  
  17. 17. The Ψ–KORS™ System Model Point-select data models, codes, entities   18 Copyright  Phasic  Systems  Inc  2013  
  18. 18. Corporate NoSQL™ 19
  19. 19. § DOD  CIO   § Adap=vely  blend  financial  and  program  data  from   mul=ple  sources  with  unclear,  undocumented   alignment  and  integra=on  logic  (i.e.  this  is  an   intelligence  challenge)  into  BI  tools  (QlikView,  Tableau,   PentaHo,  Excel  Web  Apps-­‐Sharepoint)   § Export  Development  Canada   § Ra=onalize  core  data  distributed  and  undocumented  to   feed  cross-­‐enterprise  governance  and  develop   Enterprise  Data  Model  with  seman=cally  adjudicated   canonical  en==es   Copyright  Phasic  Systems  Inc  2013   20  
  20. 20. § Challenge:  Complicated  environment  with  conflic=ng  data   values,  standards,  business  uses  cases,  and  lack  of   documenta=on.  Data  owned  by  4  major  organiza=on,  in  mul=ple   Warehouses  and  data  stores,  redundant  non-­‐reconciled  sets  of   data   § Requirement:  Integrated,  common,  accurate  data  to  enable  new   Integrated  workforce  planning,  training,  management   applica=on  (“Sailor  of  the  Future”)  for  1  million  people   § Prior  Ac-vi-es:  10+  years  of  system  integra=on,  data   warehouse,  data  governance  efforts  à  no  improvement,  poor   coordina=on  across  organiza=ons  and  systems   21  
  21. 21. § Yet,  there  were  problems  with  the  most   basic  data  fields,  which  for  the  Navy,  include   things  like     § billet  (effec=vely  a  job  but  also  includes  other   characteris=cs),     § rank  (similar  to  seniority  but  with  formal  rules  that  change   over  =me),     § ra=ng  (similar  to  voca=onal  ability  but  also  with  changing   rules),     § and  even  the  primary  iden=fier  of  a  person  the  Social   Security  Number  (SSN).     22  
  22. 22. § Bridge  Organiza=ons,  Processes,  Technologies  to  Data   Concepts   23  
  23. 23. Logical  Models  derive  directly  from  conceptual  and  use  business  terms   24  
  24. 24. •  Promulgate  key  technologies  to  help  field  overcome  major   obstacles   •  Iden=fy  cause  and  existence  of  seman=c  conflicts   •  Determine  op=ons   •  Promote  enterprise  decision  making  on  solu=on   •  Implement  solu=on  into  opera=onal  data   •  Visible  direct  line  from  governance  to  data  modeling  to   integra=on  to  database  engineering  to  analysis  and  back   again   •  Rapid  cycle  =me:  iden=fy,  assess,  decide,  execute   con=nuously  in  natural  organiza=onal  =meline  (days/weeks)   •  Community  version  DataStar  for  non-­‐commercial  use   •  Collabora=ve  community  communica=on  and  design  of  common,   seman=cally  clear  Corporate  NoSQL  models  
  25. 25. Twitter Tag: #briefr The Briefing Room
  26. 26. Upcoming Topics November: DATA DISCOVERY & VISUALIZATION December: INNOVATORS 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room www.insideanalysis.com Twitter Tag: #briefr The Briefing Room
  27. 27. Thank You for Your Attention Twitter Tag: #briefr The Briefing Room
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×