Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The Dirty Work -- Why Data Must Be Reconciled


Published on

The Briefing Room with Eric Kavanagh and the PSI-KORS Institute …

The Briefing Room with Eric Kavanagh and the PSI-KORS Institute
Live Webcast Nov. 12, 2013
Watch the archive:

Let's face it -- most enterprise information systems are a mess. That's often due to grunt work which was overlooked months or years ago and had nothing to do with you, except that you inherited it. Some mistakes can be swept under the rug for a while, but sooner or later, garbage in results in very expensive garbage out.

Register for this episode of the Briefing Room to hear Senior Analyst Eric Kavanagh outline a roadmap from the past into the possible futures of the information economy. He'll be briefed by Dr. Geoffrey Malafsky, Founder and Data Scientist for the PSI-KORS Institute, a new organization focused on data reconciliation. Malafsky will share his institute's methodology and explain how the process of doing the dirty work can yield tremendous benefits.

Visit for more information

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Dirty Work – Why Data Must Be Reconciled The Briefing Room
  • 3. Welcome Host & Analyst: Eric Kavanagh Guest: Geoffrey Malafsky Twitter Tag: #briefr The Briefing Room
  • 4. Mission !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room
  • 5. Data Reconciliation Garbage In GIGO Garbage Out GARBAGE DATA PERFECT MODEL GARBAGE RESULTS PERFECT DATA GARBAGE MODEL GARBAGE RESULTS Twitter Tag: #briefr The Briefing Room
  • 6. § Current  data  is  disjointed  and  of  low  quality   § Variable  use  and  meaning  among  systems  even  for  “same”   data  elements   § Undocumented  defini=ons  and  data  mgmt  processes   § Errors  in  data  systems   § Disagreement  among  data  systems   § Lack  of  exis=ng  descrip=ons  for  key  readiness  use  cases   § Legacy  data  systems  have  failed  to  overcome  these   problems  despite  several  years  of  new    marts/houses/ brokers/IPTs/applica=ons  
  • 7. “Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy – it can be wrong, it can be duplicative, and it can be irrelevant – which means it requires handling, which is where the real expenses come in. ‘The cost of more data is the application and the computing power and the processes to reconcile all these things’,” "While there are a myriad of analytical tools that can be leveraged, a recent study indicated that more than 70% of CMOs feel they are underprepared to manage the explosion of data and ‘lack true insight.’ “ 1.  Wall Street Journal, CIO‘s Big Problem with Big Data, 2012-08-02 2.  Forbes, The CEO/CMO Dilemma: So Much Data, So Little Impact, 2012-07-18 8  
  • 8. §  Suffix  in  source  A,  prefix  in  B,  neither  in  C  for  same  (part  number,  =tle,  …)?   §  Conflict  syntac=cally  (simplest  case)  and  seman=cally  (most  difficult)   §  Other  tools  &  methods  never  solve  this  because  they  deal  with  the   obstacles  independently  or  not  at  all:    Data  values  out-­‐of-­‐sync  with   metadata,  data  models   Different  Meanings  (Legal  and  Business  Ac=vi=es)     NKY HomeSeekers 1.  2.  3.  4.  5.  Texas Create  table  –  =tle  aligned  to  business  =  Garage   Create  vocabulary:  spaces.descrip=on,,  spaces.state,  .   Define  ETL  logic   Merge  in  warehouse  and  process  in  virtualiza=on  layer   Change  as  needed   9   Copyright  Phasic  Systems  Inc  2013  
  • 9. §  Data  Ra=onaliza=on  is  the  process  of  building  and  managing  a  con=nuously   adap=ve  data  environment  that  fuels  current  and  future  business  needs  for   decision  making  and  system  opera=ons   §  It  ensures  data  (i.e.  not  just  metadata)  is  as  accurate,  meaningful,  and   useful  as  possible  while  con=nuously  adjus=ng  to  improve  and  add   capability   §  It  provides  collabora=ve  management  of  data  assets,  the  designs  governing   who,  why,  and  how  of  data  ,  and  the  where,  when,  how  of  data  use  in   opera=onal  systems   §  It  solves  the  great  challenge  of  mapping  all  source  values  to  each  target   along  the  en=re  complex  paths  of  enterprise  data  use   §  Consolidated  values  when  possible  with  con=nuous  improvement   §  Simplified  and  adap=ve  mapping  with  Corporate  NoSQL   10  
  • 10. Design  Ra-onaliza-on  Issues   •  Mul=ple  data  models   •  Conflic=ng  defini=ons   •  Similar,  supposedly  similar,  opera=onally   dis=nct  values   •  Unknown  business  logic   •  Mul=ple  ETL  mappings   Design  Ra-onaliza-on   •  •  •  •  •  Consolidated,  adap=ve  data  models   Standardized  defini=ons   Synchronized  dis=nct  opera=onal  values   Managed  business  logic   Coordinated  ETL  mappings   System  Ra-onaliza-on  Issues   •  •  •  •  •  Mul=ple  database  systems   Conflic=ng  formats   Redundant  storage   Unsynchronized  values   Mul=ple  integra=on  points   System  Ra-onaliza-on   •  •  •  •  •  Consolidated,  adap=ve  systems   Common,  interoperable  formats   Common  storage   Synchronized  interfaces   Coordinated  integra=on   11   Copyright  Phasic  Systems  Inc  2013  
  • 11. Ra=onalized  Data=Meaningful  Analysis,  Decision  Support,   Enterprise  Applica=ons   12   Copyright  Phasic  Systems  Inc  2013  
  • 12. § Example  from  DARPA  Evidence   Extrac=on  &  Link  Discovery   § Today’s  Situa=on:    ~10k   messages/day  from  mul=ple   sources  read  by  mul=ple   analysts  and  analyzed  in   mul=ple  manual  non-­‐integrated   tools   § Similar  to  Social  Network   Analysis   13  
  • 13. Complicated  Mixture  of  Commercial,  Custom,  Legacy,  Services  Applica=ons,  Data  Stores   14   Copyright  Phasic  Systems  Inc  2013  
  • 14. 15   Copyright  Phasic  Systems  Inc  2013  
  • 15. Costs   Business  Alignment:  Goal,  Capability,  Architecture   Data  Assets:  Systems,  Owners,  Use   16
  • 16. 17   Copyright  Phasic  Systems  Inc  2013  
  • 17. The Ψ–KORS™ System Model Point-select data models, codes, entities   18 Copyright  Phasic  Systems  Inc  2013  
  • 18. Corporate NoSQL™ 19
  • 19. § DOD  CIO   § Adap=vely  blend  financial  and  program  data  from   mul=ple  sources  with  unclear,  undocumented   alignment  and  integra=on  logic  (i.e.  this  is  an   intelligence  challenge)  into  BI  tools  (QlikView,  Tableau,   PentaHo,  Excel  Web  Apps-­‐Sharepoint)   § Export  Development  Canada   § Ra=onalize  core  data  distributed  and  undocumented  to   feed  cross-­‐enterprise  governance  and  develop   Enterprise  Data  Model  with  seman=cally  adjudicated   canonical  en==es   Copyright  Phasic  Systems  Inc  2013   20  
  • 20. § Challenge:  Complicated  environment  with  conflic=ng  data   values,  standards,  business  uses  cases,  and  lack  of   documenta=on.  Data  owned  by  4  major  organiza=on,  in  mul=ple   Warehouses  and  data  stores,  redundant  non-­‐reconciled  sets  of   data   § Requirement:  Integrated,  common,  accurate  data  to  enable  new   Integrated  workforce  planning,  training,  management   applica=on  (“Sailor  of  the  Future”)  for  1  million  people   § Prior  Ac-vi-es:  10+  years  of  system  integra=on,  data   warehouse,  data  governance  efforts  à  no  improvement,  poor   coordina=on  across  organiza=ons  and  systems   21  
  • 21. § Yet,  there  were  problems  with  the  most   basic  data  fields,  which  for  the  Navy,  include   things  like     § billet  (effec=vely  a  job  but  also  includes  other   characteris=cs),     § rank  (similar  to  seniority  but  with  formal  rules  that  change   over  =me),     § ra=ng  (similar  to  voca=onal  ability  but  also  with  changing   rules),     § and  even  the  primary  iden=fier  of  a  person  the  Social   Security  Number  (SSN).     22  
  • 22. § Bridge  Organiza=ons,  Processes,  Technologies  to  Data   Concepts   23  
  • 23. Logical  Models  derive  directly  from  conceptual  and  use  business  terms   24  
  • 24. •  Promulgate  key  technologies  to  help  field  overcome  major   obstacles   •  Iden=fy  cause  and  existence  of  seman=c  conflicts   •  Determine  op=ons   •  Promote  enterprise  decision  making  on  solu=on   •  Implement  solu=on  into  opera=onal  data   •  Visible  direct  line  from  governance  to  data  modeling  to   integra=on  to  database  engineering  to  analysis  and  back   again   •  Rapid  cycle  =me:  iden=fy,  assess,  decide,  execute   con=nuously  in  natural  organiza=onal  =meline  (days/weeks)   •  Community  version  DataStar  for  non-­‐commercial  use   •  Collabora=ve  community  communica=on  and  design  of  common,   seman=cally  clear  Corporate  NoSQL  models  
  • 25. Twitter Tag: #briefr The Briefing Room
  • 26. Upcoming Topics November: DATA DISCOVERY & VISUALIZATION December: INNOVATORS 2014 Editorial Calendar at Twitter Tag: #briefr The Briefing Room
  • 27. Thank You for Your Attention Twitter Tag: #briefr The Briefing Room