SlideShare a Scribd company logo
1 of 58
Download to read offline
1	
  |	
  P a g e 	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
   	
  
2	
  |	
  P a g e 	
  
	
  
Executive	
  Summary	
  
	
  
	
   Protochips,	
   like	
   many	
   organizations,	
   has	
   a	
   lot	
   of	
   data	
   that	
   they	
   simply	
   do	
   not	
   know	
   how	
   to	
  
utilize.	
  Our	
  team	
  was	
  lucky	
  enough	
  to	
  be	
  able	
  to	
  get	
  our	
  hands	
  on	
  all	
  of	
  the	
  data	
  that	
  we	
  needed	
  and	
  
had	
  enthusiastic	
  sponsorship	
  from	
  the	
  company.	
  There	
  were	
  most	
  definitely	
  some	
  challenges	
  along	
  the	
  
way,	
  and	
  our	
  team	
  had	
  to	
  maneuver	
  all	
  kinds	
  of	
  obstacles,	
  but	
  in	
  the	
  end,	
  we	
  have	
  been	
  able	
  to	
  answer	
  
the	
  questions	
  that	
  Protochips	
  posed	
  to	
  us	
  and	
  provide	
  them	
  with	
  a	
  Business	
  Intelligence	
  solution	
  that	
  
will	
  supply	
  them	
  with	
  the	
  knowledge	
  they	
  need	
  in	
  order	
  to	
  make	
  strategic	
  and	
  profitable	
  decisions.	
  
	
   Our	
  team	
  has	
  learned	
  a	
  great	
  deal	
  in	
  the	
  course	
  of	
  this	
  project.	
  We	
  have	
  gotten	
  familiar	
  with	
  
new	
  tools,	
  like	
  Talend	
  and	
  Qlikview.	
  We	
  have	
  confirmed	
  (sometimes	
  the	
  hard	
  way)	
  that	
  all	
  of	
  the	
  steps	
  
mentioned	
  in	
  the	
  lectures	
  and	
  textbooks	
  are	
  so	
  important	
  to	
  our	
  understanding	
  of	
  the	
  business	
  as	
  well	
  
as	
  our	
  overall	
  success	
  with	
  the	
  project.	
  We	
  have	
  also	
  grown	
  as	
  individuals,	
  accomplishing	
  things	
  that	
  we	
  
were	
  not	
  sure	
  we	
  could	
  accomplish.	
  We	
  are	
  very	
  proud	
  of	
  our	
  final	
  product	
  and	
  are	
  excited	
  to	
  share	
  the	
  
highs	
  (and	
  lows)	
  of	
  our	
  journey	
  that	
  eventually	
  led	
  us	
  to	
  the	
  finish	
  line.	
  We	
  are	
  so	
  appreciative	
  of	
  the	
  
opportunity	
  to	
  carry	
  out	
  this	
  relevant	
  and	
  meaningful	
  project.	
  
	
  
	
   	
  
3	
  |	
  P a g e 	
  
	
  
Contents	
  
Background	
  ..................................................................................................................................................	
  5	
  
Our	
  Approach	
  ..........................................................................................................................................	
  6	
  
Decisions	
  We	
  Made	
  and	
  Why	
  ..................................................................................................................	
  6	
  
List	
  of	
  Team	
  Members	
  and	
  Responsibilities/Activities	
  ................................................................................	
  8	
  
Stefanie	
  Boros	
  ..........................................................................................................................................	
  8	
  
Shradha	
  Salian	
  .........................................................................................................................................	
  8	
  
Saniya	
  Shukla	
  ...........................................................................................................................................	
  9	
  
Sarah	
  Yousef	
  ............................................................................................................................................	
  9	
  
Changes	
  from	
  Original	
  Proposal	
  ..................................................................................................................	
  9	
  
Technical	
  Architecture	
  Diagram	
  ................................................................................................................	
  10	
  
Samples	
  of	
  Each	
  Data	
  Set	
  ..........................................................................................................................	
  11	
  
Original	
  Salesforce	
  Sample	
  Data	
  ............................................................................................................	
  11	
  
Access	
  -­‐	
  Yield	
  Results	
  Sample	
  Data	
  ........................................................................................................	
  13	
  
Access	
  -­‐	
  Parts	
  List	
  Sample	
  Data	
  .............................................................................................................	
  14	
  
PDF	
  -­‐	
  List	
  Price	
  Sample	
  Data	
  ..................................................................................................................	
  15	
  
Dimensional	
  Models	
  ..................................................................................................................................	
  16	
  
Conceptual	
  Model	
  .................................................................................................................................	
  16	
  
Logical	
  Model	
  .........................................................................................................................................	
  17	
  
Physical	
  Model	
  .......................................................................................................................................	
  18	
  
Sample	
  Data	
  from	
  Dimensional	
  and	
  Fact	
  Tables	
  .......................................................................................	
  19	
  
Data	
  Integration	
  Mappings	
  ........................................................................................................................	
  24	
  
Business	
  Questions	
  ....................................................................................................................................	
  35	
  
Production	
  .............................................................................................................................................	
  35	
  
Sales	
  .......................................................................................................................................................	
  35	
  
Customer	
  ...............................................................................................................................................	
  35	
  
Challenges	
  ..................................................................................................................................................	
  38	
  
Data	
  .......................................................................................................................................................	
  38	
  
Talend	
  ....................................................................................................................................................	
  38	
  
Visualizing	
  the	
  Solutions	
  to	
  the	
  Business	
  Questions	
  ..............................................................................	
  39	
  
Dimensional	
  modelling	
  ..........................................................................................................................	
  39	
  
Project	
  logistics	
  ......................................................................................................................................	
  40	
  
Appendix	
  ....................................................................................................................................................	
  41	
  
4	
  |	
  P a g e 	
  
	
  
Interview	
  with	
  Angela	
  and	
  David	
  ...........................................................................................................	
  41	
  
Business	
  Requirements	
  ......................................................................................................................	
  41	
  
Data	
  Requirements	
  ............................................................................................................................	
  41	
  
Technical	
  Requirements	
  ....................................................................................................................	
  41	
  
Salesforce	
  Data	
  ..................................................................................................................................	
  41	
  
Parts	
  Assembly	
  Data	
  (Is	
  this	
  QuickBooks?)	
  ........................................................................................	
  41	
  
Production	
  Data	
  .................................................................................................................................	
  42	
  
Yields	
  Data	
  .........................................................................................................................................	
  42	
  
Proposal	
  Approved	
  by	
  Protochips	
  .............................................................................................................	
  43	
  
Wireframes	
  ................................................................................................................................................	
  44	
  
Storyboards	
  ...............................................................................................................................................	
  44	
  
Data	
  Dictionaries	
  .......................................................................................................................................	
  46	
  
Source	
  ....................................................................................................................................................	
  46	
  
Target	
  .....................................................................................................................................................	
  48	
  
Reconciliation	
  Document	
  ..........................................................................................................................	
  52	
  
Validation	
  Rules	
  .........................................................................................................................................	
  55	
  
Suggestions	
  for	
  Protochips	
  ........................................................................................................................	
  58	
  
	
  
	
  
	
   	
  
5	
  |	
  P a g e 	
  
	
  
Background	
  
	
  
Protochips	
   is	
   a	
   small	
   company	
   based	
   in	
   Morrisville,	
   NC	
   that	
   develops	
   analytical	
   tools	
   for	
   the	
  
scanning	
  and	
  transmission	
  electron	
  microscope.	
  Protochips	
  products	
  are	
  used	
  by	
  university,	
  government	
  
and	
  industry	
  researchers	
  to	
  understand	
  how	
  nanoscale	
  materials	
  react	
  under	
  various	
  stimuli,	
  such	
  as	
  
heat,	
  electrical	
  bias	
  and	
  in	
  liquid	
  and	
  gas	
  environments.	
  The	
  company	
  was	
  founded	
  in	
  2002	
  and	
  has	
  
since	
  developed	
  many	
  innovative	
  products	
  that	
  are	
  revolutionizing	
  this	
  market	
  space.	
  They	
  work	
  with	
  
clients	
  all	
  over	
  the	
  world,	
  and	
  have	
  products	
  in	
  over	
  25	
  countries.	
  
A	
  product	
  that	
  has	
  taken	
  second-­‐place	
  to	
  their	
  main,	
  durable	
  systems	
  is	
  the	
  consumable	
  known	
  
as	
  C-­‐flat	
  Holey	
  Carbon	
  Grids.	
  This	
  product	
  first	
  was	
  sold	
  in	
  2012	
  and	
  sales	
  of	
  it	
  have	
  soared	
  unexpectedly	
  
since	
  that	
  time.	
  With	
  $250k	
  in	
  sales	
  to	
  first	
  year	
  and	
  now	
  over	
  $600k	
  with	
  the	
  expectation	
  that	
  they	
  will	
  
hit	
  over	
  $1M	
  soon,	
  it	
  is	
  clear	
  that	
  this	
  product	
  is	
  demanding	
  attention!	
  David,	
  the	
  CEO,	
  decided	
  that	
  it	
  
was	
  time	
  to	
  start	
  figuring	
  out	
  what	
  trends	
  exist	
  in	
  the	
  production	
  and	
  sales	
  of	
  this	
  product	
  in	
  order	
  to	
  
become	
  more	
  proactive	
  with	
  inventory	
  and	
  marketing,	
  instead	
  of	
  just	
  reacting	
  to	
  the	
  orders	
  coming	
  in.	
  
At	
  this	
  time,	
  Protochips	
  does	
  not	
  use	
  any	
  Business	
  Intelligence	
  (BI)	
  tools.	
  They	
  have	
  been	
  in	
  the	
  
growth	
  stage	
  and	
  are	
  now	
  stabilizing	
  so	
  they	
  want	
  to	
  make	
  more	
  strategic,	
  long-­‐term	
  decisions.	
  They	
  are	
  
not	
  able	
  to	
  see	
  trends	
  easily	
  with	
  their	
  current	
  setup	
  of	
  Excel	
  spreadsheets	
  and	
  Salesforce	
  reports.	
  They	
  
are	
  looking	
  for	
  a	
  BI	
  solution	
  that	
  will	
  allow	
  them	
  to	
  view	
  the	
  data	
  easily	
  and	
  from	
  different	
  angles	
  so	
  that	
  
they	
  can	
  learn	
  buying	
  patterns	
  and	
  production	
  trends.	
  Angela,	
  the	
  director	
  of	
  operations	
  at	
  Protochips,	
  
was	
  hired	
  recently	
  to	
  help	
  with	
  streamlining	
  product	
  manufacturing	
  through	
  data	
  analytics,	
  but	
  she	
  has	
  
focused	
  on	
  the	
  other	
  high	
  value	
  product	
  lines,	
  leaving	
  C-­‐flat	
  as	
  a	
  low	
  priority.	
  
	
  
	
   	
  
6	
  |	
  P a g e 	
  
	
  
Our	
  Approach	
  
	
  
Our	
  team	
  pinned	
  down	
  some	
  key	
  questions	
  that	
  Protochips	
  is	
  looking	
  to	
  have	
  answered	
  by	
  our	
  
solution,	
  in	
  three	
  areas:	
  Production,	
  Sales,	
  and	
  Customer.	
  For	
  Production,	
  the	
  company	
  needs	
  to	
  dive	
  in	
  
to	
  see	
  what	
  patterns	
  are	
  taking	
  place	
  in	
  the	
  manufacturing	
  of	
  C-­‐flat.	
  Is	
  there	
  a	
  trend	
  in	
  the	
  yields	
  for	
  a	
  
specific	
  part?	
  Are	
  the	
  yields	
  affected	
  by	
  what	
  time	
  of	
  year	
  it	
  is?	
  With	
  Sales,	
  it’s	
  important	
  for	
  Protochips	
  
to	
  understand	
  trends	
  in	
  what	
  C-­‐flat	
  part	
  numbers	
  are	
  being	
  sold	
  so	
  that	
  they	
  can	
  have	
  a	
  better	
  handle	
  
on	
  keeping	
  inventory	
  at	
  appropriate	
  levels.	
  They	
  also	
  want	
  to	
  see	
  if	
  certain	
  parts	
  are	
  sold	
  at	
  a	
  certain	
  
time	
  of	
  year	
  or	
  if	
  there	
  are	
  patterns	
  to	
  sales	
  by	
  region	
  of	
  the	
  world.	
  When	
  they	
  understand	
  how	
  sales	
  
are	
  occurring,	
  they	
  can	
  more	
  strategically	
  plan	
  their	
  marketing	
  efforts.	
  Finally,	
  the	
  Customer	
  element	
  
focuses	
  on	
  specific	
  clients	
  and	
  their	
  buying	
  habits.	
  Who	
  are	
  the	
  top	
  customers?	
  	
  What	
  does	
  a	
  particular	
  
customer	
  tend	
  to	
  buy	
  and	
  what	
  time	
  intervals	
  do	
  they	
  buy	
  in?	
  This	
  helps	
  Protochips	
  learn	
  what	
  their	
  
client’s	
  needs	
  are	
  and	
  can	
  predict	
  what	
  and	
  when	
  they	
  might	
  buy,	
  which	
  allows	
  them	
  to	
  not	
  have	
  to	
  wait	
  
for	
  the	
  customers	
  to	
  approach	
  them.	
  
	
  
Decisions	
  We	
  Made	
  and	
  Why	
  
	
  
	
   To	
  begin,	
  we	
  decided	
  to	
  focus	
  solely	
  on	
  C-­‐flat	
  and	
  not	
  include	
  the	
  other	
  products	
  for	
  the	
  sake	
  of	
  
simplicity	
  as	
  well	
  as	
  considering	
  the	
  time	
  constraint	
  of	
  the	
  project	
  due	
  date.	
  We	
  also	
  had	
  to	
  make	
  some	
  
strategic	
  decisions	
  about	
  what	
  data	
  we	
  wanted	
  to	
  use.	
  This	
  was	
  definitely	
  a	
  painstaking	
  process	
  because	
  
we	
  were	
  still	
  learning	
  about	
  the	
  product	
  (which	
  was	
  fairly	
  technical)	
  but	
  we	
  needed	
  to	
  keep	
  a	
  steady	
  
pace.	
  In	
  the	
  end,	
  we	
  focused	
  on	
  the	
  data	
  we	
  did	
  because	
  we	
  felt	
  that	
  it	
  would	
  allow	
  us	
  to	
  answer	
  the	
  
business	
   questions	
   in	
   the	
   most	
   straightforward	
   way.	
   Some	
   decisions	
   we	
   made	
   with	
   Protochips’	
  
validation:	
  
• Dropped	
  QuickBooks:	
  Initially	
  we	
  were	
  using	
  this	
  data	
  to	
  cross	
  check	
  the	
  values	
  from	
  Salesforce	
  
and	
  make	
  sure	
  they	
  were	
  accurate,	
  given	
  that	
  QuickBooks	
  represented	
  the	
  company’s	
  financials.	
  
7	
  |	
  P a g e 	
  
	
  
We	
  decided	
  to	
  drop	
  it	
  because	
  some	
  major	
  data	
  was	
  missing	
  and	
  the	
  level	
  of	
  granularity	
  did	
  not	
  
match	
   that	
   of	
   Salesforce	
   source	
   (for	
   example:	
   multiple	
   orders/rows	
   in	
   QuickBooks	
   are	
  
represented	
  as	
  one	
  row/order	
  in	
  Salesforce).	
  There	
  were	
  also	
  some	
  variations	
  in	
  how	
  the	
  data	
  
was	
   being	
   entered	
   and	
   we	
   felt	
   that	
   reconciling	
   to	
   that	
   extent	
   was	
   beyond	
   the	
   scope	
   of	
   our	
  
project.	
  
	
  
• Missing	
  product	
  list	
  prices	
  assumed	
  $1	
  value:	
  In	
  order	
  to	
  avoid	
  null	
  and	
  empty	
  values	
  (and	
  since	
  
the	
  information	
  was	
  not	
  available),	
  some	
  list	
  prices	
  for	
  parts	
  were	
  assumed	
  to	
  be	
  $1.	
  We	
  did	
  not	
  
use	
   0	
   to	
   avoid	
   computational	
   problems	
   (such	
   as	
   infinity	
   values	
   when	
   dividing	
   by	
   zero).	
   By	
  
making	
  this	
  change,	
  we	
  were	
  able	
  to	
  maintain	
  the	
  sales	
  data	
  but	
  just	
  not	
  allow	
  for	
  a	
  discount	
  to	
  
be	
   determined	
   (which	
   was	
   not	
   part	
   of	
   the	
   main	
   business	
   requirements	
   from	
   Protochips	
  
anyway).	
  
	
  
• Auto-­‐generated	
  Part	
  Name	
  in	
  the	
  Parts	
  table	
  and	
  added	
  a	
  new	
  column	
  ‘Category’:	
  In	
  order	
  to	
  
avoid	
   any	
   kind	
   of	
   inconsistent	
   names,	
   we	
   decided	
   to	
   auto-­‐generate	
   the	
   part	
   names	
   in	
   a	
  
particular	
  standard.	
  This	
  was	
  a	
  combination	
  of	
  the	
  different	
  attributes	
  present	
  in	
  the	
  table.	
  We	
  
felt	
  that	
  this	
  would	
  ensure	
  that	
  every	
  part	
  number	
  was	
  conforming	
  to	
  the	
  part	
  name	
  structure	
  
and	
  would	
  therefore	
  not	
  be	
  exposed	
  to	
  entries	
  that	
  were	
  typed	
  incorrectly.	
  We	
  also	
  added	
  a	
  
new	
  column	
  Category	
  consisting	
  of	
  2	
  values;	
  General	
  and	
  Custom.	
  There	
  were	
  a	
  few	
  parts	
  which	
  
were	
  customized	
  based	
  on	
  the	
  customer’s	
  needs,	
  based	
  on	
  this	
  we	
  came	
  up	
  with	
  a	
  need	
  for	
  
category.	
  
	
  
• Created	
  a	
  new	
  database	
  in	
  Access:	
  Since	
  the	
  database	
  we	
  were	
  provided	
  with	
  was	
  not	
  editable,	
  
we	
  created	
  a	
  new	
  database	
  in	
  Access	
  to	
  replicate	
  the	
  same.	
  Apart	
  from	
  the	
  tables	
  that	
  were	
  
already	
  present,	
  we	
  created	
  a	
  new	
  table	
  for	
  the	
  Grid	
  prices.	
  Since	
  some	
  sources	
  of	
  data	
  defined	
  
the	
  prices	
  in	
  terms	
  of	
  pack	
  and	
  some	
  in	
  terms	
  of	
  grids,	
  we	
  decided	
  to	
  create	
  a	
  table	
  which	
  would	
  
8	
  |	
  P a g e 	
  
	
  
consist	
   of	
   the	
   different	
   parts	
   and	
   their	
   associated	
   grid	
   prices	
   (25,	
   50	
   and	
   100	
   count	
   packs,	
  
respectively).	
  This	
  is	
  turn	
  helped	
  us	
  generate	
  an	
  intermediate	
  table	
  which	
  consisted	
  of	
  the	
  prices	
  
per	
  grid	
  as	
  well	
  as	
  pack	
  for	
  every	
  part	
  so	
  that	
  we	
  could	
  present	
  the	
  data	
  in	
  any	
  way	
  the	
  client	
  
wanted.	
  
	
  
• Decided	
   not	
   to	
   group	
   by	
   Opportunity	
   Name:	
   The	
   client	
   does	
   not	
   have	
   any	
   standard	
   way	
   of	
  
defining	
   the	
   Opportunity	
   Names	
   and	
   therefore	
   no	
   specific	
   transformation	
   rules	
   could	
   be	
  
performed.	
  We	
  allowed	
  this	
  level	
  of	
  granularity	
  to	
  be	
  reached	
  while	
  drilling	
  down	
  in	
  the	
  BI	
  tool,	
  
but	
  we	
  were	
  not	
  able	
  to	
  allow	
  for	
  sorting	
  and	
  presenting	
  data	
  solely	
  by	
  the	
  Opportunity	
  Name.	
  
List	
  of	
  Team	
  Members	
  and	
  Responsibilities/Activities	
  
	
  
Stefanie	
  Boros	
  
The	
  main	
  role	
  for	
  Stefanie	
  was	
  as	
  the	
  Project	
  Manager	
  but	
  she	
  was	
  also	
  the	
  connection	
  between	
  
the	
  team	
  and	
  Protochips.	
  She	
  worked	
  on	
  collecting	
  the	
  data	
  and	
  contacted	
  David	
  and	
  Angela	
  several	
  
times	
  throughout	
  the	
  process	
  to	
  understand	
  the	
  data,	
  the	
  process,	
  and	
  to	
  verify	
  that	
  we	
  were	
  on	
  the	
  
right	
  track	
  every	
  step	
  of	
  the	
  way.	
  She	
  collaborated	
  with	
  every	
  member	
  of	
  the	
  team	
  to	
  make	
  sure	
  that	
  
the	
  direction	
  the	
  various	
  elements	
  of	
  the	
  project	
  were	
  going	
  in	
  was	
  in	
  line	
  with	
  what	
  Protochips	
  was	
  
looking	
   for.	
   She	
   assisted	
   in	
   designing	
   the	
   data	
   models,	
   reconciliation,	
   cleansing	
   rules,	
   and	
   other	
  
activities.	
   She	
   worked	
   with	
   Saniya	
   on	
   Qlikview	
   to	
   create	
   the	
   BI	
   tool	
   that	
   would	
   be	
   utilized	
   by	
   the	
  
organization.	
  She	
  also	
  was	
  in	
  charge	
  of	
  the	
  project	
  deliverables	
  throughout	
  the	
  quarter.	
  
Shradha	
  Salian	
  
The	
   main	
   role	
   for	
   Shradha	
   was	
   that	
   of	
   a	
   Data	
   Integration	
   Specialist	
   and	
   a	
   Data	
   Analyst.	
   She	
  
worked	
  on	
  creating	
  a	
  database	
  for	
  the	
  source	
  tables,	
  learning	
  Talend	
  (along	
  with	
  Sarah)	
  to	
  understand	
  
the	
  different	
  functionalities,	
  components,	
  understanding	
  data	
  sources,	
  the	
  relationships	
  among	
  them	
  
and	
  how	
  different	
  operations	
  could	
  be	
  carried	
  out	
  on	
  them.	
  She	
  was	
  also	
  involved	
  in	
  writing	
  few	
  java	
  
codes	
  in	
  Talend.	
  She	
  worked	
  on	
  creating	
  the	
  data	
  dictionary	
  for	
  the	
  ‘Parts’	
  data	
  source.	
  She	
  contributed	
  
9	
  |	
  P a g e 	
  
	
  
to	
  cleansing	
  the	
  QuickBooks	
  data	
  source	
  as	
  well.	
  She	
  worked	
  on	
  creating	
  the	
  validation	
  rules	
  for	
  all	
  of	
  
the	
  data	
  sources.	
  She	
  reviewed	
  the	
  different	
  documents	
  and	
  made	
  changes	
  to	
  them	
  wherever	
  required.	
  
Saniya	
  Shukla	
  
Saniya	
   was	
   majorly	
   involved	
   on	
   the	
   BI	
   side	
   of	
   the	
   project	
   to	
   translate	
   the	
   data	
   sets	
   into	
  
visualizations	
  of	
  charts,	
  graphs	
  and	
  tables.	
  It	
  involved	
  addressing	
  the	
  requirement	
  questions	
  from	
  the	
  
client	
  and	
  delivering	
  the	
  charts	
  in	
  the	
  most	
  understand	
  and	
  user	
  friendly	
  way.	
  She	
  was	
  also	
  involved	
  in	
  
connecting	
   the	
   database	
   to	
   Qlikview	
   to	
   pull	
   up	
   the	
   data	
   and	
   use	
   it	
   for	
   the	
   dashboards.	
   She	
   mainly	
  
worked	
   on	
   formulating	
   expressions,	
   formulae	
   to	
   derive	
   charts,	
   graphs	
   and	
   other	
   objects	
   for	
   the	
  
dashboards	
  according	
  to	
  the	
  requirement	
  of	
  Protochips.	
  Before	
  implementing	
  the	
  data	
  on	
  the	
  BI	
  tool,	
  
she	
  also	
  created	
  the	
  initial	
  and	
  final	
  wireframes	
  and	
  storyboards	
  with	
  Stefanie	
  creating	
  the	
  sketches	
  for	
  
the	
  initial	
  ones.	
  She	
  also	
  worked	
  on	
  creating	
  the	
  data	
  dictionary	
  for	
  the	
  Salesforce	
  data.	
  	
  
	
  
Sarah	
  Yousef	
  
The	
  main	
  roles	
  for	
  Sarah	
  were	
  as	
  a	
  Technical	
  Architect	
  and	
  a	
  Data	
  Integration	
  Specialist.	
  She	
  
worked	
  on	
  developing	
  the	
  data	
  models	
  for	
  the	
  target	
  tables,	
  creating	
  the	
  corresponding	
  tables	
  in	
  MS	
  
SQL	
  Server	
  and	
  creating	
  their	
  data	
  dictionaries.	
  In	
  order	
  to	
  do	
  that,	
  thorough	
  understanding	
  of	
  the	
  data,	
  
its	
  state	
  and	
  how	
  it	
  connects	
  to	
  one	
  another	
  was	
  important.	
  She	
  was	
  also	
  responsible	
  for	
  the	
  source	
  to	
  
target	
  mappings,	
  learning	
  Talend	
  (along	
  with	
  Shradha)	
  and	
  loading	
  the	
  data	
  to	
  the	
  target	
  tables.	
  Given	
  
that,	
  she	
  also	
  worked	
  on	
  the	
  reconciliation	
  document.	
  She	
  contributed	
  to	
  cleansing	
  the	
  sources	
  files	
  by	
  
cleansing	
  SF	
  data	
  and	
  data	
  definitions	
  for	
  the	
  yield	
  result	
  source	
  table.	
  She	
  gave	
  feedback	
  and	
  input	
  on	
  
BI	
  dashboard	
  and	
  the	
  different	
  project	
  deliverables.	
  She	
  also	
  acted	
  as	
  Project	
  Manager	
  (Secondary	
  role)	
  
by	
  preparing	
  meeting	
  agendas	
  and	
  action	
  items,	
  and	
  by	
  following	
  up	
  with	
  everyone	
  to	
  make	
  sure	
  the	
  
project	
  was	
  on	
  the	
  right	
  track.	
  	
  
	
  
Changes	
  from	
  Original	
  Proposal	
  
	
  
10	
  |	
  P a g e 	
  
	
  
Our	
  project	
  ended	
  up	
  following	
  the	
  path	
  that	
  we	
  set	
  up	
  in	
  our	
  original	
  proposal	
  pretty	
  closely,	
  
with	
  just	
  a	
  few	
  changes.	
  We	
  felt	
  that,	
  given	
  the	
  time	
  constraints	
  of	
  this	
  project,	
  it	
  made	
  the	
  most	
  sense	
  
to	
  stick	
  with	
  only	
  looking	
  at	
  C-­‐flat,	
  as	
  we	
  had	
  originally	
  discussed	
  with	
  David,	
  instead	
  of	
  trying	
  to	
  analyze	
  
all	
  product	
  lines.	
  We	
  also	
  came	
  to	
  discover	
  that	
  QuickBooks	
  was	
  not	
  really	
  providing	
  us	
  with	
  anything	
  
that	
  we	
  could	
  not	
  get	
  from	
  Salesforce	
  or	
  easily	
  reconcile.	
  We	
  were	
  hoping	
  to	
  use	
  QuickBooks	
  as	
  a	
  way	
  to	
  
cross-­‐check	
  sales,	
  but	
  Protochips	
  had	
  inconsistent	
  methods	
  of	
  entering	
  sales	
  into	
  each	
  database,	
  so	
  we	
  
decided	
  to	
  simplify	
  by	
  removing	
  QuickBooks	
  without	
  having	
  to	
  sacrifice	
  any	
  vital	
  data.	
  We	
  did	
  have	
  to	
  
get	
  additional	
  data	
  from	
  Protochips	
  to	
  provide	
  us	
  with	
  the	
  List	
  Prices	
  of	
  the	
  parts,	
  so	
  that	
  was	
  another	
  
data	
  source	
  that	
  was	
  not	
  stated	
  in	
  the	
  proposal.	
  Finally,	
  we	
  had	
  to	
  switch	
  over	
  to	
  Microsoft	
  SQL	
  Server	
  
Management	
  instead	
  of	
  using	
  MySQL	
  as	
  we	
  had	
  originally	
  planned	
  because	
  of	
  familiarity	
  with	
  the	
  tool.	
  
Technical	
  Architecture	
  Diagram	
  
	
  
	
  
	
   	
  
11	
  |	
  P a g e 	
  
	
  
Samples	
  of	
  Each	
  Data	
  Set	
  
Original	
  Salesforce	
  Sample	
  Data	
  
	
  
• Manual	
  cleansing	
  included	
  removing	
  columns	
  (see	
  above)	
  and	
  rows	
  (see	
  below).	
  
Original	
  Source	
  Number	
  of	
  Rows	
  
Cleansed	
  
Source	
  
Number	
   of	
  
Rows	
  
Difference	
  
1175	
   1021	
   154	
  
Rows	
  Removed	
  (by	
  row	
  #)	
   Row	
  Count	
   Reason	
  
6,	
  103,	
  191,	
  215-­‐217,	
  219,	
  226-­‐232,	
  312,	
  397-­‐398,	
  
407-­‐408,	
   447,	
   464-­‐475,	
   493,	
   511-­‐513,	
   518,	
   532,	
  
686,	
   703-­‐704,	
   714-­‐715,	
   750,	
   773,918,	
   922,	
   1056,	
  
	
  
43	
  
Client	
  request	
  –	
  removed	
  Stage	
  
Closed	
   Lost,	
   Postponed,	
   Target,	
  
Closed/Dead	
  End,	
  Imminent	
  
12	
  |	
  P a g e 	
  
	
  
1068,	
  1074-­‐1075,	
  1129,	
  1175	
  
5,	
  7-­‐16,	
  18-­‐25,	
  40,	
  46-­‐47,	
  101-­‐102,	
  236,	
  445,	
  463,	
  
482-­‐483,	
   498,	
   524-­‐525,	
   581,	
   627,	
   669,	
   682,	
   720,	
  
727,	
  738,	
  892	
  
	
  
40	
  
Client	
   request	
   –	
   remove	
   Order	
  
Amount	
  =	
  0	
  
26-­‐29,	
   48-­‐69,	
   212,	
   224,	
   245-­‐247,	
   270-­‐282,	
   470,	
  
537-­‐538,	
   585-­‐590,	
   725,	
   728,	
   747-­‐748,	
   763-­‐768,	
  
840,	
  906-­‐907,	
  939,	
  1031,	
  1064,	
  1100	
  
	
  
70	
  
Client	
  request	
  –	
  remove	
  Created	
  
dates	
  before	
  10/01/2012	
  
17	
   1	
   Error	
  values	
  
Total	
   154	
   -­‐-­‐-­‐	
  
	
  
Rows	
  Amended	
  (by	
  row	
  #)	
   Row	
  
Count	
  
Reason	
  
13	
   1	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  5.706	
  -­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
14	
   1	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  5.391-­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
194,	
  196	
   2	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  6.462-­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
195	
   1	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  7.002-­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
666,	
  667,	
  758,	
  887	
  	
   4	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  728.1-­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
668,	
   756,	
   757,	
   888,	
   908,	
  
916,	
  919	
  
7	
   Error	
  in	
  Sales_Price	
  -­‐	
  Change	
  to	
  584.1-­‐	
  cross	
  referenced	
  with	
  
quickbooks	
  
All	
   1021	
   Removed	
   the	
   letters	
   CF	
   in	
   all	
   product	
   names	
  
=RIGHT(E2,LEN(E2)	
  -­‐3)	
  
13	
  |	
  P a g e 	
  
	
  
  
Access  -­‐  Yield  Results  Sample  Data  
	
  
• Manual	
  cleansing	
  included	
  removing	
  columns	
  (see	
  above)	
  and	
  rows	
  (see	
  below).	
  
	
  
Original	
  Source	
  Number	
  of	
  Rows	
   Cleansed	
  Source	
  Number	
  of	
  Rows	
   Difference	
  
9533	
   9508	
   25	
  
	
  
	
  
Rows	
  Removed	
  (by	
  Wafer_ID)	
   Row	
  
Count	
  
Reason	
  
16488,	
  19340,	
  18398,	
  20215,	
  20838	
   5	
   ‘Qty’	
   value	
   greater	
   than	
   ‘Out	
   of’	
  
value	
  
16085,	
  16882,	
  18676,	
  19150	
   4	
   Empty	
  ‘Out	
  of’	
  values	
  
14	
  |	
  P a g e 	
  
	
  
22808,	
  22809,	
  22125,	
  22363,	
  22570,	
  22772,	
  22822,	
  
25282	
  
8	
   Other	
  empty	
  values	
  
17329,	
  21063,	
  21092,	
  17994,	
  18228,	
  18709,	
  19665,	
  
20484	
  
8	
   Extremely	
   high	
   ‘Out	
   of’	
   value	
   –	
  
anomalies	
  
Total	
   25	
   -­‐-­‐-­‐	
  
	
  
Access	
  -­‐	
  Parts	
  List	
  Sample	
  Data	
  
	
  
	
  
	
  
15	
  |	
  P a g e 	
  
	
  
• Manual	
  Cleansing	
  included	
  
Original	
  Source	
  Number	
  of	
  Rows	
   Cleansed	
  Source	
  Number	
  of	
  Rows	
   Difference	
  
74	
   79	
   5	
  
	
  
	
  
PDF	
  -­‐	
  List	
  Price	
  Sample	
  Data	
  
	
  
• Manual	
  Cleansing	
  included	
  
Original	
  Source	
  Number	
  of	
  Rows	
   Cleansed	
  Source	
  Number	
  of	
  Rows	
   Difference	
  
81	
   219	
   138	
  
	
  
	
   	
  
16	
  |	
  P a g e 	
  
	
  
Dimensional	
  Models	
  
	
  
Conceptual	
  Model	
  
	
  
	
  
17	
  |	
  P a g e 	
  
	
  
Logical	
  Model	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
18	
  |	
  P a g e 	
  
	
  
Physical	
  Model	
  
	
  
	
  
	
   	
  
19	
  |	
  P a g e 	
  
	
  
Sample	
  Data	
  from	
  Dimensional	
  and	
  Fact	
  Tables	
  
	
  
ACCOUNT_DIM	
  sample	
  data	
  
	
  
	
  
OPPORTUNITY_DIM	
  sample	
  data	
  
	
  
20	
  |	
  P a g e 	
  
	
  
PARTS_DIM	
  data	
  sample
	
  
	
  
YIELD_TIME_DIM	
  data	
  sample	
  
	
  
	
  
	
  
	
  
21	
  |	
  P a g e 	
  
	
  
SALES_TIME_DIM	
  data	
  sample
	
  
	
  
YIELD_FACT	
  data	
  sample	
  
	
  
	
  
	
  
	
  
22	
  |	
  P a g e 	
  
	
  
SALES_FACT	
  sample	
  data	
  
	
  
	
  
SALES_CONVERSION	
  sample	
  data	
  (an	
  intermediary	
  table	
  NOT	
  included	
  in	
  modeling)	
  
	
  
	
  
	
  
	
  
23	
  |	
  P a g e 	
  
	
  
GRIDS_PRICE	
  sample	
  data	
  (a	
  table	
  in	
  the	
  access	
  database	
  table	
  NOT	
  included	
  in	
  modeling)	
  
	
  
	
   	
  
24	
  |	
  P a g e 	
  
	
  
Data	
  Integration	
  Mappings	
  	
  
	
  
ACCOUNT_DIM	
  
• Load	
  Job	
  
	
  
• tMap	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
25	
  |	
  P a g e 	
  
	
  
	
  
OPPORTUNITY_DIM	
  
• Load	
  Job	
  
	
  
• tMap	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
	
  
26	
  |	
  P a g e 	
  
	
  
	
  
	
  
SALES_TIME_DIM	
  
• Load	
  Job	
  
	
  
• tMap	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
	
   	
  
27	
  |	
  P a g e 	
  
	
  
SALES_CONVERSION	
  (Intermediate	
  Table)	
  
• Load	
  Job	
  
	
  
• tJavaRow	
  
	
  
• tMap	
  
	
  
	
  
	
  
	
  
	
  
	
  
28	
  |	
  P a g e 	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
	
  
	
  
SALES_FACT	
  
• Load	
   Job
	
  
	
  
	
  
	
  
29	
  |	
  P a g e 	
  
	
  
• tJavaRow	
  
	
  
• tMap
	
  
30	
  |	
  P a g e 	
  
	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
	
  
PARTS_DIM	
  
• Load	
  Job	
  
31	
  |	
  P a g e 	
  
	
  
	
  
• tMap	
  
	
  
	
  
	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
YIELD_TIME_DIM	
  
• Load	
  Job	
  
	
  
• tMap	
  
32	
  |	
  P a g e 	
  
	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
YIELD_FACT	
  
• Load	
  Job	
  
	
  
• tJavaRow	
  
	
  
33	
  |	
  P a g e 	
  
	
  
• tMap	
  
	
  
• Source	
  to	
  Target	
  Mapping	
  
	
  
PARTS	
  INTERMEDIATE	
  TABLE:	
  
• Load	
  Job	
  
34	
  |	
  P a g e 	
  
	
  
	
  
• tMap	
  
	
  
	
  
	
  
	
  
	
  
• tJavaRow	
  
35	
  |	
  P a g e 	
  
	
  
	
  
	
  
Business	
  Questions	
  
Production	
  
• What	
  is	
  the	
  trend	
  in	
  yields	
  for	
  each	
  part	
  number?	
  
• Is	
  there	
  any	
  trend	
  in	
  the	
  production	
  by	
  time	
  of	
  the	
  year?	
  
	
  
Sales	
  
• What	
  trends	
  are	
  there	
  in	
  the	
  sales	
  of	
  parts?	
  
• Is	
  there	
  a	
  seasonality	
  to	
  the	
  sales?	
  
• Is	
  there	
  a	
  trend	
  in	
  sales	
  by	
  region	
  of	
  the	
  world?	
  
	
  
Customer	
  
• Who	
  are	
  the	
  top	
  customers?	
  
• Does	
  an	
  individual	
  customer	
  tend	
  to	
  buy	
  at	
  a	
  certain	
  time	
  or	
  at	
  a	
  certain	
  time	
  interval?	
  
• What	
  does	
  an	
  individual	
  customer	
  tend	
  to	
  buy?	
  
36	
  |	
  P a g e 	
  
	
  
	
  
	
  
37	
  |	
  P a g e 	
  
	
  
	
  
	
  
	
  
	
   	
  
38	
  |	
  P a g e 	
  
	
  
Challenges	
  
Data	
  
	
   As	
  is	
  expected,	
  the	
  data	
  proved	
  to	
  be	
  quite	
  the	
  time-­‐consuming	
  element.	
  For	
  starters,	
  getting	
  
the	
  data	
  itself	
  seemed	
  to	
  take	
  a	
  lot	
  longer	
  than	
  we	
  anticipated.	
  Once	
  we	
  finally	
  did	
  receive	
  it	
  all,	
  we	
  had	
  
to	
  understand	
  what	
  we	
  were	
  dealing	
  with.	
  We	
  spent	
  hours	
  poring	
  over	
  each	
  attribute,	
  trying	
  to	
  see	
  how	
  
everything	
  tied	
  together.	
  We	
  also	
  had	
  conversations	
  with	
  David,	
  Angela,	
  and	
  Nicole	
  (another	
  Protochips	
  
employee	
  who	
  deals	
  with	
  the	
  data).	
  Once	
  we	
  finally	
  felt	
  comfortable	
  with	
  the	
  data,	
  we	
  had	
  to	
  decide	
  on	
  
what	
  was	
  and	
  was	
  not	
  appropriate	
  to	
  use	
  in	
  order	
  to	
  answer	
  our	
  questions.	
  This	
  was	
  an	
  element	
  that	
  we	
  
did	
  not	
  even	
  confirm	
  until	
  toward	
  the	
  end	
  of	
  the	
  project!	
  
	
   Beyond	
  just	
  getting	
  and	
  understanding	
  the	
  data,	
  we	
  had	
  to	
  work	
  on	
  cleansing	
  and	
  integrating	
  it.	
  
There	
  were	
  rows	
  that	
  were	
  missing	
  values,	
  rows	
  that	
  had	
  inconsistent	
  values,	
  etc.	
  Step	
  by	
  step,	
  we	
  had	
  
to	
  make	
  decisions	
  (and	
  get	
  confirmation	
  from	
  Protochips)	
  about	
  how	
  to	
  handle	
  each	
  and	
  every	
  quirk	
  in	
  
the	
  data.	
  For	
  integration,	
  we	
  had	
  to	
  try	
  to	
  see	
  the	
  big	
  picture	
  and	
  keep	
  our	
  minds	
  on	
  the	
  overall	
  goal	
  of	
  
answering	
   the	
   business	
   questions	
   in	
   order	
   to	
   effectively	
   choose	
   the	
   right	
   path.	
   Ultimately,	
   we	
   kept	
  
coming	
  back	
  to	
  these	
  business	
  questions	
  and	
  it	
  really	
  helped	
  to	
  guide	
  us	
  along	
  and	
  make	
  successful	
  
choices.	
  
	
  
Talend	
  
	
   As	
  expected,	
  with	
  any	
  new	
  tool,	
  there	
  was	
  a	
  learning	
  curve	
  to	
  understanding	
  and	
  utilizing	
  our	
  
integration	
  tool,	
  Talend.	
  Initially,	
  it	
  was	
  just	
  about	
  understanding	
  what	
  capabilities	
  the	
  tool	
  had	
  and	
  how	
  
to	
  get	
  the	
  results	
  we	
  wanted.	
  Capabilities	
  included	
  learning	
  the	
  different	
  palette	
  components	
  that	
  could	
  
be	
  useful	
  for	
  our	
  scenarios,	
  how	
  to	
  use	
  them	
  and	
  then	
  how	
  to	
  use	
  other	
  build	
  in	
  functionality	
  such	
  as	
  
built	
  in	
  expressions.	
  There	
  were	
  technical	
  issues	
  with	
  setting	
  up	
  and	
  connecting	
  to	
  the	
  database	
  as	
  well	
  
as	
  learning	
  what	
  the	
  various	
  error	
  messages	
  meant	
  and	
  how	
  to	
  fix	
  them.	
  Loading	
  the	
  fact	
  tables	
  was	
  a	
  
hurdle	
   because	
   you	
   had	
   to	
   be	
   able	
   to	
   make	
   sure	
   you	
   had	
   all	
   the	
   necessary	
   joins	
   between	
   all	
   your	
  
dimension	
  tables	
  so	
  that	
  rows	
  are	
  pulled	
  correctly.	
  We	
  spent	
  a	
  great	
  deal	
  of	
  time	
  trying	
  to	
  reconcile	
  our	
  
39	
  |	
  P a g e 	
  
	
  
fact	
   tables	
   thinking	
   that	
   the	
   major	
   problem	
   was	
   how	
   we	
   joined	
   the	
   tables	
   together.	
   After	
   further	
  
analysis	
  to	
  the	
  datasets	
  that	
  are	
  not	
  being	
  pulled	
  in	
  we	
  came	
  to	
  realize	
  that	
  some	
  of	
  the	
  Sales	
  Price	
  
values	
  were	
  not	
  correct	
  and	
  therefore	
  were	
  not	
  being	
  calculated	
  correctly.	
  We	
  had	
  to	
  then	
  cross	
  check	
  
with	
  QuickBooks	
  and	
  get	
  those	
  values,	
  change	
  them	
  in	
  the	
  source	
  date,	
  make	
  note	
  of	
  them	
  and	
  finally	
  
reload	
  the	
  tables.	
  
	
  
Visualizing	
  the	
  Solutions	
  to	
  the	
  Business	
  Questions	
  
	
   Again,	
  we	
  had	
  to	
  first	
  learn	
  how	
  to	
  use	
  Qlikview	
  and	
  what	
  was	
  possible	
  with	
  the	
  tool	
  but	
  once	
  
we	
  did	
  that,	
  we	
  had	
  to	
  plan	
  quite	
  a	
  bit	
  on	
  how	
  to	
  better	
  serve	
  the	
  needs	
  of	
  our	
  client	
  with	
  the	
  data	
  
visualization.	
  There	
  were	
  some	
  technical	
  errors	
  with	
  trying	
  to	
  connect	
  to	
  the	
  database	
  and	
  not	
  being	
  
able	
  to	
  pull	
  all	
  of	
  the	
  data	
  into	
  the	
  tool.	
  While	
  using	
  the	
  tool,	
  we	
  were	
  able	
  to	
  identify	
  some	
  more	
  areas	
  
where	
  the	
  data	
  needed	
  some	
  double-­‐checking	
  (invalid	
  entries,	
  anomalies,	
  etc.)	
  and	
  worked	
  little	
  by	
  little	
  
to	
  come	
  to	
  a	
  useful	
  and	
  powerful	
  tool	
  to	
  allow	
  Protochips	
  to	
  view	
  their	
  data	
  like	
  they	
  never	
  have	
  before.	
  
The	
   driving	
   force	
   was	
   always	
   to	
   respond	
   to	
   the	
   questions	
   that	
   Protochips	
   wanted	
   answered	
   and,	
   by	
  
doing	
  this,	
  we	
  were	
  able	
  to	
  stay	
  focused	
  on	
  what	
  was	
  important	
  and	
  not	
  get	
  caught	
  up	
  in	
  things	
  that	
  
were	
  not.	
  
	
  
Dimensional	
  modelling	
  	
  
	
   Dimensional	
  modeling	
  was	
  a	
  new	
  concept	
  to	
  our	
  team	
  so	
  there	
  we	
  had	
  to	
  spend	
  a	
  bit	
  of	
  One	
  on	
  
the	
  main	
  challenges	
  we	
  faced	
  was	
  actually	
  coming	
  up	
  with	
  our	
  dimension	
  model	
  and	
  the	
  reason	
  why	
  
was	
  because	
  we	
  did	
  not	
  get	
  exposed	
  to	
  it	
  until	
  this	
  class.	
  Understanding	
  the	
  difference	
  between	
  it	
  and	
  a	
  
relational	
  model	
  was	
  crucial.	
  Knowing	
  which	
  attributes	
  to	
  include	
  in	
  the	
  fact	
  versus	
  the	
  dimensions	
  and	
  
why	
  was	
  also	
  challenging.	
  We	
  also	
  had	
  to	
  decide	
  if	
  we	
  needed	
  some	
  attributes	
  in	
  the	
  model	
  or	
  will	
  we	
  
pushing	
  them	
  to	
  the	
  BI	
  tool.	
  An	
  example	
  of	
  that	
  were	
  any	
  kind	
  of	
  aggregations	
  that	
  we	
  were	
  performing	
  
on	
  the	
  data.	
  One	
  other	
  issue	
  we	
  faced	
  was	
  visualizing	
  how	
  to	
  deal	
  with	
  intermediate	
  tables	
  and	
  how	
  to	
  
connect	
  it	
  to	
  the	
  fact	
  and	
  dimensions	
  to	
  pull	
  the	
  data	
  given	
  that	
  it	
  is	
  not	
  present	
  in	
  the	
  model	
  itself.	
  	
  	
  
40	
  |	
  P a g e 	
  
	
  
	
  
Project	
  logistics	
  
	
   In	
  general,	
  this	
  project	
  was	
  an	
  incredible	
  experience	
  for	
  us,	
  but	
  it	
  was	
  also	
  a	
  big	
  commitment	
  for	
  
us.	
  All	
  of	
  us	
  had	
  other	
  classes	
  to	
  worry	
  about,	
  one	
  of	
  our	
  teammates	
  has	
  a	
  job,	
  and	
  two	
  of	
  us	
  have	
  young	
  
children	
  at	
  home.	
  We	
  all	
  had	
  to	
  be	
  extremely	
  flexible	
  in	
  order	
  to	
  find	
  meeting	
  times	
  that	
  worked	
  and	
  
balancing	
   the	
   workload	
   was	
   a	
   struggle	
   at	
   times.	
   Not	
   only	
   that,	
   but	
   we	
   had	
   spent	
   so	
   much	
   time	
   and	
  
energy	
  in	
  completing	
  this	
  project	
  that	
  it	
  was	
  hard	
  to	
  find	
  intrinsic	
  motivation	
  by	
  the	
  end	
  of	
  the	
  quarter.	
  
We	
  have	
  all	
  been	
  able	
  to	
  take	
  away	
  invaluable	
  experiences	
  and	
  skills	
  with	
  us	
  from	
  this	
  project,	
  but	
  we	
  
would	
  be	
  remiss	
  if	
  we	
  did	
  not	
  mention	
  the	
  dedication	
  required	
  to	
  complete	
  this	
  as	
  an	
  obstacle.	
  
	
  
	
  
	
  
	
   	
  
41	
  |	
  P a g e 	
  
	
  
Appendix	
  
Interview	
  with	
  Angela	
  and	
  David	
  
	
  
Business	
  Requirements:	
  
• Any	
  regulatory	
  or	
  compliance	
  considerations?	
  
• How	
  will	
  users	
  be	
  accessing	
  the	
  BI	
  tool?	
  
• Who	
  are	
  the	
  intended	
  users?	
  
• Is	
  there	
  anyone	
  else	
  we	
  should	
  interview?	
  
• What	
  are	
  the	
  key	
  deliverables	
  required?	
  
o Dashboard?	
  
o Charts/Graphs?	
  
o Reports?	
  
• What	
  is	
  the	
  problem?	
  
• What	
  is	
  the	
  expected	
  solution	
  to	
  the	
  problem	
  with	
  this	
  BI	
  tool/data?	
  
• What	
  is	
  the	
  priority	
  of	
  requirements?	
  
o List	
  of	
  NEEDS	
  
o List	
  of	
  WANTS	
  
	
  
Data	
  Requirements:	
  
• Any	
  other	
  sources	
  of	
  data	
  needed?	
  (Do	
  we	
  have	
  access	
  to	
  them?)	
  
• Are	
  there	
  any	
  modifications	
  done	
  to	
  the	
  data	
  from	
  the	
  source	
  to	
  staging?	
  
• Who	
  is	
  touching	
  the	
  data?	
  (Entering,	
  changing,	
  deleting,	
  processing,	
  etc.)	
  
• What	
  timeframe	
  are	
  we	
  looking	
  at?	
  How	
  often	
  (Level	
  of	
  granularity	
  is	
  needed)?	
  
• Do	
  you	
  have	
  any	
  reporting	
  needs	
  in	
  mind	
  that	
  you	
  would	
  like	
  us	
  to	
  solve?	
  E.g.:	
  if	
  revenue	
  data	
  is	
  
available,	
  then	
  maybe	
  what	
  is	
  the	
  revenue	
  earned	
  per	
  customer.	
  
• Could	
  we	
  get	
  some	
  of	
  your	
  guidance	
  in	
  building	
  the	
  BI	
  solution?	
  
• Can	
  any	
  data	
  be	
  left	
  off?	
  
	
  
Technical	
  Requirements:	
  
• Is	
  there	
  any	
  reason	
  we	
  need	
  to	
  provide	
  role-­‐based	
  access	
  in	
  the	
  BI	
  tool?	
  
• Are	
  there	
  any	
  BI	
  tools	
  currently	
  in	
  place?	
  Any	
  data	
  warehouses,	
  marts,	
  etc.?	
  
• Are	
  there	
  any	
  technical	
  specifications	
  with	
  regard	
  to	
  hardware/software	
  vendors?	
  
• Do	
  we	
  need	
  to	
  provide	
  web	
  services	
  or	
  a	
  cloud-­‐based	
  environment?	
  
	
  
Salesforce	
  Data:	
  
• Describe	
  the	
  terms	
  used	
  in	
  the	
  column:	
  Stage,	
  Sales	
  Price,	
  and	
  List	
  Price...	
  
• Are	
  all	
  dollar	
  amounts	
  converted	
  to	
  USD?	
  At	
  what	
  point?	
  
• What	
  do	
  you	
  mean	
  by	
  fraction	
  numbers	
  in	
  quantity	
  ordered?	
  	
  
• Difference	
  between	
  product	
  date	
  and	
  close	
  date?	
  
• What	
  do	
  zero	
  values	
  in	
  the	
  sales	
  price	
  mean	
  (if	
  it	
  is	
  won,	
  shouldn’t	
  it	
  have	
  a	
  value?)	
  Or	
  do	
  these	
  
numbers	
  refer	
  to	
  something	
  else?	
  
• What	
  are	
  the	
  number	
  values	
  in	
  the	
  “Opportunity	
  Name”	
  column?	
  How	
  are	
  they	
  assigned?	
  
• Do	
  you	
  have	
  a	
  data	
  dictionary?	
  (for	
  all	
  excel	
  sheets)	
  
	
  
Parts	
  Assembly	
  Data	
  (Is	
  this	
  QuickBooks?):	
  
• Why	
  do	
  some	
  rows	
  have	
  no	
  QTY?	
  	
  
• Why	
  do	
  some	
  rows	
  have	
  quantity	
  but	
  no	
  sales	
  price	
  and	
  amounts?	
  
42	
  |	
  P a g e 	
  
	
  
• Are	
  prices	
  for	
  the	
  same	
  product	
  varying	
  based	
  on	
  customer?	
  Quantity?	
  etc.?	
  
• Explain	
  formula	
  in	
  column	
  amount?	
  
• “Name”	
  is	
  mapping	
  to	
  “Account”	
  in	
  Salesforce	
  data?	
  
	
  
Production	
  Data:	
  
• What	
  is	
  the	
  best	
  worksheet	
  to	
  use?	
  
• How	
  does	
  this	
  data	
  tie	
  to	
  the	
  other	
  spreadsheets?	
  
	
  
Yields	
  Data:	
  
• What	
  is	
  the	
  best	
  worksheet	
  to	
  use?	
  
• How	
  does	
  this	
  data	
  tie	
  to	
  the	
  other	
  spreadsheets?	
  
• How	
  the	
  Yield	
  %	
  is	
  computed	
  -­‐	
  what	
  data	
  feeds	
  into	
  these	
  values?	
  
	
   	
  
43	
  |	
  P a g e 	
  
	
  
Proposal	
  Approved	
  by	
  Protochips	
  
	
  
Protochips	
  C-­‐flat	
  Analysis	
  
The	
  purpose	
  of	
  this	
  project	
  is	
  to	
  learn	
  more	
  about	
  the	
  trends	
  in	
  both	
  the	
  manufacturing	
  and	
  sales	
  of	
  the	
  
C-­‐flat	
  product.	
  We	
  aim	
  to	
  answer	
  the	
  following	
  questions:	
  
	
  
• Yield:	
  
o What	
  is	
  the	
  trend	
  in	
  yields	
  for	
  each	
  part	
  number?	
  
o Is	
  there	
  any	
  trend	
  in	
  the	
  production	
  by	
  time	
  of	
  the	
  year?	
  
	
  
• Sales:	
  
o What	
  trends	
  are	
  there	
  in	
  the	
  sales	
  of	
  parts?	
  
o Is	
  there	
  a	
  seasonality	
  to	
  the	
  sales?	
  
o Is	
  there	
  a	
  trend	
  in	
  sales	
  by	
  region	
  of	
  the	
  world?	
  
	
  
• Customer:	
  
o Who	
  are	
  the	
  top	
  customers?	
  
o Does	
  an	
  individual	
  customer	
  tend	
  to	
  buy	
  at	
  a	
  certain	
  time	
  or	
  at	
  a	
  certain	
  time	
  interval?	
  
o What	
  does	
  an	
  individual	
  customer	
  tend	
  to	
  buy?	
  
	
  
We	
  will	
  use	
  the	
  Salesforce,	
  QuickBooks,	
  and	
  Access	
  data	
  to	
  answer	
  these	
  questions	
  for	
  the	
  time	
  period	
  
of	
  10/1/12	
  through	
  9/30/15	
  with	
  the	
  following	
  conditions:	
  
	
  
• We	
  will	
  leave	
  out	
  data	
  involving	
  sending	
  samples	
  to	
  customers.	
  
	
  
• We	
  will	
  reference	
  the	
  Account	
  Name	
  (as	
  it	
  appears	
  in	
  Salesforce)	
  but	
  provide	
  the	
  option	
  to	
  drill	
  
down	
  to	
  view	
  Opportunity	
  name	
  (as	
  it	
  appears	
  in	
  Salesforce).	
  
	
  
• We	
  will	
  use	
  the	
  Float	
  Date	
  to	
  reference	
  the	
  yield	
  data.	
  
	
  
• We	
  will	
  use	
  the	
  Created	
  Date	
  in	
  Salesforce	
  as	
  the	
  reference	
  date	
  for	
  sales	
  data.	
  
	
  
• We	
  will	
  only	
  include	
  accounts	
  marked	
  as	
  Closed	
  Won	
  in	
  Salesforce.	
  
	
  
• We	
  will	
  show	
  the	
  quantity	
  of	
  products	
  sold	
  as	
  listed	
  by	
  grid	
  (as	
  in	
  QuickBooks).	
  
	
  
• We	
  will	
  lump	
  any	
  part	
  with	
  extra	
  wording	
  (such	
  as	
  2/1-­‐2C-­‐G200F2)	
  into	
  “custom”.	
  
	
   	
  
44	
  |	
  P a g e 	
  
	
  
Wireframes	
  
	
  
Storyboards	
  
	
  
1.	
  Yield	
  Data	
  
	
  
	
  
	
  
	
  
	
  
	
  
45	
  |	
  P a g e 	
  
	
  
	
  
2.	
  Sales	
  Data	
  
	
  
3.	
  Customer	
  Data	
  
	
   	
  
46	
  |	
  P a g e 	
  
	
  
Data	
  Dictionaries	
  
Source	
  
• PARTS_LIST	
  
	
  
• YIELD_DATA	
  
	
  
• SALESFORCE	
  
	
  
	
  
	
  
	
  
	
  
47	
  |	
  P a g e 	
  
	
  
• QUICKBOOKS	
  
	
  
	
   	
  
48	
  |	
  P a g e 	
  
	
  
Target	
  
ACCOUNT_DIM	
  
	
  
OPPORTUNITY_DIM	
  
	
  
	
  
	
  
49	
  |	
  P a g e 	
  
	
  
SALES_TIME_DIM	
  
	
  
YIELD_TIME_DIM	
  
	
  
	
  
	
  
50	
  |	
  P a g e 	
  
	
  
PARTS_DIM	
  
	
  
YIELD_FACT	
  
	
  
	
  
51	
  |	
  P a g e 	
  
	
  
	
  
SALES_FACT	
  
	
  
	
  
	
   	
  
52	
  |	
  P a g e 	
  
	
  
Reconciliation	
  Document	
  
1.	
  	
  	
  	
  	
  YIELD_FACT	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   9508	
   SELECT	
  count	
  (*)	
  
FROM	
   [Yield	
  
Results];	
  
9508	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  YIELD_FACT	
  
Sum	
  of	
  Qty	
  (Source)	
  vs	
  
Sum	
  
Quantity_Produced	
   	
  (T
arget)	
  
226938	
   SELECT	
  sum	
  ([Yield	
  
Results].[Qty])	
  
FROM	
   [Yield	
  
Results];	
  
226938	
   SELECT	
   SUM	
  
(Quantity_Produced)	
  
FROM	
  YIELD_FACT	
  
Sum	
  of	
  Out_Of	
  (source)	
  
vs	
   Total_Quantity	
  
(Target)	
  
535656	
   SELECT	
  sum	
  ([Yield	
  
Results].[Out	
  of])	
  
FROM	
   [Yield	
  
Results];	
  
535656	
   SELECT	
  SUM	
  (Total_Quantity)	
  
FROM	
  YIELD_FACT	
  
	
  
2.	
  	
  	
  	
  	
  YIELD_TIME_DIM	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   1400	
   COUNT(A2:A1401)	
   1400	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  YILED_TIME_DIM	
  
Max	
  Date	
   10/31/15	
   MAX(A2:A1401)	
   31/10/2015	
   SELECT	
  MAX	
  (Yield_Date)	
  
FROM	
  YIELD_TIME_DIM	
  
Min	
  Date	
   01/01/12	
   MIN(A2:A1401)	
   01/10/2012	
   SELECT	
  MIN	
  (Yield_Date)	
  
FROM	
  YIELD_TIME_DIM	
  
	
  
3.	
  	
  	
  	
  	
  SALES_TIME_DIM	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   1126	
   COUNT(A2:A1127)	
   1126	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  SALES_TIME_DIM	
  
Max	
  Date	
   10/31/15	
   MAX(A2:A1127)	
   31/10/2015	
   SELECT	
  MAX	
  (Sales_Date)	
  
FROM	
  SALES_TIME_DIM	
  
Min	
  Date	
   10/01/12	
   MIN(A2:A1127)	
   01/10/2012	
   SELECT	
  MIN	
  (Sales_Date)	
  
FROM	
  SALES_TIME_DIM	
  
	
  
	
  
53	
  |	
  P a g e 	
  
	
  
	
  
	
  
	
  
4.	
  	
  	
  	
  	
  ACCOUNT_DIM	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
   Number	
   of	
   Unique	
  
Rows	
  (Distinct)	
  
89	
   Remove	
   Duplicates	
  
on	
   column	
  
Account_Name	
  
89	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  ACCOUNT_DIM	
  
Total	
   Number	
   of	
   Unique	
  
Rows	
  with	
  Region	
  Americas	
  
58	
   Remove	
   Duplicates	
  
on	
   column	
  
Account_Name	
  
COUNTIF(C2:C90,”A
mericas”)	
  
58	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  ACCOUNT_DIM	
  
WHERE	
  Region	
  =	
  ‘Americas’	
  
Total	
   Number	
   of	
   Unique	
  
Rows	
  with	
  Region	
  Asia	
  
11	
   Remove	
   Duplicates	
  
on	
   column	
  
Account_Name	
  
COUNTIF(C2:C90,”Asi
a”)	
  
11	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  ACCOUNT_DIM	
  
WHERE	
  Region	
  =	
  ‘Asia’	
  
Total	
   Number	
   of	
   Unique	
  
Rows	
  with	
  Region	
  EMEA	
  
20	
   Remove	
   Duplicates	
  
on	
   column	
  
Account_Name	
  
COUNTIF(C2:C90,”EM
EA”)	
  
20	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  ACCOUNT_DIM	
  
WHERE	
  Region	
  =	
  ‘EMEA’	
  
5.	
  	
  	
  	
  	
  OPPORTUNITY_DIM	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
   Number	
   of	
   Unique	
  
Rows	
  
717	
   Remove	
   Duplicates	
  
on	
   column	
  
Opportunity	
  _Name	
  
717	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  OPPORTUNITY_DIM	
  
6.	
  	
  	
  	
  	
  PARTS_DIM	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   257	
   SELECT	
  Count	
  (*)	
  
FROM	
  Parts_output	
  
257	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  PARTS_DIM	
  
Total	
  Number	
  of	
  Rows	
  with	
  
Material	
  Au	
  
94	
   SELECT	
  Count	
  (*)	
  
FROM	
   Parts_output	
  
WHERE	
  
(((Parts_output.Mate
94	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  PARTS_DIM	
  
WHERE	
  Material	
  =	
  ‘Au’	
  
54	
  |	
  P a g e 	
  
	
  
rial)="Au"));	
  
Total	
  Number	
  of	
  Rows	
  with	
  
Material	
  C	
  
84	
   SELECT	
  Count	
  (*)	
  
FROM	
   Parts_output	
  
WHERE	
  
(((Parts_output.Mate
rial)="C"));	
  
84	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  PARTS_DIM	
  
WHERE	
  Material	
  =	
  ‘C’	
  
Total	
  Number	
  of	
  Rows	
  with	
  
Material	
  Ni	
  
78	
   SELECT	
  Count	
  (*)	
  
FROM	
   Parts_output	
  
WHERE	
  
(((Parts_output.Mate
rial)="Ni"));	
  
78	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  PARTS_DIM	
  
WHERE	
  Material	
  =	
  ‘Ni’	
  
Total	
   Number	
   of	
   Rows	
  
Custom	
  –	
  Material	
  null	
  
1	
   SELECT	
  Count	
  (*)	
  
FROM	
   Parts_output	
  
WHERE	
  
(((Parts_output.	
  
Part)=	
  "Custom"));	
  
1	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  PARTS_DIM	
  
WHERE	
   Part_Num	
   =	
  
‘Custom’	
  
7.	
  	
  	
  	
  	
  SALES_CONVERSION	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   1021	
   COUNT(G2:G1022)	
   1021	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  SALES_CONVERSION	
  
	
  
8.	
  	
  	
  	
  	
  SALES_FACT	
  
Reconciliation	
  Criteria	
   Source	
   Function	
   Target	
   Function	
  
Total	
  Number	
  of	
  Rows	
   1021	
   COUNT(G2:G1022)	
   1021	
   SELECT	
  COUNT	
  (*)	
  
FROM	
  SALES_FACT	
  
Sum	
   of	
   Total_Price	
   (Original	
  
Source)	
   vs	
   Sum	
  
of	
   	
  Total_Sales_Dolar_Amou
nt	
  (Target)	
  
122963
1	
  
SUM(J2:J1022)	
   122963
1	
  
SELECT	
   ROUND	
   (SUM	
  
(Total_Sales_Dollar_Amount
),	
  0)	
  
FROM	
  SALES_FACT	
  
Sum	
   of	
   Quantity_Pack	
  
(Intermediate)	
   vs	
   Sum	
  
Quantity_Pack	
  (Target)	
  
4271	
   SELECT	
   SUM	
  
(Quantity_Pack)	
  
FROM	
  
SALES_CONVERSIO
N	
  
4271	
   SELECT	
   SUM	
  
(Quantity_Pack)	
  
FROM	
  SALES_FACT	
  
Sum	
   of	
   Pieces_Pack	
  
(Intermediate)	
   vs	
   Sum	
  
249854	
   SELECT	
   SUM	
  
(Pieces_Pack)	
  
249854	
   SELECT	
  SUM	
  (Pieces_Pack)	
  
FROM	
  SALES_FACT	
  
55	
  |	
  P a g e 	
  
	
  
Pieces_Pack	
  (Target)	
   FROM	
  
SALES_CONVERSIO
N	
  
	
  
	
  
	
  
	
  
	
  
Validation	
  Rules	
  	
  
• QUICKBOOKS	
  
	
  
• SALESFORCE	
  
	
  
• YIELD_DATA	
  
56	
  |	
  P a g e 	
  
	
  
	
  
• P
A
R
T
S_LIST	
  
	
  
	
  
	
  
	
  
	
   	
  
57	
  |	
  P a g e 	
  
	
  
Letter	
  from	
  David	
  
	
  
58	
  |	
  P a g e 	
  
	
  
	
  
Suggestions	
  for	
  Protochips	
  
• Create	
  a	
  naming	
  convention	
  for	
  Opportunity	
  Name	
  (in	
  Salesforce)	
  
• Address	
  missing	
  data	
  in	
  Salesforce	
  and	
  QuickBooks	
  
• Standardizing	
  naming	
  of	
  parts	
  across	
  all	
  sources	
  -­‐	
  like	
  Au	
  vs	
  AU,	
  Cu	
  for	
  ‘Copper’	
  rather	
  than	
  C,	
  
etc.	
  
• Standardize	
  the	
  units	
  of	
  measure	
  across	
  sources	
  
• Unify	
  the	
  sales	
  price	
  across	
  salesforce	
  and	
  QuickBooks	
  
Number	
  of	
  Hours	
  the	
  Team	
  Worked	
  on	
  the	
  Project	
  
• Stefanie:	
  80	
  hours	
  
• Sarah:	
  140	
  hours	
  
• Shradha:	
  100	
  hours	
  
• Saniya:	
  80	
  hours	
  
• Team	
  Total:	
  400	
  Hours	
  
	
  

More Related Content

Similar to BI & DW Documentation

Data Science & BI Salary & Skills Report
Data Science & BI Salary & Skills ReportData Science & BI Salary & Skills Report
Data Science & BI Salary & Skills Report
Paul Buzby
 
software testing for beginners
software testing for beginnerssoftware testing for beginners
software testing for beginners
Bharathi Ashok
 
HJohansen (Publishable)
HJohansen (Publishable)HJohansen (Publishable)
HJohansen (Publishable)
Henry Johansen
 

Similar to BI & DW Documentation (20)

Data Science & BI Salary & Skills Report
Data Science & BI Salary & Skills ReportData Science & BI Salary & Skills Report
Data Science & BI Salary & Skills Report
 
Seu purchase requisition management system
Seu purchase requisition management systemSeu purchase requisition management system
Seu purchase requisition management system
 
Ytl03375 usen
Ytl03375 usenYtl03375 usen
Ytl03375 usen
 
Agile data science_tutorial
Agile data science_tutorialAgile data science_tutorial
Agile data science_tutorial
 
Analytics training v0.01
Analytics training  v0.01Analytics training  v0.01
Analytics training v0.01
 
Sdlc tutorial
Sdlc tutorialSdlc tutorial
Sdlc tutorial
 
Adobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness PlaybookAdobe Audience Manager Readiness Playbook
Adobe Audience Manager Readiness Playbook
 
SPi Global Services Overview
SPi Global Services OverviewSPi Global Services Overview
SPi Global Services Overview
 
software testing for beginners
software testing for beginnerssoftware testing for beginners
software testing for beginners
 
167312
167312167312
167312
 
Beginners guide to software testing
Beginners guide to software testingBeginners guide to software testing
Beginners guide to software testing
 
Pmp exam prepboothp
Pmp exam prepboothpPmp exam prepboothp
Pmp exam prepboothp
 
Placement Portfolio
Placement PortfolioPlacement Portfolio
Placement Portfolio
 
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects InstallationHP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
 
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects InstallationHP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
HP ArcSight EnterpriseView v1.6 SAP BusinessObjects Installation
 
SAP BusinessObjects Installation Guide
SAP BusinessObjects Installation GuideSAP BusinessObjects Installation Guide
SAP BusinessObjects Installation Guide
 
HP EnterpriseView v1.5 SAP BusinessObjects Installation Guide
HP EnterpriseView v1.5 SAP BusinessObjects Installation GuideHP EnterpriseView v1.5 SAP BusinessObjects Installation Guide
HP EnterpriseView v1.5 SAP BusinessObjects Installation Guide
 
Big Data, Little Data, and Everything in Between
Big Data, Little Data, and Everything in BetweenBig Data, Little Data, and Everything in Between
Big Data, Little Data, and Everything in Between
 
HJohansen (Publishable)
HJohansen (Publishable)HJohansen (Publishable)
HJohansen (Publishable)
 
On site support operations draft
On site support operations draftOn site support operations draft
On site support operations draft
 

BI & DW Documentation

  • 1. 1  |  P a g e                        
  • 2. 2  |  P a g e     Executive  Summary       Protochips,   like   many   organizations,   has   a   lot   of   data   that   they   simply   do   not   know   how   to   utilize.  Our  team  was  lucky  enough  to  be  able  to  get  our  hands  on  all  of  the  data  that  we  needed  and   had  enthusiastic  sponsorship  from  the  company.  There  were  most  definitely  some  challenges  along  the   way,  and  our  team  had  to  maneuver  all  kinds  of  obstacles,  but  in  the  end,  we  have  been  able  to  answer   the  questions  that  Protochips  posed  to  us  and  provide  them  with  a  Business  Intelligence  solution  that   will  supply  them  with  the  knowledge  they  need  in  order  to  make  strategic  and  profitable  decisions.     Our  team  has  learned  a  great  deal  in  the  course  of  this  project.  We  have  gotten  familiar  with   new  tools,  like  Talend  and  Qlikview.  We  have  confirmed  (sometimes  the  hard  way)  that  all  of  the  steps   mentioned  in  the  lectures  and  textbooks  are  so  important  to  our  understanding  of  the  business  as  well   as  our  overall  success  with  the  project.  We  have  also  grown  as  individuals,  accomplishing  things  that  we   were  not  sure  we  could  accomplish.  We  are  very  proud  of  our  final  product  and  are  excited  to  share  the   highs  (and  lows)  of  our  journey  that  eventually  led  us  to  the  finish  line.  We  are  so  appreciative  of  the   opportunity  to  carry  out  this  relevant  and  meaningful  project.        
  • 3. 3  |  P a g e     Contents   Background  ..................................................................................................................................................  5   Our  Approach  ..........................................................................................................................................  6   Decisions  We  Made  and  Why  ..................................................................................................................  6   List  of  Team  Members  and  Responsibilities/Activities  ................................................................................  8   Stefanie  Boros  ..........................................................................................................................................  8   Shradha  Salian  .........................................................................................................................................  8   Saniya  Shukla  ...........................................................................................................................................  9   Sarah  Yousef  ............................................................................................................................................  9   Changes  from  Original  Proposal  ..................................................................................................................  9   Technical  Architecture  Diagram  ................................................................................................................  10   Samples  of  Each  Data  Set  ..........................................................................................................................  11   Original  Salesforce  Sample  Data  ............................................................................................................  11   Access  -­‐  Yield  Results  Sample  Data  ........................................................................................................  13   Access  -­‐  Parts  List  Sample  Data  .............................................................................................................  14   PDF  -­‐  List  Price  Sample  Data  ..................................................................................................................  15   Dimensional  Models  ..................................................................................................................................  16   Conceptual  Model  .................................................................................................................................  16   Logical  Model  .........................................................................................................................................  17   Physical  Model  .......................................................................................................................................  18   Sample  Data  from  Dimensional  and  Fact  Tables  .......................................................................................  19   Data  Integration  Mappings  ........................................................................................................................  24   Business  Questions  ....................................................................................................................................  35   Production  .............................................................................................................................................  35   Sales  .......................................................................................................................................................  35   Customer  ...............................................................................................................................................  35   Challenges  ..................................................................................................................................................  38   Data  .......................................................................................................................................................  38   Talend  ....................................................................................................................................................  38   Visualizing  the  Solutions  to  the  Business  Questions  ..............................................................................  39   Dimensional  modelling  ..........................................................................................................................  39   Project  logistics  ......................................................................................................................................  40   Appendix  ....................................................................................................................................................  41  
  • 4. 4  |  P a g e     Interview  with  Angela  and  David  ...........................................................................................................  41   Business  Requirements  ......................................................................................................................  41   Data  Requirements  ............................................................................................................................  41   Technical  Requirements  ....................................................................................................................  41   Salesforce  Data  ..................................................................................................................................  41   Parts  Assembly  Data  (Is  this  QuickBooks?)  ........................................................................................  41   Production  Data  .................................................................................................................................  42   Yields  Data  .........................................................................................................................................  42   Proposal  Approved  by  Protochips  .............................................................................................................  43   Wireframes  ................................................................................................................................................  44   Storyboards  ...............................................................................................................................................  44   Data  Dictionaries  .......................................................................................................................................  46   Source  ....................................................................................................................................................  46   Target  .....................................................................................................................................................  48   Reconciliation  Document  ..........................................................................................................................  52   Validation  Rules  .........................................................................................................................................  55   Suggestions  for  Protochips  ........................................................................................................................  58          
  • 5. 5  |  P a g e     Background     Protochips   is   a   small   company   based   in   Morrisville,   NC   that   develops   analytical   tools   for   the   scanning  and  transmission  electron  microscope.  Protochips  products  are  used  by  university,  government   and  industry  researchers  to  understand  how  nanoscale  materials  react  under  various  stimuli,  such  as   heat,  electrical  bias  and  in  liquid  and  gas  environments.  The  company  was  founded  in  2002  and  has   since  developed  many  innovative  products  that  are  revolutionizing  this  market  space.  They  work  with   clients  all  over  the  world,  and  have  products  in  over  25  countries.   A  product  that  has  taken  second-­‐place  to  their  main,  durable  systems  is  the  consumable  known   as  C-­‐flat  Holey  Carbon  Grids.  This  product  first  was  sold  in  2012  and  sales  of  it  have  soared  unexpectedly   since  that  time.  With  $250k  in  sales  to  first  year  and  now  over  $600k  with  the  expectation  that  they  will   hit  over  $1M  soon,  it  is  clear  that  this  product  is  demanding  attention!  David,  the  CEO,  decided  that  it   was  time  to  start  figuring  out  what  trends  exist  in  the  production  and  sales  of  this  product  in  order  to   become  more  proactive  with  inventory  and  marketing,  instead  of  just  reacting  to  the  orders  coming  in.   At  this  time,  Protochips  does  not  use  any  Business  Intelligence  (BI)  tools.  They  have  been  in  the   growth  stage  and  are  now  stabilizing  so  they  want  to  make  more  strategic,  long-­‐term  decisions.  They  are   not  able  to  see  trends  easily  with  their  current  setup  of  Excel  spreadsheets  and  Salesforce  reports.  They   are  looking  for  a  BI  solution  that  will  allow  them  to  view  the  data  easily  and  from  different  angles  so  that   they  can  learn  buying  patterns  and  production  trends.  Angela,  the  director  of  operations  at  Protochips,   was  hired  recently  to  help  with  streamlining  product  manufacturing  through  data  analytics,  but  she  has   focused  on  the  other  high  value  product  lines,  leaving  C-­‐flat  as  a  low  priority.        
  • 6. 6  |  P a g e     Our  Approach     Our  team  pinned  down  some  key  questions  that  Protochips  is  looking  to  have  answered  by  our   solution,  in  three  areas:  Production,  Sales,  and  Customer.  For  Production,  the  company  needs  to  dive  in   to  see  what  patterns  are  taking  place  in  the  manufacturing  of  C-­‐flat.  Is  there  a  trend  in  the  yields  for  a   specific  part?  Are  the  yields  affected  by  what  time  of  year  it  is?  With  Sales,  it’s  important  for  Protochips   to  understand  trends  in  what  C-­‐flat  part  numbers  are  being  sold  so  that  they  can  have  a  better  handle   on  keeping  inventory  at  appropriate  levels.  They  also  want  to  see  if  certain  parts  are  sold  at  a  certain   time  of  year  or  if  there  are  patterns  to  sales  by  region  of  the  world.  When  they  understand  how  sales   are  occurring,  they  can  more  strategically  plan  their  marketing  efforts.  Finally,  the  Customer  element   focuses  on  specific  clients  and  their  buying  habits.  Who  are  the  top  customers?    What  does  a  particular   customer  tend  to  buy  and  what  time  intervals  do  they  buy  in?  This  helps  Protochips  learn  what  their   client’s  needs  are  and  can  predict  what  and  when  they  might  buy,  which  allows  them  to  not  have  to  wait   for  the  customers  to  approach  them.     Decisions  We  Made  and  Why       To  begin,  we  decided  to  focus  solely  on  C-­‐flat  and  not  include  the  other  products  for  the  sake  of   simplicity  as  well  as  considering  the  time  constraint  of  the  project  due  date.  We  also  had  to  make  some   strategic  decisions  about  what  data  we  wanted  to  use.  This  was  definitely  a  painstaking  process  because   we  were  still  learning  about  the  product  (which  was  fairly  technical)  but  we  needed  to  keep  a  steady   pace.  In  the  end,  we  focused  on  the  data  we  did  because  we  felt  that  it  would  allow  us  to  answer  the   business   questions   in   the   most   straightforward   way.   Some   decisions   we   made   with   Protochips’   validation:   • Dropped  QuickBooks:  Initially  we  were  using  this  data  to  cross  check  the  values  from  Salesforce   and  make  sure  they  were  accurate,  given  that  QuickBooks  represented  the  company’s  financials.  
  • 7. 7  |  P a g e     We  decided  to  drop  it  because  some  major  data  was  missing  and  the  level  of  granularity  did  not   match   that   of   Salesforce   source   (for   example:   multiple   orders/rows   in   QuickBooks   are   represented  as  one  row/order  in  Salesforce).  There  were  also  some  variations  in  how  the  data   was   being   entered   and   we   felt   that   reconciling   to   that   extent   was   beyond   the   scope   of   our   project.     • Missing  product  list  prices  assumed  $1  value:  In  order  to  avoid  null  and  empty  values  (and  since   the  information  was  not  available),  some  list  prices  for  parts  were  assumed  to  be  $1.  We  did  not   use   0   to   avoid   computational   problems   (such   as   infinity   values   when   dividing   by   zero).   By   making  this  change,  we  were  able  to  maintain  the  sales  data  but  just  not  allow  for  a  discount  to   be   determined   (which   was   not   part   of   the   main   business   requirements   from   Protochips   anyway).     • Auto-­‐generated  Part  Name  in  the  Parts  table  and  added  a  new  column  ‘Category’:  In  order  to   avoid   any   kind   of   inconsistent   names,   we   decided   to   auto-­‐generate   the   part   names   in   a   particular  standard.  This  was  a  combination  of  the  different  attributes  present  in  the  table.  We   felt  that  this  would  ensure  that  every  part  number  was  conforming  to  the  part  name  structure   and  would  therefore  not  be  exposed  to  entries  that  were  typed  incorrectly.  We  also  added  a   new  column  Category  consisting  of  2  values;  General  and  Custom.  There  were  a  few  parts  which   were  customized  based  on  the  customer’s  needs,  based  on  this  we  came  up  with  a  need  for   category.     • Created  a  new  database  in  Access:  Since  the  database  we  were  provided  with  was  not  editable,   we  created  a  new  database  in  Access  to  replicate  the  same.  Apart  from  the  tables  that  were   already  present,  we  created  a  new  table  for  the  Grid  prices.  Since  some  sources  of  data  defined   the  prices  in  terms  of  pack  and  some  in  terms  of  grids,  we  decided  to  create  a  table  which  would  
  • 8. 8  |  P a g e     consist   of   the   different   parts   and   their   associated   grid   prices   (25,   50   and   100   count   packs,   respectively).  This  is  turn  helped  us  generate  an  intermediate  table  which  consisted  of  the  prices   per  grid  as  well  as  pack  for  every  part  so  that  we  could  present  the  data  in  any  way  the  client   wanted.     • Decided   not   to   group   by   Opportunity   Name:   The   client   does   not   have   any   standard   way   of   defining   the   Opportunity   Names   and   therefore   no   specific   transformation   rules   could   be   performed.  We  allowed  this  level  of  granularity  to  be  reached  while  drilling  down  in  the  BI  tool,   but  we  were  not  able  to  allow  for  sorting  and  presenting  data  solely  by  the  Opportunity  Name.   List  of  Team  Members  and  Responsibilities/Activities     Stefanie  Boros   The  main  role  for  Stefanie  was  as  the  Project  Manager  but  she  was  also  the  connection  between   the  team  and  Protochips.  She  worked  on  collecting  the  data  and  contacted  David  and  Angela  several   times  throughout  the  process  to  understand  the  data,  the  process,  and  to  verify  that  we  were  on  the   right  track  every  step  of  the  way.  She  collaborated  with  every  member  of  the  team  to  make  sure  that   the  direction  the  various  elements  of  the  project  were  going  in  was  in  line  with  what  Protochips  was   looking   for.   She   assisted   in   designing   the   data   models,   reconciliation,   cleansing   rules,   and   other   activities.   She   worked   with   Saniya   on   Qlikview   to   create   the   BI   tool   that   would   be   utilized   by   the   organization.  She  also  was  in  charge  of  the  project  deliverables  throughout  the  quarter.   Shradha  Salian   The   main   role   for   Shradha   was   that   of   a   Data   Integration   Specialist   and   a   Data   Analyst.   She   worked  on  creating  a  database  for  the  source  tables,  learning  Talend  (along  with  Sarah)  to  understand   the  different  functionalities,  components,  understanding  data  sources,  the  relationships  among  them   and  how  different  operations  could  be  carried  out  on  them.  She  was  also  involved  in  writing  few  java   codes  in  Talend.  She  worked  on  creating  the  data  dictionary  for  the  ‘Parts’  data  source.  She  contributed  
  • 9. 9  |  P a g e     to  cleansing  the  QuickBooks  data  source  as  well.  She  worked  on  creating  the  validation  rules  for  all  of   the  data  sources.  She  reviewed  the  different  documents  and  made  changes  to  them  wherever  required.   Saniya  Shukla   Saniya   was   majorly   involved   on   the   BI   side   of   the   project   to   translate   the   data   sets   into   visualizations  of  charts,  graphs  and  tables.  It  involved  addressing  the  requirement  questions  from  the   client  and  delivering  the  charts  in  the  most  understand  and  user  friendly  way.  She  was  also  involved  in   connecting   the   database   to   Qlikview   to   pull   up   the   data   and   use   it   for   the   dashboards.   She   mainly   worked   on   formulating   expressions,   formulae   to   derive   charts,   graphs   and   other   objects   for   the   dashboards  according  to  the  requirement  of  Protochips.  Before  implementing  the  data  on  the  BI  tool,   she  also  created  the  initial  and  final  wireframes  and  storyboards  with  Stefanie  creating  the  sketches  for   the  initial  ones.  She  also  worked  on  creating  the  data  dictionary  for  the  Salesforce  data.       Sarah  Yousef   The  main  roles  for  Sarah  were  as  a  Technical  Architect  and  a  Data  Integration  Specialist.  She   worked  on  developing  the  data  models  for  the  target  tables,  creating  the  corresponding  tables  in  MS   SQL  Server  and  creating  their  data  dictionaries.  In  order  to  do  that,  thorough  understanding  of  the  data,   its  state  and  how  it  connects  to  one  another  was  important.  She  was  also  responsible  for  the  source  to   target  mappings,  learning  Talend  (along  with  Shradha)  and  loading  the  data  to  the  target  tables.  Given   that,  she  also  worked  on  the  reconciliation  document.  She  contributed  to  cleansing  the  sources  files  by   cleansing  SF  data  and  data  definitions  for  the  yield  result  source  table.  She  gave  feedback  and  input  on   BI  dashboard  and  the  different  project  deliverables.  She  also  acted  as  Project  Manager  (Secondary  role)   by  preparing  meeting  agendas  and  action  items,  and  by  following  up  with  everyone  to  make  sure  the   project  was  on  the  right  track.       Changes  from  Original  Proposal    
  • 10. 10  |  P a g e     Our  project  ended  up  following  the  path  that  we  set  up  in  our  original  proposal  pretty  closely,   with  just  a  few  changes.  We  felt  that,  given  the  time  constraints  of  this  project,  it  made  the  most  sense   to  stick  with  only  looking  at  C-­‐flat,  as  we  had  originally  discussed  with  David,  instead  of  trying  to  analyze   all  product  lines.  We  also  came  to  discover  that  QuickBooks  was  not  really  providing  us  with  anything   that  we  could  not  get  from  Salesforce  or  easily  reconcile.  We  were  hoping  to  use  QuickBooks  as  a  way  to   cross-­‐check  sales,  but  Protochips  had  inconsistent  methods  of  entering  sales  into  each  database,  so  we   decided  to  simplify  by  removing  QuickBooks  without  having  to  sacrifice  any  vital  data.  We  did  have  to   get  additional  data  from  Protochips  to  provide  us  with  the  List  Prices  of  the  parts,  so  that  was  another   data  source  that  was  not  stated  in  the  proposal.  Finally,  we  had  to  switch  over  to  Microsoft  SQL  Server   Management  instead  of  using  MySQL  as  we  had  originally  planned  because  of  familiarity  with  the  tool.   Technical  Architecture  Diagram          
  • 11. 11  |  P a g e     Samples  of  Each  Data  Set   Original  Salesforce  Sample  Data     • Manual  cleansing  included  removing  columns  (see  above)  and  rows  (see  below).   Original  Source  Number  of  Rows   Cleansed   Source   Number   of   Rows   Difference   1175   1021   154   Rows  Removed  (by  row  #)   Row  Count   Reason   6,  103,  191,  215-­‐217,  219,  226-­‐232,  312,  397-­‐398,   407-­‐408,   447,   464-­‐475,   493,   511-­‐513,   518,   532,   686,   703-­‐704,   714-­‐715,   750,   773,918,   922,   1056,     43   Client  request  –  removed  Stage   Closed   Lost,   Postponed,   Target,   Closed/Dead  End,  Imminent  
  • 12. 12  |  P a g e     1068,  1074-­‐1075,  1129,  1175   5,  7-­‐16,  18-­‐25,  40,  46-­‐47,  101-­‐102,  236,  445,  463,   482-­‐483,   498,   524-­‐525,   581,   627,   669,   682,   720,   727,  738,  892     40   Client   request   –   remove   Order   Amount  =  0   26-­‐29,   48-­‐69,   212,   224,   245-­‐247,   270-­‐282,   470,   537-­‐538,   585-­‐590,   725,   728,   747-­‐748,   763-­‐768,   840,  906-­‐907,  939,  1031,  1064,  1100     70   Client  request  –  remove  Created   dates  before  10/01/2012   17   1   Error  values   Total   154   -­‐-­‐-­‐     Rows  Amended  (by  row  #)   Row   Count   Reason   13   1   Error  in  Sales_Price  -­‐  Change  to  5.706  -­‐  cross  referenced  with   quickbooks   14   1   Error  in  Sales_Price  -­‐  Change  to  5.391-­‐  cross  referenced  with   quickbooks   194,  196   2   Error  in  Sales_Price  -­‐  Change  to  6.462-­‐  cross  referenced  with   quickbooks   195   1   Error  in  Sales_Price  -­‐  Change  to  7.002-­‐  cross  referenced  with   quickbooks   666,  667,  758,  887     4   Error  in  Sales_Price  -­‐  Change  to  728.1-­‐  cross  referenced  with   quickbooks   668,   756,   757,   888,   908,   916,  919   7   Error  in  Sales_Price  -­‐  Change  to  584.1-­‐  cross  referenced  with   quickbooks   All   1021   Removed   the   letters   CF   in   all   product   names   =RIGHT(E2,LEN(E2)  -­‐3)  
  • 13. 13  |  P a g e       Access  -­‐  Yield  Results  Sample  Data     • Manual  cleansing  included  removing  columns  (see  above)  and  rows  (see  below).     Original  Source  Number  of  Rows   Cleansed  Source  Number  of  Rows   Difference   9533   9508   25       Rows  Removed  (by  Wafer_ID)   Row   Count   Reason   16488,  19340,  18398,  20215,  20838   5   ‘Qty’   value   greater   than   ‘Out   of’   value   16085,  16882,  18676,  19150   4   Empty  ‘Out  of’  values  
  • 14. 14  |  P a g e     22808,  22809,  22125,  22363,  22570,  22772,  22822,   25282   8   Other  empty  values   17329,  21063,  21092,  17994,  18228,  18709,  19665,   20484   8   Extremely   high   ‘Out   of’   value   –   anomalies   Total   25   -­‐-­‐-­‐     Access  -­‐  Parts  List  Sample  Data        
  • 15. 15  |  P a g e     • Manual  Cleansing  included   Original  Source  Number  of  Rows   Cleansed  Source  Number  of  Rows   Difference   74   79   5       PDF  -­‐  List  Price  Sample  Data     • Manual  Cleansing  included   Original  Source  Number  of  Rows   Cleansed  Source  Number  of  Rows   Difference   81   219   138        
  • 16. 16  |  P a g e     Dimensional  Models     Conceptual  Model      
  • 17. 17  |  P a g e     Logical  Model                        
  • 18. 18  |  P a g e     Physical  Model          
  • 19. 19  |  P a g e     Sample  Data  from  Dimensional  and  Fact  Tables     ACCOUNT_DIM  sample  data       OPPORTUNITY_DIM  sample  data    
  • 20. 20  |  P a g e     PARTS_DIM  data  sample     YIELD_TIME_DIM  data  sample          
  • 21. 21  |  P a g e     SALES_TIME_DIM  data  sample     YIELD_FACT  data  sample          
  • 22. 22  |  P a g e     SALES_FACT  sample  data       SALES_CONVERSION  sample  data  (an  intermediary  table  NOT  included  in  modeling)          
  • 23. 23  |  P a g e     GRIDS_PRICE  sample  data  (a  table  in  the  access  database  table  NOT  included  in  modeling)        
  • 24. 24  |  P a g e     Data  Integration  Mappings       ACCOUNT_DIM   • Load  Job     • tMap     • Source  to  Target  Mapping                
  • 25. 25  |  P a g e       OPPORTUNITY_DIM   • Load  Job     • tMap     • Source  to  Target  Mapping      
  • 26. 26  |  P a g e         SALES_TIME_DIM   • Load  Job     • tMap     • Source  to  Target  Mapping        
  • 27. 27  |  P a g e     SALES_CONVERSION  (Intermediate  Table)   • Load  Job     • tJavaRow     • tMap              
  • 28. 28  |  P a g e     • Source  to  Target  Mapping         SALES_FACT   • Load   Job        
  • 29. 29  |  P a g e     • tJavaRow     • tMap  
  • 30. 30  |  P a g e       • Source  to  Target  Mapping       PARTS_DIM   • Load  Job  
  • 31. 31  |  P a g e       • tMap           • Source  to  Target  Mapping     YIELD_TIME_DIM   • Load  Job     • tMap  
  • 32. 32  |  P a g e       • Source  to  Target  Mapping     YIELD_FACT   • Load  Job     • tJavaRow    
  • 33. 33  |  P a g e     • tMap     • Source  to  Target  Mapping     PARTS  INTERMEDIATE  TABLE:   • Load  Job  
  • 34. 34  |  P a g e       • tMap             • tJavaRow  
  • 35. 35  |  P a g e         Business  Questions   Production   • What  is  the  trend  in  yields  for  each  part  number?   • Is  there  any  trend  in  the  production  by  time  of  the  year?     Sales   • What  trends  are  there  in  the  sales  of  parts?   • Is  there  a  seasonality  to  the  sales?   • Is  there  a  trend  in  sales  by  region  of  the  world?     Customer   • Who  are  the  top  customers?   • Does  an  individual  customer  tend  to  buy  at  a  certain  time  or  at  a  certain  time  interval?   • What  does  an  individual  customer  tend  to  buy?  
  • 36. 36  |  P a g e        
  • 37. 37  |  P a g e              
  • 38. 38  |  P a g e     Challenges   Data     As  is  expected,  the  data  proved  to  be  quite  the  time-­‐consuming  element.  For  starters,  getting   the  data  itself  seemed  to  take  a  lot  longer  than  we  anticipated.  Once  we  finally  did  receive  it  all,  we  had   to  understand  what  we  were  dealing  with.  We  spent  hours  poring  over  each  attribute,  trying  to  see  how   everything  tied  together.  We  also  had  conversations  with  David,  Angela,  and  Nicole  (another  Protochips   employee  who  deals  with  the  data).  Once  we  finally  felt  comfortable  with  the  data,  we  had  to  decide  on   what  was  and  was  not  appropriate  to  use  in  order  to  answer  our  questions.  This  was  an  element  that  we   did  not  even  confirm  until  toward  the  end  of  the  project!     Beyond  just  getting  and  understanding  the  data,  we  had  to  work  on  cleansing  and  integrating  it.   There  were  rows  that  were  missing  values,  rows  that  had  inconsistent  values,  etc.  Step  by  step,  we  had   to  make  decisions  (and  get  confirmation  from  Protochips)  about  how  to  handle  each  and  every  quirk  in   the  data.  For  integration,  we  had  to  try  to  see  the  big  picture  and  keep  our  minds  on  the  overall  goal  of   answering   the   business   questions   in   order   to   effectively   choose   the   right   path.   Ultimately,   we   kept   coming  back  to  these  business  questions  and  it  really  helped  to  guide  us  along  and  make  successful   choices.     Talend     As  expected,  with  any  new  tool,  there  was  a  learning  curve  to  understanding  and  utilizing  our   integration  tool,  Talend.  Initially,  it  was  just  about  understanding  what  capabilities  the  tool  had  and  how   to  get  the  results  we  wanted.  Capabilities  included  learning  the  different  palette  components  that  could   be  useful  for  our  scenarios,  how  to  use  them  and  then  how  to  use  other  build  in  functionality  such  as   built  in  expressions.  There  were  technical  issues  with  setting  up  and  connecting  to  the  database  as  well   as  learning  what  the  various  error  messages  meant  and  how  to  fix  them.  Loading  the  fact  tables  was  a   hurdle   because   you   had   to   be   able   to   make   sure   you   had   all   the   necessary   joins   between   all   your   dimension  tables  so  that  rows  are  pulled  correctly.  We  spent  a  great  deal  of  time  trying  to  reconcile  our  
  • 39. 39  |  P a g e     fact   tables   thinking   that   the   major   problem   was   how   we   joined   the   tables   together.   After   further   analysis  to  the  datasets  that  are  not  being  pulled  in  we  came  to  realize  that  some  of  the  Sales  Price   values  were  not  correct  and  therefore  were  not  being  calculated  correctly.  We  had  to  then  cross  check   with  QuickBooks  and  get  those  values,  change  them  in  the  source  date,  make  note  of  them  and  finally   reload  the  tables.     Visualizing  the  Solutions  to  the  Business  Questions     Again,  we  had  to  first  learn  how  to  use  Qlikview  and  what  was  possible  with  the  tool  but  once   we  did  that,  we  had  to  plan  quite  a  bit  on  how  to  better  serve  the  needs  of  our  client  with  the  data   visualization.  There  were  some  technical  errors  with  trying  to  connect  to  the  database  and  not  being   able  to  pull  all  of  the  data  into  the  tool.  While  using  the  tool,  we  were  able  to  identify  some  more  areas   where  the  data  needed  some  double-­‐checking  (invalid  entries,  anomalies,  etc.)  and  worked  little  by  little   to  come  to  a  useful  and  powerful  tool  to  allow  Protochips  to  view  their  data  like  they  never  have  before.   The   driving   force   was   always   to   respond   to   the   questions   that   Protochips   wanted   answered   and,   by   doing  this,  we  were  able  to  stay  focused  on  what  was  important  and  not  get  caught  up  in  things  that   were  not.     Dimensional  modelling       Dimensional  modeling  was  a  new  concept  to  our  team  so  there  we  had  to  spend  a  bit  of  One  on   the  main  challenges  we  faced  was  actually  coming  up  with  our  dimension  model  and  the  reason  why   was  because  we  did  not  get  exposed  to  it  until  this  class.  Understanding  the  difference  between  it  and  a   relational  model  was  crucial.  Knowing  which  attributes  to  include  in  the  fact  versus  the  dimensions  and   why  was  also  challenging.  We  also  had  to  decide  if  we  needed  some  attributes  in  the  model  or  will  we   pushing  them  to  the  BI  tool.  An  example  of  that  were  any  kind  of  aggregations  that  we  were  performing   on  the  data.  One  other  issue  we  faced  was  visualizing  how  to  deal  with  intermediate  tables  and  how  to   connect  it  to  the  fact  and  dimensions  to  pull  the  data  given  that  it  is  not  present  in  the  model  itself.      
  • 40. 40  |  P a g e       Project  logistics     In  general,  this  project  was  an  incredible  experience  for  us,  but  it  was  also  a  big  commitment  for   us.  All  of  us  had  other  classes  to  worry  about,  one  of  our  teammates  has  a  job,  and  two  of  us  have  young   children  at  home.  We  all  had  to  be  extremely  flexible  in  order  to  find  meeting  times  that  worked  and   balancing   the   workload   was   a   struggle   at   times.   Not   only   that,   but   we   had   spent   so   much   time   and   energy  in  completing  this  project  that  it  was  hard  to  find  intrinsic  motivation  by  the  end  of  the  quarter.   We  have  all  been  able  to  take  away  invaluable  experiences  and  skills  with  us  from  this  project,  but  we   would  be  remiss  if  we  did  not  mention  the  dedication  required  to  complete  this  as  an  obstacle.            
  • 41. 41  |  P a g e     Appendix   Interview  with  Angela  and  David     Business  Requirements:   • Any  regulatory  or  compliance  considerations?   • How  will  users  be  accessing  the  BI  tool?   • Who  are  the  intended  users?   • Is  there  anyone  else  we  should  interview?   • What  are  the  key  deliverables  required?   o Dashboard?   o Charts/Graphs?   o Reports?   • What  is  the  problem?   • What  is  the  expected  solution  to  the  problem  with  this  BI  tool/data?   • What  is  the  priority  of  requirements?   o List  of  NEEDS   o List  of  WANTS     Data  Requirements:   • Any  other  sources  of  data  needed?  (Do  we  have  access  to  them?)   • Are  there  any  modifications  done  to  the  data  from  the  source  to  staging?   • Who  is  touching  the  data?  (Entering,  changing,  deleting,  processing,  etc.)   • What  timeframe  are  we  looking  at?  How  often  (Level  of  granularity  is  needed)?   • Do  you  have  any  reporting  needs  in  mind  that  you  would  like  us  to  solve?  E.g.:  if  revenue  data  is   available,  then  maybe  what  is  the  revenue  earned  per  customer.   • Could  we  get  some  of  your  guidance  in  building  the  BI  solution?   • Can  any  data  be  left  off?     Technical  Requirements:   • Is  there  any  reason  we  need  to  provide  role-­‐based  access  in  the  BI  tool?   • Are  there  any  BI  tools  currently  in  place?  Any  data  warehouses,  marts,  etc.?   • Are  there  any  technical  specifications  with  regard  to  hardware/software  vendors?   • Do  we  need  to  provide  web  services  or  a  cloud-­‐based  environment?     Salesforce  Data:   • Describe  the  terms  used  in  the  column:  Stage,  Sales  Price,  and  List  Price...   • Are  all  dollar  amounts  converted  to  USD?  At  what  point?   • What  do  you  mean  by  fraction  numbers  in  quantity  ordered?     • Difference  between  product  date  and  close  date?   • What  do  zero  values  in  the  sales  price  mean  (if  it  is  won,  shouldn’t  it  have  a  value?)  Or  do  these   numbers  refer  to  something  else?   • What  are  the  number  values  in  the  “Opportunity  Name”  column?  How  are  they  assigned?   • Do  you  have  a  data  dictionary?  (for  all  excel  sheets)     Parts  Assembly  Data  (Is  this  QuickBooks?):   • Why  do  some  rows  have  no  QTY?     • Why  do  some  rows  have  quantity  but  no  sales  price  and  amounts?  
  • 42. 42  |  P a g e     • Are  prices  for  the  same  product  varying  based  on  customer?  Quantity?  etc.?   • Explain  formula  in  column  amount?   • “Name”  is  mapping  to  “Account”  in  Salesforce  data?     Production  Data:   • What  is  the  best  worksheet  to  use?   • How  does  this  data  tie  to  the  other  spreadsheets?     Yields  Data:   • What  is  the  best  worksheet  to  use?   • How  does  this  data  tie  to  the  other  spreadsheets?   • How  the  Yield  %  is  computed  -­‐  what  data  feeds  into  these  values?      
  • 43. 43  |  P a g e     Proposal  Approved  by  Protochips     Protochips  C-­‐flat  Analysis   The  purpose  of  this  project  is  to  learn  more  about  the  trends  in  both  the  manufacturing  and  sales  of  the   C-­‐flat  product.  We  aim  to  answer  the  following  questions:     • Yield:   o What  is  the  trend  in  yields  for  each  part  number?   o Is  there  any  trend  in  the  production  by  time  of  the  year?     • Sales:   o What  trends  are  there  in  the  sales  of  parts?   o Is  there  a  seasonality  to  the  sales?   o Is  there  a  trend  in  sales  by  region  of  the  world?     • Customer:   o Who  are  the  top  customers?   o Does  an  individual  customer  tend  to  buy  at  a  certain  time  or  at  a  certain  time  interval?   o What  does  an  individual  customer  tend  to  buy?     We  will  use  the  Salesforce,  QuickBooks,  and  Access  data  to  answer  these  questions  for  the  time  period   of  10/1/12  through  9/30/15  with  the  following  conditions:     • We  will  leave  out  data  involving  sending  samples  to  customers.     • We  will  reference  the  Account  Name  (as  it  appears  in  Salesforce)  but  provide  the  option  to  drill   down  to  view  Opportunity  name  (as  it  appears  in  Salesforce).     • We  will  use  the  Float  Date  to  reference  the  yield  data.     • We  will  use  the  Created  Date  in  Salesforce  as  the  reference  date  for  sales  data.     • We  will  only  include  accounts  marked  as  Closed  Won  in  Salesforce.     • We  will  show  the  quantity  of  products  sold  as  listed  by  grid  (as  in  QuickBooks).     • We  will  lump  any  part  with  extra  wording  (such  as  2/1-­‐2C-­‐G200F2)  into  “custom”.      
  • 44. 44  |  P a g e     Wireframes     Storyboards     1.  Yield  Data              
  • 45. 45  |  P a g e       2.  Sales  Data     3.  Customer  Data      
  • 46. 46  |  P a g e     Data  Dictionaries   Source   • PARTS_LIST     • YIELD_DATA     • SALESFORCE            
  • 47. 47  |  P a g e     • QUICKBOOKS        
  • 48. 48  |  P a g e     Target   ACCOUNT_DIM     OPPORTUNITY_DIM        
  • 49. 49  |  P a g e     SALES_TIME_DIM     YIELD_TIME_DIM        
  • 50. 50  |  P a g e     PARTS_DIM     YIELD_FACT      
  • 51. 51  |  P a g e       SALES_FACT          
  • 52. 52  |  P a g e     Reconciliation  Document   1.          YIELD_FACT   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   9508   SELECT  count  (*)   FROM   [Yield   Results];   9508   SELECT  COUNT  (*)   FROM  YIELD_FACT   Sum  of  Qty  (Source)  vs   Sum   Quantity_Produced    (T arget)   226938   SELECT  sum  ([Yield   Results].[Qty])   FROM   [Yield   Results];   226938   SELECT   SUM   (Quantity_Produced)   FROM  YIELD_FACT   Sum  of  Out_Of  (source)   vs   Total_Quantity   (Target)   535656   SELECT  sum  ([Yield   Results].[Out  of])   FROM   [Yield   Results];   535656   SELECT  SUM  (Total_Quantity)   FROM  YIELD_FACT     2.          YIELD_TIME_DIM   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   1400   COUNT(A2:A1401)   1400   SELECT  COUNT  (*)   FROM  YILED_TIME_DIM   Max  Date   10/31/15   MAX(A2:A1401)   31/10/2015   SELECT  MAX  (Yield_Date)   FROM  YIELD_TIME_DIM   Min  Date   01/01/12   MIN(A2:A1401)   01/10/2012   SELECT  MIN  (Yield_Date)   FROM  YIELD_TIME_DIM     3.          SALES_TIME_DIM   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   1126   COUNT(A2:A1127)   1126   SELECT  COUNT  (*)   FROM  SALES_TIME_DIM   Max  Date   10/31/15   MAX(A2:A1127)   31/10/2015   SELECT  MAX  (Sales_Date)   FROM  SALES_TIME_DIM   Min  Date   10/01/12   MIN(A2:A1127)   01/10/2012   SELECT  MIN  (Sales_Date)   FROM  SALES_TIME_DIM      
  • 53. 53  |  P a g e           4.          ACCOUNT_DIM   Reconciliation  Criteria   Source   Function   Target   Function   Total   Number   of   Unique   Rows  (Distinct)   89   Remove   Duplicates   on   column   Account_Name   89   SELECT  COUNT  (*)   FROM  ACCOUNT_DIM   Total   Number   of   Unique   Rows  with  Region  Americas   58   Remove   Duplicates   on   column   Account_Name   COUNTIF(C2:C90,”A mericas”)   58   SELECT  COUNT  (*)   FROM  ACCOUNT_DIM   WHERE  Region  =  ‘Americas’   Total   Number   of   Unique   Rows  with  Region  Asia   11   Remove   Duplicates   on   column   Account_Name   COUNTIF(C2:C90,”Asi a”)   11   SELECT  COUNT  (*)   FROM  ACCOUNT_DIM   WHERE  Region  =  ‘Asia’   Total   Number   of   Unique   Rows  with  Region  EMEA   20   Remove   Duplicates   on   column   Account_Name   COUNTIF(C2:C90,”EM EA”)   20   SELECT  COUNT  (*)   FROM  ACCOUNT_DIM   WHERE  Region  =  ‘EMEA’   5.          OPPORTUNITY_DIM   Reconciliation  Criteria   Source   Function   Target   Function   Total   Number   of   Unique   Rows   717   Remove   Duplicates   on   column   Opportunity  _Name   717   SELECT  COUNT  (*)   FROM  OPPORTUNITY_DIM   6.          PARTS_DIM   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   257   SELECT  Count  (*)   FROM  Parts_output   257   SELECT  COUNT  (*)   FROM  PARTS_DIM   Total  Number  of  Rows  with   Material  Au   94   SELECT  Count  (*)   FROM   Parts_output   WHERE   (((Parts_output.Mate 94   SELECT  COUNT  (*)   FROM  PARTS_DIM   WHERE  Material  =  ‘Au’  
  • 54. 54  |  P a g e     rial)="Au"));   Total  Number  of  Rows  with   Material  C   84   SELECT  Count  (*)   FROM   Parts_output   WHERE   (((Parts_output.Mate rial)="C"));   84   SELECT  COUNT  (*)   FROM  PARTS_DIM   WHERE  Material  =  ‘C’   Total  Number  of  Rows  with   Material  Ni   78   SELECT  Count  (*)   FROM   Parts_output   WHERE   (((Parts_output.Mate rial)="Ni"));   78   SELECT  COUNT  (*)   FROM  PARTS_DIM   WHERE  Material  =  ‘Ni’   Total   Number   of   Rows   Custom  –  Material  null   1   SELECT  Count  (*)   FROM   Parts_output   WHERE   (((Parts_output.   Part)=  "Custom"));   1   SELECT  COUNT  (*)   FROM  PARTS_DIM   WHERE   Part_Num   =   ‘Custom’   7.          SALES_CONVERSION   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   1021   COUNT(G2:G1022)   1021   SELECT  COUNT  (*)   FROM  SALES_CONVERSION     8.          SALES_FACT   Reconciliation  Criteria   Source   Function   Target   Function   Total  Number  of  Rows   1021   COUNT(G2:G1022)   1021   SELECT  COUNT  (*)   FROM  SALES_FACT   Sum   of   Total_Price   (Original   Source)   vs   Sum   of    Total_Sales_Dolar_Amou nt  (Target)   122963 1   SUM(J2:J1022)   122963 1   SELECT   ROUND   (SUM   (Total_Sales_Dollar_Amount ),  0)   FROM  SALES_FACT   Sum   of   Quantity_Pack   (Intermediate)   vs   Sum   Quantity_Pack  (Target)   4271   SELECT   SUM   (Quantity_Pack)   FROM   SALES_CONVERSIO N   4271   SELECT   SUM   (Quantity_Pack)   FROM  SALES_FACT   Sum   of   Pieces_Pack   (Intermediate)   vs   Sum   249854   SELECT   SUM   (Pieces_Pack)   249854   SELECT  SUM  (Pieces_Pack)   FROM  SALES_FACT  
  • 55. 55  |  P a g e     Pieces_Pack  (Target)   FROM   SALES_CONVERSIO N             Validation  Rules     • QUICKBOOKS     • SALESFORCE     • YIELD_DATA  
  • 56. 56  |  P a g e       • P A R T S_LIST              
  • 57. 57  |  P a g e     Letter  from  David    
  • 58. 58  |  P a g e       Suggestions  for  Protochips   • Create  a  naming  convention  for  Opportunity  Name  (in  Salesforce)   • Address  missing  data  in  Salesforce  and  QuickBooks   • Standardizing  naming  of  parts  across  all  sources  -­‐  like  Au  vs  AU,  Cu  for  ‘Copper’  rather  than  C,   etc.   • Standardize  the  units  of  measure  across  sources   • Unify  the  sales  price  across  salesforce  and  QuickBooks   Number  of  Hours  the  Team  Worked  on  the  Project   • Stefanie:  80  hours   • Sarah:  140  hours   • Shradha:  100  hours   • Saniya:  80  hours   • Team  Total:  400  Hours