This document provides an executive summary and details of a business intelligence project completed for Protochips. The team built a dimensional data model and integrated data from multiple sources to answer key business questions about production, sales, and customers. They overcame challenges with the data and Talend tools to deliver a solution that will help Protochips make more strategic decisions.
2. 2
|
P a g e
Executive
Summary
Protochips,
like
many
organizations,
has
a
lot
of
data
that
they
simply
do
not
know
how
to
utilize.
Our
team
was
lucky
enough
to
be
able
to
get
our
hands
on
all
of
the
data
that
we
needed
and
had
enthusiastic
sponsorship
from
the
company.
There
were
most
definitely
some
challenges
along
the
way,
and
our
team
had
to
maneuver
all
kinds
of
obstacles,
but
in
the
end,
we
have
been
able
to
answer
the
questions
that
Protochips
posed
to
us
and
provide
them
with
a
Business
Intelligence
solution
that
will
supply
them
with
the
knowledge
they
need
in
order
to
make
strategic
and
profitable
decisions.
Our
team
has
learned
a
great
deal
in
the
course
of
this
project.
We
have
gotten
familiar
with
new
tools,
like
Talend
and
Qlikview.
We
have
confirmed
(sometimes
the
hard
way)
that
all
of
the
steps
mentioned
in
the
lectures
and
textbooks
are
so
important
to
our
understanding
of
the
business
as
well
as
our
overall
success
with
the
project.
We
have
also
grown
as
individuals,
accomplishing
things
that
we
were
not
sure
we
could
accomplish.
We
are
very
proud
of
our
final
product
and
are
excited
to
share
the
highs
(and
lows)
of
our
journey
that
eventually
led
us
to
the
finish
line.
We
are
so
appreciative
of
the
opportunity
to
carry
out
this
relevant
and
meaningful
project.
3. 3
|
P a g e
Contents
Background
..................................................................................................................................................
5
Our
Approach
..........................................................................................................................................
6
Decisions
We
Made
and
Why
..................................................................................................................
6
List
of
Team
Members
and
Responsibilities/Activities
................................................................................
8
Stefanie
Boros
..........................................................................................................................................
8
Shradha
Salian
.........................................................................................................................................
8
Saniya
Shukla
...........................................................................................................................................
9
Sarah
Yousef
............................................................................................................................................
9
Changes
from
Original
Proposal
..................................................................................................................
9
Technical
Architecture
Diagram
................................................................................................................
10
Samples
of
Each
Data
Set
..........................................................................................................................
11
Original
Salesforce
Sample
Data
............................................................................................................
11
Access
-‐
Yield
Results
Sample
Data
........................................................................................................
13
Access
-‐
Parts
List
Sample
Data
.............................................................................................................
14
PDF
-‐
List
Price
Sample
Data
..................................................................................................................
15
Dimensional
Models
..................................................................................................................................
16
Conceptual
Model
.................................................................................................................................
16
Logical
Model
.........................................................................................................................................
17
Physical
Model
.......................................................................................................................................
18
Sample
Data
from
Dimensional
and
Fact
Tables
.......................................................................................
19
Data
Integration
Mappings
........................................................................................................................
24
Business
Questions
....................................................................................................................................
35
Production
.............................................................................................................................................
35
Sales
.......................................................................................................................................................
35
Customer
...............................................................................................................................................
35
Challenges
..................................................................................................................................................
38
Data
.......................................................................................................................................................
38
Talend
....................................................................................................................................................
38
Visualizing
the
Solutions
to
the
Business
Questions
..............................................................................
39
Dimensional
modelling
..........................................................................................................................
39
Project
logistics
......................................................................................................................................
40
Appendix
....................................................................................................................................................
41
4. 4
|
P a g e
Interview
with
Angela
and
David
...........................................................................................................
41
Business
Requirements
......................................................................................................................
41
Data
Requirements
............................................................................................................................
41
Technical
Requirements
....................................................................................................................
41
Salesforce
Data
..................................................................................................................................
41
Parts
Assembly
Data
(Is
this
QuickBooks?)
........................................................................................
41
Production
Data
.................................................................................................................................
42
Yields
Data
.........................................................................................................................................
42
Proposal
Approved
by
Protochips
.............................................................................................................
43
Wireframes
................................................................................................................................................
44
Storyboards
...............................................................................................................................................
44
Data
Dictionaries
.......................................................................................................................................
46
Source
....................................................................................................................................................
46
Target
.....................................................................................................................................................
48
Reconciliation
Document
..........................................................................................................................
52
Validation
Rules
.........................................................................................................................................
55
Suggestions
for
Protochips
........................................................................................................................
58
5. 5
|
P a g e
Background
Protochips
is
a
small
company
based
in
Morrisville,
NC
that
develops
analytical
tools
for
the
scanning
and
transmission
electron
microscope.
Protochips
products
are
used
by
university,
government
and
industry
researchers
to
understand
how
nanoscale
materials
react
under
various
stimuli,
such
as
heat,
electrical
bias
and
in
liquid
and
gas
environments.
The
company
was
founded
in
2002
and
has
since
developed
many
innovative
products
that
are
revolutionizing
this
market
space.
They
work
with
clients
all
over
the
world,
and
have
products
in
over
25
countries.
A
product
that
has
taken
second-‐place
to
their
main,
durable
systems
is
the
consumable
known
as
C-‐flat
Holey
Carbon
Grids.
This
product
first
was
sold
in
2012
and
sales
of
it
have
soared
unexpectedly
since
that
time.
With
$250k
in
sales
to
first
year
and
now
over
$600k
with
the
expectation
that
they
will
hit
over
$1M
soon,
it
is
clear
that
this
product
is
demanding
attention!
David,
the
CEO,
decided
that
it
was
time
to
start
figuring
out
what
trends
exist
in
the
production
and
sales
of
this
product
in
order
to
become
more
proactive
with
inventory
and
marketing,
instead
of
just
reacting
to
the
orders
coming
in.
At
this
time,
Protochips
does
not
use
any
Business
Intelligence
(BI)
tools.
They
have
been
in
the
growth
stage
and
are
now
stabilizing
so
they
want
to
make
more
strategic,
long-‐term
decisions.
They
are
not
able
to
see
trends
easily
with
their
current
setup
of
Excel
spreadsheets
and
Salesforce
reports.
They
are
looking
for
a
BI
solution
that
will
allow
them
to
view
the
data
easily
and
from
different
angles
so
that
they
can
learn
buying
patterns
and
production
trends.
Angela,
the
director
of
operations
at
Protochips,
was
hired
recently
to
help
with
streamlining
product
manufacturing
through
data
analytics,
but
she
has
focused
on
the
other
high
value
product
lines,
leaving
C-‐flat
as
a
low
priority.
6. 6
|
P a g e
Our
Approach
Our
team
pinned
down
some
key
questions
that
Protochips
is
looking
to
have
answered
by
our
solution,
in
three
areas:
Production,
Sales,
and
Customer.
For
Production,
the
company
needs
to
dive
in
to
see
what
patterns
are
taking
place
in
the
manufacturing
of
C-‐flat.
Is
there
a
trend
in
the
yields
for
a
specific
part?
Are
the
yields
affected
by
what
time
of
year
it
is?
With
Sales,
it’s
important
for
Protochips
to
understand
trends
in
what
C-‐flat
part
numbers
are
being
sold
so
that
they
can
have
a
better
handle
on
keeping
inventory
at
appropriate
levels.
They
also
want
to
see
if
certain
parts
are
sold
at
a
certain
time
of
year
or
if
there
are
patterns
to
sales
by
region
of
the
world.
When
they
understand
how
sales
are
occurring,
they
can
more
strategically
plan
their
marketing
efforts.
Finally,
the
Customer
element
focuses
on
specific
clients
and
their
buying
habits.
Who
are
the
top
customers?
What
does
a
particular
customer
tend
to
buy
and
what
time
intervals
do
they
buy
in?
This
helps
Protochips
learn
what
their
client’s
needs
are
and
can
predict
what
and
when
they
might
buy,
which
allows
them
to
not
have
to
wait
for
the
customers
to
approach
them.
Decisions
We
Made
and
Why
To
begin,
we
decided
to
focus
solely
on
C-‐flat
and
not
include
the
other
products
for
the
sake
of
simplicity
as
well
as
considering
the
time
constraint
of
the
project
due
date.
We
also
had
to
make
some
strategic
decisions
about
what
data
we
wanted
to
use.
This
was
definitely
a
painstaking
process
because
we
were
still
learning
about
the
product
(which
was
fairly
technical)
but
we
needed
to
keep
a
steady
pace.
In
the
end,
we
focused
on
the
data
we
did
because
we
felt
that
it
would
allow
us
to
answer
the
business
questions
in
the
most
straightforward
way.
Some
decisions
we
made
with
Protochips’
validation:
• Dropped
QuickBooks:
Initially
we
were
using
this
data
to
cross
check
the
values
from
Salesforce
and
make
sure
they
were
accurate,
given
that
QuickBooks
represented
the
company’s
financials.
7. 7
|
P a g e
We
decided
to
drop
it
because
some
major
data
was
missing
and
the
level
of
granularity
did
not
match
that
of
Salesforce
source
(for
example:
multiple
orders/rows
in
QuickBooks
are
represented
as
one
row/order
in
Salesforce).
There
were
also
some
variations
in
how
the
data
was
being
entered
and
we
felt
that
reconciling
to
that
extent
was
beyond
the
scope
of
our
project.
• Missing
product
list
prices
assumed
$1
value:
In
order
to
avoid
null
and
empty
values
(and
since
the
information
was
not
available),
some
list
prices
for
parts
were
assumed
to
be
$1.
We
did
not
use
0
to
avoid
computational
problems
(such
as
infinity
values
when
dividing
by
zero).
By
making
this
change,
we
were
able
to
maintain
the
sales
data
but
just
not
allow
for
a
discount
to
be
determined
(which
was
not
part
of
the
main
business
requirements
from
Protochips
anyway).
• Auto-‐generated
Part
Name
in
the
Parts
table
and
added
a
new
column
‘Category’:
In
order
to
avoid
any
kind
of
inconsistent
names,
we
decided
to
auto-‐generate
the
part
names
in
a
particular
standard.
This
was
a
combination
of
the
different
attributes
present
in
the
table.
We
felt
that
this
would
ensure
that
every
part
number
was
conforming
to
the
part
name
structure
and
would
therefore
not
be
exposed
to
entries
that
were
typed
incorrectly.
We
also
added
a
new
column
Category
consisting
of
2
values;
General
and
Custom.
There
were
a
few
parts
which
were
customized
based
on
the
customer’s
needs,
based
on
this
we
came
up
with
a
need
for
category.
• Created
a
new
database
in
Access:
Since
the
database
we
were
provided
with
was
not
editable,
we
created
a
new
database
in
Access
to
replicate
the
same.
Apart
from
the
tables
that
were
already
present,
we
created
a
new
table
for
the
Grid
prices.
Since
some
sources
of
data
defined
the
prices
in
terms
of
pack
and
some
in
terms
of
grids,
we
decided
to
create
a
table
which
would
8. 8
|
P a g e
consist
of
the
different
parts
and
their
associated
grid
prices
(25,
50
and
100
count
packs,
respectively).
This
is
turn
helped
us
generate
an
intermediate
table
which
consisted
of
the
prices
per
grid
as
well
as
pack
for
every
part
so
that
we
could
present
the
data
in
any
way
the
client
wanted.
• Decided
not
to
group
by
Opportunity
Name:
The
client
does
not
have
any
standard
way
of
defining
the
Opportunity
Names
and
therefore
no
specific
transformation
rules
could
be
performed.
We
allowed
this
level
of
granularity
to
be
reached
while
drilling
down
in
the
BI
tool,
but
we
were
not
able
to
allow
for
sorting
and
presenting
data
solely
by
the
Opportunity
Name.
List
of
Team
Members
and
Responsibilities/Activities
Stefanie
Boros
The
main
role
for
Stefanie
was
as
the
Project
Manager
but
she
was
also
the
connection
between
the
team
and
Protochips.
She
worked
on
collecting
the
data
and
contacted
David
and
Angela
several
times
throughout
the
process
to
understand
the
data,
the
process,
and
to
verify
that
we
were
on
the
right
track
every
step
of
the
way.
She
collaborated
with
every
member
of
the
team
to
make
sure
that
the
direction
the
various
elements
of
the
project
were
going
in
was
in
line
with
what
Protochips
was
looking
for.
She
assisted
in
designing
the
data
models,
reconciliation,
cleansing
rules,
and
other
activities.
She
worked
with
Saniya
on
Qlikview
to
create
the
BI
tool
that
would
be
utilized
by
the
organization.
She
also
was
in
charge
of
the
project
deliverables
throughout
the
quarter.
Shradha
Salian
The
main
role
for
Shradha
was
that
of
a
Data
Integration
Specialist
and
a
Data
Analyst.
She
worked
on
creating
a
database
for
the
source
tables,
learning
Talend
(along
with
Sarah)
to
understand
the
different
functionalities,
components,
understanding
data
sources,
the
relationships
among
them
and
how
different
operations
could
be
carried
out
on
them.
She
was
also
involved
in
writing
few
java
codes
in
Talend.
She
worked
on
creating
the
data
dictionary
for
the
‘Parts’
data
source.
She
contributed
9. 9
|
P a g e
to
cleansing
the
QuickBooks
data
source
as
well.
She
worked
on
creating
the
validation
rules
for
all
of
the
data
sources.
She
reviewed
the
different
documents
and
made
changes
to
them
wherever
required.
Saniya
Shukla
Saniya
was
majorly
involved
on
the
BI
side
of
the
project
to
translate
the
data
sets
into
visualizations
of
charts,
graphs
and
tables.
It
involved
addressing
the
requirement
questions
from
the
client
and
delivering
the
charts
in
the
most
understand
and
user
friendly
way.
She
was
also
involved
in
connecting
the
database
to
Qlikview
to
pull
up
the
data
and
use
it
for
the
dashboards.
She
mainly
worked
on
formulating
expressions,
formulae
to
derive
charts,
graphs
and
other
objects
for
the
dashboards
according
to
the
requirement
of
Protochips.
Before
implementing
the
data
on
the
BI
tool,
she
also
created
the
initial
and
final
wireframes
and
storyboards
with
Stefanie
creating
the
sketches
for
the
initial
ones.
She
also
worked
on
creating
the
data
dictionary
for
the
Salesforce
data.
Sarah
Yousef
The
main
roles
for
Sarah
were
as
a
Technical
Architect
and
a
Data
Integration
Specialist.
She
worked
on
developing
the
data
models
for
the
target
tables,
creating
the
corresponding
tables
in
MS
SQL
Server
and
creating
their
data
dictionaries.
In
order
to
do
that,
thorough
understanding
of
the
data,
its
state
and
how
it
connects
to
one
another
was
important.
She
was
also
responsible
for
the
source
to
target
mappings,
learning
Talend
(along
with
Shradha)
and
loading
the
data
to
the
target
tables.
Given
that,
she
also
worked
on
the
reconciliation
document.
She
contributed
to
cleansing
the
sources
files
by
cleansing
SF
data
and
data
definitions
for
the
yield
result
source
table.
She
gave
feedback
and
input
on
BI
dashboard
and
the
different
project
deliverables.
She
also
acted
as
Project
Manager
(Secondary
role)
by
preparing
meeting
agendas
and
action
items,
and
by
following
up
with
everyone
to
make
sure
the
project
was
on
the
right
track.
Changes
from
Original
Proposal
10. 10
|
P a g e
Our
project
ended
up
following
the
path
that
we
set
up
in
our
original
proposal
pretty
closely,
with
just
a
few
changes.
We
felt
that,
given
the
time
constraints
of
this
project,
it
made
the
most
sense
to
stick
with
only
looking
at
C-‐flat,
as
we
had
originally
discussed
with
David,
instead
of
trying
to
analyze
all
product
lines.
We
also
came
to
discover
that
QuickBooks
was
not
really
providing
us
with
anything
that
we
could
not
get
from
Salesforce
or
easily
reconcile.
We
were
hoping
to
use
QuickBooks
as
a
way
to
cross-‐check
sales,
but
Protochips
had
inconsistent
methods
of
entering
sales
into
each
database,
so
we
decided
to
simplify
by
removing
QuickBooks
without
having
to
sacrifice
any
vital
data.
We
did
have
to
get
additional
data
from
Protochips
to
provide
us
with
the
List
Prices
of
the
parts,
so
that
was
another
data
source
that
was
not
stated
in
the
proposal.
Finally,
we
had
to
switch
over
to
Microsoft
SQL
Server
Management
instead
of
using
MySQL
as
we
had
originally
planned
because
of
familiarity
with
the
tool.
Technical
Architecture
Diagram
11. 11
|
P a g e
Samples
of
Each
Data
Set
Original
Salesforce
Sample
Data
• Manual
cleansing
included
removing
columns
(see
above)
and
rows
(see
below).
Original
Source
Number
of
Rows
Cleansed
Source
Number
of
Rows
Difference
1175
1021
154
Rows
Removed
(by
row
#)
Row
Count
Reason
6,
103,
191,
215-‐217,
219,
226-‐232,
312,
397-‐398,
407-‐408,
447,
464-‐475,
493,
511-‐513,
518,
532,
686,
703-‐704,
714-‐715,
750,
773,918,
922,
1056,
43
Client
request
–
removed
Stage
Closed
Lost,
Postponed,
Target,
Closed/Dead
End,
Imminent
12. 12
|
P a g e
1068,
1074-‐1075,
1129,
1175
5,
7-‐16,
18-‐25,
40,
46-‐47,
101-‐102,
236,
445,
463,
482-‐483,
498,
524-‐525,
581,
627,
669,
682,
720,
727,
738,
892
40
Client
request
–
remove
Order
Amount
=
0
26-‐29,
48-‐69,
212,
224,
245-‐247,
270-‐282,
470,
537-‐538,
585-‐590,
725,
728,
747-‐748,
763-‐768,
840,
906-‐907,
939,
1031,
1064,
1100
70
Client
request
–
remove
Created
dates
before
10/01/2012
17
1
Error
values
Total
154
-‐-‐-‐
Rows
Amended
(by
row
#)
Row
Count
Reason
13
1
Error
in
Sales_Price
-‐
Change
to
5.706
-‐
cross
referenced
with
quickbooks
14
1
Error
in
Sales_Price
-‐
Change
to
5.391-‐
cross
referenced
with
quickbooks
194,
196
2
Error
in
Sales_Price
-‐
Change
to
6.462-‐
cross
referenced
with
quickbooks
195
1
Error
in
Sales_Price
-‐
Change
to
7.002-‐
cross
referenced
with
quickbooks
666,
667,
758,
887
4
Error
in
Sales_Price
-‐
Change
to
728.1-‐
cross
referenced
with
quickbooks
668,
756,
757,
888,
908,
916,
919
7
Error
in
Sales_Price
-‐
Change
to
584.1-‐
cross
referenced
with
quickbooks
All
1021
Removed
the
letters
CF
in
all
product
names
=RIGHT(E2,LEN(E2)
-‐3)
13. 13
|
P a g e
Access -‐ Yield Results Sample Data
• Manual
cleansing
included
removing
columns
(see
above)
and
rows
(see
below).
Original
Source
Number
of
Rows
Cleansed
Source
Number
of
Rows
Difference
9533
9508
25
Rows
Removed
(by
Wafer_ID)
Row
Count
Reason
16488,
19340,
18398,
20215,
20838
5
‘Qty’
value
greater
than
‘Out
of’
value
16085,
16882,
18676,
19150
4
Empty
‘Out
of’
values
14. 14
|
P a g e
22808,
22809,
22125,
22363,
22570,
22772,
22822,
25282
8
Other
empty
values
17329,
21063,
21092,
17994,
18228,
18709,
19665,
20484
8
Extremely
high
‘Out
of’
value
–
anomalies
Total
25
-‐-‐-‐
Access
-‐
Parts
List
Sample
Data
15. 15
|
P a g e
• Manual
Cleansing
included
Original
Source
Number
of
Rows
Cleansed
Source
Number
of
Rows
Difference
74
79
5
PDF
-‐
List
Price
Sample
Data
• Manual
Cleansing
included
Original
Source
Number
of
Rows
Cleansed
Source
Number
of
Rows
Difference
81
219
138
16. 16
|
P a g e
Dimensional
Models
Conceptual
Model
35. 35
|
P a g e
Business
Questions
Production
• What
is
the
trend
in
yields
for
each
part
number?
• Is
there
any
trend
in
the
production
by
time
of
the
year?
Sales
• What
trends
are
there
in
the
sales
of
parts?
• Is
there
a
seasonality
to
the
sales?
• Is
there
a
trend
in
sales
by
region
of
the
world?
Customer
• Who
are
the
top
customers?
• Does
an
individual
customer
tend
to
buy
at
a
certain
time
or
at
a
certain
time
interval?
• What
does
an
individual
customer
tend
to
buy?
38. 38
|
P a g e
Challenges
Data
As
is
expected,
the
data
proved
to
be
quite
the
time-‐consuming
element.
For
starters,
getting
the
data
itself
seemed
to
take
a
lot
longer
than
we
anticipated.
Once
we
finally
did
receive
it
all,
we
had
to
understand
what
we
were
dealing
with.
We
spent
hours
poring
over
each
attribute,
trying
to
see
how
everything
tied
together.
We
also
had
conversations
with
David,
Angela,
and
Nicole
(another
Protochips
employee
who
deals
with
the
data).
Once
we
finally
felt
comfortable
with
the
data,
we
had
to
decide
on
what
was
and
was
not
appropriate
to
use
in
order
to
answer
our
questions.
This
was
an
element
that
we
did
not
even
confirm
until
toward
the
end
of
the
project!
Beyond
just
getting
and
understanding
the
data,
we
had
to
work
on
cleansing
and
integrating
it.
There
were
rows
that
were
missing
values,
rows
that
had
inconsistent
values,
etc.
Step
by
step,
we
had
to
make
decisions
(and
get
confirmation
from
Protochips)
about
how
to
handle
each
and
every
quirk
in
the
data.
For
integration,
we
had
to
try
to
see
the
big
picture
and
keep
our
minds
on
the
overall
goal
of
answering
the
business
questions
in
order
to
effectively
choose
the
right
path.
Ultimately,
we
kept
coming
back
to
these
business
questions
and
it
really
helped
to
guide
us
along
and
make
successful
choices.
Talend
As
expected,
with
any
new
tool,
there
was
a
learning
curve
to
understanding
and
utilizing
our
integration
tool,
Talend.
Initially,
it
was
just
about
understanding
what
capabilities
the
tool
had
and
how
to
get
the
results
we
wanted.
Capabilities
included
learning
the
different
palette
components
that
could
be
useful
for
our
scenarios,
how
to
use
them
and
then
how
to
use
other
build
in
functionality
such
as
built
in
expressions.
There
were
technical
issues
with
setting
up
and
connecting
to
the
database
as
well
as
learning
what
the
various
error
messages
meant
and
how
to
fix
them.
Loading
the
fact
tables
was
a
hurdle
because
you
had
to
be
able
to
make
sure
you
had
all
the
necessary
joins
between
all
your
dimension
tables
so
that
rows
are
pulled
correctly.
We
spent
a
great
deal
of
time
trying
to
reconcile
our
39. 39
|
P a g e
fact
tables
thinking
that
the
major
problem
was
how
we
joined
the
tables
together.
After
further
analysis
to
the
datasets
that
are
not
being
pulled
in
we
came
to
realize
that
some
of
the
Sales
Price
values
were
not
correct
and
therefore
were
not
being
calculated
correctly.
We
had
to
then
cross
check
with
QuickBooks
and
get
those
values,
change
them
in
the
source
date,
make
note
of
them
and
finally
reload
the
tables.
Visualizing
the
Solutions
to
the
Business
Questions
Again,
we
had
to
first
learn
how
to
use
Qlikview
and
what
was
possible
with
the
tool
but
once
we
did
that,
we
had
to
plan
quite
a
bit
on
how
to
better
serve
the
needs
of
our
client
with
the
data
visualization.
There
were
some
technical
errors
with
trying
to
connect
to
the
database
and
not
being
able
to
pull
all
of
the
data
into
the
tool.
While
using
the
tool,
we
were
able
to
identify
some
more
areas
where
the
data
needed
some
double-‐checking
(invalid
entries,
anomalies,
etc.)
and
worked
little
by
little
to
come
to
a
useful
and
powerful
tool
to
allow
Protochips
to
view
their
data
like
they
never
have
before.
The
driving
force
was
always
to
respond
to
the
questions
that
Protochips
wanted
answered
and,
by
doing
this,
we
were
able
to
stay
focused
on
what
was
important
and
not
get
caught
up
in
things
that
were
not.
Dimensional
modelling
Dimensional
modeling
was
a
new
concept
to
our
team
so
there
we
had
to
spend
a
bit
of
One
on
the
main
challenges
we
faced
was
actually
coming
up
with
our
dimension
model
and
the
reason
why
was
because
we
did
not
get
exposed
to
it
until
this
class.
Understanding
the
difference
between
it
and
a
relational
model
was
crucial.
Knowing
which
attributes
to
include
in
the
fact
versus
the
dimensions
and
why
was
also
challenging.
We
also
had
to
decide
if
we
needed
some
attributes
in
the
model
or
will
we
pushing
them
to
the
BI
tool.
An
example
of
that
were
any
kind
of
aggregations
that
we
were
performing
on
the
data.
One
other
issue
we
faced
was
visualizing
how
to
deal
with
intermediate
tables
and
how
to
connect
it
to
the
fact
and
dimensions
to
pull
the
data
given
that
it
is
not
present
in
the
model
itself.
40. 40
|
P a g e
Project
logistics
In
general,
this
project
was
an
incredible
experience
for
us,
but
it
was
also
a
big
commitment
for
us.
All
of
us
had
other
classes
to
worry
about,
one
of
our
teammates
has
a
job,
and
two
of
us
have
young
children
at
home.
We
all
had
to
be
extremely
flexible
in
order
to
find
meeting
times
that
worked
and
balancing
the
workload
was
a
struggle
at
times.
Not
only
that,
but
we
had
spent
so
much
time
and
energy
in
completing
this
project
that
it
was
hard
to
find
intrinsic
motivation
by
the
end
of
the
quarter.
We
have
all
been
able
to
take
away
invaluable
experiences
and
skills
with
us
from
this
project,
but
we
would
be
remiss
if
we
did
not
mention
the
dedication
required
to
complete
this
as
an
obstacle.
41. 41
|
P a g e
Appendix
Interview
with
Angela
and
David
Business
Requirements:
• Any
regulatory
or
compliance
considerations?
• How
will
users
be
accessing
the
BI
tool?
• Who
are
the
intended
users?
• Is
there
anyone
else
we
should
interview?
• What
are
the
key
deliverables
required?
o Dashboard?
o Charts/Graphs?
o Reports?
• What
is
the
problem?
• What
is
the
expected
solution
to
the
problem
with
this
BI
tool/data?
• What
is
the
priority
of
requirements?
o List
of
NEEDS
o List
of
WANTS
Data
Requirements:
• Any
other
sources
of
data
needed?
(Do
we
have
access
to
them?)
• Are
there
any
modifications
done
to
the
data
from
the
source
to
staging?
• Who
is
touching
the
data?
(Entering,
changing,
deleting,
processing,
etc.)
• What
timeframe
are
we
looking
at?
How
often
(Level
of
granularity
is
needed)?
• Do
you
have
any
reporting
needs
in
mind
that
you
would
like
us
to
solve?
E.g.:
if
revenue
data
is
available,
then
maybe
what
is
the
revenue
earned
per
customer.
• Could
we
get
some
of
your
guidance
in
building
the
BI
solution?
• Can
any
data
be
left
off?
Technical
Requirements:
• Is
there
any
reason
we
need
to
provide
role-‐based
access
in
the
BI
tool?
• Are
there
any
BI
tools
currently
in
place?
Any
data
warehouses,
marts,
etc.?
• Are
there
any
technical
specifications
with
regard
to
hardware/software
vendors?
• Do
we
need
to
provide
web
services
or
a
cloud-‐based
environment?
Salesforce
Data:
• Describe
the
terms
used
in
the
column:
Stage,
Sales
Price,
and
List
Price...
• Are
all
dollar
amounts
converted
to
USD?
At
what
point?
• What
do
you
mean
by
fraction
numbers
in
quantity
ordered?
• Difference
between
product
date
and
close
date?
• What
do
zero
values
in
the
sales
price
mean
(if
it
is
won,
shouldn’t
it
have
a
value?)
Or
do
these
numbers
refer
to
something
else?
• What
are
the
number
values
in
the
“Opportunity
Name”
column?
How
are
they
assigned?
• Do
you
have
a
data
dictionary?
(for
all
excel
sheets)
Parts
Assembly
Data
(Is
this
QuickBooks?):
• Why
do
some
rows
have
no
QTY?
• Why
do
some
rows
have
quantity
but
no
sales
price
and
amounts?
42. 42
|
P a g e
• Are
prices
for
the
same
product
varying
based
on
customer?
Quantity?
etc.?
• Explain
formula
in
column
amount?
• “Name”
is
mapping
to
“Account”
in
Salesforce
data?
Production
Data:
• What
is
the
best
worksheet
to
use?
• How
does
this
data
tie
to
the
other
spreadsheets?
Yields
Data:
• What
is
the
best
worksheet
to
use?
• How
does
this
data
tie
to
the
other
spreadsheets?
• How
the
Yield
%
is
computed
-‐
what
data
feeds
into
these
values?
43. 43
|
P a g e
Proposal
Approved
by
Protochips
Protochips
C-‐flat
Analysis
The
purpose
of
this
project
is
to
learn
more
about
the
trends
in
both
the
manufacturing
and
sales
of
the
C-‐flat
product.
We
aim
to
answer
the
following
questions:
• Yield:
o What
is
the
trend
in
yields
for
each
part
number?
o Is
there
any
trend
in
the
production
by
time
of
the
year?
• Sales:
o What
trends
are
there
in
the
sales
of
parts?
o Is
there
a
seasonality
to
the
sales?
o Is
there
a
trend
in
sales
by
region
of
the
world?
• Customer:
o Who
are
the
top
customers?
o Does
an
individual
customer
tend
to
buy
at
a
certain
time
or
at
a
certain
time
interval?
o What
does
an
individual
customer
tend
to
buy?
We
will
use
the
Salesforce,
QuickBooks,
and
Access
data
to
answer
these
questions
for
the
time
period
of
10/1/12
through
9/30/15
with
the
following
conditions:
• We
will
leave
out
data
involving
sending
samples
to
customers.
• We
will
reference
the
Account
Name
(as
it
appears
in
Salesforce)
but
provide
the
option
to
drill
down
to
view
Opportunity
name
(as
it
appears
in
Salesforce).
• We
will
use
the
Float
Date
to
reference
the
yield
data.
• We
will
use
the
Created
Date
in
Salesforce
as
the
reference
date
for
sales
data.
• We
will
only
include
accounts
marked
as
Closed
Won
in
Salesforce.
• We
will
show
the
quantity
of
products
sold
as
listed
by
grid
(as
in
QuickBooks).
• We
will
lump
any
part
with
extra
wording
(such
as
2/1-‐2C-‐G200F2)
into
“custom”.
44. 44
|
P a g e
Wireframes
Storyboards
1.
Yield
Data
52. 52
|
P a g e
Reconciliation
Document
1.
YIELD_FACT
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
9508
SELECT
count
(*)
FROM
[Yield
Results];
9508
SELECT
COUNT
(*)
FROM
YIELD_FACT
Sum
of
Qty
(Source)
vs
Sum
Quantity_Produced
(T
arget)
226938
SELECT
sum
([Yield
Results].[Qty])
FROM
[Yield
Results];
226938
SELECT
SUM
(Quantity_Produced)
FROM
YIELD_FACT
Sum
of
Out_Of
(source)
vs
Total_Quantity
(Target)
535656
SELECT
sum
([Yield
Results].[Out
of])
FROM
[Yield
Results];
535656
SELECT
SUM
(Total_Quantity)
FROM
YIELD_FACT
2.
YIELD_TIME_DIM
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
1400
COUNT(A2:A1401)
1400
SELECT
COUNT
(*)
FROM
YILED_TIME_DIM
Max
Date
10/31/15
MAX(A2:A1401)
31/10/2015
SELECT
MAX
(Yield_Date)
FROM
YIELD_TIME_DIM
Min
Date
01/01/12
MIN(A2:A1401)
01/10/2012
SELECT
MIN
(Yield_Date)
FROM
YIELD_TIME_DIM
3.
SALES_TIME_DIM
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
1126
COUNT(A2:A1127)
1126
SELECT
COUNT
(*)
FROM
SALES_TIME_DIM
Max
Date
10/31/15
MAX(A2:A1127)
31/10/2015
SELECT
MAX
(Sales_Date)
FROM
SALES_TIME_DIM
Min
Date
10/01/12
MIN(A2:A1127)
01/10/2012
SELECT
MIN
(Sales_Date)
FROM
SALES_TIME_DIM
53. 53
|
P a g e
4.
ACCOUNT_DIM
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Unique
Rows
(Distinct)
89
Remove
Duplicates
on
column
Account_Name
89
SELECT
COUNT
(*)
FROM
ACCOUNT_DIM
Total
Number
of
Unique
Rows
with
Region
Americas
58
Remove
Duplicates
on
column
Account_Name
COUNTIF(C2:C90,”A
mericas”)
58
SELECT
COUNT
(*)
FROM
ACCOUNT_DIM
WHERE
Region
=
‘Americas’
Total
Number
of
Unique
Rows
with
Region
Asia
11
Remove
Duplicates
on
column
Account_Name
COUNTIF(C2:C90,”Asi
a”)
11
SELECT
COUNT
(*)
FROM
ACCOUNT_DIM
WHERE
Region
=
‘Asia’
Total
Number
of
Unique
Rows
with
Region
EMEA
20
Remove
Duplicates
on
column
Account_Name
COUNTIF(C2:C90,”EM
EA”)
20
SELECT
COUNT
(*)
FROM
ACCOUNT_DIM
WHERE
Region
=
‘EMEA’
5.
OPPORTUNITY_DIM
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Unique
Rows
717
Remove
Duplicates
on
column
Opportunity
_Name
717
SELECT
COUNT
(*)
FROM
OPPORTUNITY_DIM
6.
PARTS_DIM
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
257
SELECT
Count
(*)
FROM
Parts_output
257
SELECT
COUNT
(*)
FROM
PARTS_DIM
Total
Number
of
Rows
with
Material
Au
94
SELECT
Count
(*)
FROM
Parts_output
WHERE
(((Parts_output.Mate
94
SELECT
COUNT
(*)
FROM
PARTS_DIM
WHERE
Material
=
‘Au’
54. 54
|
P a g e
rial)="Au"));
Total
Number
of
Rows
with
Material
C
84
SELECT
Count
(*)
FROM
Parts_output
WHERE
(((Parts_output.Mate
rial)="C"));
84
SELECT
COUNT
(*)
FROM
PARTS_DIM
WHERE
Material
=
‘C’
Total
Number
of
Rows
with
Material
Ni
78
SELECT
Count
(*)
FROM
Parts_output
WHERE
(((Parts_output.Mate
rial)="Ni"));
78
SELECT
COUNT
(*)
FROM
PARTS_DIM
WHERE
Material
=
‘Ni’
Total
Number
of
Rows
Custom
–
Material
null
1
SELECT
Count
(*)
FROM
Parts_output
WHERE
(((Parts_output.
Part)=
"Custom"));
1
SELECT
COUNT
(*)
FROM
PARTS_DIM
WHERE
Part_Num
=
‘Custom’
7.
SALES_CONVERSION
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
1021
COUNT(G2:G1022)
1021
SELECT
COUNT
(*)
FROM
SALES_CONVERSION
8.
SALES_FACT
Reconciliation
Criteria
Source
Function
Target
Function
Total
Number
of
Rows
1021
COUNT(G2:G1022)
1021
SELECT
COUNT
(*)
FROM
SALES_FACT
Sum
of
Total_Price
(Original
Source)
vs
Sum
of
Total_Sales_Dolar_Amou
nt
(Target)
122963
1
SUM(J2:J1022)
122963
1
SELECT
ROUND
(SUM
(Total_Sales_Dollar_Amount
),
0)
FROM
SALES_FACT
Sum
of
Quantity_Pack
(Intermediate)
vs
Sum
Quantity_Pack
(Target)
4271
SELECT
SUM
(Quantity_Pack)
FROM
SALES_CONVERSIO
N
4271
SELECT
SUM
(Quantity_Pack)
FROM
SALES_FACT
Sum
of
Pieces_Pack
(Intermediate)
vs
Sum
249854
SELECT
SUM
(Pieces_Pack)
249854
SELECT
SUM
(Pieces_Pack)
FROM
SALES_FACT
55. 55
|
P a g e
Pieces_Pack
(Target)
FROM
SALES_CONVERSIO
N
Validation
Rules
• QUICKBOOKS
• SALESFORCE
• YIELD_DATA
58. 58
|
P a g e
Suggestions
for
Protochips
• Create
a
naming
convention
for
Opportunity
Name
(in
Salesforce)
• Address
missing
data
in
Salesforce
and
QuickBooks
• Standardizing
naming
of
parts
across
all
sources
-‐
like
Au
vs
AU,
Cu
for
‘Copper’
rather
than
C,
etc.
• Standardize
the
units
of
measure
across
sources
• Unify
the
sales
price
across
salesforce
and
QuickBooks
Number
of
Hours
the
Team
Worked
on
the
Project
• Stefanie:
80
hours
• Sarah:
140
hours
• Shradha:
100
hours
• Saniya:
80
hours
• Team
Total:
400
Hours