RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
Big data beyond the hype may 2014
1. Big
Data
Beyond
the
Hype:
Trends,
Pi6alls
and
Building
Competency
May
2014
1
5/29/14
2. Agenda
• Why
Big
Data
• Technology
Capabili<es
• Organiza<onal
Evolu<on
• Case
Study
• Conclusions
5/29/14
2
3. Big Data
5/29/14
3
• Data
sets
so
large
and
complex
that
they
become
awkward
to
work
with
using
standard
tools
and
techniques
• Google,
Quantcast,
Yahoo,
LinkedIn…
customer
innova<on
in
open
source
4. Patterns from Delivering Big Data at Scale
eCommerce
2
of
Global
Top
5
Retail
2
of
Global
Top
5
Social
Networking
Global
#1
Banking
4
of
Global
Top
10
Credit
Issuer
2
of
Global
Top
5
Financial
Data
Services
2
of
Global
Top
5
Financial
Exchanges
Global
#2
Brokerage
&
Mutual
Funds
2
of
Global
Top
5
Asset
Management
Global
#1
Semiconductor
2
of
Global
Top
5
Data
Storage
Devices
3
of
Global
Top
5
Disk
Drive
Manufacturing
Global
#1
TelecommunicaJons
2
of
Global
Top
5
Media
&
AdverJsing
2
of
Global
Top
4
Internet
TransacJon
Security
Global
#1
5/29/14
4
5. Manufacturing Analytics
1
Improve
Yield
-‐
reduced
scrap,
faster
<me
to
market
2
Ease
data
access
–
quick
search
access
to
all
relevant
data
for
all
history
in
one
place,
not
samples,
summary
or
recent
data
3
Process
Efficiency
–
iden<fy
redundant
tests,
op<mize
throughput
4
Component
Analysis
–
proac<ve
discovery
of
component
problems,
compa<bility
detec<on
5
Service
Revenue
GeneraJon
-‐
Leverage
customer
support
to
generate
new
revenues
by
increasing
customer
reten<on,
support
renewals,
and
upgrading
customers
to
new
products
and
licenses.
5
5/29/14
6. Product Analytics
1
Issue
ResoluJon
-‐
Faster
resolu<on
of
issues
through
deep
analysis
of
integrated
device
log,
product,
and
configura<on
data
across
the
install
base.
2
Support
Metrics
-‐
Gain
new
insights
into
dimensions
impac<ng
service
metrics
by
combining
product
use
technical
data
with
business
data.
E.g.,
number
of
<ckets
resolved,
length
of
calls,
resource
alloca<on,
customer
sa<sfac<on.
3
ProacJve
Service
-‐
Iden<fy
and
address
poten<al
issues
rather
than
wai<ng
to
react
to
them
aXer
they
happen.
I.e.,
intelligence
for
condi<on-‐based
maintenance.
4
Customer
Self-‐Service
-‐
Drive
higher
rates
of
customer
self-‐service
by
providing
customers
access
to
tools
and
resources
to
answer
their
own
ques<ons
and
solve
their
own
issues
by
analyzing
past
customer
ac<vity
and
integra<ng
technical
details
from
customer
systems
into
self-‐service
portals.
5
Service
Revenue
GeneraJon
-‐
Leverage
customer
support
to
generate
new
revenues
by
increasing
customer
reten<on,
support
renewals,
and
upgrading
customers
to
new
products
and
licenses.
6
5/29/14
7. Clickstream Analytics
1
Timely
Processing
-‐
Data
volumes
are
increasing
clogging
data
warehouse,
slow
and
out
of
date.
2
Cost
Efficiency
&
Scale
-‐
Data
volumes
are
increasing
clogging
data
warehouse,
slow
and
out
of
date.
3
Flexible
Tagging
–
Analyze
site
without
having
to
push
data
to
tags,
classify
ac<vity
based
on
sec<ons,
URLs
more
flexibly
than
tags.
4
Converged
AnalyJcs
-‐
Understand
and
improve
web,
mobile,
cross-‐channel
customer
ac<vity.
Siloed
apps
costly
&
inflexible
–
e.g.,
tags
require
feeding
data
about
user
demographics,
segments.
Mash
up
interac<ons
with
data
like
geographic,
demographic,
social,
call
center,
purchases.
5
Deeper
ExploraJon
-‐
Analyze
sequences
of
events
in
customer
journeys.
Deep
drill
down
(how
do
users
with
the
latest
iPhone
and
iOS
compare
to
latest
Samsung
and
Android).
Cohort
analysis
-‐
those
who
buy
books
vs
spor<ng
goods.
6
PredicJve
Models
–
next
best
offer
recommenda<ons,
personaliza<on,
customer
value,
microsegmenta<on,
ad
aeribu<on…
7
5/29/14
8. Agenda
• Why
Big
Data
• Technology
CapabiliJes
• Organiza<onal
Evolu<on
• Case
Study
• Conclusions
5/29/14
8
10. • Community
Open
Source
• Over
$1
Billion
invested
in
startups,
all
major
sw
companies
embrace
• 3
replicas
for
HA,
50
PB+
scale
Apache Hadoop
5/29/14
10
Cluster
Slaves
Client Servers
✲ Hive, Pig, ...
✲ cron+bash, Azkaban, …
Sqoop, Scribe, …
Monitoring, Management
...
Secondary
Master Server
Secondary
Name Node
Primary
Master Server
✲ Job Tracker
Name Node
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
11. Copyright Think Big Analytics 2014. All rights reserved. 11
Ÿ Support for many processing engines including emerging Predictive
Analytics libraries
Ÿ Common cluster with local data access in HDFS
YARN - Yet Another Resource Negotiator
12. Copyright Think Big Analytics 2014. All rights reserved. 12
Ÿ Focus on technologies for prescriptive analytics
- Getting insights from your data
Ÿ To get there you need
- data ingestion (Flume, Kafka, Sqoop, ...)
- data transformation (Pig, MapReduce, Hive, Cascading, …)
Ÿ Beyond prescriptive, Machine Learned predictive analytics is
interesting
Narrowing In
13. Copyright Think Big Analytics 2014. All rights reserved. 13
Ÿ Scala-based more general distributed processing with long-lived
memory caching
Ÿ First released 2010, many contributors
Ÿ Scala, Java, Python APIs
Ÿ Largest production cluster to date is 80 nodes at Yahoo
Ÿ YARN Support
Ÿ Subprojects
- Shark – Hive on Spark
- MLib – machine learning
- Spark Streaming
- GraphX
Ÿ Pros: faster iterative analysis, growing distribution support
Ÿ Cons: less mature, Scala API foundation, smaller community
Apache Spark
14. Copyright Think Big Analytics 2014. All rights reserved. 14
Ÿ DAG Processing of never ending streams of data
- First released 2011
- Used at Twitter plus dozens of other companies
- Reliable - At Least Once semantics
- Think MapReduce for data streams
- Java / Clojure based
- Bolts in Java and ‘Shell Bolts’
- Not a queue, but usually reads from a queue.
- YARN integration
Ÿ Pros: community, HW support, low latency, high throughput
Ÿ Cons: static topologies & cluster sizing, Nimbus – SPOF, state
management
Apache Storm
15. Platform Trends
• Hadoop
will
eat
Big
Data
Analy<cs
• Hadoop
YARN
to
run
diverse
applica<ons
• Real<me
latency:
Storm,
Samza,
HBase,
Cassandra,
Elas<csearch…
• Management:
Ambari,
Pepperdata…
• Security:
Knox,
Sentry,
XA
Secure…
• Cloud:
AWS,
Qubole,
Altascale,
Info
Chimps…
15
5/29/14
17. Agenda
• Why
Big
Data
• Technology
Capabili<es
• OrganizaJonal
EvoluJon
• Case
Study
• Conclusions
5/29/14
17
18. How To Get the Right Answers
Data
Weak intersection between Data, Systems
and Business often means project failure
Current State Improved State
Collaborative integration of Analytics with
Big Data capabilities for success.
Data
Weak Area with
diminished influence
Cross-functional
collaboration
Data-driven
business alignment
5/29/14
18
19. Analytics and Data Science
Analy<cs
is
the
umbrella
term
for
discovery
and
communica<on
of
signal
in
data.
• Intelligence
and
ReporEng:
Governed
insights
for
general
audiences
by
providing
summaries
and
visualiza<ons
over
well
understood
data
• DescripEve
AnalyEcs:
Interpretable
data
stories
to
domain
consumers
(e.g.
business,
sales,
etc.)
using
exploratory
data
analysis
and
descrip<ve
models
(e.g.
clustering)
over
well
understood
data
• PredicEve
AnalyEcs:
Predicts
future
state
or
assesses
impact
of
changing
parameters
for
domain
consumers
(e.g.
business,
sales,
etc.)
using
sta<s<cal
modeling
over
well
understood
data
• Data
Science:
Uncovers
new
or
missing
signals
and
rela<onships
in
data
to
be
leveraged
by
other
analy<cs.
Supported
with
a
mix
of
exploratory
data
analysis
and
sta<s<cal
modeling
over
unexplored
data
or
new
combina<ons
of
known
data.
New
capability
driven
by
Big
Data,
currently
overhyped
and
overused.
End-‐goal:
effecEve
and
repeatable
integraEon
of
analyEcs
into
business
processes.
5/29/14
19
20. Basic
Repor<ng
Big
Data
Repor<ng
Data
Science
Hub
Business
Analy<cs
Data
Driven
Organiza<on
Analy<c
Integra<on
High
Low
Big
Data
Capability
Limited
Full
Analytics Capability Model
5/29/14
20
21. Data
Science
Hub
Data
Driven
Organiza<on
Analy<c
Integra<on
High
Low
Big
Data
Capability
Limited
Full
• Redundant
infrastructure
• Data
locked
down
w/in
product
teams
• Analy<c
innova<ons
not
shared
• Share
data
internally
• Support
joining
of
data
• Make
analy<cs
capabili<es
accessible
to
all
business
lines.
Client Example – Internet Infrastructure Firm
5/29/14
21
22. Agenda
• Why
Big
Data
• Technology
Capabili<es
• Organiza<onal
Evolu<on
• Case
Study
• Conclusions
5/29/14
22
23. • “Reduce
Data
Search
Par<es”
• Stop
playing
“Where’s
Waldo
with
your
Data”
•
“
I
know
I
have
that
data…..
somewhere?”
• Data
Aggrega<on
to
a
Common
Plaoorm
with
common
access
tools
• Improve
Yields
by
Accessing
More
Data
in
a
More
Timely
Manner
• End
to
end
visibility
for
every
test,
every
diagnos<c
and
all
info
from
all
components
of
a
product
(internal
and
external)
• Speed
up
yield
improvement
ramp
up
on
new
products
• Improve
steady
state
yield
on
exis<ng
products.
Use Cases
23
5/29/14
24. Big Data Platform
Devices
Components
Assemblies
…
Field Data
Supplier
All raw parametric,
logistic, vintage, data
ERP/DW’s
Data Sources
End-‐to-‐End
Integrated
Data
Consumers
Ad
hoc
Analysis
BI
Tools
Op<mize/Reduce
Tes<ng
Data
Marts
Batch
AnalyJcs
Parallelized
batch analytics
App-Specific
Views
New High-Value
Parameters
raw
extracts
Enriched
data
Proac<ve
Issue
Iden<fica<on
Failure
Screen
Tests
Customer
FA
via
Field
Data
Predic<ve
Analy<cs
Tools
…other ..
5/29/14
24
25. Highlights
• Enterprise-‐wide
data
access
for
<mely
analy<cs
and
insights
• Founda<on
for
large
scale
proac<ve
analy<cs
Legacy
BDP
Reten<on
3-‐6
months
scaeered
DBs
then
tape
archive
All
data
online
in
common
data
lake
for
3+
years
Coverage
Summaries,
samples
for
several
data
sets
All
parametric
data
captured
in
raw
and
integrated
form
Analysis
Reac<ve
resul<ng
in
missed
improvement
opportuni<es
In
daily
opera<ons
and
on
larger
data
sets
to
support
proac<ve
improvements
25
5/29/14
26. • Metrics:
• Collec<ng
>2M
manufacturing/tes<ng
binary
files
daily
• Collec<ng
from
~500
tables
across
6
databases
à
tens
of
millions
of
records
daily
• Over
140
users
to
date
in
early
pilo<ng
• Over
150
aeendees
par<cipated
in
BDP
training
• Highlights:
although
early
in
the
overall
journey,
the
BDP
is
already
demonstra<ng
early
benefits:
Key Metrics & Highlights
Development Engineer: demonstrated the joining of data sets for detailed
logistics tracking—analyses that is very difficult to conduct with current
systems
Ops Engineer: a recent production issue required detailed historical data. Current systems
did not have the required retention for this data. However, the team was able to pull the data
from the BDP in minutes, as opposed to 3+ weeks to pull the data from tape archive
Development Engineer: obtained technical data from the BDP in
hours as opposed to 3+ weeks to pull from tape archive
DATA SEARCH
PARTIES
YIELD
5/29/14
27. Governance – Project Prioritization
New
Data
Sets
Use
Cases
from
Strategy
New
Apps /
Reports
New
Analytics
Working
Steering
Committee
Executive
Steering
Committee
1. New Data Set 1
2. New Use Case 1
3. Use Case from
Strategy
Prioritized List
Business
Experts
Business impact vs. ease
of delivery, etc.
Nominated by…
Prioritized by…
Core
Platform
Updates
Sponsored by…
27
5/29/14
28. Release Cycles
July
Aug
Sept
Oct
Nov
Dec
Oct
Jan
Sept
Feb
Today
Mar
Apr
May
Big
Data
Plaoorm
Roadmap
Jun
July
Aug
Release
1b:
all
ac<ve
products
Release
1a:
base
plaoorm,
first
product
Release
2:
Field/RMA
Data
Quality Development Operations Core/Other
Release
3
Release
4
Release
5
Reduce
Scrap
New
Product
Data
Access
New
Data
Sets
Ongoing
Support
Faster
Response
to
Field
Requests
Correlated
Analysis
Improved
Failure
Analyses
Delivering new
capabilities
to the business every
45 – 90 days
28
5/29/14
29. Agenda
• Why
Big
Data
• Technology
CapabiliJes
• Organiza<onal
Evolu<on
• Case
Study
• Conclusions
5/29/14
29
30. Stages of Big Data Adoption
Contain
Costs &
Scale
Big Data
Reporting
Agile Analytics
& Insight
Innovation
360o view
Speed
Accuracy
Data Science
Hub
Business
Innovation
Optimization
Engagement
Data Science
Automation
Organizational
Transformation
New products
Customer analytics
Operational efficiency
New businesses
Data Driven
Organization
Sustained competitive advantage through Big Analytics
Traditional
B.I.
5/29/14
30
31. Pitfalls
• Data
Poli<cs
• Immature
Governance
• Siloed
Organiza<on
• Siloed
Applica<ons
• Buy-‐in
• Leaping
over
phases
/
Bypassing
low
hanging
fruit
• Disrup<ve
Change
• Insufficient
funding
• Big
Bang
Rollout
• Skills
Gap
• Tac<cal
use
cases/the
big
data
plateau
• Business
as
usual
5/29/14
31
32. Conclusions
• Big
Data
is
affec<ng
all
companies
especially
in
tech
• Start
with
low
hanging
fruit
• Establish
a
roadmap
for
incremental
organiza<onal
evolu<on
and
technical
capabili<es
5/29/14
32