SlideShare a Scribd company logo
1 of 32
Download to read offline
Big	
  Data	
  Beyond	
  the	
  Hype:	
  
Trends,	
  Pi6alls	
  and	
  Building	
  Competency	
  
May	
  2014	
  
1	
  5/29/14	
  
Agenda
•  Why	
  Big	
  Data	
  
•  Technology	
  Capabili<es	
  
•  Organiza<onal	
  Evolu<on	
  
•  Case	
  Study	
  
•  Conclusions	
  
5/29/14	
   2	
  
Big Data
5/29/14	
   3	
  
•  Data	
  sets	
  so	
  large	
  and	
  
complex	
  that	
  they	
  become	
  
awkward	
  to	
  work	
  with	
  using	
  
standard	
  tools	
  and	
  
techniques	
  
•  Google,	
  Quantcast,	
  Yahoo,	
  
LinkedIn…	
  customer	
  
innova<on	
  in	
  open	
  source	
  
Patterns from Delivering Big Data at Scale
eCommerce	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Retail	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Social	
  Networking	
  	
  
Global	
  #1	
  
Banking	
  	
  
4	
  of	
  Global	
  Top	
  10	
  
Credit	
  Issuer	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Financial	
  Data	
  Services	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Financial	
  Exchanges	
  	
  
Global	
  #2	
  
Brokerage	
  &	
  Mutual	
  Funds	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Asset	
  Management	
  	
  
Global	
  #1	
  
Semiconductor	
  	
  
2	
  of	
  Global	
  Top	
  5	
  
Data	
  Storage	
  Devices	
  	
  
3	
  of	
  Global	
  Top	
  5	
  
Disk	
  Drive	
  Manufacturing	
  	
  
Global	
  #1	
  
TelecommunicaJons	
  
2	
  of	
  Global	
  Top	
  5	
  
Media	
  &	
  AdverJsing	
  	
  
2	
  of	
  Global	
  Top	
  4	
  
Internet	
  TransacJon	
  Security	
  	
  
Global	
  #1	
  
5/29/14	
   4	
  
Manufacturing Analytics
1	
   Improve	
  Yield	
  -­‐	
  reduced	
  scrap,	
  faster	
  <me	
  to	
  market	
  
2	
   Ease	
  data	
  access	
  –	
  quick	
  search	
  access	
  to	
  all	
  relevant	
  data	
  for	
  all	
  history	
  in	
  one	
  
place,	
  not	
  samples,	
  summary	
  or	
  recent	
  data	
  	
  
3	
   Process	
  Efficiency	
  –	
  iden<fy	
  redundant	
  tests,	
  op<mize	
  throughput	
  
4	
   Component	
  Analysis	
  –	
  proac<ve	
  discovery	
  of	
  component	
  problems,	
  
compa<bility	
  detec<on	
  
5	
   Service	
  Revenue	
  GeneraJon	
  -­‐	
  Leverage	
  customer	
  support	
  to	
  generate	
  new	
  
revenues	
  by	
  increasing	
  customer	
  reten<on,	
  support	
  renewals,	
  and	
  upgrading	
  
customers	
  to	
  new	
  products	
  and	
  licenses.	
  
5	
  5/29/14	
  
Product Analytics
1	
   Issue	
  ResoluJon	
  -­‐	
  Faster	
  resolu<on	
  of	
  issues	
  through	
  deep	
  analysis	
  of	
  
integrated	
  device	
  log,	
  product,	
  and	
  configura<on	
  data	
  across	
  the	
  install	
  base.	
  
2	
   Support	
  Metrics	
  -­‐	
  Gain	
  new	
  insights	
  into	
  dimensions	
  impac<ng	
  service	
  metrics	
  
by	
  combining	
  product	
  use	
  technical	
  data	
  with	
  business	
  data.	
  E.g.,	
  number	
  of	
  
<ckets	
  resolved,	
  length	
  of	
  calls,	
  resource	
  alloca<on,	
  customer	
  sa<sfac<on.	
  	
  
3	
   ProacJve	
  Service	
  -­‐	
  Iden<fy	
  and	
  address	
  poten<al	
  issues	
  rather	
  than	
  wai<ng	
  to	
  
react	
  to	
  them	
  aXer	
  they	
  happen.	
  I.e.,	
  intelligence	
  for	
  condi<on-­‐based	
  
maintenance.	
  
4	
   Customer	
  Self-­‐Service	
  -­‐	
  Drive	
  higher	
  rates	
  of	
  customer	
  self-­‐service	
  by	
  providing	
  
customers	
  access	
  to	
  tools	
  and	
  resources	
  to	
  answer	
  their	
  own	
  ques<ons	
  and	
  
solve	
  their	
  own	
  issues	
  by	
  analyzing	
  past	
  customer	
  ac<vity	
  and	
  integra<ng	
  
technical	
  details	
  from	
  customer	
  systems	
  into	
  self-­‐service	
  portals.	
  
5	
   Service	
  Revenue	
  GeneraJon	
  -­‐	
  Leverage	
  customer	
  support	
  to	
  generate	
  new	
  
revenues	
  by	
  increasing	
  customer	
  reten<on,	
  support	
  renewals,	
  and	
  upgrading	
  
customers	
  to	
  new	
  products	
  and	
  licenses.	
  
6	
  5/29/14	
  
Clickstream Analytics
1	
   Timely	
  Processing	
  -­‐	
  Data	
  volumes	
  are	
  increasing	
  clogging	
  data	
  warehouse,	
  slow	
  
and	
  out	
  of	
  date.	
  
2	
   Cost	
  Efficiency	
  &	
  Scale	
  -­‐	
  Data	
  volumes	
  are	
  increasing	
  clogging	
  data	
  warehouse,	
  
slow	
  and	
  out	
  of	
  date.	
  
3	
   Flexible	
  Tagging	
  –	
  Analyze	
  site	
  without	
  having	
  to	
  push	
  data	
  to	
  tags,	
  classify	
  
ac<vity	
  based	
  on	
  sec<ons,	
  URLs	
  more	
  flexibly	
  than	
  tags.	
  
4	
   Converged	
  AnalyJcs	
  -­‐	
  Understand	
  and	
  improve	
  web,	
  mobile,	
  cross-­‐channel	
  
customer	
  ac<vity.	
  Siloed	
  apps	
  costly	
  &	
  inflexible	
  –	
  e.g.,	
  tags	
  require	
  feeding	
  
data	
  about	
  user	
  demographics,	
  segments.	
  Mash	
  up	
  interac<ons	
  with	
  data	
  like	
  
geographic,	
  demographic,	
  social,	
  call	
  center,	
  purchases.	
  
5	
   Deeper	
  ExploraJon	
  -­‐	
  Analyze	
  sequences	
  of	
  events	
  in	
  customer	
  journeys.	
  Deep	
  
drill	
  down	
  (how	
  do	
  users	
  with	
  the	
  latest	
  iPhone	
  and	
  iOS	
  compare	
  to	
  latest	
  
Samsung	
  and	
  Android).	
  Cohort	
  analysis	
  -­‐	
  those	
  who	
  buy	
  books	
  vs	
  spor<ng	
  
goods.	
  
6	
   PredicJve	
  Models	
  –	
  next	
  best	
  offer	
  recommenda<ons,	
  personaliza<on,	
  
customer	
  value,	
  microsegmenta<on,	
  ad	
  aeribu<on…	
  
7	
  5/29/14	
  
Agenda
•  Why	
  Big	
  Data	
  
•  Technology	
  CapabiliJes	
  
•  Organiza<onal	
  Evolu<on	
  
•  Case	
  Study	
  
•  Conclusions	
  
5/29/14	
   8	
  
Big Data Reference Architecture
5/29/14	
   9	
  
•  Community	
  Open	
  Source	
  	
  
•  Over	
  $1	
  Billion	
  invested	
  in	
  startups,	
  all	
  major	
  sw	
  companies	
  embrace	
  
•  3	
  replicas	
  for	
  HA,	
  50	
  PB+	
  scale	
  
Apache Hadoop
5/29/14	
   10	
  
Cluster
Slaves
Client Servers
✲ Hive, Pig, ...
✲ cron+bash, Azkaban, …
Sqoop, Scribe, …
Monitoring, Management
...
Secondary
Master Server
Secondary
Name Node
Primary
Master Server
✲ Job Tracker
Name Node
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
Slave Server
✲ Task Tracker
Data Node
DiskDiskDiskDiskDiskDiskDiskDisk
Copyright Think Big Analytics 2014. All rights reserved. 11
Ÿ  Support for many processing engines including emerging Predictive
Analytics libraries
Ÿ  Common cluster with local data access in HDFS
YARN - Yet Another Resource Negotiator
Copyright Think Big Analytics 2014. All rights reserved. 12
Ÿ  Focus on technologies for prescriptive analytics
-  Getting insights from your data
Ÿ  To get there you need
-  data ingestion (Flume, Kafka, Sqoop, ...)
-  data transformation (Pig, MapReduce, Hive, Cascading, …)
Ÿ  Beyond prescriptive, Machine Learned predictive analytics is
interesting
Narrowing In
Copyright Think Big Analytics 2014. All rights reserved. 13
Ÿ  Scala-based more general distributed processing with long-lived
memory caching
Ÿ  First released 2010, many contributors
Ÿ  Scala, Java, Python APIs
Ÿ  Largest production cluster to date is 80 nodes at Yahoo
Ÿ  YARN Support
Ÿ  Subprojects
-  Shark – Hive on Spark
-  MLib – machine learning
-  Spark Streaming
-  GraphX
Ÿ  Pros: faster iterative analysis, growing distribution support
Ÿ  Cons: less mature, Scala API foundation, smaller community
Apache Spark
Copyright Think Big Analytics 2014. All rights reserved. 14
Ÿ  DAG Processing of never ending streams of data
-  First released 2011
-  Used at Twitter plus dozens of other companies
-  Reliable - At Least Once semantics
-  Think MapReduce for data streams
-  Java / Clojure based
-  Bolts in Java and ‘Shell Bolts’
-  Not a queue, but usually reads from a queue.
-  YARN integration
Ÿ  Pros: community, HW support, low latency, high throughput
Ÿ  Cons: static topologies & cluster sizing, Nimbus – SPOF, state
management
Apache Storm
Platform Trends
•  Hadoop	
  will	
  eat	
  Big	
  Data	
  Analy<cs	
  
•  Hadoop	
  YARN	
  to	
  run	
  diverse	
  applica<ons	
  
•  Real<me	
  latency:	
  Storm,	
  Samza,	
  HBase,	
  Cassandra,	
  Elas<csearch…	
  
•  Management:	
  Ambari,	
  Pepperdata…	
  
•  Security:	
  Knox,	
  Sentry,	
  XA	
  Secure…	
  
•  Cloud:	
  AWS,	
  Qubole,	
  Altascale,	
  Info	
  Chimps…	
  
15	
  5/29/14	
  
Analytics Trends
•  Real<me	
  query	
  engines:	
  Hive/Tez,	
  Shark,	
  Impala,	
  Presto…	
  
•  Machine	
  Learning:	
  Spark,	
  MLib,	
  Mahout,	
  Orryx…	
  
•  Metadata:	
  HCatalog	
  
•  Analyst	
  Tools:	
  Tableau,	
  Alpine	
  Data	
  Labs,	
  Plaoora,	
  Zoomdata…	
  
•  Sensemaking:	
  Data	
  Tamer,	
  Trifacta,	
  Paxta…	
  
16	
  5/29/14	
  
Agenda
•  Why	
  Big	
  Data	
  
•  Technology	
  Capabili<es	
  
•  OrganizaJonal	
  EvoluJon	
  
•  Case	
  Study	
  
•  Conclusions	
  
5/29/14	
   17	
  
How To Get the Right Answers
Data
Weak intersection between Data, Systems
and Business often means project failure
Current State Improved State
Collaborative integration of Analytics with
Big Data capabilities for success.
Data
Weak Area with
diminished influence
Cross-functional
collaboration
Data-driven
business alignment
5/29/14	
   18	
  
Analytics and Data Science
Analy<cs	
  is	
  the	
  umbrella	
  term	
  for	
  discovery	
  and	
  communica<on	
  of	
  signal	
  in	
  data.	
  
•  Intelligence	
  and	
  ReporEng:	
  Governed	
  insights	
  for	
  general	
  audiences	
  by	
  providing	
  
summaries	
  and	
  visualiza<ons	
  over	
  well	
  understood	
  data	
  
•  DescripEve	
  AnalyEcs:	
  Interpretable	
  data	
  stories	
  to	
  domain	
  consumers	
  (e.g.	
  business,	
  
sales,	
  etc.)	
  using	
  exploratory	
  data	
  analysis	
  and	
  descrip<ve	
  models	
  (e.g.	
  clustering)	
  over	
  
well	
  understood	
  data	
  
•  PredicEve	
  AnalyEcs:	
  Predicts	
  future	
  state	
  or	
  assesses	
  impact	
  of	
  changing	
  parameters	
  for	
  
domain	
  consumers	
  (e.g.	
  business,	
  sales,	
  etc.)	
  using	
  sta<s<cal	
  modeling	
  over	
  well	
  
understood	
  data	
  
•  Data	
  Science:	
  Uncovers	
  new	
  or	
  missing	
  signals	
  and	
  rela<onships	
  in	
  data	
  to	
  be	
  leveraged	
  
by	
  other	
  analy<cs.	
  	
  Supported	
  with	
  a	
  mix	
  of	
  exploratory	
  data	
  analysis	
  and	
  sta<s<cal	
  
modeling	
  over	
  unexplored	
  data	
  or	
  new	
  combina<ons	
  of	
  known	
  data.	
  	
  
New	
  capability	
  driven	
  by	
  Big	
  Data,	
  currently	
  overhyped	
  and	
  overused.	
  
End-­‐goal:	
  effecEve	
  and	
  repeatable	
  integraEon	
  of	
  analyEcs	
  into	
  business	
  processes.	
  
5/29/14	
   19	
  
Basic	
  
Repor<ng	
  
Big	
  Data	
  
Repor<ng	
  
Data	
  
Science	
  
Hub	
  
Business	
  
Analy<cs	
  
Data	
  Driven	
  
Organiza<on	
  
Analy<c	
  
Integra<on	
  
High	
  
Low	
  
Big	
  Data	
  
Capability	
  
Limited	
   Full	
  
Analytics Capability Model
5/29/14	
   20	
  
Data	
  
Science	
  
Hub	
  
Data	
  Driven	
  
Organiza<on	
  
Analy<c	
  
Integra<on	
  
High	
  
Low	
  
Big	
  Data	
  
Capability	
  
Limited	
   Full	
  
•  Redundant	
  infrastructure	
  
•  Data	
  locked	
  down	
  w/in	
  product	
  teams	
  
•  Analy<c	
  innova<ons	
  not	
  shared	
  
•  Share	
  data	
  internally	
  
•  Support	
  joining	
  of	
  data	
  
•  Make	
  analy<cs	
  capabili<es	
  
accessible	
  to	
  all	
  business	
  lines.	
  	
  
Client Example – Internet Infrastructure Firm
5/29/14	
   21	
  
Agenda
•  Why	
  Big	
  Data	
  
•  Technology	
  Capabili<es	
  
•  Organiza<onal	
  Evolu<on	
  
•  Case	
  Study	
  
•  Conclusions	
  
5/29/14	
   22	
  
•  “Reduce	
  Data	
  Search	
  Par<es”	
  
•  Stop	
  playing	
  “Where’s	
  Waldo	
  with	
  your	
  Data”	
  
•  	
  	
  “	
  I	
  know	
  I	
  have	
  that	
  data…..	
  somewhere?”	
  
•  Data	
  Aggrega<on	
  to	
  a	
  Common	
  Plaoorm	
  with	
  
common	
  access	
  tools	
  	
  
	
  
•  Improve	
  Yields	
  by	
  Accessing	
  More	
  Data	
  in	
  a	
  More	
  
Timely	
  Manner	
  	
  
•  End	
  to	
  end	
  visibility	
  for	
  every	
  test,	
  every	
  diagnos<c	
  and	
  all	
  
info	
  from	
  all	
  components	
  of	
  a	
  product	
  (internal	
  and	
  external)	
  
•  Speed	
  up	
  yield	
  improvement	
  ramp	
  up	
  on	
  new	
  products	
  
•  Improve	
  steady	
  state	
  yield	
  on	
  exis<ng	
  products.	
  
Use Cases
23	
  5/29/14	
  
Big Data Platform
Devices
Components
Assemblies
…
Field Data
Supplier
All raw parametric,
logistic, vintage, data
ERP/DW’s
Data Sources
End-­‐to-­‐End	
  
Integrated	
  
Data	
  
Consumers	
  
Ad	
  hoc	
  Analysis	
  
BI	
  Tools	
  
Op<mize/Reduce	
  
Tes<ng	
  
Data	
  Marts	
  
Batch	
  AnalyJcs	
  
Parallelized
batch analytics
App-Specific
Views
New High-Value
Parameters
raw
extracts
Enriched
data
Proac<ve	
  Issue	
  
Iden<fica<on	
  
Failure	
  Screen	
  
Tests	
  
Customer	
  FA	
  via	
  
Field	
  Data	
  
Predic<ve	
  Analy<cs	
  	
  
Tools	
  
…other ..
5/29/14	
   24	
  
Highlights
•  Enterprise-­‐wide	
  data	
  access	
  for	
  <mely	
  analy<cs	
  and	
  insights	
  
•  Founda<on	
  for	
  large	
  scale	
  proac<ve	
  analy<cs	
  
Legacy	
   BDP	
  
Reten<on	
   3-­‐6	
  months	
  scaeered	
  DBs	
  
then	
  tape	
  archive	
  
All	
  data	
  online	
  in	
  common	
  data	
  
lake	
  for	
  3+	
  years	
  
Coverage	
   Summaries,	
  samples	
  for	
  several	
  
data	
  sets	
  
All	
  parametric	
  data	
  captured	
  in	
  
raw	
  and	
  integrated	
  form	
  
Analysis	
   Reac<ve	
  resul<ng	
  in	
  missed	
  
improvement	
  opportuni<es	
  
In	
  daily	
  opera<ons	
  and	
  on	
  
larger	
  data	
  sets	
  to	
  support	
  
proac<ve	
  improvements	
  
25	
  5/29/14	
  
•  Metrics:	
  
•  Collec<ng	
  >2M	
  manufacturing/tes<ng	
  binary	
  files	
  daily	
  
•  Collec<ng	
  from	
  ~500	
  tables	
  across	
  6	
  databases	
  à	
  tens	
  of	
  millions	
  of	
  records	
  daily	
  
•  Over	
  140	
  users	
  to	
  date	
  in	
  early	
  pilo<ng	
  
•  Over	
  150	
  aeendees	
  par<cipated	
  in	
  BDP	
  training	
  
•  Highlights:	
  although	
  early	
  in	
  the	
  overall	
  journey,	
  the	
  BDP	
  is	
  already	
  demonstra<ng	
  early	
  benefits:	
  
Key Metrics & Highlights
Development Engineer: demonstrated the joining of data sets for detailed
logistics tracking—analyses that is very difficult to conduct with current
systems
Ops Engineer: a recent production issue required detailed historical data. Current systems
did not have the required retention for this data. However, the team was able to pull the data
from the BDP in minutes, as opposed to 3+ weeks to pull the data from tape archive
Development Engineer: obtained technical data from the BDP in
hours as opposed to 3+ weeks to pull from tape archive
DATA SEARCH
PARTIES
YIELD
5/29/14	
  
Governance – Project Prioritization
New
Data
Sets
Use
Cases
from
Strategy
New
Apps /
Reports
New
Analytics
Working
Steering
Committee
Executive
Steering
Committee
1.  New Data Set 1
2.  New Use Case 1
3.  Use Case from
Strategy
Prioritized List
Business
Experts
Business impact vs. ease
of delivery, etc.
Nominated by…
Prioritized by…
Core
Platform
Updates
Sponsored by…
27	
  5/29/14	
  
Release Cycles
July	
   Aug	
   Sept	
   Oct	
   Nov	
   Dec	
   Oct	
  Jan	
   Sept	
  Feb	
  
Today
Mar	
   Apr	
   May	
  
Big	
  Data	
  Plaoorm	
  Roadmap	
  
Jun	
   July	
   Aug	
  
Release	
  1b:	
  all	
  
ac<ve	
  
products	
  
Release	
  1a:	
  base	
  
plaoorm,	
  first	
  
product	
  
Release	
  2:	
  
Field/RMA	
  
Data	
  
Quality Development Operations Core/Other
Release	
  3	
   Release	
  4	
   Release	
  5	
  
Reduce	
  Scrap	
  
New	
  Product	
  Data	
  Access	
  
New	
  Data	
  Sets	
  
Ongoing	
  
Support	
  
Faster	
  Response	
  to	
  Field	
  
Requests	
  
Correlated	
  
Analysis	
  
Improved	
  Failure	
  Analyses	
  
Delivering new
capabilities
to the business every
45 – 90 days
28	
  5/29/14	
  
Agenda
•  Why	
  Big	
  Data	
  
•  Technology	
  CapabiliJes	
  
•  Organiza<onal	
  Evolu<on	
  
•  Case	
  Study	
  
•  Conclusions	
  
5/29/14	
   29	
  
Stages of Big Data Adoption
Contain
Costs &
Scale
Big Data
Reporting
Agile Analytics
& Insight
Innovation
360o view
Speed
Accuracy
Data Science
Hub
Business
Innovation
Optimization
Engagement
Data Science
Automation
Organizational
Transformation
New products
Customer analytics
Operational efficiency
New businesses
Data Driven
Organization
Sustained competitive advantage through Big Analytics
Traditional
B.I.
5/29/14	
   30	
  
Pitfalls
•  Data	
  Poli<cs	
  
•  Immature	
  Governance	
  
•  Siloed	
  Organiza<on	
  
•  Siloed	
  Applica<ons	
  
•  Buy-­‐in	
  
•  Leaping	
  over	
  phases	
  /	
  Bypassing	
  low	
  hanging	
  fruit	
  
•  Disrup<ve	
  Change	
  
•  Insufficient	
  funding	
  
•  Big	
  Bang	
  Rollout	
  
•  Skills	
  Gap	
  
•  Tac<cal	
  use	
  cases/the	
  big	
  data	
  plateau	
  
•  Business	
  as	
  usual	
  
5/29/14	
   31	
  
Conclusions
•  Big	
  Data	
  is	
  affec<ng	
  all	
  companies	
  especially	
  in	
  tech	
  
•  Start	
  with	
  low	
  hanging	
  fruit	
  
•  Establish	
  a	
  roadmap	
  for	
  incremental	
  organiza<onal	
  evolu<on	
  and	
  
technical	
  capabili<es	
  
5/29/14	
   32	
  

More Related Content

What's hot

HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergencekvnnrao
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HPMITEF México
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure ClustersDavid Walker
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampNVISIA
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Perspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernancePerspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernanceCloudera, Inc.
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...emermell
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessInside Analysis
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondInside Analysis
 
IRJET- Survey of Big Data with Hadoop
IRJET-  	  Survey of Big Data with HadoopIRJET-  	  Survey of Big Data with Hadoop
IRJET- Survey of Big Data with HadoopIRJET Journal
 
Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Pactera_US
 

What's hot (20)

HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergence
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
4. Big data & analytics HP
4. Big data & analytics HP4. Big data & analytics HP
4. Big data & analytics HP
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
 
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swampDAMA Chicago - Ensuring your data lake doesn’t become a data swamp
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
IDOL presentation
IDOL presentationIDOL presentation
IDOL presentation
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Perspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernancePerspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data Governance
 
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
 
IRJET- Survey of Big Data with Hadoop
IRJET-  	  Survey of Big Data with HadoopIRJET-  	  Survey of Big Data with Hadoop
IRJET- Survey of Big Data with Hadoop
 
Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks
 

Viewers also liked

Wind & Waves
Wind & WavesWind & Waves
Wind & WavesHoma2000
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSbigdatagurus_meetup
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 

Viewers also liked (8)

Wind & Waves
Wind & WavesWind & Waves
Wind & Waves
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 

Similar to Big data beyond the hype may 2014

Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is HappeningPrecisely
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTKiththi Perera
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardKiththi Perera
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksMapR Technologies
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data LakesKiran Kamreddy
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Bratamay Majumder
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Denodo
 

Similar to Big data beyond the hype may 2014 (20)

Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is Happening
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
 

More from bigdatagurus_meetup

More from bigdatagurus_meetup (10)

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
What enterprises can learn from Real Time Bidding (RTB)
What enterprises can learn from Real Time Bidding (RTB)What enterprises can learn from Real Time Bidding (RTB)
What enterprises can learn from Real Time Bidding (RTB)
 
Scaling HBase at Pinterest
Scaling HBase at PinterestScaling HBase at Pinterest
Scaling HBase at Pinterest
 
Continuuity Weave
Continuuity WeaveContinuuity Weave
Continuuity Weave
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Big data beyond the hype may 2014

  • 1. Big  Data  Beyond  the  Hype:   Trends,  Pi6alls  and  Building  Competency   May  2014   1  5/29/14  
  • 2. Agenda •  Why  Big  Data   •  Technology  Capabili<es   •  Organiza<onal  Evolu<on   •  Case  Study   •  Conclusions   5/29/14   2  
  • 3. Big Data 5/29/14   3   •  Data  sets  so  large  and   complex  that  they  become   awkward  to  work  with  using   standard  tools  and   techniques   •  Google,  Quantcast,  Yahoo,   LinkedIn…  customer   innova<on  in  open  source  
  • 4. Patterns from Delivering Big Data at Scale eCommerce     2  of  Global  Top  5   Retail     2  of  Global  Top  5   Social  Networking     Global  #1   Banking     4  of  Global  Top  10   Credit  Issuer     2  of  Global  Top  5   Financial  Data  Services     2  of  Global  Top  5   Financial  Exchanges     Global  #2   Brokerage  &  Mutual  Funds     2  of  Global  Top  5   Asset  Management     Global  #1   Semiconductor     2  of  Global  Top  5   Data  Storage  Devices     3  of  Global  Top  5   Disk  Drive  Manufacturing     Global  #1   TelecommunicaJons   2  of  Global  Top  5   Media  &  AdverJsing     2  of  Global  Top  4   Internet  TransacJon  Security     Global  #1   5/29/14   4  
  • 5. Manufacturing Analytics 1   Improve  Yield  -­‐  reduced  scrap,  faster  <me  to  market   2   Ease  data  access  –  quick  search  access  to  all  relevant  data  for  all  history  in  one   place,  not  samples,  summary  or  recent  data     3   Process  Efficiency  –  iden<fy  redundant  tests,  op<mize  throughput   4   Component  Analysis  –  proac<ve  discovery  of  component  problems,   compa<bility  detec<on   5   Service  Revenue  GeneraJon  -­‐  Leverage  customer  support  to  generate  new   revenues  by  increasing  customer  reten<on,  support  renewals,  and  upgrading   customers  to  new  products  and  licenses.   5  5/29/14  
  • 6. Product Analytics 1   Issue  ResoluJon  -­‐  Faster  resolu<on  of  issues  through  deep  analysis  of   integrated  device  log,  product,  and  configura<on  data  across  the  install  base.   2   Support  Metrics  -­‐  Gain  new  insights  into  dimensions  impac<ng  service  metrics   by  combining  product  use  technical  data  with  business  data.  E.g.,  number  of   <ckets  resolved,  length  of  calls,  resource  alloca<on,  customer  sa<sfac<on.     3   ProacJve  Service  -­‐  Iden<fy  and  address  poten<al  issues  rather  than  wai<ng  to   react  to  them  aXer  they  happen.  I.e.,  intelligence  for  condi<on-­‐based   maintenance.   4   Customer  Self-­‐Service  -­‐  Drive  higher  rates  of  customer  self-­‐service  by  providing   customers  access  to  tools  and  resources  to  answer  their  own  ques<ons  and   solve  their  own  issues  by  analyzing  past  customer  ac<vity  and  integra<ng   technical  details  from  customer  systems  into  self-­‐service  portals.   5   Service  Revenue  GeneraJon  -­‐  Leverage  customer  support  to  generate  new   revenues  by  increasing  customer  reten<on,  support  renewals,  and  upgrading   customers  to  new  products  and  licenses.   6  5/29/14  
  • 7. Clickstream Analytics 1   Timely  Processing  -­‐  Data  volumes  are  increasing  clogging  data  warehouse,  slow   and  out  of  date.   2   Cost  Efficiency  &  Scale  -­‐  Data  volumes  are  increasing  clogging  data  warehouse,   slow  and  out  of  date.   3   Flexible  Tagging  –  Analyze  site  without  having  to  push  data  to  tags,  classify   ac<vity  based  on  sec<ons,  URLs  more  flexibly  than  tags.   4   Converged  AnalyJcs  -­‐  Understand  and  improve  web,  mobile,  cross-­‐channel   customer  ac<vity.  Siloed  apps  costly  &  inflexible  –  e.g.,  tags  require  feeding   data  about  user  demographics,  segments.  Mash  up  interac<ons  with  data  like   geographic,  demographic,  social,  call  center,  purchases.   5   Deeper  ExploraJon  -­‐  Analyze  sequences  of  events  in  customer  journeys.  Deep   drill  down  (how  do  users  with  the  latest  iPhone  and  iOS  compare  to  latest   Samsung  and  Android).  Cohort  analysis  -­‐  those  who  buy  books  vs  spor<ng   goods.   6   PredicJve  Models  –  next  best  offer  recommenda<ons,  personaliza<on,   customer  value,  microsegmenta<on,  ad  aeribu<on…   7  5/29/14  
  • 8. Agenda •  Why  Big  Data   •  Technology  CapabiliJes   •  Organiza<onal  Evolu<on   •  Case  Study   •  Conclusions   5/29/14   8  
  • 9. Big Data Reference Architecture 5/29/14   9  
  • 10. •  Community  Open  Source     •  Over  $1  Billion  invested  in  startups,  all  major  sw  companies  embrace   •  3  replicas  for  HA,  50  PB+  scale   Apache Hadoop 5/29/14   10   Cluster Slaves Client Servers ✲ Hive, Pig, ... ✲ cron+bash, Azkaban, … Sqoop, Scribe, … Monitoring, Management ... Secondary Master Server Secondary Name Node Primary Master Server ✲ Job Tracker Name Node Slave Server ✲ Task Tracker Data Node DiskDiskDiskDiskDiskDiskDiskDisk Slave Server ✲ Task Tracker Data Node DiskDiskDiskDiskDiskDiskDiskDisk Slave Server ✲ Task Tracker Data Node DiskDiskDiskDiskDiskDiskDiskDisk
  • 11. Copyright Think Big Analytics 2014. All rights reserved. 11 Ÿ  Support for many processing engines including emerging Predictive Analytics libraries Ÿ  Common cluster with local data access in HDFS YARN - Yet Another Resource Negotiator
  • 12. Copyright Think Big Analytics 2014. All rights reserved. 12 Ÿ  Focus on technologies for prescriptive analytics -  Getting insights from your data Ÿ  To get there you need -  data ingestion (Flume, Kafka, Sqoop, ...) -  data transformation (Pig, MapReduce, Hive, Cascading, …) Ÿ  Beyond prescriptive, Machine Learned predictive analytics is interesting Narrowing In
  • 13. Copyright Think Big Analytics 2014. All rights reserved. 13 Ÿ  Scala-based more general distributed processing with long-lived memory caching Ÿ  First released 2010, many contributors Ÿ  Scala, Java, Python APIs Ÿ  Largest production cluster to date is 80 nodes at Yahoo Ÿ  YARN Support Ÿ  Subprojects -  Shark – Hive on Spark -  MLib – machine learning -  Spark Streaming -  GraphX Ÿ  Pros: faster iterative analysis, growing distribution support Ÿ  Cons: less mature, Scala API foundation, smaller community Apache Spark
  • 14. Copyright Think Big Analytics 2014. All rights reserved. 14 Ÿ  DAG Processing of never ending streams of data -  First released 2011 -  Used at Twitter plus dozens of other companies -  Reliable - At Least Once semantics -  Think MapReduce for data streams -  Java / Clojure based -  Bolts in Java and ‘Shell Bolts’ -  Not a queue, but usually reads from a queue. -  YARN integration Ÿ  Pros: community, HW support, low latency, high throughput Ÿ  Cons: static topologies & cluster sizing, Nimbus – SPOF, state management Apache Storm
  • 15. Platform Trends •  Hadoop  will  eat  Big  Data  Analy<cs   •  Hadoop  YARN  to  run  diverse  applica<ons   •  Real<me  latency:  Storm,  Samza,  HBase,  Cassandra,  Elas<csearch…   •  Management:  Ambari,  Pepperdata…   •  Security:  Knox,  Sentry,  XA  Secure…   •  Cloud:  AWS,  Qubole,  Altascale,  Info  Chimps…   15  5/29/14  
  • 16. Analytics Trends •  Real<me  query  engines:  Hive/Tez,  Shark,  Impala,  Presto…   •  Machine  Learning:  Spark,  MLib,  Mahout,  Orryx…   •  Metadata:  HCatalog   •  Analyst  Tools:  Tableau,  Alpine  Data  Labs,  Plaoora,  Zoomdata…   •  Sensemaking:  Data  Tamer,  Trifacta,  Paxta…   16  5/29/14  
  • 17. Agenda •  Why  Big  Data   •  Technology  Capabili<es   •  OrganizaJonal  EvoluJon   •  Case  Study   •  Conclusions   5/29/14   17  
  • 18. How To Get the Right Answers Data Weak intersection between Data, Systems and Business often means project failure Current State Improved State Collaborative integration of Analytics with Big Data capabilities for success. Data Weak Area with diminished influence Cross-functional collaboration Data-driven business alignment 5/29/14   18  
  • 19. Analytics and Data Science Analy<cs  is  the  umbrella  term  for  discovery  and  communica<on  of  signal  in  data.   •  Intelligence  and  ReporEng:  Governed  insights  for  general  audiences  by  providing   summaries  and  visualiza<ons  over  well  understood  data   •  DescripEve  AnalyEcs:  Interpretable  data  stories  to  domain  consumers  (e.g.  business,   sales,  etc.)  using  exploratory  data  analysis  and  descrip<ve  models  (e.g.  clustering)  over   well  understood  data   •  PredicEve  AnalyEcs:  Predicts  future  state  or  assesses  impact  of  changing  parameters  for   domain  consumers  (e.g.  business,  sales,  etc.)  using  sta<s<cal  modeling  over  well   understood  data   •  Data  Science:  Uncovers  new  or  missing  signals  and  rela<onships  in  data  to  be  leveraged   by  other  analy<cs.    Supported  with  a  mix  of  exploratory  data  analysis  and  sta<s<cal   modeling  over  unexplored  data  or  new  combina<ons  of  known  data.     New  capability  driven  by  Big  Data,  currently  overhyped  and  overused.   End-­‐goal:  effecEve  and  repeatable  integraEon  of  analyEcs  into  business  processes.   5/29/14   19  
  • 20. Basic   Repor<ng   Big  Data   Repor<ng   Data   Science   Hub   Business   Analy<cs   Data  Driven   Organiza<on   Analy<c   Integra<on   High   Low   Big  Data   Capability   Limited   Full   Analytics Capability Model 5/29/14   20  
  • 21. Data   Science   Hub   Data  Driven   Organiza<on   Analy<c   Integra<on   High   Low   Big  Data   Capability   Limited   Full   •  Redundant  infrastructure   •  Data  locked  down  w/in  product  teams   •  Analy<c  innova<ons  not  shared   •  Share  data  internally   •  Support  joining  of  data   •  Make  analy<cs  capabili<es   accessible  to  all  business  lines.     Client Example – Internet Infrastructure Firm 5/29/14   21  
  • 22. Agenda •  Why  Big  Data   •  Technology  Capabili<es   •  Organiza<onal  Evolu<on   •  Case  Study   •  Conclusions   5/29/14   22  
  • 23. •  “Reduce  Data  Search  Par<es”   •  Stop  playing  “Where’s  Waldo  with  your  Data”   •     “  I  know  I  have  that  data…..  somewhere?”   •  Data  Aggrega<on  to  a  Common  Plaoorm  with   common  access  tools       •  Improve  Yields  by  Accessing  More  Data  in  a  More   Timely  Manner     •  End  to  end  visibility  for  every  test,  every  diagnos<c  and  all   info  from  all  components  of  a  product  (internal  and  external)   •  Speed  up  yield  improvement  ramp  up  on  new  products   •  Improve  steady  state  yield  on  exis<ng  products.   Use Cases 23  5/29/14  
  • 24. Big Data Platform Devices Components Assemblies … Field Data Supplier All raw parametric, logistic, vintage, data ERP/DW’s Data Sources End-­‐to-­‐End   Integrated   Data   Consumers   Ad  hoc  Analysis   BI  Tools   Op<mize/Reduce   Tes<ng   Data  Marts   Batch  AnalyJcs   Parallelized batch analytics App-Specific Views New High-Value Parameters raw extracts Enriched data Proac<ve  Issue   Iden<fica<on   Failure  Screen   Tests   Customer  FA  via   Field  Data   Predic<ve  Analy<cs     Tools   …other .. 5/29/14   24  
  • 25. Highlights •  Enterprise-­‐wide  data  access  for  <mely  analy<cs  and  insights   •  Founda<on  for  large  scale  proac<ve  analy<cs   Legacy   BDP   Reten<on   3-­‐6  months  scaeered  DBs   then  tape  archive   All  data  online  in  common  data   lake  for  3+  years   Coverage   Summaries,  samples  for  several   data  sets   All  parametric  data  captured  in   raw  and  integrated  form   Analysis   Reac<ve  resul<ng  in  missed   improvement  opportuni<es   In  daily  opera<ons  and  on   larger  data  sets  to  support   proac<ve  improvements   25  5/29/14  
  • 26. •  Metrics:   •  Collec<ng  >2M  manufacturing/tes<ng  binary  files  daily   •  Collec<ng  from  ~500  tables  across  6  databases  à  tens  of  millions  of  records  daily   •  Over  140  users  to  date  in  early  pilo<ng   •  Over  150  aeendees  par<cipated  in  BDP  training   •  Highlights:  although  early  in  the  overall  journey,  the  BDP  is  already  demonstra<ng  early  benefits:   Key Metrics & Highlights Development Engineer: demonstrated the joining of data sets for detailed logistics tracking—analyses that is very difficult to conduct with current systems Ops Engineer: a recent production issue required detailed historical data. Current systems did not have the required retention for this data. However, the team was able to pull the data from the BDP in minutes, as opposed to 3+ weeks to pull the data from tape archive Development Engineer: obtained technical data from the BDP in hours as opposed to 3+ weeks to pull from tape archive DATA SEARCH PARTIES YIELD 5/29/14  
  • 27. Governance – Project Prioritization New Data Sets Use Cases from Strategy New Apps / Reports New Analytics Working Steering Committee Executive Steering Committee 1.  New Data Set 1 2.  New Use Case 1 3.  Use Case from Strategy Prioritized List Business Experts Business impact vs. ease of delivery, etc. Nominated by… Prioritized by… Core Platform Updates Sponsored by… 27  5/29/14  
  • 28. Release Cycles July   Aug   Sept   Oct   Nov   Dec   Oct  Jan   Sept  Feb   Today Mar   Apr   May   Big  Data  Plaoorm  Roadmap   Jun   July   Aug   Release  1b:  all   ac<ve   products   Release  1a:  base   plaoorm,  first   product   Release  2:   Field/RMA   Data   Quality Development Operations Core/Other Release  3   Release  4   Release  5   Reduce  Scrap   New  Product  Data  Access   New  Data  Sets   Ongoing   Support   Faster  Response  to  Field   Requests   Correlated   Analysis   Improved  Failure  Analyses   Delivering new capabilities to the business every 45 – 90 days 28  5/29/14  
  • 29. Agenda •  Why  Big  Data   •  Technology  CapabiliJes   •  Organiza<onal  Evolu<on   •  Case  Study   •  Conclusions   5/29/14   29  
  • 30. Stages of Big Data Adoption Contain Costs & Scale Big Data Reporting Agile Analytics & Insight Innovation 360o view Speed Accuracy Data Science Hub Business Innovation Optimization Engagement Data Science Automation Organizational Transformation New products Customer analytics Operational efficiency New businesses Data Driven Organization Sustained competitive advantage through Big Analytics Traditional B.I. 5/29/14   30  
  • 31. Pitfalls •  Data  Poli<cs   •  Immature  Governance   •  Siloed  Organiza<on   •  Siloed  Applica<ons   •  Buy-­‐in   •  Leaping  over  phases  /  Bypassing  low  hanging  fruit   •  Disrup<ve  Change   •  Insufficient  funding   •  Big  Bang  Rollout   •  Skills  Gap   •  Tac<cal  use  cases/the  big  data  plateau   •  Business  as  usual   5/29/14   31  
  • 32. Conclusions •  Big  Data  is  affec<ng  all  companies  especially  in  tech   •  Start  with  low  hanging  fruit   •  Establish  a  roadmap  for  incremental  organiza<onal  evolu<on  and   technical  capabili<es   5/29/14   32