SlideShare a Scribd company logo
1	
  
Transforming	
  Data	
  Architecture	
  
Complexity	
  at	
  Sears	
  
Jus:n	
  Sheppard	
  
Sears	
  Holdings	
  Corpora1on	
  
2	
  
	
  
•  Not	
  mee1ng	
  produc1on	
  schedules	
  
•  Mul1ple	
  copies	
  of	
  data,	
  no	
  single	
  point	
  of	
  truth	
  
•  ETL	
  complexity,	
  cost	
  of	
  soAware	
  and	
  cost	
  to	
  manage	
  
•  Time	
  to	
  setup	
  ETL	
  data	
  sources	
  for	
  projects	
  
•  Latency	
  in	
  data	
  (up	
  to	
  weeks	
  in	
  some	
  cases)	
  
•  Enterprise	
  Data	
  Warehouses	
  unable	
  to	
  handle	
  load	
  
•  Mainframe	
  workload	
  over	
  consuming	
  capacity	
  
•  IT	
  Budgets	
  not	
  growing	
  –	
  BUT	
  data	
  volumes	
  escala1ng	
  
Where	
  Did	
  We	
  Start?	
  
What	
  Is	
  Hadoop?	
  
3	
  
Hadoop	
  Distributed	
  
File	
  System	
  (HDFS)	
  
	
  
File	
  Sharing	
  &	
  Data	
  
Protec1on	
  Across	
  
Physical	
  Servers	
  
MapReduce	
  
	
  
Fault	
  Tolerant	
  
Distributed	
  
Compu1ng	
  Across	
  
Physical	
  Servers	
  
Flexibility	
  
	
  
o A	
  single	
  repository	
  for	
  
storing	
  processing	
  &	
  
analyzing	
  any	
  type	
  of	
  data	
  
(structured	
  and	
  complex)	
  
o Not	
  bound	
  by	
  a	
  single	
  
schema	
  
Scalability	
  
	
  
o Scale-­‐out	
  architecture	
  divides	
  
workloads	
  across	
  mul1ple	
  
nodes	
  
o Flexible	
  file	
  system	
  eliminates	
  
ETL	
  boXlenecks	
  
Low	
  Cost	
  
	
  
o Can	
  be	
  deployed	
  on	
  
commodity	
  hardware	
  
o Open	
  source	
  plaZorm	
  guards	
  
against	
  vendor	
  lock	
  
Hadoop	
  is	
  a	
  plaZorm	
  for	
  data	
  storage	
  
and	
  processing	
  that	
  is…	
  
o  Scalable	
  
o  Fault	
  tolerant	
  
o  Open	
  source	
  
4	
  
Hadoop	
  
IS	
  
•  Store	
  vast	
  amounts	
  of	
  data	
  
•  Run	
  queries	
  on	
  huge	
  data	
  
sets	
  
•  Ask	
  ques1ons	
  previously	
  
impossible	
  
•  Archive	
  data	
  but	
  s1ll	
  
analyze	
  it	
  
•  Capture	
  data	
  streams	
  at	
  
incredible	
  speeds	
  
•  Massively	
  reduce	
  data	
  
latency	
  
•  Transform	
  your	
  thinking	
  
about	
  ETL	
  
Is	
  Not	
  
•  High-­‐speed	
  SQL	
  database	
  
•  Simple	
  
•  Easily	
  connected	
  to	
  legacy	
  
systems	
  
•  A	
  replacement	
  for	
  your	
  
current	
  data	
  warehouse	
  
•  Going	
  to	
  be	
  built	
  or	
  
operated	
  by	
  your	
  DBA's	
  
•  Going	
  to	
  make	
  any	
  sense	
  
to	
  your	
  data	
  architects	
  
•  Going	
  to	
  be	
  possible	
  if	
  do	
  
not	
  have	
  Linux	
  skills	
  
5	
  
Use	
  The	
  Right	
  Tool	
  For	
  The	
  Right	
  Job	
  
Databases:	
   Hadoop:	
  
When to use?
•  Affordable Storage/Compute
•  High-performance queries on large data
•  Complex data
•  Resilient Auto Scalability
When to use?
•  Transactional, High Speed Analytics
•  Interactive Reporting (<1sec)
•  Multi-step Transactions
•  Numerous Inserts/Updates/Deletes
Can be combined
Use	
  The	
  Right	
  Tool	
  For	
  The	
  Right	
  Job	
  
6	
  
Hadoop
Database
Data	
  Hub	
  
7	
  
•  Underlying	
  premise	
  as	
  Hadoop	
  adop1on	
  con1nues	
  –	
  source	
  data	
  once,	
  use	
  many.	
  
•  Over	
  1me,	
  as	
  more	
  and	
  more	
  data	
  is	
  sourced,	
  development	
  1mes	
  will	
  reduce	
  since	
  data	
  
sourcing	
  is	
  significantly	
  less	
  than	
  typical.	
  
8	
  
Some	
  Examples	
  
Use-­‐cases	
  at	
  Sears	
  Holdings	
  
The	
  First	
  Usage	
  in	
  Produc1on	
  
Use	
  Case	
  	
  
•  Interac1ve	
  presenta1on	
  layer	
  was	
  required	
  to	
  present	
  item/price/sales	
  data	
  in	
  a	
  highly	
  flexible	
  user	
  
interface	
  with	
  rapid	
  response	
  1me	
  	
  
•  Needed	
  to	
  deliver	
  solu1on	
  within	
  a	
  very	
  short	
  period	
  of	
  1me.	
  	
  
•  Legacy	
  architecture	
  would	
  have	
  required	
  a	
  MicroStrategy	
  solu1on	
  u1lizing	
  1,000’s	
  of	
  cubes	
  on	
  
many	
  expensive	
  servers	
  	
  
Approach	
  	
  
•  Rapid	
  development	
  project	
  ini1ated	
  to	
  present	
  item/price/sales	
  data	
  in	
  a	
  highly	
  flexible	
  user	
  
interface	
  with	
  rapid	
  response	
  1me	
  	
  
•  Built	
  system	
  from	
  the	
  ground	
  up	
  	
  
•  Migrated	
  all	
  required	
  data	
  to	
  centralized	
  HDFS	
  repository	
  from	
  legacy	
  databases	
  	
  
•  Developed	
  MapReduce	
  code	
  to	
  process	
  daily	
  data	
  files	
  into	
  4	
  primary	
  data	
  tables	
  	
  
•  Tables	
  extracted	
  to	
  service	
  layer	
  (MySQL/Infobrite)	
  for	
  presenta1on	
  through	
  the	
  Pricing	
  Portal	
  	
  
Results	
  	
  
•  File	
  prepara1on	
  completes	
  in	
  minutes	
  each	
  day	
  and	
  ensures	
  portal	
  data	
  is	
  ready	
  very	
  soon	
  aAer	
  
daily	
  sales	
  processing	
  completes	
  (100K	
  records	
  daily)	
  	
  
•  This	
  was	
  the	
  first	
  produc1on	
  usage	
  of	
  MapReduce	
  and	
  associated	
  technologies	
  –	
  the	
  project	
  
ini1ated	
  in	
  March	
  and	
  was	
  live	
  on	
  May	
  9	
  (<10	
  weeks	
  concept	
  to	
  realiza1on)	
  	
  
Technologies	
  Used	
  	
  
•  Hadoop,	
  Hive,	
  MapReduce,	
  MySql,	
  Infobright,	
  Linux,	
  REST	
  Web	
  Service,	
  Dotnetnuke	
  	
  
9	
  
Learning	
  experience	
  for	
  all	
  par1es,	
  successfully	
  demonstrated	
  plaZorm	
  abili1es	
  in	
  
produc1on	
  environment	
  –	
  but	
  we	
  would	
  NOT	
  do	
  it	
  this	
  way	
  again…	
  
Mainframe	
  Migra1on	
  
10	
  
Step 1
Source 1 Source 2
Step 2 Step 3 Step 4 Step 5
Source 3 Source 4
Output
As	
  our	
  experience	
  with	
  Hadoop	
  increased,	
  hypothesis	
  were	
  formed	
  that	
  the	
  
technology	
  could	
  aid	
  with	
  SHC’s	
  mainframe	
  migra1on	
  ini1a1ve.	
  
Example	
  above	
  represents	
  a	
  simply	
  mainframe	
  process	
  
Step 1
Source 1 Source 2
Step 2 Step 3 Step 4 Step 5
Source 3 Source 4
Output
Step 4 Step 5
X X
Migrated	
  sec1ons	
  of	
  mainframe	
  processing,	
  including	
  
data	
  transfer	
  to	
  Hadoop	
  and	
  back,	
  elimina1ng	
  MIPS	
  
and	
  IMPROVING	
  overall	
  cycle	
  1me	
  
ETL	
  Replacement	
  
•  A	
  major	
  ongoing	
  system	
  effort	
  in	
  our	
  Marke1ng	
  department	
  
was	
  heavily	
  reliant	
  on	
  DataStage	
  processing	
  for	
  ETL	
  	
  
–  In	
  the	
  early	
  stages	
  of	
  deployment	
  the	
  ETL	
  plaZorm	
  performed	
  within	
  
acceptable	
  limits	
  
–  As	
  volume	
  increased	
  the	
  system	
  began	
  to	
  have	
  performance	
  issues	
  as	
  
the	
  ETL	
  plaZorm	
  degraded	
  
–  With	
  full	
  rollout	
  imminent,	
  the	
  op1ons	
  were	
  to	
  heavily	
  invest	
  in	
  
addi1onal	
  hardware	
  –	
  or	
  –	
  re-­‐work	
  CPU-­‐intensive	
  por1ons	
  in	
  Hadoop	
  
11	
  
•  Experience	
  with	
  mainframe	
  migra1on	
  evolved	
  to	
  ETL	
  replacement	
  .	
  
•  SHC	
  successfully	
  demonstrated	
  reducing	
  load	
  on	
  costly	
  ETL	
  soAware	
  with	
  PiG	
  
scripts	
  (and	
  data	
  movement	
  from	
  /	
  to	
  ETL	
  plaZorm	
  as	
  an	
  intermediate	
  step).	
  
•  AND	
  with	
  improved	
  processing	
  1me…	
  
The	
  Journey	
  
•  From	
  Legacy	
  (>	
  1000	
  lines)	
  to	
  Ruby	
  /	
  MapReduce	
  (400	
  lines)	
  
–  Cryp1c	
  code,	
  difficult	
  to	
  support,	
  difficult	
  to	
  train	
  
	
  
•  We	
  tried	
  HIVE	
  (~400	
  lines	
  -­‐	
  Sql-­‐like	
  abstrac1on)	
  
–  Easy	
  to	
  use,	
  easy	
  to	
  experiment	
  and	
  test	
  with	
  
–  Poor	
  performance,	
  difficult	
  to	
  implement	
  business	
  logic	
  
	
  
•  We	
  evolved	
  to	
  PiG	
  with	
  Java	
  UDF	
  extensions	
  
–  Compressed,	
  very	
  efficient,	
  easy	
  to	
  code	
  /	
  read	
  (~200	
  lines)	
  
–  Demonstrated	
  success	
  in	
  transforming	
  mainframe	
  developers	
  to	
  PiG	
  developers	
  in	
  under	
  2	
  weeks	
  
	
  
•  As	
  we	
  progressed,	
  our	
  business	
  partners	
  requested	
  more	
  and	
  more	
  data	
  from	
  the	
  cluster	
  –	
  
which	
  required	
  developer	
  1me	
  
–  We	
  are	
  now	
  using	
  Datameer	
  as	
  a	
  business-­‐user	
  repor1ng	
  and	
  query	
  front-­‐end	
  to	
  the	
  cluster	
  
–  Developer	
  for	
  Hadoop,	
  runs	
  efficiently,	
  flexible	
  spreadsheet	
  interface	
  with	
  dashboards	
  
12	
  
We	
  are	
  in	
  a	
  much	
  different	
  place	
  now	
  than	
  when	
  we	
  started	
  our	
  Hadoop	
  journey.	
  
13	
  
The	
  Learning	
  HADOOP	
  
ü  We	
  can	
  drama1cally	
  reduce	
  batch	
  processing	
  1mes	
  for	
  mainframe	
  and	
  EDW	
  
ü  We	
  can	
  retain	
  and	
  analyze	
  data	
  at	
  a	
  much	
  more	
  granular	
  level,	
  with	
  longer	
  history	
  	
  
ü  Hadoop	
  must	
  be	
  part	
  of	
  an	
  overall	
  solu1on	
  and	
  eco-­‐system	
  
IMPLEMENTATION	
  
ü  We	
  can	
  reliably	
  meet	
  our	
  produc1on	
  deliverable	
  1me-­‐windows	
  by	
  using	
  Hadoop	
  
ü  We	
  can	
  largely	
  eliminate	
  the	
  use	
  of	
  tradi1onal	
  ETL	
  tools	
  
ü  New	
  Tools	
  allow	
  improved	
  user	
  experience	
  on	
  very	
  large	
  data	
  sets	
  
ü  We	
  developed	
  tools	
  and	
  skills	
  –	
  The	
  learning	
  curve	
  is	
  not	
  to	
  be	
  underes1mated	
  
ü  We	
  developed	
  experience	
  in	
  moving	
  workload	
  from	
  expensive,	
  proprietary	
  
mainframe	
  and	
  EDW	
  plaZorms	
  to	
  Hadoop	
  with	
  spectacular	
  results	
  
UNIQUE	
  VALUE	
  
Over	
  three	
  years	
  of	
  experience	
  using	
  Hadoop	
  for	
  enterprise	
  
legacy	
  workload.	
  	
  
Thank You!
For	
  further	
  informa1on	
  
email:	
  
visit:	
  
contact@metascale.com
www.metascale.com

More Related Content

What's hot

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Operationalizing Data Science Using Cloud Foundry
Operationalizing Data Science Using Cloud FoundryOperationalizing Data Science Using Cloud Foundry
Operationalizing Data Science Using Cloud Foundry
VMware Tanzu
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
Hortonworks
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
DataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 
YARN Federation
YARN Federation YARN Federation
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
DataWorks Summit
 

What's hot (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Operationalizing Data Science Using Cloud Foundry
Operationalizing Data Science Using Cloud FoundryOperationalizing Data Science Using Cloud Foundry
Operationalizing Data Science Using Cloud Foundry
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 

Viewers also liked

Kmart
KmartKmart
Sears Holdings Corp.
Sears Holdings Corp.Sears Holdings Corp.
Sears Holdings Corp.
msg14
 
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
SocialMedia.org
 
Best practices in outsourcing : The case of Sears Holdings
Best practices in outsourcing : The case of Sears HoldingsBest practices in outsourcing : The case of Sears Holdings
Best practices in outsourcing : The case of Sears Holdings
Alok Kumar
 
The 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and valueThe 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and value
DataWorks Summit
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
DataWorks Summit
 
Sears Holdings Corporation
Sears Holdings CorporationSears Holdings Corporation
Sears Holdings Corporation
Sam Hudson
 
Kmart pp2
Kmart pp2Kmart pp2
Kmart pp2
justin howard
 
Organization And Management Kmart
Organization And Management KmartOrganization And Management Kmart
Organization And Management Kmart
guest634b8da
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
DataWorks Summit
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
 
Strategy analysis Target vs. Kmart
Strategy analysis Target vs. KmartStrategy analysis Target vs. Kmart
Strategy analysis Target vs. Kmart
Dan Saguy
 
E-Business transformation-Sears Case Study
E-Business transformation-Sears Case StudyE-Business transformation-Sears Case Study
E-Business transformation-Sears Case Study
Danny D. Kosasih
 
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraBig Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
DataWorks Summit
 
Sears Hometown Store Overview
Sears Hometown Store OverviewSears Hometown Store Overview
Sears Hometown Store Overview
ctodd001
 
Sears Final Project
Sears Final ProjectSears Final Project
Sears Final Project
Hillary Paige Thompson
 
Case Study: Sears
Case Study: SearsCase Study: Sears
Case Study: Sears
Sword Ciboodle
 
Strategy recommendation for Sears
Strategy recommendation for SearsStrategy recommendation for Sears
Strategy recommendation for Sears
Dev Anumolu
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
Laurent Kinet
 

Viewers also liked (20)

Kmart
KmartKmart
Kmart
 
Sears Holdings Corp.
Sears Holdings Corp.Sears Holdings Corp.
Sears Holdings Corp.
 
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
BlogWell San Francisco Case Study: Sears Holdings Corporation, presented by J...
 
Best practices in outsourcing : The case of Sears Holdings
Best practices in outsourcing : The case of Sears HoldingsBest practices in outsourcing : The case of Sears Holdings
Best practices in outsourcing : The case of Sears Holdings
 
The 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and valueThe 3 T's - Using Hadoop to modernize with faster access to data and value
The 3 T's - Using Hadoop to modernize with faster access to data and value
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
 
Sears Holdings Corporation
Sears Holdings CorporationSears Holdings Corporation
Sears Holdings Corporation
 
Kmart pp2
Kmart pp2Kmart pp2
Kmart pp2
 
Organization And Management Kmart
Organization And Management KmartOrganization And Management Kmart
Organization And Management Kmart
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Strategy analysis Target vs. Kmart
Strategy analysis Target vs. KmartStrategy analysis Target vs. Kmart
Strategy analysis Target vs. Kmart
 
E-Business transformation-Sears Case Study
E-Business transformation-Sears Case StudyE-Business transformation-Sears Case Study
E-Business transformation-Sears Case Study
 
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data EraBig Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
Big Data 2.0: Hadoop as part of a Near-Real-Time Integrated Data Era
 
Sears Hometown Store Overview
Sears Hometown Store OverviewSears Hometown Store Overview
Sears Hometown Store Overview
 
Sears Final Project
Sears Final ProjectSears Final Project
Sears Final Project
 
Case Study: Sears
Case Study: SearsCase Study: Sears
Case Study: Sears
 
Strategy recommendation for Sears
Strategy recommendation for SearsStrategy recommendation for Sears
Strategy recommendation for Sears
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
 

Similar to Transforming Data Architecture Complexity at Sears - StampedeCon 2013

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
Zohar Elkayam
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
Ram Kedem
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
markgrover
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
Tom Rogers
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 

Similar to Transforming Data Architecture Complexity at Sears - StampedeCon 2013 (20)

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

Transforming Data Architecture Complexity at Sears - StampedeCon 2013

  • 1. 1   Transforming  Data  Architecture   Complexity  at  Sears   Jus:n  Sheppard   Sears  Holdings  Corpora1on  
  • 2. 2     •  Not  mee1ng  produc1on  schedules   •  Mul1ple  copies  of  data,  no  single  point  of  truth   •  ETL  complexity,  cost  of  soAware  and  cost  to  manage   •  Time  to  setup  ETL  data  sources  for  projects   •  Latency  in  data  (up  to  weeks  in  some  cases)   •  Enterprise  Data  Warehouses  unable  to  handle  load   •  Mainframe  workload  over  consuming  capacity   •  IT  Budgets  not  growing  –  BUT  data  volumes  escala1ng   Where  Did  We  Start?  
  • 3. What  Is  Hadoop?   3   Hadoop  Distributed   File  System  (HDFS)     File  Sharing  &  Data   Protec1on  Across   Physical  Servers   MapReduce     Fault  Tolerant   Distributed   Compu1ng  Across   Physical  Servers   Flexibility     o A  single  repository  for   storing  processing  &   analyzing  any  type  of  data   (structured  and  complex)   o Not  bound  by  a  single   schema   Scalability     o Scale-­‐out  architecture  divides   workloads  across  mul1ple   nodes   o Flexible  file  system  eliminates   ETL  boXlenecks   Low  Cost     o Can  be  deployed  on   commodity  hardware   o Open  source  plaZorm  guards   against  vendor  lock   Hadoop  is  a  plaZorm  for  data  storage   and  processing  that  is…   o  Scalable   o  Fault  tolerant   o  Open  source  
  • 4. 4   Hadoop   IS   •  Store  vast  amounts  of  data   •  Run  queries  on  huge  data   sets   •  Ask  ques1ons  previously   impossible   •  Archive  data  but  s1ll   analyze  it   •  Capture  data  streams  at   incredible  speeds   •  Massively  reduce  data   latency   •  Transform  your  thinking   about  ETL   Is  Not   •  High-­‐speed  SQL  database   •  Simple   •  Easily  connected  to  legacy   systems   •  A  replacement  for  your   current  data  warehouse   •  Going  to  be  built  or   operated  by  your  DBA's   •  Going  to  make  any  sense   to  your  data  architects   •  Going  to  be  possible  if  do   not  have  Linux  skills  
  • 5. 5   Use  The  Right  Tool  For  The  Right  Job   Databases:   Hadoop:   When to use? •  Affordable Storage/Compute •  High-performance queries on large data •  Complex data •  Resilient Auto Scalability When to use? •  Transactional, High Speed Analytics •  Interactive Reporting (<1sec) •  Multi-step Transactions •  Numerous Inserts/Updates/Deletes Can be combined
  • 6. Use  The  Right  Tool  For  The  Right  Job   6   Hadoop Database
  • 7. Data  Hub   7   •  Underlying  premise  as  Hadoop  adop1on  con1nues  –  source  data  once,  use  many.   •  Over  1me,  as  more  and  more  data  is  sourced,  development  1mes  will  reduce  since  data   sourcing  is  significantly  less  than  typical.  
  • 8. 8   Some  Examples   Use-­‐cases  at  Sears  Holdings  
  • 9. The  First  Usage  in  Produc1on   Use  Case     •  Interac1ve  presenta1on  layer  was  required  to  present  item/price/sales  data  in  a  highly  flexible  user   interface  with  rapid  response  1me     •  Needed  to  deliver  solu1on  within  a  very  short  period  of  1me.     •  Legacy  architecture  would  have  required  a  MicroStrategy  solu1on  u1lizing  1,000’s  of  cubes  on   many  expensive  servers     Approach     •  Rapid  development  project  ini1ated  to  present  item/price/sales  data  in  a  highly  flexible  user   interface  with  rapid  response  1me     •  Built  system  from  the  ground  up     •  Migrated  all  required  data  to  centralized  HDFS  repository  from  legacy  databases     •  Developed  MapReduce  code  to  process  daily  data  files  into  4  primary  data  tables     •  Tables  extracted  to  service  layer  (MySQL/Infobrite)  for  presenta1on  through  the  Pricing  Portal     Results     •  File  prepara1on  completes  in  minutes  each  day  and  ensures  portal  data  is  ready  very  soon  aAer   daily  sales  processing  completes  (100K  records  daily)     •  This  was  the  first  produc1on  usage  of  MapReduce  and  associated  technologies  –  the  project   ini1ated  in  March  and  was  live  on  May  9  (<10  weeks  concept  to  realiza1on)     Technologies  Used     •  Hadoop,  Hive,  MapReduce,  MySql,  Infobright,  Linux,  REST  Web  Service,  Dotnetnuke     9   Learning  experience  for  all  par1es,  successfully  demonstrated  plaZorm  abili1es  in   produc1on  environment  –  but  we  would  NOT  do  it  this  way  again…  
  • 10. Mainframe  Migra1on   10   Step 1 Source 1 Source 2 Step 2 Step 3 Step 4 Step 5 Source 3 Source 4 Output As  our  experience  with  Hadoop  increased,  hypothesis  were  formed  that  the   technology  could  aid  with  SHC’s  mainframe  migra1on  ini1a1ve.   Example  above  represents  a  simply  mainframe  process   Step 1 Source 1 Source 2 Step 2 Step 3 Step 4 Step 5 Source 3 Source 4 Output Step 4 Step 5 X X Migrated  sec1ons  of  mainframe  processing,  including   data  transfer  to  Hadoop  and  back,  elimina1ng  MIPS   and  IMPROVING  overall  cycle  1me  
  • 11. ETL  Replacement   •  A  major  ongoing  system  effort  in  our  Marke1ng  department   was  heavily  reliant  on  DataStage  processing  for  ETL     –  In  the  early  stages  of  deployment  the  ETL  plaZorm  performed  within   acceptable  limits   –  As  volume  increased  the  system  began  to  have  performance  issues  as   the  ETL  plaZorm  degraded   –  With  full  rollout  imminent,  the  op1ons  were  to  heavily  invest  in   addi1onal  hardware  –  or  –  re-­‐work  CPU-­‐intensive  por1ons  in  Hadoop   11   •  Experience  with  mainframe  migra1on  evolved  to  ETL  replacement  .   •  SHC  successfully  demonstrated  reducing  load  on  costly  ETL  soAware  with  PiG   scripts  (and  data  movement  from  /  to  ETL  plaZorm  as  an  intermediate  step).   •  AND  with  improved  processing  1me…  
  • 12. The  Journey   •  From  Legacy  (>  1000  lines)  to  Ruby  /  MapReduce  (400  lines)   –  Cryp1c  code,  difficult  to  support,  difficult  to  train     •  We  tried  HIVE  (~400  lines  -­‐  Sql-­‐like  abstrac1on)   –  Easy  to  use,  easy  to  experiment  and  test  with   –  Poor  performance,  difficult  to  implement  business  logic     •  We  evolved  to  PiG  with  Java  UDF  extensions   –  Compressed,  very  efficient,  easy  to  code  /  read  (~200  lines)   –  Demonstrated  success  in  transforming  mainframe  developers  to  PiG  developers  in  under  2  weeks     •  As  we  progressed,  our  business  partners  requested  more  and  more  data  from  the  cluster  –   which  required  developer  1me   –  We  are  now  using  Datameer  as  a  business-­‐user  repor1ng  and  query  front-­‐end  to  the  cluster   –  Developer  for  Hadoop,  runs  efficiently,  flexible  spreadsheet  interface  with  dashboards   12   We  are  in  a  much  different  place  now  than  when  we  started  our  Hadoop  journey.  
  • 13. 13   The  Learning  HADOOP   ü  We  can  drama1cally  reduce  batch  processing  1mes  for  mainframe  and  EDW   ü  We  can  retain  and  analyze  data  at  a  much  more  granular  level,  with  longer  history     ü  Hadoop  must  be  part  of  an  overall  solu1on  and  eco-­‐system   IMPLEMENTATION   ü  We  can  reliably  meet  our  produc1on  deliverable  1me-­‐windows  by  using  Hadoop   ü  We  can  largely  eliminate  the  use  of  tradi1onal  ETL  tools   ü  New  Tools  allow  improved  user  experience  on  very  large  data  sets   ü  We  developed  tools  and  skills  –  The  learning  curve  is  not  to  be  underes1mated   ü  We  developed  experience  in  moving  workload  from  expensive,  proprietary   mainframe  and  EDW  plaZorms  to  Hadoop  with  spectacular  results   UNIQUE  VALUE   Over  three  years  of  experience  using  Hadoop  for  enterprise   legacy  workload.    
  • 14. Thank You! For  further  informa1on   email:   visit:   contact@metascale.com www.metascale.com