• Save
Hadoop in the Enterprise: Legacy Rides the Elephant
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Hadoop in the Enterprise: Legacy Rides the Elephant

  • 3,963 views
Uploaded on

Dr. Phil Shelley

Dr. Phil Shelley
CTO Sears Holdings
Founder and CEO MetaScale

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • http://www.slideshare.net/Hadoop_Summit/hadoop-in-the-enterprise-legacy-rides-the-elephant-13587064



    It turns out you need the 13587064 at the end of the link which doesn't always load when you click through the Hadoop Summit page. Cut and Paste the link above and the slides will load.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,963
On Slideshare
3,797
From Embeds
166
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
1
Likes
7

Embeds 166

http://eventifier.co 102
http://eventifier.com 63
http://www.eventifier.co 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop in the Enterprise:Legacy Rides the Elephant Dr. Phil Shelley CTO Sears Holdings Founder and CEO MetaScale
  • 2. Hadoop haschanged theenterprisebig datagame.Are youlanguishingin the pastor adoptingoutdatedtrends? Legacy rides the elephant! Page 2
  • 3. Why Hadoop and Why Now?THE ADVANTAGES:Cost reductionAlleviate performance bottlenecksETL too expensive and complexMainframe and Data Warehouse processing à HadoopTHE CHALLENGE:Traditional enterprises lack of awarenessTHE SOLUTION:Leverage the growing support system for HadoopMake Hadoop the data hub in the EnterpriseUse Hadoop for processing batch and analytic jobs Page 3
  • 4. The Classic Enterprise Challenge Growing Data Volumes Shortened Tight IT Processing Budgets Windows Latency in The Escalating Data Challenge Costs Hitting ETL Scalability Complexity Ceilings Demanding Business Requirements Page 4
  • 5. The Sears Holdings Approach Key to our Approach: 1)  allowing users to continue to use familiar consumption interfaces 2)  providing inherent HA 3)  enabling businesses to unlock previously unusable data 1 2 3 4 5 6 Move results Retain, within Implement a Move Massively and Hadoop, Hadoop- enterprise Make Hadoop reduce ETL by aggregates source files at centric batch the single transforming back to legacy the finest reference processing to point of truth within Hadoop systems for granularity for architecture Hadoop consumption re-use Page 5
  • 6. The Architecture •  Enterprise solutions using Hadoop must be an eco-system •  Large companies have a complex environment: –  Transactional system –  Services –  EDW and Data marts –  Reporting tools and needs •  We needed to build an entire solution Page 6
  • 7. The Sears Holdings Architecture Page 7
  • 8. The Learning Over two years of Hadoop experience using Hadoop for Enterprise legacy workload. ü  We can dramatically reduce batch processing times for mainframe and EDWHADOOP ü  We can retain and analyze data at a much more granular level, with longer history ü  Hadoop must be part of an overall solution and eco-systemIMPLEMENTATION ü  We can reliably meet our production deliverable time-windows by using Hadoop ü  We can largely eliminate the use of traditional ETL tools ü  New Tools allow improved user experience on very large data setsUNIQUE VALUE ü  We developed tools and skills – The learning curve is not to be underestimated ü  We developed experience in moving workload from expensive, proprietary mainframe and EDW platforms to Hadoop with spectacular results Page 8
  • 9. Some ExamplesUse-Cases at Sears Holdings
  • 10. The Challenge – Use-Case #1 Sales: Price 8.9B Sync: Line Elasticity: Offers: Daily Items 12.6B 1.4B SKUs Parameters Items: Stores: Timing: 11.3M 3200 SKUs Inventory: Sites Weekly 1.8B rows •  Intensive computational and large storage requirements •  Needed to calculate item price elasticity based on 8 billion rows of sales data •  Could only be run quarterly and on subset of data – Needed more often •  Business need - React to market conditions and new product launches Page 10
  • 11. The Result – Use-Case #1Business Problem: Sales: Price 8.9B Sync: Line Elasticity: •  Intensive computational Offers: Daily Items 12.6B and large storage 1.4B SKUs Parameters requirements •  Needed to calculate Items: Stores: store-item price 11.3M 3200 Timing: elasticity based on 8 SKUs Inventory: Sites Weekly billion rows of sales 1.8B rows data •  Could only be run quarterly and on subset of data Hadoop •  Business missing the opportunity to react to changing market conditions and new product launches Price elasticity New business 100% of data calculated capability set and Meets all SLAs weekly enabled granularity Page 11
  • 12. The Challenge – Use-Case #2 Mainframe Data Scalability: Sources: Unable to Mainframe: 30+ Scale 100 100 MIPS Input fold on 1% of Records: data Billions Hadoop •  Mainframe batch business process would not scale •  Needed to process 100 times more detail to handle business critical functionality •  Business need required processing billions of records from 30 input data sources •  Complex business logic and financial calculations •  SLA for this cyclic process was 2 hours per run Page 12
  • 13. The Result – Use-Case #2 MainframeBusiness Problem: Data Scalability: Unable to Sources: Mainframe: 30+ Scale 100 100 MIPS •  Mainframe batch Input fold on 1% of business process would Records: data not scale Billions •  Needed to process 100 times more detail to handle rollout of high Hadoop value business critical functionality •  Time sensitive business need required processing billions of records from 30 input data sources Teradata & Implemented JAVA UDFs for Scalable Mainframe Data PIG for financial Solution in 8 •  Complex business logic on Hadoop Processing calculations Weeks and financial calculations •  SLA for this cyclic process was 2 hours per 6000 Lines Processing Met $600K Annual run Reduced to 400 Tighter SLA Savings Lines of PIG Page 13
  • 14. The Challenge – Use-Case #3 Data Storage: Mainframe DB2 Tables Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Mainframe unable to meet SLAs on growing data volume Page 14
  • 15. The Result – Use-Case #3Business Problem: Data Storage:Mainframe unable to meet Mainframe DB2 TablesSLAs on growing data volume Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Job Runs Over Maintenance Source Data in 100% faster – $100K in Annual Improvement – Hadoop Now in 1.5 Savings <50 Lines PIG hours code Page 15
  • 16. The Challenge – Use-Case #4 Teradata via Transformation: Business On Teradata User Objects Experience: Unacceptable Batch History Processing Retained: New Report Output: .CS No Development: V Files Slow Hadoop•  Needed to enhance user experience and ability to perform analytics at granular data•  Restricted availability of data due to space constraint•  Needed to retain granular data•  Needed Excel format interaction on data sources of 100 millions of records with agility Page 16
  • 17. The Result – Use-Case #4Business Problem: Teradata via Transformation: Business On Teradata User Objects Experience: •  Needed to enhance user Unacceptable experience and ability to Batch perform analytics at Processing History granular data Retained: New Report Output: .CS No Development: V Files Slow •  Restricted availability of data due to space constraint •  Needed to retain granular Hadoop data •  Needed Excel format interaction on data sources of 100 millions of records with agility User Sourcing Data Redundant Transformation Directly to Experience Storage Moved to Hadoop Expectations Eliminated Hadoop Met Over 50 Data Business’s Datameer for PIG Scripts to Sources Granular History Single Source Additional Ease Code Retained in Retained of Truth Analytics Maintenance Hadoop Page 17
  • 18. Summary•  Hadoop can handle Enterprise workload•  Can reduce strain on legacy platforms•  Can reduce cost•  Can bring new business opportunities•  Must be an eco-system•  Must be part of an data overall strategy•  Not to be underestimated Page 18
  • 19. The Horizon – What do we need next •  Automation tools and techniques that ease the Enterprise integration of Hadoop •  Educate traditional Enterprise IT organizations about the possibilities and reasons to deploy Hadoop •  Continue development of a reusable framework for legacy workload migration Page 19
  • 20. For more information, visit: www.metascale.com Follow us on Twitter @BigDataMadeEasy Join us on LinkedIn: www.linkedin.com/company/metascale-llc Page 20