Your SlideShare is downloading. ×
Hadoop in the Enterprise: Legacy Rides the Elephant
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Hadoop in the Enterprise: Legacy Rides the Elephant

3,159
views

Published on

Dr. Phil Shelley …

Dr. Phil Shelley
CTO Sears Holdings
Founder and CEO MetaScale

Published in: Technology, Business

1 Comment
7 Likes
Statistics
Notes
  • http://www.slideshare.net/Hadoop_Summit/hadoop-in-the-enterprise-legacy-rides-the-elephant-13587064



    It turns out you need the 13587064 at the end of the link which doesn't always load when you click through the Hadoop Summit page. Cut and Paste the link above and the slides will load.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,159
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop in the Enterprise:Legacy Rides the Elephant Dr. Phil Shelley CTO Sears Holdings Founder and CEO MetaScale
  • 2. Hadoop haschanged theenterprisebig datagame.Are youlanguishingin the pastor adoptingoutdatedtrends? Legacy rides the elephant! Page 2
  • 3. Why Hadoop and Why Now?THE ADVANTAGES:Cost reductionAlleviate performance bottlenecksETL too expensive and complexMainframe and Data Warehouse processing à HadoopTHE CHALLENGE:Traditional enterprises lack of awarenessTHE SOLUTION:Leverage the growing support system for HadoopMake Hadoop the data hub in the EnterpriseUse Hadoop for processing batch and analytic jobs Page 3
  • 4. The Classic Enterprise Challenge Growing Data Volumes Shortened Tight IT Processing Budgets Windows Latency in The Escalating Data Challenge Costs Hitting ETL Scalability Complexity Ceilings Demanding Business Requirements Page 4
  • 5. The Sears Holdings Approach Key to our Approach: 1)  allowing users to continue to use familiar consumption interfaces 2)  providing inherent HA 3)  enabling businesses to unlock previously unusable data 1 2 3 4 5 6 Move results Retain, within Implement a Move Massively and Hadoop, Hadoop- enterprise Make Hadoop reduce ETL by aggregates source files at centric batch the single transforming back to legacy the finest reference processing to point of truth within Hadoop systems for granularity for architecture Hadoop consumption re-use Page 5
  • 6. The Architecture •  Enterprise solutions using Hadoop must be an eco-system •  Large companies have a complex environment: –  Transactional system –  Services –  EDW and Data marts –  Reporting tools and needs •  We needed to build an entire solution Page 6
  • 7. The Sears Holdings Architecture Page 7
  • 8. The Learning Over two years of Hadoop experience using Hadoop for Enterprise legacy workload. ü  We can dramatically reduce batch processing times for mainframe and EDWHADOOP ü  We can retain and analyze data at a much more granular level, with longer history ü  Hadoop must be part of an overall solution and eco-systemIMPLEMENTATION ü  We can reliably meet our production deliverable time-windows by using Hadoop ü  We can largely eliminate the use of traditional ETL tools ü  New Tools allow improved user experience on very large data setsUNIQUE VALUE ü  We developed tools and skills – The learning curve is not to be underestimated ü  We developed experience in moving workload from expensive, proprietary mainframe and EDW platforms to Hadoop with spectacular results Page 8
  • 9. Some ExamplesUse-Cases at Sears Holdings
  • 10. The Challenge – Use-Case #1 Sales: Price 8.9B Sync: Line Elasticity: Offers: Daily Items 12.6B 1.4B SKUs Parameters Items: Stores: Timing: 11.3M 3200 SKUs Inventory: Sites Weekly 1.8B rows •  Intensive computational and large storage requirements •  Needed to calculate item price elasticity based on 8 billion rows of sales data •  Could only be run quarterly and on subset of data – Needed more often •  Business need - React to market conditions and new product launches Page 10
  • 11. The Result – Use-Case #1Business Problem: Sales: Price 8.9B Sync: Line Elasticity: •  Intensive computational Offers: Daily Items 12.6B and large storage 1.4B SKUs Parameters requirements •  Needed to calculate Items: Stores: store-item price 11.3M 3200 Timing: elasticity based on 8 SKUs Inventory: Sites Weekly billion rows of sales 1.8B rows data •  Could only be run quarterly and on subset of data Hadoop •  Business missing the opportunity to react to changing market conditions and new product launches Price elasticity New business 100% of data calculated capability set and Meets all SLAs weekly enabled granularity Page 11
  • 12. The Challenge – Use-Case #2 Mainframe Data Scalability: Sources: Unable to Mainframe: 30+ Scale 100 100 MIPS Input fold on 1% of Records: data Billions Hadoop •  Mainframe batch business process would not scale •  Needed to process 100 times more detail to handle business critical functionality •  Business need required processing billions of records from 30 input data sources •  Complex business logic and financial calculations •  SLA for this cyclic process was 2 hours per run Page 12
  • 13. The Result – Use-Case #2 MainframeBusiness Problem: Data Scalability: Unable to Sources: Mainframe: 30+ Scale 100 100 MIPS •  Mainframe batch Input fold on 1% of business process would Records: data not scale Billions •  Needed to process 100 times more detail to handle rollout of high Hadoop value business critical functionality •  Time sensitive business need required processing billions of records from 30 input data sources Teradata & Implemented JAVA UDFs for Scalable Mainframe Data PIG for financial Solution in 8 •  Complex business logic on Hadoop Processing calculations Weeks and financial calculations •  SLA for this cyclic process was 2 hours per 6000 Lines Processing Met $600K Annual run Reduced to 400 Tighter SLA Savings Lines of PIG Page 13
  • 14. The Challenge – Use-Case #3 Data Storage: Mainframe DB2 Tables Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Mainframe unable to meet SLAs on growing data volume Page 14
  • 15. The Result – Use-Case #3Business Problem: Data Storage:Mainframe unable to meet Mainframe DB2 TablesSLAs on growing data volume Price Processing Data: Window: Mainframe 500M 3.5 Hours Jobs: 64 Records Hadoop Job Runs Over Maintenance Source Data in 100% faster – $100K in Annual Improvement – Hadoop Now in 1.5 Savings <50 Lines PIG hours code Page 15
  • 16. The Challenge – Use-Case #4 Teradata via Transformation: Business On Teradata User Objects Experience: Unacceptable Batch History Processing Retained: New Report Output: .CS No Development: V Files Slow Hadoop•  Needed to enhance user experience and ability to perform analytics at granular data•  Restricted availability of data due to space constraint•  Needed to retain granular data•  Needed Excel format interaction on data sources of 100 millions of records with agility Page 16
  • 17. The Result – Use-Case #4Business Problem: Teradata via Transformation: Business On Teradata User Objects Experience: •  Needed to enhance user Unacceptable experience and ability to Batch perform analytics at Processing History granular data Retained: New Report Output: .CS No Development: V Files Slow •  Restricted availability of data due to space constraint •  Needed to retain granular Hadoop data •  Needed Excel format interaction on data sources of 100 millions of records with agility User Sourcing Data Redundant Transformation Directly to Experience Storage Moved to Hadoop Expectations Eliminated Hadoop Met Over 50 Data Business’s Datameer for PIG Scripts to Sources Granular History Single Source Additional Ease Code Retained in Retained of Truth Analytics Maintenance Hadoop Page 17
  • 18. Summary•  Hadoop can handle Enterprise workload•  Can reduce strain on legacy platforms•  Can reduce cost•  Can bring new business opportunities•  Must be an eco-system•  Must be part of an data overall strategy•  Not to be underestimated Page 18
  • 19. The Horizon – What do we need next •  Automation tools and techniques that ease the Enterprise integration of Hadoop •  Educate traditional Enterprise IT organizations about the possibilities and reasons to deploy Hadoop •  Continue development of a reusable framework for legacy workload migration Page 19
  • 20. For more information, visit: www.metascale.com Follow us on Twitter @BigDataMadeEasy Join us on LinkedIn: www.linkedin.com/company/metascale-llc Page 20