Your SlideShare is downloading. ×
0
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney

5,691

Published on

This is the story of why and how Hadoop was integrated into the Disney data infrastructure. Providing data infrastructure for Disney’s, ABC’s and ESPN’s Internet presences is challenging. Doing so …

This is the story of why and how Hadoop was integrated into the Disney data infrastructure. Providing data infrastructure for Disney’s, ABC’s and ESPN’s Internet presences is challenging. Doing so requires cost effective, performant, scalable and highly available solutions. Information requirements from the business add the need for these solutions work together; providing consistent acquisition, storage and access to data. Burdened with a heavily laden commercial RDBMS infrastructure, Hadoop provided an opportunity to solve some challenging use cases at Disney. The deployment of Hadoop helped Disney to address growing costs, scalability, and data availability. In addition, it provids our businesses with new data driven business to consumer opportunities.

Published in: Art & Photos, Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,691
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Introductions: Who am I? I am…
  • High quality, engaging interactive experiences across console, online, mobile, and social network platforms to entertain and inform audiences around the globeNo 1-ranked community-family and parenting Web destinationsPlaydom has 47million users11 theme parks at five resorts in the United States, Europe and Asia; a top-rated family cruise line; a popular vacation-ownership program; and outstanding guided family tours to the world’s most exciting destinations38 billion in revenues company-wideAvg 10-12 billion page views a monthPeak of 42 billion ad calls in a month**(Private Information – numbers won’t be disclosed)** Peak Registered Users – Fantasy Football, NCAA Tournament Challenge, Dancing with the Stars
  • 1993 - Starwave 1995 - ESPN.com / ABCNews.com 1998 - Disney / InfoseekGo.comDIGWDIGDCATDTSS
  • DTSS provides services for the Disney Owned Brands Hosting for the Disney Owned brands Core Applications Customer Registration and Authentication (login) Terms of use and Opt-ins Survey’s & Sweepstakes Newsletters Content Management & Publishing Ad Serving Campaign Management Broadcast Email Data Services Operational Data Stores Data Warehouse Operational Reporting Platform Business Intelligence Platform
  • The infrastructure looks like this….<talk to slide contents>
  • High quality, engaging interactive experiences across console, online, mobile, and social network platforms to entertain and inform audiences around the globeNo 1-ranked community-family and parenting Web destinationsPlaydom has 47million users11 theme parks at five resorts in the United States, Europe and Asia; a top-rated family cruise line; a popular vacation-ownership program; and outstanding guided family tours to the world’s most exciting destinations38 billion in revenues company-wideAvg 10-12 billion page views a monthPeak of 42 billion ad calls in a month**(Private Information – numbers won’t be disclosed)** Peak Registered Users – Fantasy Football, NCAA Tournament Challenge, Dancing with the Stars
  • Would you beleive....It was a really good hire? It was. But one person will not build a No-SQL Platform alone.Partner with ClouderaTrainingConsultation on DesignsOperations SupportTrain StaffIn House ClassesEvangelize and Grow Adoption
  • Make a statement about each….……In total, these began to paint a picture that non-rdbms technologies were providing competitive advantage for very successful web companies. The question became, what we can learn from them? How can we apply that learning to Disney?Tokyo Cabinet – KV store, surpassed by Kyoto Cabinet
  • What did we do about it?Developed an infrastructure strategy.Tested that strategy, specifically, tested the technologies that went into the strategy.Hired Key Positions, finding the talent and skills that we did not possess in house.Partnered with Cloudera for support, training, and consulting on cluster setup, map-reduce design, end user programs like training and evangelism.Launched our DCloud effort of which Hadoop was a key technology componentHoned the data services tier of DCloud, wrapping that into the Data Management Platform. Before we get into each of these areas specifically, let’s talk about our infrastructure and why and where it fell short of our ideals.
  • The database centric data architecture looks like this.This infrastructure served us well for many years. But it was beginning to show its weaknesses, it was becoming all to clear that we needed to move beyond it. Before we go there, let’s talk about the Architecture itself.End user applications are built within the business units; we host but otherwise do not provide architecture over their databasesDatabases under our prevue start with the OLTP databases, supporting the core applicationsAn ODS Platform provides multi tenant infrastructure for operational data stored, for data moving in and out of the environment and for each application or each significant line of businessThe Data warehouse Tier includes a multi tenant database in the style of Kimball data warehouses. Data is integrated from each of the OLTP and ODS databases into this conformed dimension schemaBI, Reporting and light analytical tools exist for DTSS and Business Unit Staff to leverage
  • This is how and where Hadoop came into the Enterprise.(SAME) End user applications are built within the business units; we host but otherwise do not provide architecture over their databasesDatabases under our prevue start with the OLTP databases, supporting the core applications. This also includes a central logging service that applications can log messages or bulk-post files to.An ODS Platform provides multi tenant infrastructure for operational data stored, for data moving in and out of the environment and for each application or each significant line of business. This now includes Hadoop as an operational sync for data. Data may flow between the ODS’ and HadoopThe Data warehouse Tier includes a multi tenant database in the style of Kimball data warehouses. Data is integrated from each of the OLTP and ODS databases into this conformed dimension schema. This also includes a Hadoop location for data to be written when it has been crafted, or modified, from other data. Data may flow between Hadoop and the data warehouse.BI, Reporting and light analytical tools exist for DTSS and Business Unit Staff to leverage
  • This is the overall plan that Hadoop was a part of.
  • This is the overall plan that Hadoop was a part of.
  • Previously Cost Prohibitive – Hadoop enables parallel processing of vast quantities of data. The reality is that we could architect, design and build solutions to do the same. However, cost effective and cost prohibitive are the key phrases. We could not afford, nor get funding to store the quantities of data and do the types of processing on it that Hadoop enables.1.2.3.4.5.6.7.
  • Completed a financial estimate based on item by item parts list.Worked with our Business Operations and Finance Departments to complete a Net Present Value analysis.NPV is a standard method for using the time value of money to appraise long-term projects
  • We had developed a strong in-house database engineering and operations skill-set These people knew data They were not skilled in the programming paradigms or languages of Hadoop and No-SQLWe had a good Java engineering organization These people knew the right languages. In all but a few, they did not know data. In all but a few, they did not know the programming paradigms of Hadoop and No-SQLWe sought out one, critical hire – Arun Jacob A visionary architect with the programming skills to show how to do it – any of the it’s that are required
  • Some looked at it as purchasing services from Cloudera. I looked at it as partnering with Cloudera.We purchased Developer training, admin training and while Josh was on site, training on some specific processing routines that we required.We engaged in design consulting, HDFS physical layout, directory structures for optimal processing, and consulting on early map reduce designs .We ink’d a deal to receive operations support, 24x7x365 support, and for submitting bug fixes, for that “I can’t get this to work” times and the inevitable “remember when you told us not to….. Well, we did it anyway.”Then there were the collaboration points; the product advisory councils, both technical and executive groups.And the occasional rant – hey, that’s not how I want that to work. I want it to work like this… Cloudera has been there to listen to how we need to do it. Like any other company, they have to balance the needs of its various stakeholders. But it is clear that they are listening and acting on the feedback we provide.
  • The key challenges in the org: executive: data isolation — business units not sharing their dataengineering: rethinking data — processing data with latency goals in mind. Relaxing the restrictions around transactional integrity allows for increased scalability and simplicity of the solution. CAP theorem.community — honestly, the goal started with evangelizing technologies, I quickly realized that I realized that what we need to do is offer DaaS because not everyone should have to learn the different technology stacks in order to get value from the data. community should be focused on business value, not specific technologies. 
  • Isolation of technology from capability — providing data to the general consumer, providing the capability for developers to create, process, and manage that data, without having to directly couple to underlying technologies. Best of breed technologies — don't couple to a specific implementation. Allow for evolution. Providing RESTful APIs, data as JSON, over HTTP allows bindings in multiple languages and can take advantage of standard edge caching architectures. Centralizing the operations — these technologies require operational care and feeding — focus that care and feeding in one place in the org. Onboarding — registering for DaaS should be a self service operation. 
  • Completed a financial estimate based on item by item parts list.Worked with our Business Operations and Finance Departments to complete a Net Present Value analysis.NPV is a standard method for using the time value of money to appraise long-term projects
  • Innovation doesn’t just happen – if you are busy operating and sustaining a product or set of products, there is little to no time available for innovating. New features may be discovered, but true creation of something different is difficult to achieve. Specific time and resources must be dedicated to evolving if you want to evolve.Leadership may change – we started with strong Executive Sponsorship. Then one day we found ourselves leaderless. Having strong VP / Director leaders, we stayed the course. Not long after, we found ourselves with a new executive, a CIO and not a CTO; not incapable of learning this space, just that we had to take a step back and educate. Technology is not the Hard Part – its usually the people that are the hard part, and change is the hardest part for people.Plan, and Adjust your plan – you will miss something. Other things will change. The unexpected will take place.Fill the Gaps – if your gap is staff, fill it. If your gap is operations, outsource. If you gap is experience, find someone like Cloudera. Whatever your gap is, address it. In business you put your best foot forward. Those who don’t, don’t get work. But when it comes to delivering information, the lifeblood of your company, take a hard look at what isn’t working, take a look at what is weak, and address it.
  • Transcript

    • 1. Advancing Disney’s Internet DataInfrastructure with HadoopA Multi Year View of Hadoop at Disney Matt Estes Director Data Architecture The Walt Disney Co.
    • 2. Matt EstesDirector Data ArchitectureDisney Technology Solutions & Services Background • Music Performance, Theory & Composition • Management of Technology Employment •Washington State University • Campfire Boys & Girls • Disney • Database Operations • Platform Engineering • Data Architecture Industry Participation • Member of TDWI • Member of ODCA • Product Advisory Councils
    • 3. MotivationWhy Matt Estes is Here Talking to You Information can Computing is provide competitive undergoing dramatic advantage change I believe… Hadoop & related We learn by telling technologies can help our stories to each propel us forward other
    • 4. The Walt Disney CompanyUnparalleled Entertainment Experiences • Founded in 1923 • $38 billion total revenues 2010 ABC • 11 theme parks at five resorts ESPN Disney • Cruise Lines, Vacation Club & Adventures by Disney
    • 5. Evolution of an Internet Division Disney Technology Solutions & Services(1993) • Paul Allen funded Internet Startup Starwave • ESPN.com & ABCNews.com joint venture with Disney • Disney purchased Starwave, traded to Infoseek Disney, • Purchased Infoseek, transformed to portal : Go.com Infoseek, • Consolidation to WDIG, added games - DIMG Go.com, DIG, WDIG • Moving closer to the core, becomes Disney Connected and DCAT Advanced Technologies • Final move, integration into IT: Disney Technology DTSS Solutions and Services(2011)
    • 6. DTSS ServicesFoundation for Disney’s Digital Experiences ABC Data ESPN Disney Services Core Applications Hosting
    • 7. Existing InfrastructureUnderstanding our Evolution Requires a Look at... Environment & Requirements BU Properties Web • Multi Tenant Shared Services • Shared & Segmented Services • Shared & Segmented Data Core Infrastructure Stats • 5200 Server Images Data • 61% of servers virtualized Services • 1600 Databases
    • 8. Disney’s Internet BusinessThree Brands – Hundreds of Lines of Businesses • 10-12 billion page views per month • Peak: 42 billion ad calls in a month ABC • Peak Registered Users Occur ESPN Disney • Fantasy Football, • NCAA Tournament Challenge • Dancing with the Stars
    • 9. What’s the problem with this kind ofsuccess?Lots of Data Difficult to Manage & Monetize “In any given year, we probably generate more data than the Walt Disney Co. did in its first 80 years of existence,” observes Bud Albers, executive vice president and CTO of the Disney Technology Shared Services Group. “The challenge becomes what do you do with it all?”
    • 10. Meeting the Challenge What did we do about all this data? Looked for others with this same problem  Who is benefiting from their solution? What did we find?
    • 11. We Found What You FoundWhat can we learn from Google, Yahoo, et all • Google’s GFS and Big Table • 5000 node Hadoop Cluster at Facebook • Yahoo Search Webmap 10k node single cluster – Source: http://wiki.apache.org/hadoop/PoweredBy • HBASE • Cassandra • Voldemort • Tokyo Cabinet • Etc…
    • 12. Our Plan of ActionA Roadmap to Success Strategy1. Strategy • Design Next Gen Platform 1 • Test & Evangellize2. Leadership & Growth • Hire Key Positions People • Grow Staff • Partner with Experience 23. Execute • Hadoop the Technology Execution • Data Enabled Cloud • DaaS (DMP) 3
    • 13. RDBMS Enterprise ArchitectureStarting Point – Served Us Well Data Warehouse S E BI R T V O OLTP ODS I O C L E S S Transactional Operational Analytical Access
    • 14. Success and LimitationsPros/Cons of our RDBMS-based Data InfrastructureSuccess Limitations • Scaled to large web events • Scale up only, not out • Excellence at RDBMS’s • Scalability ceiling looming • Strongly typed schemas • Lack of flexibility • Known data • Growing costs: • Cross system integrated • Big Iron data • Commercial DB • Vendor support at a call Licensing • Limited to set-based • Substantial data movement • Network saturation
    • 15. Hadoop at Disney2009 – Hadoop as a Technology Component 1 Analytical S E BI R T V O OLTP ODS I O C Ingest L E S S Hadoop Present Transactional Operational Analytical Access
    • 16. Additional ContextPositioning our Infrastructure to the Market • Aggressive 2008 Virtualization • Built on YOY Success Strategy • Infrastructure Focused • Java Framework 2009 Service • Logging Extensions Framework • Hadoop as Technology • Self Service Portal 2010 Cloud • Java and PHP PaaS Platform • Hadoop Based Data Services
    • 17. Hadoop at Disney2010/2011 – Data Services to Enable Disney Cloud2 Analytical S E BI R T V O OLTP ODS I O C Disney Cloud Services Platform * L E S S Hadoop Data Services Transactional Operational Analytical Access * Hadoop not run on Disney Cloud Services
    • 18. Hadoop at Disney2011 – Data Management Platform (DaaS) 3 Analytical S E BI R T V O OLTP ODS I O C L E S S Data Management Platform Transactional Operational Analytical Access
    • 19. Enabling Business Value Cost Effective Solution to Previously Cost Prohibitive iPhone Push Notifications Ads Impression & Click Tracking Audience Analysis & Segmentation Recommendation Engine Clickstream / Web Analytics In-Park Traffic Flow Analysis Park Traffic Flow Analysis & Optimization
    • 20. Financial Estimates & NPV Analysis Is this open source software really cheaper? Hardware RDBMS Database Licensing SupportSolutions Lost Opportunity ? ? ? ? ? ? ? HardwareStandalone Support Training Hadoop Learning Curve Hardware No-SQL Support Training Platform Learning Curve
    • 21. The Lifeblood of the Company - PeopleHadoop and No-SQL Require a Different Way of Thinking Existing Staff •Know data / wrong language •Know languages / not data savvy •Lack of parallel data processing experience Future Staff • Know Data • Know languages • Know Open Source Stack • Parallel data processing experience
    • 22. Partner with ClouderaProvide the Experience That We Had Yet to BuildTraining Design Consulting Operations SupportDeveloper Central Logging 24x7 SupportAdministrator HDFS Bugs / fixes Directory Map Reduce Collaboration Product Advisory Councils (Technical & Executive)
    • 23. Disney StaffFind Experience & Enable Existing Staff Leadership - Arun Jacob Experience processing data at scale Understands getting value from data Vision plus practical delivery Existing Disney Staff Busy supporting current solutions Opportunities to engage in new thinking Opportunities to bring their skills to the table
    • 24. Changing the Data EngineTaking the Organization to a New Place Rethinking Data Data Isolation Strong Community
    • 25. Data Management PlatformProviding Big Data Capabilities DMP• Isolation - Technology / Capability• Best of Breed Technologies• Restful APIs• Centralizing the Operations• Self Service
    • 26. Data Management PlatformCapabilities Ingestion Transformation Access Storage Management
    • 27. Take-aways Innovation Doesn’t Just Happen Change Happens Technology is Not Hardest Part Meet People Half Way
    • 28. InteractiveDid this trigger any thoughts beyond – “whats for lunch?” Q&A
    • 29. THANK YOU!Please visit our websites: …and visit our resorts: ABC.com Disneyland ABCNews.com Walt Disney World Disney.com Disneyland Paris Family.com Disneyland Hong Kong ESPN.com Aulani Resort Go.com Shanghai Disney Tokyo Disney

    ×