This is the story of why and how Hadoop was integrated into the Disney data infrastructure. Providing data infrastructure for Disney’s, ABC’s and ESPN’s Internet presences is challenging. Doing so requires cost effective, performant, scalable and highly available solutions. Information requirements from the business add the need for these solutions work together; providing consistent acquisition, storage and access to data. Burdened with a heavily laden commercial RDBMS infrastructure, Hadoop provided an opportunity to solve some challenging use cases at Disney. The deployment of Hadoop helped Disney to address growing costs, scalability, and data availability. In addition, it provids our businesses with new data driven business to consumer opportunities.
FULL NIGHT — 9999894380 Call Girls In Kishangarh | Delhi
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt Estes, Disney
1. Advancing Disney’s Internet Data
Infrastructure with Hadoop
A Multi Year View of Hadoop at Disney
Matt Estes
Director Data Architecture
The Walt Disney Co.
2. Matt Estes
Director Data Architecture
Disney Technology Solutions & Services
Background
• Music Performance, Theory & Composition
• Management of Technology
Employment
•Washington State University
• Campfire Boys & Girls
• Disney
• Database Operations
• Platform Engineering
• Data Architecture
Industry Participation
• Member of TDWI
• Member of ODCA
• Product Advisory Councils
3. Motivation
Why Matt Estes is Here Talking to You
Information can Computing is
provide competitive undergoing dramatic
advantage change
I believe…
Hadoop & related We learn by telling
technologies can help our stories to each
propel us forward other
4. The Walt Disney Company
Unparalleled Entertainment Experiences
• Founded in 1923
• $38 billion total revenues 2010
ABC • 11 theme parks at five resorts
ESPN Disney
• Cruise Lines, Vacation Club &
Adventures by Disney
5. Evolution of an Internet Division
Disney Technology Solutions & Services
(1993) • Paul Allen funded Internet Startup
Starwave • ESPN.com & ABCNews.com joint venture with Disney
• Disney purchased Starwave, traded to Infoseek
Disney, • Purchased Infoseek, transformed to portal : Go.com
Infoseek, • Consolidation to WDIG, added games - DIMG
Go.com, DIG,
WDIG
• Moving closer to the core, becomes Disney Connected and
DCAT Advanced Technologies
• Final move, integration into IT: Disney Technology
DTSS Solutions and Services
(2011)
6. DTSS Services
Foundation for Disney’s Digital Experiences
ABC
Data ESPN Disney
Services
Core
Applications
Hosting
7. Existing Infrastructure
Understanding our Evolution Requires a Look at...
Environment & Requirements
BU Properties
Web
• Multi Tenant Shared
Services • Shared & Segmented Services
• Shared & Segmented Data
Core
Infrastructure
Stats
• 5200 Server Images
Data • 61% of servers virtualized
Services
• 1600 Databases
8. Disney’s Internet Business
Three Brands – Hundreds of Lines of Businesses
• 10-12 billion page views per month
• Peak: 42 billion ad calls in a month
ABC
• Peak Registered Users Occur
ESPN Disney • Fantasy Football,
• NCAA Tournament Challenge
• Dancing with the Stars
9. What’s the problem with this kind of
success?
Lots of Data
Difficult to Manage & Monetize
“In any given year, we probably generate more data
than the Walt Disney Co. did in its first 80 years of
existence,” observes Bud Albers, executive vice
president and CTO of the Disney Technology Shared
Services Group. “The challenge becomes what do
you do with it all?”
10. Meeting the Challenge
What did we do about all this data?
Looked for others with this same problem
Who is benefiting from their solution?
What did we find?
11. We Found What You Found
What can we learn from Google, Yahoo, et all
• Google’s GFS and Big Table
• 5000 node Hadoop Cluster at Facebook
• Yahoo Search Webmap 10k node single cluster
– Source: http://wiki.apache.org/hadoop/PoweredBy
• HBASE
• Cassandra
• Voldemort
• Tokyo Cabinet
• Etc…
12. Our Plan of Action
A Roadmap to Success
Strategy
1. Strategy
• Design Next Gen Platform 1
• Test & Evangellize
2. Leadership & Growth
• Hire Key Positions People
• Grow Staff
• Partner with Experience
2
3. Execute
• Hadoop the Technology Execution
• Data Enabled Cloud
• DaaS (DMP) 3
13. RDBMS Enterprise Architecture
Starting Point – Served Us Well
Data Warehouse
S
E BI
R T
V O
OLTP ODS
I O
C L
E S
S
Transactional Operational Analytical Access
14. Success and Limitations
Pros/Cons of our RDBMS-based Data Infrastructure
Success Limitations
• Scaled to large web events • Scale up only, not out
• Excellence at RDBMS’s • Scalability ceiling looming
• Strongly typed schemas • Lack of flexibility
• Known data • Growing costs:
• Cross system integrated • Big Iron
data • Commercial DB
• Vendor support at a call Licensing
• Limited to set-based
• Substantial data movement
• Network saturation
15. Hadoop at Disney
2009 – Hadoop as a Technology Component
1
Analytical
S
E BI
R T
V O
OLTP ODS
I O
C Ingest L
E S
S Hadoop
Present
Transactional Operational Analytical Access
16. Additional Context
Positioning our Infrastructure to the Market
• Aggressive
2008 Virtualization • Built on YOY Success
Strategy • Infrastructure Focused
• Java Framework
2009 Service • Logging Extensions
Framework • Hadoop as Technology
• Self Service Portal
2010 Cloud • Java and PHP PaaS
Platform • Hadoop Based Data
Services
17. Hadoop at Disney
2010/2011 – Data Services to Enable Disney Cloud
2
Analytical
S
E BI
R T
V O
OLTP ODS
I O
C Disney Cloud Services Platform * L
E S
S
Hadoop Data Services
Transactional Operational Analytical Access
* Hadoop not run on Disney Cloud Services
18. Hadoop at Disney
2011 – Data Management Platform (DaaS)
3
Analytical
S
E BI
R T
V O
OLTP ODS
I O
C L
E S
S Data Management Platform
Transactional Operational Analytical Access
19. Enabling Business Value
Cost Effective Solution to Previously Cost Prohibitive
iPhone Push Notifications
Ads Impression & Click Tracking
Audience Analysis & Segmentation
Recommendation Engine
Clickstream / Web Analytics
In-Park Traffic Flow Analysis
Park Traffic Flow Analysis & Optimization
20. Financial Estimates & NPV Analysis
Is this open source software really cheaper?
Hardware
RDBMS Database Licensing
Support
Solutions
Lost Opportunity ? ? ? ? ? ? ?
Hardware
Standalone Support
Training
Hadoop
Learning Curve
Hardware
No-SQL Support
Training
Platform
Learning Curve
21. The Lifeblood of the Company - People
Hadoop and No-SQL Require a Different Way of Thinking
Existing Staff
•Know data / wrong language
•Know languages / not data savvy
•Lack of parallel data processing
experience
Future Staff
• Know Data
• Know languages
• Know Open Source Stack
• Parallel data processing experience
22. Partner with Cloudera
Provide the Experience That We Had Yet to Build
Training Design Consulting Operations Support
Developer Central Logging 24x7 Support
Administrator HDFS Bugs / fixes
Directory
Map Reduce
Collaboration
Product Advisory Councils (Technical & Executive)
23. Disney Staff
Find Experience & Enable Existing Staff
Leadership - Arun Jacob
Experience processing data at scale
Understands getting value from data
Vision plus practical delivery
Existing Disney Staff
Busy supporting current solutions
Opportunities to engage in new thinking
Opportunities to bring their skills to the table
24. Changing the Data Engine
Taking the Organization to a New Place
Rethinking
Data
Data
Isolation
Strong
Community
25. Data Management Platform
Providing Big Data Capabilities
DMP
• Isolation - Technology / Capability
• Best of Breed Technologies
• Restful APIs
• Centralizing the Operations
• Self Service
29. THANK YOU!
Please visit our websites: …and visit our resorts:
ABC.com Disneyland
ABCNews.com Walt Disney World
Disney.com Disneyland Paris
Family.com Disneyland Hong Kong
ESPN.com Aulani Resort
Go.com Shanghai Disney
Tokyo Disney
Editor's Notes
Introductions: Who am I? I am…
High quality, engaging interactive experiences across console, online, mobile, and social network platforms to entertain and inform audiences around the globeNo 1-ranked community-family and parenting Web destinationsPlaydom has 47million users11 theme parks at five resorts in the United States, Europe and Asia; a top-rated family cruise line; a popular vacation-ownership program; and outstanding guided family tours to the world’s most exciting destinations38 billion in revenues company-wideAvg 10-12 billion page views a monthPeak of 42 billion ad calls in a month**(Private Information – numbers won’t be disclosed)** Peak Registered Users – Fantasy Football, NCAA Tournament Challenge, Dancing with the Stars
DTSS provides services for the Disney Owned Brands Hosting for the Disney Owned brands Core Applications Customer Registration and Authentication (login) Terms of use and Opt-ins Survey’s & Sweepstakes Newsletters Content Management & Publishing Ad Serving Campaign Management Broadcast Email Data Services Operational Data Stores Data Warehouse Operational Reporting Platform Business Intelligence Platform
The infrastructure looks like this….<talk to slide contents>
High quality, engaging interactive experiences across console, online, mobile, and social network platforms to entertain and inform audiences around the globeNo 1-ranked community-family and parenting Web destinationsPlaydom has 47million users11 theme parks at five resorts in the United States, Europe and Asia; a top-rated family cruise line; a popular vacation-ownership program; and outstanding guided family tours to the world’s most exciting destinations38 billion in revenues company-wideAvg 10-12 billion page views a monthPeak of 42 billion ad calls in a month**(Private Information – numbers won’t be disclosed)** Peak Registered Users – Fantasy Football, NCAA Tournament Challenge, Dancing with the Stars
Would you beleive....It was a really good hire? It was. But one person will not build a No-SQL Platform alone.Partner with ClouderaTrainingConsultation on DesignsOperations SupportTrain StaffIn House ClassesEvangelize and Grow Adoption
Make a statement about each….……In total, these began to paint a picture that non-rdbms technologies were providing competitive advantage for very successful web companies. The question became, what we can learn from them? How can we apply that learning to Disney?Tokyo Cabinet – KV store, surpassed by Kyoto Cabinet
What did we do about it?Developed an infrastructure strategy.Tested that strategy, specifically, tested the technologies that went into the strategy.Hired Key Positions, finding the talent and skills that we did not possess in house.Partnered with Cloudera for support, training, and consulting on cluster setup, map-reduce design, end user programs like training and evangelism.Launched our DCloud effort of which Hadoop was a key technology componentHoned the data services tier of DCloud, wrapping that into the Data Management Platform. Before we get into each of these areas specifically, let’s talk about our infrastructure and why and where it fell short of our ideals.
The database centric data architecture looks like this.This infrastructure served us well for many years. But it was beginning to show its weaknesses, it was becoming all to clear that we needed to move beyond it. Before we go there, let’s talk about the Architecture itself.End user applications are built within the business units; we host but otherwise do not provide architecture over their databasesDatabases under our prevue start with the OLTP databases, supporting the core applicationsAn ODS Platform provides multi tenant infrastructure for operational data stored, for data moving in and out of the environment and for each application or each significant line of businessThe Data warehouse Tier includes a multi tenant database in the style of Kimball data warehouses. Data is integrated from each of the OLTP and ODS databases into this conformed dimension schemaBI, Reporting and light analytical tools exist for DTSS and Business Unit Staff to leverage
This is how and where Hadoop came into the Enterprise.(SAME) End user applications are built within the business units; we host but otherwise do not provide architecture over their databasesDatabases under our prevue start with the OLTP databases, supporting the core applications. This also includes a central logging service that applications can log messages or bulk-post files to.An ODS Platform provides multi tenant infrastructure for operational data stored, for data moving in and out of the environment and for each application or each significant line of business. This now includes Hadoop as an operational sync for data. Data may flow between the ODS’ and HadoopThe Data warehouse Tier includes a multi tenant database in the style of Kimball data warehouses. Data is integrated from each of the OLTP and ODS databases into this conformed dimension schema. This also includes a Hadoop location for data to be written when it has been crafted, or modified, from other data. Data may flow between Hadoop and the data warehouse.BI, Reporting and light analytical tools exist for DTSS and Business Unit Staff to leverage
This is the overall plan that Hadoop was a part of.
This is the overall plan that Hadoop was a part of.
Previously Cost Prohibitive – Hadoop enables parallel processing of vast quantities of data. The reality is that we could architect, design and build solutions to do the same. However, cost effective and cost prohibitive are the key phrases. We could not afford, nor get funding to store the quantities of data and do the types of processing on it that Hadoop enables.1.2.3.4.5.6.7.
Completed a financial estimate based on item by item parts list.Worked with our Business Operations and Finance Departments to complete a Net Present Value analysis.NPV is a standard method for using the time value of money to appraise long-term projects
We had developed a strong in-house database engineering and operations skill-set These people knew data They were not skilled in the programming paradigms or languages of Hadoop and No-SQLWe had a good Java engineering organization These people knew the right languages. In all but a few, they did not know data. In all but a few, they did not know the programming paradigms of Hadoop and No-SQLWe sought out one, critical hire – Arun Jacob A visionary architect with the programming skills to show how to do it – any of the it’s that are required
Some looked at it as purchasing services from Cloudera. I looked at it as partnering with Cloudera.We purchased Developer training, admin training and while Josh was on site, training on some specific processing routines that we required.We engaged in design consulting, HDFS physical layout, directory structures for optimal processing, and consulting on early map reduce designs .We ink’d a deal to receive operations support, 24x7x365 support, and for submitting bug fixes, for that “I can’t get this to work” times and the inevitable “remember when you told us not to….. Well, we did it anyway.”Then there were the collaboration points; the product advisory councils, both technical and executive groups.And the occasional rant – hey, that’s not how I want that to work. I want it to work like this… Cloudera has been there to listen to how we need to do it. Like any other company, they have to balance the needs of its various stakeholders. But it is clear that they are listening and acting on the feedback we provide.
The key challenges in the org: executive: data isolation — business units not sharing their dataengineering: rethinking data — processing data with latency goals in mind. Relaxing the restrictions around transactional integrity allows for increased scalability and simplicity of the solution. CAP theorem.community — honestly, the goal started with evangelizing technologies, I quickly realized that I realized that what we need to do is offer DaaS because not everyone should have to learn the different technology stacks in order to get value from the data. community should be focused on business value, not specific technologies.
Isolation of technology from capability — providing data to the general consumer, providing the capability for developers to create, process, and manage that data, without having to directly couple to underlying technologies. Best of breed technologies — don't couple to a specific implementation. Allow for evolution. Providing RESTful APIs, data as JSON, over HTTP allows bindings in multiple languages and can take advantage of standard edge caching architectures. Centralizing the operations — these technologies require operational care and feeding — focus that care and feeding in one place in the org. Onboarding — registering for DaaS should be a self service operation.
Completed a financial estimate based on item by item parts list.Worked with our Business Operations and Finance Departments to complete a Net Present Value analysis.NPV is a standard method for using the time value of money to appraise long-term projects
Innovation doesn’t just happen – if you are busy operating and sustaining a product or set of products, there is little to no time available for innovating. New features may be discovered, but true creation of something different is difficult to achieve. Specific time and resources must be dedicated to evolving if you want to evolve.Leadership may change – we started with strong Executive Sponsorship. Then one day we found ourselves leaderless. Having strong VP / Director leaders, we stayed the course. Not long after, we found ourselves with a new executive, a CIO and not a CTO; not incapable of learning this space, just that we had to take a step back and educate. Technology is not the Hard Part – its usually the people that are the hard part, and change is the hardest part for people.Plan, and Adjust your plan – you will miss something. Other things will change. The unexpected will take place.Fill the Gaps – if your gap is staff, fill it. If your gap is operations, outsource. If you gap is experience, find someone like Cloudera. Whatever your gap is, address it. In business you put your best foot forward. Those who don’t, don’t get work. But when it comes to delivering information, the lifeblood of your company, take a hard look at what isn’t working, take a look at what is weak, and address it.