eBay and Google operate some of the largest Internet sites on the planet, and each maintains its leadership through continuous innovation in infrastructure and products. While substantially different in their detailed approaches, both organizations sustain their feature velocity through a combination of organizational culture, process, and people. This session will explore how these large-scale sites do it, and will offer some concrete suggestions on how other organizations -- both large and small -- can do the same.
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYERandy Shoup
As a maker of real-time strategy games for web and mobile, KIXEYE's business depends on deep insights into how players play our games. By analyzing player behavior in a rich and flexible way, we are able to better target our efforts around user acquisition, game balance, player retention, and game monetization. By storing and analyzing data in standard ways, our data scientists are better able to take learnings from one game and apply them to another.
This presentation describes KIXEYE's newly-minted modern analytics infrastructure soup-to-nuts, from Kafka queues through Hadoop 2 to Hive and Redshift. It outlines our efforts around queryability, extensibility, scalability, standardization, and stability and outage recovery. It further shares our lessons learned in building, testing, operating, and enhancing this mission-critical piece of our infrastructure.
Service Architectures At Scale - QCon London 2015Randy Shoup
Over time, almost all large, well-known web sites have evolved their architectures from an early monolithic application to a loosely-coupled ecosystem of polyglot microservices. While first-order goals are almost always driven by the needs of scalability and velocity, this evolution also produces second-order effects on the organization as well. This session will discuss modern service architectures at scale, using specific examples from both Google and eBay.
It covers some interesting -- and perhaps nonintuitive -- lessons learned in building and operating these sites. It concludes with a number of experience-based recommendations for other smaller organizations evolving to -- and sustaining -- an effective service ecosystem.
Why Enterprises Are Embracing the CloudRandy Shoup
After being deeply involved in public cloud for the last several years, as both a provider and a consumer, I have been very pleasantly surprised at the rate at which large enterprises are rapidly moving to the cloud. For all the right reasons, even the most regulated and risk-averse of industries -- banking, for example -- are rapidly moving workloads out of their own owned data centers. Public cloud is not just for the "unicorns", but for the "horses" as well. This short vignette, presented at the GOTO Aarhus 2014 conference, tries to explain why this trend will continue and accelerate, and why we should be excited about it.
Concurrency at Scale: Evolution to Micro-ServicesRandy Shoup
Most large-scale web companies have evolved their system architecture from a monolithic application and monolithic database to a set of loosely coupled micro-services. Using examples from Google, eBay, and KIXEYE, this talk outlines the pros and cons of these different stages of evolution, and makes practical suggestions about when and how other organizations should consider migrating to micro-services. It concludes with some more advanced implications of a micro-services architecture, including SLAs, cost-allocation, and vendor-customer relationships within the organization.
DevOpsDays Silicon Valley 2014 - The Game of OperationsRandy Shoup
Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques his team has put into practice at KIXEYE: Cloud infrastructure, Service teams, and DevOps Culture. He talks about elastic workloads, micro-services, configuration automation, and a common service "chassis". He further discusses the organizational and technical disciplines of team autonomy, internal vendor-customer relationships, and, of course, "you build it, you run it"!
What if we designed our organizations like we design our systems? Applying scalability principles that we know from building large-scale distributed systems, as well as practical lessons learned at eBay and Google, this session covers how we can design and evolve our engineering organizations to scale.
The Importance of Culture: Building and Sustaining Effective Engineering Org...Randy Shoup
Randy is a 25-year veteran of Silicon Valley, having led engineering organizations at eBay, Google, Oracle, and a number of other companies. Through the lens of his personal experience from hands-on engineer to architect to CTO, at organizations ranging from tiny startups to global giants, Randy will discuss several important aspects of engineering cultures, which both support and hinder the ability to innovate: hiring and retention, ownership and collaboration, quality and discipline, and learning and experimentation.
Randy will suggest some learnings about what has worked well -- and what has not -- in creating and sustaining an effective engineering culture. He will further offer some concrete suggestions on how other organizations -- both large and small -- can evolve their cultures as well.
From the Monolith to Microservices - CraftConf 2015Randy Shoup
Most large-scale web companies have evolved their system architecture from a monolithic application and monolithic database to a set of loosely coupled microservices. Using examples from Google, eBay, and other large-scale sites, this talk outlines the pros and cons of these different stages of evolution, and makes practical suggestions about when and how other organizations should consider migrating to microservices. It continues with some more advanced implications of a microservices architecture, including SLAs, cost-allocation, and vendor-customer relationships within the organization. It concludes by exploring a set of common service anti-patterns.
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYERandy Shoup
As a maker of real-time strategy games for web and mobile, KIXEYE's business depends on deep insights into how players play our games. By analyzing player behavior in a rich and flexible way, we are able to better target our efforts around user acquisition, game balance, player retention, and game monetization. By storing and analyzing data in standard ways, our data scientists are better able to take learnings from one game and apply them to another.
This presentation describes KIXEYE's newly-minted modern analytics infrastructure soup-to-nuts, from Kafka queues through Hadoop 2 to Hive and Redshift. It outlines our efforts around queryability, extensibility, scalability, standardization, and stability and outage recovery. It further shares our lessons learned in building, testing, operating, and enhancing this mission-critical piece of our infrastructure.
Service Architectures At Scale - QCon London 2015Randy Shoup
Over time, almost all large, well-known web sites have evolved their architectures from an early monolithic application to a loosely-coupled ecosystem of polyglot microservices. While first-order goals are almost always driven by the needs of scalability and velocity, this evolution also produces second-order effects on the organization as well. This session will discuss modern service architectures at scale, using specific examples from both Google and eBay.
It covers some interesting -- and perhaps nonintuitive -- lessons learned in building and operating these sites. It concludes with a number of experience-based recommendations for other smaller organizations evolving to -- and sustaining -- an effective service ecosystem.
Why Enterprises Are Embracing the CloudRandy Shoup
After being deeply involved in public cloud for the last several years, as both a provider and a consumer, I have been very pleasantly surprised at the rate at which large enterprises are rapidly moving to the cloud. For all the right reasons, even the most regulated and risk-averse of industries -- banking, for example -- are rapidly moving workloads out of their own owned data centers. Public cloud is not just for the "unicorns", but for the "horses" as well. This short vignette, presented at the GOTO Aarhus 2014 conference, tries to explain why this trend will continue and accelerate, and why we should be excited about it.
Concurrency at Scale: Evolution to Micro-ServicesRandy Shoup
Most large-scale web companies have evolved their system architecture from a monolithic application and monolithic database to a set of loosely coupled micro-services. Using examples from Google, eBay, and KIXEYE, this talk outlines the pros and cons of these different stages of evolution, and makes practical suggestions about when and how other organizations should consider migrating to micro-services. It concludes with some more advanced implications of a micro-services architecture, including SLAs, cost-allocation, and vendor-customer relationships within the organization.
DevOpsDays Silicon Valley 2014 - The Game of OperationsRandy Shoup
Operating online games is fun and challenging. Games are some of the spikiest workloads around, and real-time really means *real-time*. Randy shares many of the DevOps techniques his team has put into practice at KIXEYE: Cloud infrastructure, Service teams, and DevOps Culture. He talks about elastic workloads, micro-services, configuration automation, and a common service "chassis". He further discusses the organizational and technical disciplines of team autonomy, internal vendor-customer relationships, and, of course, "you build it, you run it"!
What if we designed our organizations like we design our systems? Applying scalability principles that we know from building large-scale distributed systems, as well as practical lessons learned at eBay and Google, this session covers how we can design and evolve our engineering organizations to scale.
The Importance of Culture: Building and Sustaining Effective Engineering Org...Randy Shoup
Randy is a 25-year veteran of Silicon Valley, having led engineering organizations at eBay, Google, Oracle, and a number of other companies. Through the lens of his personal experience from hands-on engineer to architect to CTO, at organizations ranging from tiny startups to global giants, Randy will discuss several important aspects of engineering cultures, which both support and hinder the ability to innovate: hiring and retention, ownership and collaboration, quality and discipline, and learning and experimentation.
Randy will suggest some learnings about what has worked well -- and what has not -- in creating and sustaining an effective engineering culture. He will further offer some concrete suggestions on how other organizations -- both large and small -- can evolve their cultures as well.
From the Monolith to Microservices - CraftConf 2015Randy Shoup
Most large-scale web companies have evolved their system architecture from a monolithic application and monolithic database to a set of loosely coupled microservices. Using examples from Google, eBay, and other large-scale sites, this talk outlines the pros and cons of these different stages of evolution, and makes practical suggestions about when and how other organizations should consider migrating to microservices. It continues with some more advanced implications of a microservices architecture, including SLAs, cost-allocation, and vendor-customer relationships within the organization. It concludes by exploring a set of common service anti-patterns.
Eric Ries at Startup Lessons Learned sllconf 2011 - Japanese TranslationKenji Hiranabe
Japanese translation of Eric Ries Keynote at Startup Lessons Learned sllconf 2011 - Japanese Translation
http://www.slideshare.net/startuplessonslearned/eric-ries-sllconf-keynote-state-of-the-lean-startup-movement
Translated by Yuki Sekiguchi and Kenji Hiranabe
Throw away the map and let's go with the help of your compass.
Agile Tour Osaka 2012 ( http://bit.ly/Tm3MNc )発表資料です。若手エンジニアとサービス開発を通して考えてきた「なぜ?」。その探求の旅の紹介です。
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
Building distributed systems that work is hard. And scaling those systems by multiple orders of magnitude is even harder. Using examples from internet-scale consumer properties like Google, Amazon, and eBay, this talk deep-dives into the counterintuitive idea that the key to success in large-scale architecture is simplicity.
We first discuss simple components like modular services, orthogonal domain logic, and service layering. Next we discuss simple interactions between components, leveraging event-driven models, immutable logs, and asynchronous dataflow. Then we explore techniques that simplify making changes the system, including incremental changes, continuous testing, canary deployments, and feature flags.
In the final part of the talk, we show how all these ideas work together with specific architectural examples from Amazon, Netflix, and Walmart.
You will take away actionable insights you can immediately put into practice in your own systems.
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
The best response to a system outage is not "What did you do?", but "What did we learn?" This session will walk through three system-wide outages at Google, at Stitch Fix, and at WeWork—their incidents, aftermaths, and recoveries. In all cases, many things went right and a few went wrong; also in all cases, because of blameless cultures, we buckled down, learned a lot, and made substantial improvements in the systems for the future. Looking back with the perspective of 20-20 hindsight, all of these incidents were seminal events that changed the focus and trajectory of engineering at each organization. You will leave with a set of actionable suggestions in dealing with customers, engineering teams, and upper management. You will also enjoy a few war stories from the trenches.
More Related Content
Similar to QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fast at eBay and Google
Eric Ries at Startup Lessons Learned sllconf 2011 - Japanese TranslationKenji Hiranabe
Japanese translation of Eric Ries Keynote at Startup Lessons Learned sllconf 2011 - Japanese Translation
http://www.slideshare.net/startuplessonslearned/eric-ries-sllconf-keynote-state-of-the-lean-startup-movement
Translated by Yuki Sekiguchi and Kenji Hiranabe
Throw away the map and let's go with the help of your compass.
Agile Tour Osaka 2012 ( http://bit.ly/Tm3MNc )発表資料です。若手エンジニアとサービス開発を通して考えてきた「なぜ?」。その探求の旅の紹介です。
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
Building distributed systems that work is hard. And scaling those systems by multiple orders of magnitude is even harder. Using examples from internet-scale consumer properties like Google, Amazon, and eBay, this talk deep-dives into the counterintuitive idea that the key to success in large-scale architecture is simplicity.
We first discuss simple components like modular services, orthogonal domain logic, and service layering. Next we discuss simple interactions between components, leveraging event-driven models, immutable logs, and asynchronous dataflow. Then we explore techniques that simplify making changes the system, including incremental changes, continuous testing, canary deployments, and feature flags.
In the final part of the talk, we show how all these ideas work together with specific architectural examples from Amazon, Netflix, and Walmart.
You will take away actionable insights you can immediately put into practice in your own systems.
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
The best response to a system outage is not "What did you do?", but "What did we learn?" This session will walk through three system-wide outages at Google, at Stitch Fix, and at WeWork—their incidents, aftermaths, and recoveries. In all cases, many things went right and a few went wrong; also in all cases, because of blameless cultures, we buckled down, learned a lot, and made substantial improvements in the systems for the future. Looking back with the perspective of 20-20 hindsight, all of these incidents were seminal events that changed the focus and trajectory of engineering at each organization. You will leave with a set of actionable suggestions in dealing with customers, engineering teams, and upper management. You will also enjoy a few war stories from the trenches.
One Terrible Day at Google, and How It Made Us BetterRandy Shoup
In October 2012, Google App Engine had an 8-hour global outage. This session walks through the incident and the "Reliability Fixit" it inspired in its aftermath. Learn how the team came together, and over the next 6 months, reduced reliability issues by 10x. Also take away broader insights around engineering tradeoffs, managing an incident, and driving improvement.
Scaling Your Architecture for the Long TermRandy Shoup
This talk from the virtual 2020 CTO Summit (https://www.ctoconnection.com/summits) covers several architecture lessons to help you survive and thrive through the scaling phase of your company:
* Modular Architecture
* Event-Driven Communication
* Quality and Reliability
* Continuous Delivery
This presentation introduces the idea of a "Minimal Viable Architecture". As a company and product evolves, its architecture should evolve as well. We talk about the different phases of a product -- from the idea phase, to the starting phase, scaling phase, and optimizing phase. For each phase, we discuss the goals and constraints on the business, and we suggest an appropriate software architecture to match. Throughout the presentation, we use examples from eBay, Google, StitchFix, and others.
Machine learning has become an important tool in the modern software toolbox, and high-performing organizations are increasingly coming to rely on data science and machine learning as a core part of their business. eBay introduced machine learning to its commerce search ranking and drove double-digit increases in revenue. Stitch Fix built a multibillion dollar clothing retail business in the US by combining the best of machines with the best of humans. And WeWork is bringing machine-learned approaches to the physical office environment all around the world. In all cases, algorithmic techniques started simple and slowly became more sophisticated over time. This talk will use these examples to derive an agile approach to machine learning, and will explore that approach across several different dimensions. We will set the stage by outlining the kinds of problems that are most amenable to machine-learned approaches as well as describing some important prerequisites, including investments in data quality, a robust data pipeline, and experimental discipline. Next, we will choose the right (algorithmic) tool for the right job, and suggest how to incrementally evolve the algorithmic approaches we bring to bear. Most fancy cutting-edge recommender systems in the real world, for example, started out with simple rules-based techniques or basic regression. Finally, we will integrate machine learning into the broader product development process, and see how it can help us to accelerate business results
As the research in Accelerate and in the DevOps Handbook shows, high-performing organizations deliver more rapidly, more repeatably, and more reliably. And as an organization scales, it becomes more and more important to get the product development process right. Drawing on the speaker's experiences leading high-performing organizations at Google and eBay, this session discusses the upstream parts of that process, focusing on organization, problem definition, and prioritization. We will discuss forming small, cross-functional teams with clear areas of responsibility. Then we will discuss the importance of clearly defining the problem we are trying to solve as a team. Finally, we will cover focus and prioritization -- how we decide what to do when. You will take away actionable techniques you can apply in your own organization.
Breaking Codes, Designing Jets, and Building TeamsRandy Shoup
Throughout engineering history, focused and empowered teams have consistently achieved the near-impossible. Alan Turing, Tommy Flowers, and their teams at Bletchley Park broke Nazi codes, saved their country, and brought down the Third Reich. Kelly Johnson and the Lockheed Skunk Works designed and built the XP-80 in 143 days, and later produced the U-2, the SR-71, and the F-22. Xerox PARC invented Smalltalk, graphical user interfaces, Ethernet, and the laser printer. What can this history teach us? Well, basically everything.
Effective teams have a purpose - a clearly defined problem which the entire team focuses on and owns end-to-end. Effective teams have an organizational culture that prioritizes collaboration and learning. And most importantly, effective teams are made up of people from diverse backgrounds and experiences.
If this sounds a lot like DevOps, or true little-a agile, that's no coincidence. But too few organizations actually practice these three-quarter-century-old ideas despite the overwhelming evidence that they work. So let's relearn those history lessons.
Scaling Your Architecture with Services and EventsRandy Shoup
This session is a deep dive into the modern best practices around asynchronous decoupling, resilience, and scalability that allow us to implement a large-scale software system from the building blocks of events and services, based on the speaker's experiences implementing such systems at Google, eBay, and other high-performing technology organizations. We will outline the various options for handling event delivery and event ordering in a distributed system. We will cover data and persistence in an event-driven architecture. Finally, we will describe how to combine events, services, and so-called 'serverless' functions into a powerful overall architecture. You will leave with practical suggestions to help you accelerate your development velocity and drive business results.
Learning from Learnings: Anatomy of Three IncidentsRandy Shoup
The best response to a system outage is not "What did you do?", but "What did we learn?" This session will walk through three system-wide outages at Google, at Stitch Fix, and at WeWork—their incidents, aftermaths, and recoveries. In all cases, many things went right and a few went wrong; also in all cases, because of blameless cultures, we buckled down, learned a lot, and made substantial improvements in the systems for the future. Looking back with the perspective of 20-20 hindsight, all of these incidents were seminal events that changed the focus and trajectory of engineering at each organization. You will leave with a set of actionable suggestions in dealing with customers, engineering teams, and upper management. You will also enjoy a few war stories from the trenches.
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
The “right” architecture and organization depends on the size and scale of your company. The only constant is change, and what works for 5 engineers does not work for 5000. Based upon lessons from Google and eBay, learn how to evolve both technology and organization together successfully.
This presentation is based on many hard-won lessons by the speaker, who led large-scale engineering teams at Google and eBay, but also co-founded a tiny startup and tried (unsuccessfully) to apply the same techniques. This session hopes to help others from making the same mistakes by introducing the concept of “Minimal Viable Architecture”. It outlines the common architectural evolution of a company or project through the search, execution, and scaling phases, and discusses the appropriate technologies, disciplines, and organizational structures at each phase. You'll start with a monolith, and end up with microservices, and that's completely and entirely appropriate.
Managing Data at Scale - Microservices and EventsRandy Shoup
An ambitious attempt at BuildStuff España 2018 to cover, in 50 minutes:
* Migrating to Microservices
* Challenges of Data in Microservices (including shared data, joins, and transactions)
* Challenges of Event-Driven Systems (including event duplication and event ordering)
How do effective large-scale service ecosystems work? Keynote Presentation at Istanbul Tech Talks 2018
How to Design Services
* Systems of record
* Interface specification
* Interface backward / forward compatibility
Service Ecosystems
* Layered services
* "Standardization" through encouragement
* Vendor-customer relationships between teams
Operating and Deploying Services
* Data Migration
* Automated Pipelines
* Incremental Deployment
* Feature Flags
Monoliths, Migrations, and MicroservicesRandy Shoup
This talk describes several common challenges of software systems at scale:
* How to break up a monolithic application or a monolithic database into microservices.
* How to approach shared data, joins, and transactions in a microservices ecosystem
Evolving Architecture and Organization - Lessons from Google and eBayRandy Shoup
Keynote at DevOpsDays Cuba
Successful Internet companies are built on a foundation of excellent culture, efficient organization, and solid technology. As a company needs to scale, all of these parts of the foundation need to grow and scale with it. This session covers modern best practices at innovative companies in Silicon Valley for scaling culture, organization, and technology. Driven primarily by the presenter's experience ranging from small Valley startups to Google and eBay, it discusses:
* Organizing small, fast-moving engineering teams
* Building a scalable system out of smaller microservices
* Maintaining a culture of ownership and collaboration
* Developing effective engineering processes of continuous integration and continuous delivery
Faster is Better. High-performing organizations deploy both substantially faster and substantially more reliably, and thus are 2.5x more likely to achieve business goals. This keynote covers how to move fast at large scale:
* Organizing for Speed
* What to Build and What NOT to Build
* When to Build
* How to Build
* Delivering and Operating
Keynote at Reversim Summit 2017 in Tel Aviv, Israel.
DevOps is far more about culture and organization than it is about technology and tooling. This talk will discuss the speaker's experiences leading high-performing engineering teams at Google, eBay, and Stitch Fix, and will offer suggestions for other organizations to level up their DevOps game.
https://www.meetup.com/SV-ELC/events/240087808/
Modern software-service models take advantage of the great benefits in having the same team both build the software as well as operate it in production -- "You Build It; You Run It" is the Amazon mantra. What does this mean in practice?
Organizationally, it means small teams with well-defined areas of responsibility, directly aligned with the business. The teams are cross-functional, meaning that each team has all the skill sets it requires to do its job, while at the same time relying on other teams for supporting services, tools, and libraries.
Process-wise, it means doubling down on practices like test-driven development and continuous delivery. Using continuous delivery practices, high-performing teams can and do release their applications and services multiple times a day. This enables them to iterate rapidly, experiment courageously, and fail more quickly.
Culturally, it means end-to-end ownership. Each team owns its software end-to-end, from design to development to deployment to retirement. The same engineers who are responsible for the features are responsible for quality, performance, operations, and maintenance. This ownership puts incentives in the right place to encourage building maintainable, observable, and operable systems from the start.
All these techniques and approaches are available to everyone, and practical examples in this talk will help other organizations on their journey.
From the DevOps Enterprise Summit 2015, this presentation covers hard-won lessons of transitioning an engineering organization to DevOps. See video at https://www.youtube.com/watch?v=6tREbJl8e_Y.
Lessons:
1. Reorganize around Ownership
2. Lose the Ticket Culture
3. Replace Approvals with Code
4. Enforce a Service Mentality
5. Charge for Usage
6. Prioritize Quality
7. Start Investing in Testing
8. Actively Manage Technical Debt
9. Share On-Call Duties
10. Make Post-Mortems Truly Blameless
DevOps is no longer just for Internet unicorns any more. Today many large enterprises are transitioning from the slow and siloed traditional IT approach to modern DevOps practices, and getting substantial improvements in agility, velocity, scalability, and efficiency. But this transition is not without its challenges and pitfalls, and those of us who have led this journey have the scar tissue to prove it.
A successful transition to DevOps practices ultimately involves changes to organization, to culture, and to architecture. Organizationally, we want to create multi-skilled teams with end-to-end ownership and shared on-call responsibilities. Culturally, we want to prioritize solving problems and improving the product over closing tickets. Architecturally, we want to move to an infrastructure with independently testable and deployable components.
The ten practical lessons outlined in this session synthesize the speaker’s experiences leading teams at eBay, Google, and KIXEYE, as well as from his former consulting practice.
From a talk at the SF CTO Summit 2017 (https://www.ctoconnection.com/summits/sf2017), these slides cover the speaker's experience at Stitch Fix with managing data in a microservices environment. Areas include:
* Breaking up a monolithic database into services
* Using events as a first-class part of your architecture
* Sharing data among microservices
* Handling "joins" among microservices
* Simulating "transactions" among microservices using the Saga pattern
Effective Microservices In a Data-centric WorldRandy Shoup
From a talk at GOTOChicago 2017, these slides discuss the speaker's experiences at Stitch Fix with
* Organizational, Process, and Cultural prerequisites for being successful with Microservices: small teams, TDD / CD, DevOps
* How to handle shared data when your data is split among microservices
* How to handle "joins" across microservices
* How to simulate "transactions" across microservices
Slides link: https://gotochgo.com/3/sessions/79/slides
Video link: https://gotochgo.com/3/sessions/79/video
QCon Tokyo 2014 - Virtuous Cycles of Velocity: What I Learned About Going Fast at eBay and Google
1. Virtuous Cycles of Velocity
What I Learned About Going Fast at
eBay and Google
ベロシティ(速度)の好循環: 速く進むことの重要性
に関して、GoogleとeBayで学んだこと
Randy Shoup
@randyshoup
linkedin.com/in/randyshoup
2. Background 背景
CTO at KIXEYE
• Real-time strategy games for web and mobile
Webとモバイル用のリアルタイム戦略ゲーム
Director of Engineering for Google App Engine
• World’s largest Platform-as-a-Service
世界最大のプラットフォーム・アズア サービス
Chief Engineer at eBay
• Multiple generations of eBay’s real-time search
infrastructure
eBayのリアルタイムサーチ・インフラ数世代分
3. Why Are Organizations Slow?
なぜ組織はスローなのか
Organizational Culture 組織とカルチャー
Process プロセス
People ピープル
5. Organization: Quality over
Quantity 組織:量よりも質
Whole user / player experience
ユーザー/プレーヤの全体経験
• Think holistically about the full end-to-end experience of
the user
ユーザの最初から最後までの全経験を包括的に考えろ
• UX, functionality, performance, bugs, etc.
ユーザエクスピリエンス,機能、性能、バグ、その他
Less is more 小さいことはよいことだ
• Solve 100% of one problem rather than 50% of two
2つとも50%よりも、1つの問題を100%解くべし
• Users prefer one great feature instead of two partially-
completed features
ユーザは中途半端な2機能よりも、凄い機能1つの方がよい
6. Organization: Culture of Learning
組織:学習するカルチャー
Learn from mistakes and improve
失敗から学んで改善せよ
• What did you do -> What did you learn
何をしたか 何を学んだか
• Take emotion and personalization out of it
そこから感情や個人的な思いを掴まえろ
Encourage iteration and velocity
繰り返しとスピードを重視せよ
• “Failure is not falling down but refusing to get
back up” – Theodore Roosevelt
「失敗とは倒れること自体ではなく、起き上がるのを
拒むこと」 セオドア・ルーズベルト
7. Google Blame-Free Post-
Mortems グーグルの責めない振り返り
Post-mortem After Every Incident
全インシデント毎に振り返りを
• Document exactly what happened
何か起こったのかを正確に文書化
• What went right
正しく進めたこと
• What went wrong
誤ってしまったこと
Open and Honest Discussion オープンで正直な議論
• What contributed to the incident?
何がそのインシデントの原因だったのか
• Engineers will compete to take responsibility (!)
エンジニアは先を争って責任を取ろうとする
8. Google Blame-Free Post-
Mortems グーグルの責めない振り返り
Action Items アクション・アイテム
• How will we change process, technology,
documentation, etc.
どのようにしてプロセス、技術、文書化 等を変えるのか
• How could we have automated the problems away?
どのように自動化して問題を取り除けるのか
• How could we have diagnosed more quickly?
どのようにしてもっと早く問題を見つけられるのか
• How could we have restored service more quickly?
どのようにしてもっと早くサービスを回復できるのか
Follow up (!) それらをきちんとフォローせよ
10. Organization: Service Teams
組織: サービスチーム
• Small, focused teams
目標の明確な小チーム
• Single service or set of related services
1つまたは関連する少数のサービス
• Minimal, well-defined “interface”
最小限の明確に定義された「インターフェース」
• Clear “contract” between teams
チーム間の明解な「契約」
• Functionality 機能性
• Service levels and performance サービスレベルと性能
11. Google Services グーグルサービス
• All engineering groups organized into “services”
すべてのエンジニアリングチームは「サービス群」の単
位で組織化される
• Gmail, App Engine, Bigtable, etc.
• Self-sufficient and autonomous
自己充足的かつ自律的
• Layered on one another
階層化されている
Result: Very small teams achieve great things
結果 極小のチーム群が偉大な成果を成し遂げる
12. Organization: Ownership Culture
組織: 所有形態のカルチャー
• Give teams autonomy チームに自律性を
• Freedom to choose technology, methodology ,working
environment
技術、手法、ツール環境を選択する自由
• Responsibility for the results of those choices
これらの選択によって生じた結果に対する責任
• Hold them accountable for *results*
彼らに『結果』に関する説明責任を持たせなさい
• Give a team a goal, not a solution
各チームにはソリューションではなくゴールを与えなさい
• Let team own the best way to achieve the goal
そのゴールを達成するベストな方法は各チームに所有を
任せなさい
13. KIXEYE Service Chassis
KIXEYEのサービスの胴体(シャーシ)
• Goal: Produce a “chassis” for building scalable game services
ゴール: スケールするゲームサービス構築の胴体の製造
• Minimal resources, minimal direction
最小限のリソース、最小限の指示
• 3 people x 1 month
• Consider building on open source projects
オープンソースプロジェクトとして構築することを検討
• Results
• Exceeded expectations: chassis, transport, servcie template,
autoscaled deployment, etc.
期待を超える成果: シャーシ、トランスポート、サービス・テンプレート、自
動スケールするディプロイ
• 15 minutes from no code to running service in AWS (!)
コードのない状態から15分でAWS上にサービスを運用開始できる!
• Plan to open-source several parts of this work
この成果物の一部のオープンソース化を計画中
15. Organization: Collaboration
組織: コラボレーション
• One team across engineering, product,
operations, etc.
エンジニアリング、製品、運用、…を通して1チーム
• Solve problems instead of pointing fingers
問題を指摘するのではなく問題を解決する
16. Google Co-Location
グーグルのコロケーション
Multiple Organizations 複数組織
• Engineering エンジニアリング
• Product 製品
• Operations 運用
• Support サポート
• Different reporting structures to different VPs
異なる組織長への異なるリポートライン構造
Virtual Team with Single Goal 単一ゴールの仮想チーム
• All work to make Google App Engine successful
全員、Google App Engine 成功のために働く
• Coworkers are “Us”, not “Them”
小ワーカーはみな「私たち」であって、「彼ら」ではない
• Never occurred to us that other organizations were not “our team”
他の組織は「我がチーム」ではない、といった問題は決して起こらない
18. Process: Experimentation
プロセス: 実験
*Engineer* successes エンジニア の成功
• Constant iteration 定常的な反復
• Launch is only the first step
ローンチは最初の1歩に過ぎない
• A|B Testing needs to be a core competence AB
テストを中核的に実施する必要性
Many small experiments sum to big wins
多くの小さな実験の積み重ねが大きな成功に繋がる
19. eBay Machine-Learned Ranking
eBayの機械学習によるランキング
Ranking function for search results サーチ結果のランキング機能
• Which item should appear 1st, 10th, 100th, 1000th
1st, 10th, 100th, 1000th番目にどのアイテムを表示すべきか
• Before: small number of hand-tuned factors
以前: 手作業でチューニングした数個の因子
• Goal: Thousands of factors
目標: 何千もの因子
Experimentation Process 実験プロセス
• Predictive models: query->view, view->purchase, etc.
予測モデル: クエリ->ビュー、ビュー->購入 など
• Hundreds of parallel A|B tests 何百もの並列ABテスト
• Full year of steady, incremental improvements
1年間の徹底した段階的改善プロセスによる安定化
Result: 2% increase in eBay revenue (!) eBay収入2%向上
21. Process: Quality Discipline
プロセス: クオリティの原則
“Quality is a Priority-0 feature”
『クオリティは優先順位0の機能である』
Automated Tests help you go faster
自動テストは速く進むヘルプになる
• Tests have your back テストはあなたの背中を支えてくれる
• Confidence to break things, refactor mercilessly
既存物を壊して容赦なくリファクタリングする自信を与えてくれる
• Catch bugs earlier, fail faster
バグをより早くキャッチし、より速く失敗することを可能にする
Faster to run on solid ground than on quicksand
砂地よりしっかりとした地面の方が速く走れる
22. Process: Institutionalize Quality
プロセス: クオリティの制度化
Development Practices プラクティスの開発
• Code reviews コードレビュー
• Continuous Testing 継続的テスト
• Continuous Integration 継続的インテグレーション
Quality Automation クオリティのオートメーション
• Automated testing frameworks 自動テストフレームワーク
• Canary releases to production 製品の試験リリース
“Make it easy to do the right thing, and hard to do the wrong
thing”
『正しいことが簡単にできるようにせよ、そして誤ったことをす
るのをむずかしくせよ』
23. Google Engineering Discipline
グーグルのエンジニアリング原則
Solid Development Practices
安定した開発プラクティス
• Code reviews before submission サブミット前にコードレビューを
• Automated tests for everything すべてに自動化テストを
• Single logical source repository 単一の論理的ソースリポジトリ
Result: Internal Open Source Model
結果: 内部的なオープンソースモデル
• Not “here is a bug report” 「はい、バグレポート」ではない
• Instead “here is the bug; here are the code changes; here is the
test that verifies the changes”
そうではなく、「これはバグで、これはコード変更で、これはその変更を検証
するテストです」というモデル
24. Virtuous Cycle of Quality
クオリティの好循環
Engineering
Discipline
エンジニア
リング原則
Solid
Foundation
安定した
基礎
Faster and
Better
より速く
より良く
Results
結果
25. Process: Manage Technical
Debt プロセス: 技術的負債の管理
Make Explicit Tradeoffs 明示的トレードオフ
• Triangle: date vs. quality vs. features
トライアングル: 期限 vs 品質 vs 機能
• When you choose date and features, you implicitly choose a
level of quality
期限と機能を選んだとき、品質レベルも暗黙に選んだことになる
Manage Your Debt あなたの負債の管理
• Plan for how and when you will pay it off
いつどうやって負債を返すかを計画せよ
• Maintain a sustainable level of debt
持続可能な負債のレベルを維持せよ
“Don’t have time to do it right” ? 正しくやってる時間がない?
• WRONG – Don’t have time to do it twice (!)
間違い ― 間違えることで2度もやる時間を取ってしまう方が問
題!
26. Vicious Cycle of Technical Debt
技術的負債の悪循環
Technical
Debt
技術的負債
“No time to
do it right”
『正しく行う
時間ない』
Quick-and-
dirty
やっつけ仕事
27. Virtuous Cycle of Technical
Investment 技術的投資の好循環
Invest
投資
Solid
Foundation
安定した基礎
Faster and
Better
より速くより良く
Results
結果
29. People: Hire and Retain the
Best ピープル:良き人材を雇用し続ける
Hire ‘A’ Players A人材を雇用する
• In creative disciplines, top performers are 10x
more productive (!)
創造的分野の原則では、トップ人材の生産性は通常人の10
倍を超える
Confidence 信頼の伝搬
• A players bring A players A人材はA人材をもたらす
• B players bring C players B人材はC人材をもたらす
30. Google Hiring グーグルの採用
Goal: Only hire top talent トップタレントのみ雇用
• False negatives are OK; false positives are not
誤認識は(良いモノの排除)よしとする、偽陽性(悪いモノの受容)はダメ
Process プロセス
• Famously challenging interviews 有名な挑戦的インタビュー
• Very detailed interviewer feedback
非常に詳細なインタビュアーからのフィードバック
• Hiring committee decides whether to hire
採用委員会が雇用するか否かを判断
• Separately assign person to group
個別に個人をグループにアサインする
Results: Highly talented and engaged employees
結果: 才能を持った熱意のある従業員
31. People: Respect People
ピープル: 人を尊敬する
People are not interchangeable
人は交換可能でない
• Different skills, interests, capabilities
異なるスキル、関心、能力
• Create a Symphony, not a Factory
交響楽団を創るのであり、工場を作るわけではない
Most valuable and irreplaceable asset
もっとも価値があって置き換えできない資産
• Treat people with care and respect
人を気遣いと尊敬をもって遇すること
• If the company values its people, people will provide value to
the company
もし会社がその人の価値を評価するなら、人はその会社に価値をもたら
すだろう
32. eBay “Train Seats” 「列車席」
eBay’s development process (circa 2006) 開発プロセス
• Design and estimate project 設計と見積りプロジェクト
(“Train Seat” == 2 engineer-weeks) 列車席==2エンジニア週
• Assign engineers from common pool to implement tasks タスクの
実装に共通プールからエンジニアをアサイン
• Designer does not implement; implementers do not design 設計
者は実装せず、実装者は設計しない
Many Problems 多くの問題
• Engineers treated as interchangeable “cogs”
エンジニアは交換可能な歯車の歯として扱われている
• No regard for skill, interest, experience
スキル、関心、経験に対する敬意を払わない
• No pride of ownership in task implementation
タスクの実装におけるオーナーシップのプライドがない
• No long-term ownership of codebase
コードベースの長期的なオーナーシップがない
33. Vicious Cycle of People
人の問題の悪循環
Hire ‘B’ / ‘C’
players
BCクラスの
人材の採用
Mediocre
results
平凡な結果
People leave
人が離れて
いく
Need to hire
more
もっと採用
する必要
34. Virtuous Cycle of People
人の問題の好循環
Hire ‘A’
Players
Aクラス人材
の採用
Treat Well
大切に扱う
Keep and
Retain
ずっと残っ
てくれる
Results
よい成果