SlideShare a Scribd company logo
1 of 43
SEC302: Twitter's GCP
Architecture for Its Petabyte-
Scale Data Storage
in GCS and User Identity
Management
Vrushali Channapattan, Staff Engineer, Twitter
James Duke, Strategic Cloud Engineer, Google Cloud
● What is Partly Cloudy
● Architecture
● Project & Bucket Design
● User Identity Management
● DemiGod services
● Deployment
Outline
What is Partly Cloudy
A project to extend Data Processing at Twitter
from an on-premises only model
to a hybrid on-premises and Cloud model
Why Partly Cloudy
- A long term desire to have some cloud presence
- Right strategy will balance developer agility, capabilities, and
cost.
- Provides a convenient way to test Hadoop changes at scale
- A broader geographical footprint for locality and business continuity
- Access to other Google offerings such as BigQuery, CloudML, Cloud
DataFlow etc
Design principles
Authentication
Strong authentication
for all user and service
access to data
Authorization
Explicit authorization
for all user and service
access to data
Audit
Ability to easily
determine who
performed what
actions on the data
Before Partly Cloudy
Partly Cloudy
Partly Cloudy Resource Hierarchy
Organization and Project
structure
Partly Cloudy Resource Hierarchy
TWITTER Org
Partly Cloudy Resource Hierarchy
TWITTER Org
DATA INFRA
Folder
Partly Cloudy Resource Hierarchy
TWITTER Org
DATA INFRA
Folder
twitter-
product
twitter-revenue
twitter-infraeng GCP
Projects
What do these Projects contain?
twitter-project
Project
Dataset bucket
Cloud Storage
User Bucket
Cloud Storage
Google Cloud Storage
Connector for Hadoop
Google Cloud Storage
Connector for Hadoop
Nest
nest-compute@project-
name.iam.gserviceacc
ount.com
Name Nodes
nn-per-cluster-
compute@project-
name.iam.gserviceacc
ount.com
Worker Node(s)
wn-per-cluster-
compute@project-
name.iam.gserviceaccou
nt.com
Resource Manager
rm-per-cluster-
compute@project-
name.iam.gserviceacc
ount.com
Task
ViewFS filesystem layer
ViewFS filesystem layer
Shadow account based access
User account based access
User account based access
Scratch bucket
Cloud Storage
Scrubbed bucket
Cloud Storage
Partly Cloudy Resource Hierarchy
Storage in the Cloud
GCS
On-premises
path
/dc1/cluster1/user/
helen/some/path/par
t-001.lzo
Logical Cloud
path
/gcs/user/helen/
some/path/part-
001.lzo
GCS bucket path
gs://user.helen.dp.
twitter.domain/some
/path/part-001.lzo
Partly Cloudy Resource Hierarchy
User Management
Users & Accounts
Key Management
- A new key is generated every N days
- Each key is valid for 2N + N days
- Keys are distributed to compute nodes by Twitter’s key
distribution service
- The shadow account key is readable only by that user
- Key management & distribution is transparent to the user
Partly Cloudy Resource Hierarchy
DATA INFRA
twitter-[org]
twitter-
employee-users
What
do the
Data Processing
Users
at Twitter get
What
do the
Data Processing
Users
at Twitter get
❏ A shadow account to access GCS
❏ A GCS bucket for their data
❏ Access to a Twitter managed
Hadoop cluster in GCP
❏ Access to a Twitter managed
Presto cluster in GCP
❏ To work with us to leverage other
Google offerings (such as BigQuery,
Cloud DataProc & Cloud DataFlow)
Who configures
GCP for the
Data Processing Users
at Twitter
Who configures
GCP for the
Data Processing Users
at Twitter
DemiGod
Services
Partly Cloudy Resource Hierarchy
DemiGod Services
What are DemiGod services
Demigod is a group of service(s) that are responsible for
configuring GCP for Twitter’s Data Platform.
They run in GCP.
Salient features of DemiGods
- Run asynchronously of each other.
- Run with exactly-scoped, privileged google service accounts
- Idempotent runs
- Puppet-like functionality. Will override any manual changes
- Modular in design
- Each kept as simple as possible
Partly Cloudy
Types of DemiGods
Bucket Creation
❏ Creates buckets
❏ Twitter domain
❏ One DemiGod per pillar twitter
project
❏ Inputs
❏ Configurable prefixes
❏ LDAP input
❏ YAML input
Shadow Account
Management
❏ Creates shadow accounts
❏ Google service accounts
❏ One DemiGod
❏ Inputs
❏ Configurable pattern
❏ LDAP input
❏ YAML input
Policy
Management
❏ Creates IAM policies
❏ One DemiGod per pillar project
❏ Inputs
❏ LDAP input
❏ YAML input
❏ Google groups
❏ Ignore list
Key LifeCycle
Management
❏ Creates keys with expiration
❏ Manges lifecycle of every N
days
❏ Adds them to Key Store
❏ Inputs
❏ destination for keys
❏ LDAP input
❏ Shadow account
Partly Cloudy
Deployment of DemiGods
Deployment considerations
- Demigods will run on GCE with the VM running a demigod service
account
- Demigod service accounts will be created in partly-cloudy-admin
project that has limited ssh access
- Demigod processes will run as a kerberized headless twitter user
- Demigod Key Creation Service shall NOT write service account
keys to disk. It will store in memory until written to Secret Store.
Partly Cloudy
DemiGods execution flow
What happens when
a user joins
an ldap pillar group?
Demigod ⇔ twitter user interaction
What happens when
a user joins
an ldap pillar group?
Demigod ⇔ twitter user interaction
❏ A shadow account is created
❏ added to google group
❏ A GCS user bucket is created
❏ Scratch bucket
❏ Keys are generated
❏ Added to Secrets Store
❏ Keys are distributed thereby
enabling access to a Twitter
managed Hadoop cluster in GCP &
Presto cluster in GCP
What happens when
a new dataset is added?
Demigod ⇔ twitter dataset interaction
What happens when
a new dataset is added?
Demigod ⇔ twitter dataset interaction
❏ Dataset info is replicated to a YAML
file in a GCS config bucket
❏ A GCS dataset bucket is created
❏ Scratch, Scrubbed, Scratch-scrubbed
bucket also created
❏ Access privileges are granted
❏ Owner - read on orig dataset , r/w on
scratch & scrubbed
❏ Reader group: read on dataset, scrubbed
Thank you!
We are hiring
https://careers.twitter.com https://careers.google.com/cloud/
Your Feedback is Greatly Appreciated!
Complete the
session survey
in mobile app
1-5 star rating
system
Open field for
comments
Rate icon in
status bar
Appendix
Google Cloud Twitter
https://cloud.google.com/twitter/

More Related Content

What's hot

GCP Gaming 2016 Seoul, Korea Firebase
GCP Gaming 2016 Seoul, Korea FirebaseGCP Gaming 2016 Seoul, Korea Firebase
GCP Gaming 2016 Seoul, Korea FirebaseChris Jang
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...tdc-globalcode
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Simon Su
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingSimon Su
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Google compute engine - overview
Google compute engine - overviewGoogle compute engine - overview
Google compute engine - overviewCharles Fan
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud PlatformOpsta
 
Google Compute Engine Starter Guide
Google Compute Engine Starter GuideGoogle Compute Engine Starter Guide
Google Compute Engine Starter GuideSimon Su
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesChris Schalk
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolatsliwowicz
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data UnlimitedAudrey Huvet
 
Introduction to Google Cloud
Introduction to Google CloudIntroduction to Google Cloud
Introduction to Google CloudDSC IEM
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep DiveMichelle Holley
 
Anthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applicationsAnthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applicationsGreg Castle
 
Shakr - Container CI/CD with Google Cloud Platform
Shakr - Container CI/CD with Google Cloud PlatformShakr - Container CI/CD with Google Cloud Platform
Shakr - Container CI/CD with Google Cloud PlatformMinku Lee
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 

What's hot (20)

Gdsc muk - innocent
Gdsc   muk - innocentGdsc   muk - innocent
Gdsc muk - innocent
 
GCP Gaming 2016 Seoul, Korea Firebase
GCP Gaming 2016 Seoul, Korea FirebaseGCP Gaming 2016 Seoul, Korea Firebase
GCP Gaming 2016 Seoul, Korea Firebase
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Google compute engine - overview
Google compute engine - overviewGoogle compute engine - overview
Google compute engine - overview
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Google Compute Engine Starter Guide
Google Compute Engine Starter GuideGoogle Compute Engine Starter Guide
Google Compute Engine Starter Guide
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud Technologies
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited
 
L2 3.fa19
L2 3.fa19L2 3.fa19
L2 3.fa19
 
Introduction to Google Cloud
Introduction to Google CloudIntroduction to Google Cloud
Introduction to Google Cloud
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep Dive
 
Anthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applicationsAnthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applications
 
Shakr - Container CI/CD with Google Cloud Platform
Shakr - Container CI/CD with Google Cloud PlatformShakr - Container CI/CD with Google Cloud Platform
Shakr - Container CI/CD with Google Cloud Platform
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 

Similar to SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs and user identity management

Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud lohitvijayarenu
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)bigdata trunk
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
Challenges In Modern Application
Challenges In Modern ApplicationChallenges In Modern Application
Challenges In Modern ApplicationRahul Kumar Gupta
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCPKnoldus Inc.
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
 
Session 4 GCCP.pptx
Session 4 GCCP.pptxSession 4 GCCP.pptx
Session 4 GCCP.pptxDSCIITPatna
 
Informatica big data relational topics and presentation
Informatica big data relational topics and presentationInformatica big data relational topics and presentation
Informatica big data relational topics and presentationJanardhan Reddy
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud PlatformPradeep Bhadani
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
Serverless and Design Patterns In GCP
Serverless and Design Patterns In GCPServerless and Design Patterns In GCP
Serverless and Design Patterns In GCPOliver Fierro
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud PlatformGeneXus
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdfKnoldus Inc.
 
Developing Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesDeveloping Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesChakradhar Rao Jonagam
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 

Similar to SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs and user identity management (20)

Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
GCP-pde.pdf
GCP-pde.pdfGCP-pde.pdf
GCP-pde.pdf
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Challenges In Modern Application
Challenges In Modern ApplicationChallenges In Modern Application
Challenges In Modern Application
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCP
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
Session 4 GCCP.pptx
Session 4 GCCP.pptxSession 4 GCCP.pptx
Session 4 GCCP.pptx
 
Informatica big data relational topics and presentation
Informatica big data relational topics and presentationInformatica big data relational topics and presentation
Informatica big data relational topics and presentation
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
Amazon cloud service
Amazon cloud serviceAmazon cloud service
Amazon cloud service
 
Serverless and Design Patterns In GCP
Serverless and Design Patterns In GCPServerless and Design Patterns In GCP
Serverless and Design Patterns In GCP
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud Platform
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
Developing Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesDeveloping Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/Kubernetes
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs and user identity management

  • 1. SEC302: Twitter's GCP Architecture for Its Petabyte- Scale Data Storage in GCS and User Identity Management Vrushali Channapattan, Staff Engineer, Twitter James Duke, Strategic Cloud Engineer, Google Cloud
  • 2. ● What is Partly Cloudy ● Architecture ● Project & Bucket Design ● User Identity Management ● DemiGod services ● Deployment Outline
  • 3. What is Partly Cloudy A project to extend Data Processing at Twitter from an on-premises only model to a hybrid on-premises and Cloud model
  • 4. Why Partly Cloudy - A long term desire to have some cloud presence - Right strategy will balance developer agility, capabilities, and cost. - Provides a convenient way to test Hadoop changes at scale - A broader geographical footprint for locality and business continuity - Access to other Google offerings such as BigQuery, CloudML, Cloud DataFlow etc
  • 5. Design principles Authentication Strong authentication for all user and service access to data Authorization Explicit authorization for all user and service access to data Audit Ability to easily determine who performed what actions on the data
  • 8. Partly Cloudy Resource Hierarchy Organization and Project structure
  • 9. Partly Cloudy Resource Hierarchy TWITTER Org
  • 10. Partly Cloudy Resource Hierarchy TWITTER Org DATA INFRA Folder
  • 11. Partly Cloudy Resource Hierarchy TWITTER Org DATA INFRA Folder twitter- product twitter-revenue twitter-infraeng GCP Projects
  • 12. What do these Projects contain? twitter-project
  • 13. Project Dataset bucket Cloud Storage User Bucket Cloud Storage Google Cloud Storage Connector for Hadoop Google Cloud Storage Connector for Hadoop Nest nest-compute@project- name.iam.gserviceacc ount.com Name Nodes nn-per-cluster- compute@project- name.iam.gserviceacc ount.com Worker Node(s) wn-per-cluster- compute@project- name.iam.gserviceaccou nt.com Resource Manager rm-per-cluster- compute@project- name.iam.gserviceacc ount.com Task ViewFS filesystem layer ViewFS filesystem layer Shadow account based access User account based access User account based access Scratch bucket Cloud Storage Scrubbed bucket Cloud Storage
  • 14. Partly Cloudy Resource Hierarchy Storage in the Cloud
  • 16. Partly Cloudy Resource Hierarchy User Management
  • 18. Key Management - A new key is generated every N days - Each key is valid for 2N + N days - Keys are distributed to compute nodes by Twitter’s key distribution service - The shadow account key is readable only by that user - Key management & distribution is transparent to the user
  • 19. Partly Cloudy Resource Hierarchy DATA INFRA twitter-[org] twitter- employee-users
  • 21. What do the Data Processing Users at Twitter get ❏ A shadow account to access GCS ❏ A GCS bucket for their data ❏ Access to a Twitter managed Hadoop cluster in GCP ❏ Access to a Twitter managed Presto cluster in GCP ❏ To work with us to leverage other Google offerings (such as BigQuery, Cloud DataProc & Cloud DataFlow)
  • 22. Who configures GCP for the Data Processing Users at Twitter
  • 23. Who configures GCP for the Data Processing Users at Twitter DemiGod Services
  • 24. Partly Cloudy Resource Hierarchy DemiGod Services
  • 25. What are DemiGod services Demigod is a group of service(s) that are responsible for configuring GCP for Twitter’s Data Platform. They run in GCP.
  • 26. Salient features of DemiGods - Run asynchronously of each other. - Run with exactly-scoped, privileged google service accounts - Idempotent runs - Puppet-like functionality. Will override any manual changes - Modular in design - Each kept as simple as possible
  • 28. Bucket Creation ❏ Creates buckets ❏ Twitter domain ❏ One DemiGod per pillar twitter project ❏ Inputs ❏ Configurable prefixes ❏ LDAP input ❏ YAML input
  • 29. Shadow Account Management ❏ Creates shadow accounts ❏ Google service accounts ❏ One DemiGod ❏ Inputs ❏ Configurable pattern ❏ LDAP input ❏ YAML input
  • 30. Policy Management ❏ Creates IAM policies ❏ One DemiGod per pillar project ❏ Inputs ❏ LDAP input ❏ YAML input ❏ Google groups ❏ Ignore list
  • 31. Key LifeCycle Management ❏ Creates keys with expiration ❏ Manges lifecycle of every N days ❏ Adds them to Key Store ❏ Inputs ❏ destination for keys ❏ LDAP input ❏ Shadow account
  • 33.
  • 34. Deployment considerations - Demigods will run on GCE with the VM running a demigod service account - Demigod service accounts will be created in partly-cloudy-admin project that has limited ssh access - Demigod processes will run as a kerberized headless twitter user - Demigod Key Creation Service shall NOT write service account keys to disk. It will store in memory until written to Secret Store.
  • 36. What happens when a user joins an ldap pillar group? Demigod ⇔ twitter user interaction
  • 37. What happens when a user joins an ldap pillar group? Demigod ⇔ twitter user interaction ❏ A shadow account is created ❏ added to google group ❏ A GCS user bucket is created ❏ Scratch bucket ❏ Keys are generated ❏ Added to Secrets Store ❏ Keys are distributed thereby enabling access to a Twitter managed Hadoop cluster in GCP & Presto cluster in GCP
  • 38. What happens when a new dataset is added? Demigod ⇔ twitter dataset interaction
  • 39. What happens when a new dataset is added? Demigod ⇔ twitter dataset interaction ❏ Dataset info is replicated to a YAML file in a GCS config bucket ❏ A GCS dataset bucket is created ❏ Scratch, Scrubbed, Scratch-scrubbed bucket also created ❏ Access privileges are granted ❏ Owner - read on orig dataset , r/w on scratch & scrubbed ❏ Reader group: read on dataset, scrubbed
  • 40.
  • 41. Thank you! We are hiring https://careers.twitter.com https://careers.google.com/cloud/
  • 42. Your Feedback is Greatly Appreciated! Complete the session survey in mobile app 1-5 star rating system Open field for comments Rate icon in status bar