SlideShare a Scribd company logo
1 of 200
Download to read offline
GCP
Google Cloud
Professional Data
Engineer
© ANKIT MISTRY – GOOGLE CLOUD
Google Certified
Professional Data Engineer
© ANKIT MISTRY – GOOGLE CLOUD
Professional Data Engineer
© ANKIT MISTRY – GOOGLE CLOUD
 Pay attention for 5 minutes, before we dive in.
 Challenging certification, and course is long so have patience.
 Advance professional certification, Expect basics Associate level GCP knowledge
 Learn by Doing
 So with every exam objective, There is hand-on Lab – 80+
© ANKIT MISTRY – GOOGLE CLOUD
GCP certifications
https://cloud.google.com/certification/guides/data-engineer
Cloud Cost for this course
© ANKIT MISTRY – GOOGLE CLOUD
 $0 – for GCP account
 GCP Free trial
 $300 for next 3 months https://cloud.google.com/free
 Length: Two hours
 Registration fee: $200 (plus tax where applicable)
 Languages: English, Japanese
 Exam format: Multiple choice and multiple select,
Data Engineering
Concepts
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Data engineering Overview
© ANKIT MISTRY – GOOGLE CLOUD
 Data Pipelines
 How Data Flows
Ingestion Storage
Process and
analyze
Explore and
visualize
© ANKIT MISTRY – GOOGLE CLOUD
Ingestion Storage
Process
and analyze
Explore and
visualize
Ingest
© ANKIT MISTRY – GOOGLE CLOUD
 Gather Data from multiple sources
 Data gather from App
 Event Log, Click stream Data, e-
commerce Transaction
 Streaming Ingest
 PubSub
 Batch Ingest
 Different Transfer services
 GCS - gsutil
Store
© ANKIT MISTRY – GOOGLE CLOUD
 Cost efficient & durable data storage
https://cloud.google.com/architecture/data-lifecycle-cloud-platform#store
Process & Analyze
© ANKIT MISTRY – GOOGLE CLOUD
 What kind of outcome you want
 What analysis you want to perform
 Convert Data into meaning
 Analyze Data with BigQuery
 Apply ML with
 BigQuery ML
 Spark ML with DataProc
 Vertex API
 Build ML Model with Auto ML/Custom Model
https://cloud.google.com/architecture/data-lifecycle-cloud-platform
Explore and visualize
© ANKIT MISTRY – GOOGLE CLOUD
 Google Data Studio – Easy to use BI Engine
 Dashboard & Visualization
 Datalab
 Interactive Jupyter Notebook
 Support for all Data Science Library
 ML Prebuilt API
 Vision API
 Speech API
Types of Data - Structure
© ANKIT MISTRY – GOOGLE CLOUD
Structured
Semi-
Structured
Unstructured
Structured Data
© ANKIT MISTRY – GOOGLE CLOUD
 Tabular Data
 Represented by Rows & Columns
 SQL can be used to interact with data
 Fixed Schema
 Each row has same number of columns
 Relational Database are structured
 MySQL, Oracle SQL, PostgreSQL , MSSQL
 In GCP, Cloud SQL, Cloud Spanner
Book_id Book_name Author_id
100 C 1
101 Java 1
102 Python 2
Author_id Author_name
1 John
2 Alice
Semi-Structured Data
© ANKIT MISTRY – GOOGLE CLOUD
 Each Record has variable number of Properties
 No Fixed Schema
 Flexible structure
 NoSQL kind of Data
 Store data as key-value pair
 JSON – Java Script object Notation are base way to
represent semi structure data
 MongoDB, Cassandra, Redis, Neo4j
 In GCP, BigTable, DataStore, memoryStore
Doc #1{
“studentID” : 100,
“name” : “john”,
“score” : 78,
“country” : “US”
},
Doc #2{
“studentID” : 101,
“name” : “Alice”,
“rank” : 7,
},
UnStructured Data
© ANKIT MISTRY – GOOGLE CLOUD
 No Pre define Structure in Data
 Image
 video data,
 natural Language are example of unstructured data
 Google Cloud Storage, File store inside GCP to store Unstructure data
Batch Data vs Streaming Data
© ANKIT MISTRY – GOOGLE CLOUD
 Batch Data Processing
 Defined Start & End of data – data size is known
 Processing High volume of data after certain periodic interval
 Long time to process data
 Payment processing
 Streaming Data
 Unbounded, No End defined
 Data is processed as it arrives
 Size is unknown
 No much heavy processing – take millisecond - seconds to process data
 Stock data processing
GCP Fundamental
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
GCP Regions & Zones
© ANKIT MISTRY – GOOGLE CLOUD
Why Zones & Regions
© ANKIT MISTRY – GOOGLE CLOUD
Singapore
Zone-a
Singapore
Zone-b
US West
Zone-a
US West
Zone-b
 Low latency
 Follow Government rules
 High availability
 Disaster recovery
GCP (Zones & Region)
 Zones – Independent data Center
 Region – Geographical area
 Multi-region : Collection of Geographical
 Global - Anywhere
© ANKIT MISTRY – GOOGLE CLOUD
Global Locations - Regions & Zones | Google Cloud
Fascinating Number: Google Is Now 40% Of The Internet (forbes.com)
Global
Multi-regions
Regions-1
Zone-a
Zone-b
Zone-c
Regions-2
Zone-a
Zone-b
Zone-c
Create GCP Free Tier
Account
© ANKIT MISTRY – GOOGLE CLOUD
GCP Services
© ANKIT MISTRY – GOOGLE CLOUD
Storage &
Database
• Cloud Storage, File Store
• SQL, Spanner, Big Query, Big table
Data
processing
• Data Proc, Data Flow
• Data Prep
• Data Catalog
• Big Query, PubSub
• Composer, Data Fusion
ML/AI
• Vertex AI
• Prebuilt API
• Custom Model
• Auto ML
Secondary
Services
• Compute Engine
• Kubernetes
• App Engine – Cloud Run
GCP Basic Services
© ANKIT MISTRY – GOOGLE CLOUD
Basic Infrastructure services
in GCP
© ANKIT MISTRY – GOOGLE CLOUD
IAM
Compute
Engine
Kubernetes App
Engine
Cloud
Function
VPC
Cloud
Run
IAM
© ANKIT MISTRY – GOOGLE CLOUD
 Identity & access management
 Who can do What on Which resources
 Who - Identity
 What - Action : Create, Update, Delete
 Which – Resources, Compute Engine, App Engine, Cloud Storage
 Roles : Collections of Permissions
 Built-in Roles
 Custom Role
 Service Account
IAM - Identity
© ANKIT MISTRY – GOOGLE CLOUD
Identity
Google Account Google Groups
Cloud identity
Account
Google
Workspace
Account
Service
Account
IAM – Roles & Permission
© ANKIT MISTRY – GOOGLE CLOUD
Roles
Primitive
Owner
Editor
Viewer
Pre-defined
Compute
Admin
Storage
Viewer
Custom
Combined
Collection of
Permission
 Roles are collection of permissions
 One can assign Role to identity, but Can
not assign permission directly.
Assign Roles to identity
© ANKIT MISTRY – GOOGLE CLOUD
Service Account
© ANKIT MISTRY – GOOGLE CLOUD
 For non human – like for Apps, services
 Service Account is identity for Compute engine
 Service account keys can be used for authentication
 Max 10 keys per Service Account
 Max 100 Service Account per project
 Let’s Explore Service Account
Provision Virtual machine
© ANKIT MISTRY – GOOGLE CLOUD
 Basic Building block of any Cloud
 Compute Engine
 IAAS – Full Control, more flexibility, more responsibility
 Important parameter :
 Zone
 Service Account
 Machine family – CPU, RAM
 Boot Disk
 Storage
 Virtual Private Cloud
App Engine Deployment
© ANKIT MISTRY – GOOGLE CLOUD
 PAAS Solution
 No Server management
 Deploy HTTP based application
 Focus on Code
 Standard & Flexible mode
 Support many runtime engine
 Python, java, Go, Node JS
 Let’s Deploy Hello World NodeJS App
GKE – kubernetes Engine
© ANKIT MISTRY – GOOGLE CLOUD
 Let’s say
 you want to create 100’s of container to scale your app
 need some automate approach which fully manage all container
lifecycle
 Kubernetes is the solution for it
 Open source
 Orchestration system for containerized application.
 Open Source – Developed by Google , Launched in 2014 –
Kubernetes
 In 2015, Google Launched cloud version - GKE
 Cloud Agnostics
 Written in Go language
 Let’s Deploy image to GKE
1. Create Cluster
2. Deploy Workload (Container image)
3. Expose Outside World
Google Kubernetes Engine
© ANKIT MISTRY – GOOGLE CLOUD
 Orchestration system for containerized application.
 Open Source – Developed by Google
 Launched in 2014 – Kubernetes
 In 2015, Google Launched cloud version - GKE
 Cloud Agnostics
 Written in Go language
 Let’s deploy App on Kubernetes
Deploy Containerized app on
Google Kubernetes Engine -
GKE (CAAS)
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Run
© ANKIT MISTRY – GOOGLE CLOUD
 Next Generation App Engine
 Deploy Containerized Workload
 No Server management
 Auto-scaling as Traffic grows
 Let’s See in action
Google Cloud Function
© ANKIT MISTRY – GOOGLE CLOUD
 Server less
 Fully managed
 Build small micro service
 Auto scaling as traffic increase
 Event based trigger
 Http
 File upload etc.
 Message pushed to pub/sub
 Let’s See in action
Different storage product
(GCP)
© ANKIT MISTRY – GOOGLE CLOUD
© ANKIT MISTRY – GOOGLE CLOUD
Structured
Transactional
Cloud SQL
Cloud Spanner
Analytical
Big Query
Semi-
Structured
Big Table
(Row Key)
Data store
Fire store
Different storage product
© ANKIT MISTRY – GOOGLE CLOUD
Unstructured
Cloud Storage
Block Storage
Persistence Disk Local SSD
In-memory DB
Memory Store
Different storage product
Google Cloud Storage
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Storage
© ANKIT MISTRY – GOOGLE CLOUD
 Object storage solution in GCP
 Unstructured Data storage
 Image
 Video
 Binary File, etc…
 Cloud storage can be used for long term archival storage
 Can be access object over http, Rest API
 No capacity planning required
 Scale to Exabyte
 Unlimited data can be stored
 By Default Data is encrypted at rest
 In transit also by default encryption.
Google Cloud Storage
© ANKIT MISTRY – GOOGLE CLOUD
 No minimum Object Size
 Max object size is 5 TB
 High Durability – 99.999999999% annual
 Object can be Globally access
 Single API to access across multiple storage class
 Data is geo - redundant
 Due to Multiregional
 Dual-Region storage
Object Organization
© ANKIT MISTRY – GOOGLE CLOUD
Cloud
Storage
Buckets
Folders
Object
 Global unique name for bucket
 Example access URL :
 https://storage.cloud.google.com/[bucket]/[objectname]
 Bucket name appear in URL
 So, be careful while naming bucket
 Does not store anything like file system
 Folder are virtual
 Bucket level lock with data retention policy
 Object are immutable
 Object can be versioned
Storage Location
© ANKIT MISTRY – GOOGLE CLOUD
Region
• Lowest latency within a
single region
• Replicated data across
multiple zone in single
region
Dual-region
• High availability and low
latency across 2 regions
(Paired region)
• Auto-failover
Multi-region
• Highest availability across
continent area – US, EU,
Asia
• Auto-failover
Storage Class
© ANKIT MISTRY – GOOGLE CLOUD
Standard
• Good for Hot data
• High frequency access
• Storage Costliest
• Access cost is very low
• Low latency
• SLA :
• 99.95% Multi/Dual
• 99.9% Regional
Near line
• Low Frequency access
Once in a 30 days
• Storage is Cheaper
than standard
• Access cost will
increase
• Back up
• SLA : 99.9% Multi/Dual
• 99.0% for Regional
Cold line
• Very low frequency to
access
• Once in 90 days
• Storage is Cheaper
than Near line
• SLA :
• 99.9% Multi/Dual
• 99.0% for Regional
Archive
• Offline data
• Backup
• Data access is once in
year
• Storage Cheapest
• Access cost very high
• No SLA
 How frequently access data
 How much amount of data
[Hands-on]
Google Cloud Storage
© ANKIT MISTRY – GOOGLE CLOUD
Object Lifecycle management
© ANKIT MISTRY – GOOGLE CLOUD
 Based on condition what action needs to perform on object.
 Condition
 Object age
 Object file type
 after some specific date
 Action
 Transition to different storage class for high performance
 Like – Standard to Nearline
 Coldline to Delete
Secure Data with Encryption
© ANKIT MISTRY – GOOGLE CLOUD
 Encryption
 Google managed Encryption keys
 No Configuration
 Fully managed
 Customer managed Encryption keys
 Create keyring in Cloud KMS
 key will be managed by customer. Like Key rotation
 Customer supplied Encryption keys
 We will generate Key with : openssl rand =base64 32
 gsutil – encrypt with CSEK
Object Versioning
© ANKIT MISTRY – GOOGLE CLOUD
 Help to prevent accidental deletion of object
 Enable/Disable versioning at bucket level
 Get access to older version with (object key + version number)
 If you don’t need earlier version, delete it & reduce storage cost
 If you don’t specify version number, always retrieve latest version
 Let’s see in action
Controlling access
© ANKIT MISTRY – GOOGLE CLOUD
 Who can do what on GCS at what level
 Permissions
 Apply at Bucket level
 Uniform level access
 No Object level permission
 Apply uniform at all object inside bucket
 Fine grained permission
 Access Control List – ACL For Each object Separately
 Apply Project level
 IAM
 Different Predefined Role
 Storage Admin
 Storage Object Admin
 Storage Object Creator
 Storage Object Viewer
 Create Custom Role
 Assign Bucket level Role
 Select bucket & assign role
 To user
 To other GCP services or product
Bucket Retention Policy
© ANKIT MISTRY – GOOGLE CLOUD
 Minimum duration for which bucket will be protected from
 Deletion
 modification
 Let’s see How to Configure it.
Signed URL
© ANKIT MISTRY – GOOGLE CLOUD
 Temporary access
 you can give access to user who doesn’t have Google Account.
 URL expired after time period defined.
 Max period for which URL is valid is 7 days.
 gsutil signurl -d 10m -u gs://<bucket>/<object>
GCS - Pricing
© ANKIT MISTRY – GOOGLE CLOUD
 Storage Pricing
 Data access Pricing
 Go to Cloud Console & create Bucket, observe pricing
Data Transfer Services
© ANKIT MISTRY – GOOGLE CLOUD
Data Transfer Service
© ANKIT MISTRY – GOOGLE CLOUD
 From On-premises to Google Cloud Storage (GCS)
 From One bucket to another bucket inside same GCP
 From Other public cloud Amazon S3, Azure Container to GCS
On-premises
On-premises to (GCS)
© ANKIT MISTRY – GOOGLE CLOUD
 gsutil – command line utility
 Online mode of transfer
 install locally Google Cloud SDK
 gsutil -m cp large_number_of_small_files (-m for parallel upload)
 Should we go for it or not?
 Good Network
 Follow chart in next slide
 Transfer Service for on-premises data
 This will quickly and securely move your data from private data centers into Google Cloud Storage
 Two step process
 installing an agent
 create a transfer job
Transfer Appliance
© ANKIT MISTRY – GOOGLE CLOUD
 Transfer Appliance
 Physical device which securely transfer large amounts of data to Google Cloud Platform
 When data that exceeds 20 TB or would take more than a week to upload.
Online vs Offline transfer
© ANKIT MISTRY – GOOGLE CLOUD
Transfer Service|cloud data
© ANKIT MISTRY – GOOGLE CLOUD
 This will quickly and securely transfer data into Google Cloud Storage
 From various sources
 Amazon S3
 Azure Blob Storage
 Move data between Cloud Storage buckets
 Create Transfer Job
 Onetime run or recurring
Google Block Storage
© ANKIT MISTRY – GOOGLE CLOUD
Google Block Storage
© ANKIT MISTRY – GOOGLE CLOUD
 Block storage – hard Disk storage
 Direct attached Storage
 Network attached Storage
Direct attached – Local SSD
© ANKIT MISTRY – GOOGLE CLOUD
 Local SSD
 Physically attached to VM
 Very High Performance – 10x to 100x of Persistence Disk
 Costlier than Persistence Disk
 You can not re attach to other VM
 Once VM destroy, Local SSD will be deleted
 Lower Availability
 Temporary/Ephemeral Storage
 No Snapshot
 Let’s see in action.
Local SSD with Compute
Engine
© ANKIT MISTRY – GOOGLE CLOUD
Network attached storage
© ANKIT MISTRY – GOOGLE CLOUD
 Network attached hard disk
 Persistent Disks
 Zonal, Regional
 Not attached directly to any VM
 Can be re-attached with other VM
 Very Flexible – resize easily
 Permanent storage
 Snapshot supported
 Cheaper than Local SSD
Persistence Disk with
Compute Engine
© ANKIT MISTRY – GOOGLE CLOUD
FileStore
© ANKIT MISTRY – GOOGLE CLOUD
Storage
Which storage to use when
© ANKIT MISTRY – GOOGLE CLOUD
 Cloud Storage
 Unstructured data storage
 Video stream, Image
 Staging environment
 Compliance
 Backup
 Data lake
 Persistent Disk
 Attach Disk with VM & Containers
 Share read-only disk with multiple VM
 Database storage
 Local Disk
 Temporary high performance attach Disk
 File Store
 Performance predictable
 Lift-shift millions of Files
Structured data Solution
in GCP
© ANKIT MISTRY – GOOGLE CLOUD
Structured data
© ANKIT MISTRY – GOOGLE CLOUD
Cloud SQL Cloud Spanner
Few Concept
© ANKIT MISTRY – GOOGLE CLOUD
 Relational data
 OLTP
 OLAP
 RTO vs RPO
 Vertical vs Horizontal Scaling
 Availability & Durability
OLTP & OLAP
© ANKIT MISTRY – GOOGLE CLOUD
OLTP
© ANKIT MISTRY – GOOGLE CLOUD
 OLTP – Online Transaction Processing
 Simple Query
 Large number of small transaction
 Traditional RDBMS
 Database modification
 Popular Database - MySQL, PostgreSQL, Oracle, MSSQL
 ERP, CRM, Banking application
 GCP - Cloud SQL, Cloud spanner
OLAP
© ANKIT MISTRY – GOOGLE CLOUD
OLAP – Online Analytical Processing
 Data warehousing
 Data is collected from multiple sources
Complex Query
 Data analysis
 Google Cloud Big Query – Petabyte Data warehouse
 Reporting Application, Web click analysis, BI Dashboard app
Vertical – Horizontal Scaling
© ANKIT MISTRY – GOOGLE CLOUD
Vertical
Scaling
8 VCPU - 16 GB
12 VCPU - 24 GB
Horizontal
Scaling
8 VCPU - 16 GB
RTO & RPO
© ANKIT MISTRY – GOOGLE CLOUD
 Data loss : 13 hours
 System Downtime : 7 Hours
5th March
4:00 PM
5th March
3:00 AM
5th March
11:00 PM
Last Backup
Disaster
happens
System
Recovered
Loss data
RTO
RPO
Downtime
RTO & RPO
© ANKIT MISTRY – GOOGLE CLOUD
 RTO – Recovery Time objective
 Maximum time for which system can be down
 RPO - Recovery Point objective
 Maximum time for which organization can tolerate Dataloss
RTO & RPO
© ANKIT MISTRY – GOOGLE CLOUD
 RTO – Recovery Time objective
 Maximum time for which system can be down
 RPO - Recovery Point objective
 Maximum time for which organization can tolerate Dataloss
Durability
© ANKIT MISTRY – GOOGLE CLOUD
 If you loose data means
 business is down
 No business afford to loose data
 How healthy & resilient your data is
 Object Storage provider measure durability in terms of number of 9’s
 Example : 99.9999999999999% - 11 9’s
 That means that even with one billion objects, you would likely go a hundred years
without losing a single one!
 https://cloud.google.com/blog/products/storage-data-transfer/understanding-
cloud-storage-11-9s-durability-target
Availability
© ANKIT MISTRY – GOOGLE CLOUD
 If region goes down where your data stored
 Replicate data across many region
 How much amount of time data is up/available to access.
 Data replicated across multiple regions, means higher Availability
 SLA – service level agreement
 SLA – 99.99% : four 9’s
 https://uptime.is
 https://cloud.google.com/terms/sla
GCP Database products
© ANKIT MISTRY – GOOGLE CLOUD
Structure data
• Cloud SQL
• Cloud Spanner
• BigQuery
Semi-
structure data
• BigTable
• Datastore
• Memorystore
Google Cloud SQL
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud SQL
© ANKIT MISTRY – GOOGLE CLOUD
 Fully managed Relational database services for MySQL, PostgreSQL & SQL Server
 Lift & shift above database
 Regional Database with 99.95% SLA
 Storage up to 30 TB
 Scale up to 96 core & 416 GB Memory
 No Horizontal Scaling
 Data is encrypted with Google managed key or CMEK
 Cloud SQL can be accessed from anywhere like – App Engine, Compute Engine…
 Used for storing Transactional database
 Ecommerce, CRM kind application backend.
Google Cloud SQL
© ANKIT MISTRY – GOOGLE CLOUD
 No maintenance & auto update
 Back-up Database
 On-demand Backup
 Schedule backup
 Database migration service (DMS)
 migrate data from different SQL system to Cloud SQL
 Point-in Time Recovery
 Scale with Read replicas – To transfer workload to other instance
 Export data
 gcloud utility or Cloud Console
 In SQL/CSV format
[Hands-on]
Create Google Cloud SQL
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on]
Connect Cloud SQL & IP
Whitelisting
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on]
Data migration to Cloud
SQL Demo
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on]
Cloud SQL failover demo
© ANKIT MISTRY – GOOGLE CLOUD
© ANKIT MISTRY – GOOGLE CLOUD
Google
Cloud SQL
Failover
https://cloud.google.com/sql/docs/mysql/high-availability
Cloud SQL Explore
© ANKIT MISTRY – GOOGLE CLOUD
Cloud SQL Export
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Spanner
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Spanner
© ANKIT MISTRY – GOOGLE CLOUD
 Distributed & scalable solution for RDBMS in GCP
 Fully managed, Mission critical application
 Horizontal Scalability
 use when Data volume > 2 TB
 Costlier than Cloud SQL
 Cloud SQL has just Read replicas,
 where as in cloud spanner horizontal read/write
across region
 Highly scalable, Petabyte scale
 Data is strongly typed.
 Must define schema database
 Datatype for each column of each table must be
defined.
 99.999% availability
 Cloud native solution – specific to GCP
 Lift & Shift not possible, Not recommended
 Spanner = Cloud SQL + Horizontal Scalable
 Scale to petabyte
 Regional/ Multi-region level instance can be
created
 Data export
 can not export with gcloud
 Cloud Console or Cloud Dataflow Job
Spanner vs RDBMS-SQL
© ANKIT MISTRY – GOOGLE CLOUD
Spanner Cloud SQL
Availability High During failover little downtime
Scalable Horizontal vertical
Price Costly Cheaper than spanner
SQL/Schema Support yes yes
Replication High Only Read Replica
[Hands-on] Cloud Spanner
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Spanner Demo
© ANKIT MISTRY – GOOGLE CLOUD
 Create Spanner Instance
 Create database edu_db
 Create 2 Table
 Author
 AuthorID
 AuthorName
 Book
 BookId
 Bookname
 AuthorId
After Job Done
make sure to delete
Spanner Instance
© ANKIT MISTRY – GOOGLE CLOUD
Which database to use when
© ANKIT MISTRY – GOOGLE CLOUD
 Cloud SQL
 Lift & Shift SQL based system
 CRM, Ecommerce App
 Max Data size is 30 TB
 Cloud Spanner
 Horizontal scalability
 Low latency
 High scalability in terms of storage + compute
 if Data Storage requirement is beyond TB
Semi-Structured data
Solution in GCP
© ANKIT MISTRY – GOOGLE CLOUD
NoSQL Introduction
© ANKIT MISTRY – GOOGLE CLOUD
 Not a SQL
 Flexible Schema
 Variable number of property
 Data model will be like
 Document
 key-value pair
 Graph based
{
“studentID” : 100,
“name” : “john”,
“score” : 78,
“country” : “US”
},
{
“studentID” : 101,
“name” : “Alice”,
“rank” : 7,
},
key value
100 std1
101 std2
Semi-structured data
© ANKIT MISTRY – GOOGLE CLOUD
BigTable Datastore Firestore
Google Cloud Datastore
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Datastore
© ANKIT MISTRY – GOOGLE CLOUD
 Highly scalable NoSQL database
 Serverless
 Document kind data storage – MongoDB
 App Engine + Datastore
 SQL Like Queries – GQL
 Support ACID Transaction
 Multiple indexes
 Data replication across different region
 Use case
 Session Info
 Product catlog
 Export data from gcloud utility only
Datastore RDBMS
Kind Table
Entity Row
Property Column
Key Primary Key
[Hands-on] Datastore
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Firestore
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Firestore
© ANKIT MISTRY – GOOGLE CLOUD
 Firestore is the next generation of Datastore
 Highly scalable NoSQL database
 Collection & Document Model
 Two mode
 native Mode
 datastore mode
 Real-time updates
 Mobile and Web client libraries
 Let’s see in Action
Firestore RDBMS
Collection group Table
Document Row
Field Column
Document ID Primary Key
[Hands-on] Firestore
© ANKIT MISTRY – GOOGLE CLOUD
Datastore & Firestore
Pricing
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud Memorystore
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Memorystore
© ANKIT MISTRY – GOOGLE CLOUD
 Fully managed Inmemory database
 sub-millisecond data access
 Two engine supported
 Redis
 Memcached
 Only Internal IP
 Highly available with 99.9% SLA
 Import/Export data from Cloud Storage to memory store
 Let’s create memory store Instance
[Hands-on] Memorystore
© ANKIT MISTRY – GOOGLE CLOUD
Google Cloud BigTable
© ANKIT MISTRY – GOOGLE CLOUD
Cloud BigTable
© ANKIT MISTRY – GOOGLE CLOUD
 No Multi column index
 Only Row key based indexing
 Design Row Key is very important
 Design Row key by keeping in your mind
 which is your frequent query in application
 No Hot spotting
 Don’t use monotonically increasing key
 Seamless integration with
 Warehouse – BigQuery
 Machine Learning Product
 Used for
 Financial data
 Time series Data
 Fully managed
 Wide column NOSQL database
 Not serverless
 Scale horizontally with Multiple Node
 Scale to huge Volume of data
 Data stored at column wise
 Column are grouped into column family
 Milli second latency
 Handles millions of request per second
 How to access
 cbt – command line (part of cloud sdk)
 Hbase API
Cloud BigTable
© ANKIT MISTRY – GOOGLE CLOUD
Row Key
Personal_data_cf Professional_data_cf
name age salary designation company
1
2
3
Professional_data_cf:salary
[Hands-on]
Google Cloud BigTable
© ANKIT MISTRY – GOOGLE CLOUD
BigTable Pricing
© ANKIT MISTRY – GOOGLE CLOUD
GCP Data Processing Solution
© ANKIT MISTRY – GOOGLE CLOUD
 BigQuery
 Analytical Workload – Storage + Processing
 DataFlow – Apache Beam
 DataProc – Spark, Hadoop
 Data Fusion
 Cloud Composer
 Data Prep
 Cloud PubSub
Google Cloud BigQuery
© ANKIT MISTRY – GOOGLE CLOUD
Cloud BigQuery
© ANKIT MISTRY – GOOGLE CLOUD
 Data warehouse solution in GCP
 Like Relational database – SQL schema
 Serverless
 Built using BigTable + GCP Infrastructure
 BigQuery is Columnar storage
 This is for Analytical database
 not for Transactional purpose
 Exabyte scale
 Query using
 Standard SQL
 legacy SQL
 Big Query can query from external data source.
 Cloud storage, SQL, Big Table
 Biquery can load data from various sources.
 CSV, JSON, Avro, SQL and many more
 Query is very expensive
 $5 approx. for 1 TB of data scanned
 Before query execution do dry run.
 Alternative to OpenSource Apache Hive
 How to access BigQuery
 Cloud Console
 bq – command line tool
 Client library - written in C#, Go, Java, Node.js, PHP,
Python, and Ruby
BigQuery Data organization
© ANKIT MISTRY – GOOGLE CLOUD
Projects
Datasets
Tables
Jobs
 Projects are top level container in GCP
 Dataset hold multiple tables
 Each table must belong to dataset
 Assign Role at the organization, project, and dataset level
 Tables – contain data
 It has Schema
 Types of Tables
 Native tables, External tables, Views
 Jobs
 manage asynchronous tasks
 Types of Job
 Load, Query, Extract, Copy
When BigQuery should be used
© ANKIT MISTRY – GOOGLE CLOUD
 When workload is analytical
 When Data doesn’t change in database, as bigquery use built –in cache
 For complex query
 When query takes more execution time.
 off-load some workload from primary transaction DB
 When you large volume of data
 No Join is preferred.
 When you data is denormalized
[Hands-on]
Cloud BigQuery Explore +
Public dataset
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on]
Cloud BigQuery +
Local data
© ANKIT MISTRY – GOOGLE CLOUD
Cloud BigQuery Pricing
© ANKIT MISTRY – GOOGLE CLOUD
 On-demand
 Pay for what you use
 Flat rat Pricing
 Allocate compute & storage capacity
Google Cloud PubSub
© ANKIT MISTRY – GOOGLE CLOUD
PubSub
© ANKIT MISTRY – GOOGLE CLOUD
Application DB
Application
DB
Notification
Service
Notification
Service
How PubSub works
© ANKIT MISTRY – GOOGLE CLOUD
 Fully-managed asynchronous
messaging service
 Scale to billions of message per day
 Publisher – App send message to
Topic
 Push & Pull way to access messages
 Pull – Subscriber pull message
 Push – Message will be sent to
subscriber via webhook
 One topic – Multiple Subscriber
 One subscriber – Multiple Topic
https://cloud.google.com/pubsub/docs/overview
Cloud PubSub
© ANKIT MISTRY – GOOGLE CLOUD
 Fully-managed Pubsub system inside Google Cloud
 Serverless
 Auto-scaling and auto-provisioning with support from zero
to hundreds of GB/second
 Topic – Storage reference
 Publisher send message to topic at pubsub.googleapis.com
 Push – Pull way to access message
 Once subscriber receive message ack is sent.
 Cloud Pubsub act as staging environment for many GCP
services
Advantage PubSub
© ANKIT MISTRY – GOOGLE CLOUD
 Durability of data will increase
 Highly Scalable, Scalable
 Decoupling between both system (Publisher & Subscriber)
 Application don’t synchronously communicate with Notification service
 Application (Publisher) is not dependent on Notification service (Subscriber)
[Hands-on] Cloud PubSub
© ANKIT MISTRY – GOOGLE CLOUD
Cloud DataFlow
© ANKIT MISTRY – GOOGLE CLOUD
 Managed service for variety of data processing
 An advanced unified programming model to implement batch and streaming data processing jobs that run
on various execution engine/ runner
 Cloud version of Apache Beam = (Batch + Stream)
 Serverless, Fully managed
 Horizontal autoscaling of worker
 Jobs created with
 Pre-define template
 Notebook instance
 Write Data Pipeline job in Java, Python, SQL
 From Cloud Shell/Local Machine
How DataFlow Works
© ANKIT MISTRY – GOOGLE CLOUD
 Write Job in Java, Python Go
 Unified API for both batch + stream Processing
 No Need to separately handle Batch & streaming data
 Execution
 Direct Runner
 Scaling issue
 Apache Flink
 Apache Spark
 Cloud DataFlow
Apache Beam
© ANKIT MISTRY – GOOGLE CLOUD
 Pipeline
 A pipeline is a graph of transformations that a user constructs that defines the data processing they want to do.
 IO-Transform
 https://beam.apache.org/documentation/io/built-in/
 Pcollection
 Fundamental data type in Beam
 Ptransform
 The operations executed within a pipeline
 https://beam.apache.org/documentation/programming-guide/#transforms
 Runner - Execution engine
[Hands-on] Cloud DataFlow
PRE-DEFINE TEMPLATE
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on] Cloud DataFlow
NOTEBOOK INSTANCE
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on] Cloud DataFlow
EXECUTE JOB FROM SHELL WITH DATAFLOW
© ANKIT MISTRY – GOOGLE CLOUD
Cloud DataProc
© ANKIT MISTRY – GOOGLE CLOUD
 Managed Hadoop & Spark Services inside GCP
 Lift/Shift Existing Hadoop/Spark based Job
 Cluster type
 Standard (1 master, N workers)
 Single Node (1 master, 0 workers)
 High Availability (3 masters, N workers)
 Worker node regular VM or Preemptible VM (Cost reduction)
 Job Supported :
 Hadoop, SparkR, Spark, SparkSQL, Hive, Pig, PySpark
 Demo
 Spark, PySpark, Notebook Instance
[Hands-on]
Create Cloud DataProc
Cluster
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on] Cloud DataProc
SPARK JOB
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on] Cloud DataProc
SUBMIT PYSPARK JOB
© ANKIT MISTRY – GOOGLE CLOUD
[Hands-on] Cloud DataProc
NOTEBOOK INSTANCE
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Fusion
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Data Fusion
© ANKIT MISTRY – GOOGLE CLOUD
 Fully-managed, cloud native solution to quickly building data pipelines
 Code free, Drag-n-drop tool
 150+ preconfigured connectors & transformations
 Built with Open-source CDAP
 3 Edition are available
 Developer
 Basic
 Enterprise
 Pricing :
 https://cloud.google.com/data-fusion/pricing#cloud-data-fusion-pricing
 Let’s see in Action – create Cloud Fusion Instance
Cloud Data Fusion Demo
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Composer
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Composer
© ANKIT MISTRY – GOOGLE CLOUD
 Fully Managed Apache Airflow which in GCP
 Airflow is a workflow & orchestration engine
 With Airflow, one can programmatically schedule and monitor workflows
 Workflows are defined as directed acyclic graphs (DAGs)
 DAGs are written in Python 3.x
 Built-in integration for Other GCP services
 Google BigQuery,
 Cloud Dataflow & Dataproc,
 Cloud Datastore
 Cloud Storage,
 Cloud Pub/Sub, and Cloud ML Engine
[Hands-on]
Create Cloud Composer Instance
© ANKIT MISTRY – GOOGLE CLOUD
Write first DAG in
Cloud Composer
© ANKIT MISTRY – GOOGLE CLOUD
Data Loss Prevention
API
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Data Loss Prevention API
© ANKIT MISTRY – GOOGLE CLOUD
 Fully managed service designed to help you discover, classify, and protect your most sensitive data.
 PII data
 Person’s name, Credit Card Number, SSN
 Apply API on Cloud Storage, Big Query Data
 DLP work upon Free form Text, Structured & Unstructured data (image)
 What to do with this Data
 Identify sensitive data
 De-identify data
 Masking and Encryption
 re-identify (In case want to recover original data)
De-Identification of Data
© ANKIT MISTRY – GOOGLE CLOUD
 Redacation – remove sensitive data
 Replacement – replace with some tokens (Like Info_type)
 Masking – Replace one/more character with some other char
 Encryption – Encrypt Sensitive Data
TEMPLETES, INFOTYPES &
MATCH LIKELIHOOD
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
TEMPLATES
© ANKIT MISTRY – GOOGLE CLOUD
 Configuration which define for
 Inspection of Jobs
 De-identification of Jobs
 Once Template defined , can be reused for other Jobs
TEMPLATES
TYPES
Identification :
Find Sensitive Data
De–Identification :
Remove Sensitive Data
INFOTYPES
© ANKIT MISTRY – GOOGLE CLOUD
 What to Scan For
 Like Credit Card
 SSN
 Age
Types of
INFOTYPES
Built-IN
US_SOCIAL_SECURITY_NUMBER,
EMAIL_ADDRESS
Such types of 120 Built-in Infotype
Defined
STORED
Custom Infotype
Based on Fixed words, Regular
Expression, Custom Dictionary
MATCH LIKELIHOOD
© ANKIT MISTRY – GOOGLE CLOUD
LIKELIHOOD_UNSPECIFIED Default value; same as POSSIBLE.
VERY_UNLIKELY
It is very unlikely that the data matches the
given InfoType.
UNLIKELY
It is unlikely that the data matches the
given InfoType.
POSSIBLE
It is possible that the data matches the
given InfoType.
LIKELY
It is likely that the data matches the
given InfoType.
VERY_LIKELY
It is very likely that the data matches the
given InfoType.
DLP API Demo
https://cloud.google.com/dlp/demo/#!/
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Create INFO_TYPE
(Hands-on)
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Create TEMPLETES
(Hands-on)
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Create Job for Inspection
(Hands-on)
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Create Template for De-
identification
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Apply Some more rules to
template
BY ANKIT MISTRY
© ANKIT MISTRY – GOOGLE CLOUD
Data Catalog
© ANKIT MISTRY – GOOGLE CLOUD
Data Catalog
© ANKIT MISTRY – GOOGLE CLOUD
 Most organizations today are dealing with a large and growing number of data assets.
 Data stakeholders (consumers, producers, and administrators) face a number of challenges:
 Searching for insightful data
 Understanding data
 Making data useful
 Data Catalog
 A fully managed and highly scalable data discovery and metadata management service.
 Single place to discover all data, asset across all project
 Using Data catalog
 Search data assets
 tag data
How Data Catalog works
© ANKIT MISTRY – GOOGLE CLOUD
Metadata
© ANKIT MISTRY – GOOGLE CLOUD
 Technical Metadata
 For BigQuery, Pubsub these metadata resides inside individual products
 Technical meta data being registered by catalog automatically
 Business Metadata
 Attach Tag to existing data asset
 Define some Tag template and attach metadata
[Hands-on] Data Catalog
© ANKIT MISTRY – GOOGLE CLOUD
ML/AI Module Introduction
© ANKIT MISTRY – GOOGLE CLOUD
Machine Learning
© ANKIT MISTRY – GOOGLE CLOUD
Machine Learning - GCP
© ANKIT MISTRY – GOOGLE CLOUD
 Concept behind Machine Learning
 Types of ML System
 Pre trained Model
 Custom Model
 TPU – tensor processing unit
Machine Learning
© ANKIT MISTRY – GOOGLE CLOUD
 Design Spam email classification system
 How to design?
 What rules you will code inside system
 If message coming from some specified list of senders, spam it
 If message contain word like lottery, promotion, spam it
 But how many such rule you will define inside system.
 It is very difficult & cumbersome task to design such way.
 If spammer start sending spam which is not part of rule book.
 So, need some intelligent approach,
 Machine Learning is the solution behind it.
Machine Learning
© ANKIT MISTRY – GOOGLE CLOUD
 Rather than define such rule,
 In machine learning, system learn from data
 Training + Testing kind of system
Input – message
with labels
ML Algorithm Model
Test input + Model Model Predicted Output
Training Process
Testing Process
Types of ML System
© ANKIT MISTRY – GOOGLE CLOUD
 ML Types
 Supervised learning
 Label has been given
 Regression
 Classification
 Unsupervised learning
 No labels
 Find Structure within data
Regression
© ANKIT MISTRY – GOOGLE CLOUD
 Output prediction is continuous in nature
 Example
 House Price prediction
 Regression ML Algorithm :
 Linear Regression
 SVR
 Decision Tree Regressor
Area
No of
Bedroom
Price
5434 5 3536
2342 5 3564
243 1 4564
987 4 7675
Classification
© ANKIT MISTRY – GOOGLE CLOUD
 Output prediction is discrete in nature
 Example
 Sentiment analysis of review : +ve/-ve
 This product is very much helpful. +ve
 Is it Orange?
 Yes/No
 Classification Algorithm :
 Logistic Regression
 SVM
 KNN
 Decision Tree Classification
Unsupervised Learning
© ANKIT MISTRY – GOOGLE CLOUD
 No label Given
 Find Structure within data
 Clustering is type of Unsupervised Learning
 Some clustering Algorithm :
 K-Means
 hierarchical
Machine Learning workflow
© ANKIT MISTRY – GOOGLE CLOUD
Ingest Data
Data
Preparation
Split data
(Training +
Testing)
Apply ML
algorithm to
Training Data
Model
Testing Data Model Prediction
Training
Testing
Machine Learning + GCP
© ANKIT MISTRY – GOOGLE CLOUD
GCP ML Solution
Pre-Trained
Model
For generic use
cases
Vision API, Speech
API, Language API
Auto ML
Throw your data &
rest GCP will take
care
Custom Training
Tensorflow, PyTorch,
Scikit-learn
Spark ML,
BigQuery ML
Data Preparation with
DataPrep
© ANKIT MISTRY – GOOGLE CLOUD
DataPrep
© ANKIT MISTRY – GOOGLE CLOUD
 Intelligent Data Preparation tool
 Visually explore, clean, and prepare data for analysis and machine learning
 Build by Trifacta – Third Party tool, not cloud native one
 Play with this tool without any code, with just click
 Dataprep is serverless and works at any scale
 No infrastructure to deploy or manage
 Automatically detect schema, anomalies
 Do all time consuming task easily
 Concern – Need to share data with Trifacta
DataPrep ETL Pipeline
© ANKIT MISTRY – GOOGLE CLOUD
Let’s see DataPrep in
Action
© ANKIT MISTRY – GOOGLE CLOUD
Pre-trained Model
© ANKIT MISTRY – GOOGLE CLOUD
Pre-Trained Model
© ANKIT MISTRY – GOOGLE CLOUD
Vision API
Natural
Language API
Speech to
Text API
Text to
Speech API
 Google has huge amount of data
 Google has already trained ML/AI algorithm to build
model
 For generic use case like
 Object recognition/detection – Vision API
 OCR
 Speech to Text
 Language Translation
 NLP API – to get insight from natural language
 You can take advantage of pre-built model.
 No Training required from customer
 Use already built Rest API for above use cases
Vision API
© ANKIT MISTRY – GOOGLE CLOUD
 Derive insights from your images
 Detect printed and handwritten text
 Detect objects
 Identify popular places and product logos
 Moderate content
 Celebrity recognition
 How to use
 Web UI – Just for Testing
 https://cloud.google.com/vision#section-2
 gcloud – CLI
 Python SDK
Natural Language API
© ANKIT MISTRY – GOOGLE CLOUD
 Derive insights from unstructured text using Google machine learning
 Identify entities within documents
 Sentiment analysis
 Content classification
 How to use
 gcloud – CLI
 Python SDK
Speech Text API
© ANKIT MISTRY – GOOGLE CLOUD
 Speech to Text API
 Accurately convert speech into text using an API powered by Google’s AI technologies.
 125 languages support
 Streaming speech recognition
 Content filtering
 Automatic punctuation
 https://cloud.google.com/speech-to-text#section-2
 Text to Speech API
 Convert text into natural-sounding speech using an API powered by Google’s AI technologies.
 https://cloud.google.com/text-to-speech#section-2
ML API Pricing
© ANKIT MISTRY – GOOGLE CLOUD
Auto Machine Learning
© ANKIT MISTRY – GOOGLE CLOUD
Auto Machine Learning
© ANKIT MISTRY – GOOGLE CLOUD
 AutoML – Auto Machine Learning
 Your use case is not generic
 You have some custom requirement
 Vision API – recognize shoes
 Auto ML – Different types of shoes detection
 Adidas, Nike shoes
 Throw your data & GCP will create best model for you
 State-of art Transfer learning technology
 Throw your data & Google AI will create model
 2 use cases Demo :
 Flower species recognition
 Text Classification
Text Classification
© ANKIT MISTRY – GOOGLE CLOUD
 Dataset creation
Data Class
My eldest son who is 27 just got word he has a new job after
finishing his bachelors degree. This made me very happy!
achievement
I visited my best friend at her school on St. Patrick's day. bonding
My mom cooked some delicious rice for me with curd. affection
Today I make Eye contact with my crush. She Also look into my
Eyes For a Seconds or Two. I can still Memorize his Beautiful Eyes.
affection
 Train Model
 Analyze Model
 Deploy for Prediction
achievement
Affection
bonding
enjoy_the_moment
exercise
leisure
nature
Flower Classification
© ANKIT MISTRY – GOOGLE CLOUD
 Dataset creation
 Train Model
 Define Node hours
 Analyze Model
 Deploy for Prediction
daisy
dandelion
roses
sunflowers
tulips
TPU
© ANKIT MISTRY – GOOGLE CLOUD
 TPU – Tensor Processing Unit
 Machine Learning Training is one of the most time consuming process
 It may take hours to days to sometime week
 Training time depend upon Ml Algorithm + Amount of dataset
 Google introduce Tensorflow framework to do Machine Learning which powers their own ML Product
 Tensor are basic building block of this framework.
 So, To do training faster Google created ASIC based in-house dedicated computing for Tensor Processing
 Speed up training by 20x to 30x
 Work with VM, GKE, AI Platform
 Quickly experiment with number of ML Models creation
Train your own model
© ANKIT MISTRY – GOOGLE CLOUD
Custom Model
© ANKIT MISTRY – GOOGLE CLOUD
 You have your own dataset
 You want to train your own model
 You have team of data scientist
 They want to write own algorithm based on
 Scikit-learn, XGBoost
 Tensorflow
 PyTorch Framework
 Notebook instance
 Build Logistic Regression model for flower species recognition
 Save model in pickle file
 Deploy model endpoint
BigQuery ML
© ANKIT MISTRY – GOOGLE CLOUD
 Create ML Model in SQL
 No Python, No Java.
 No need to export data to other environment
 Model support
 Linear Regression
 Multiclass Logistic Regression
 K-Means
 XGBoost
 Tensorflow – Import
 use case demo
 Flower species recognition
 From BigQuery public dataset
Main BigQuery Function
© ANKIT MISTRY – GOOGLE CLOUD
 Create MODEL
 Model Type – Linear Reg, Logistic
 Label Column
 Learning Rate etc…
 Evaluate Model
 ML.Evaluate
 Provide Model & Test Data
 Determine how good model performance on Test data
 Prediction
 ML.Prediction
 Apply Live data to Model to get prediction
[Hands-on] BigQuery ML
© ANKIT MISTRY – GOOGLE CLOUD
Cloud Data Studio
© ANKIT MISTRY – GOOGLE CLOUD
Data Studio
© ANKIT MISTRY – GOOGLE CLOUD
 BI tool from Google
 Connect your data from spreadsheets, Analytics, Google Ads, Google BigQuery and many more connectors
 Drag & drop, no code
 Create reports & Dashboards
 Free
 Let’s see in Action
THANK YOU
© ANKIT MISTRY – GOOGLE CLOUD

More Related Content

What's hot

MicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshMicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshAkash Agrawal
 
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingSreenivas Makam
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan Misirli
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideBytemark
 
Event driven autoscaling with KEDA
Event driven autoscaling with KEDAEvent driven autoscaling with KEDA
Event driven autoscaling with KEDANilesh Gule
 
Cloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsCloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsWeaveworks
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowDatabricks
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana LokiJulien Pivotto
 
Prometheus
PrometheusPrometheus
Prometheuswyukawa
 

What's hot (20)

MicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMeshMicroServices with Containers, Kubernetes & ServiceMesh
MicroServices with Containers, Kubernetes & ServiceMesh
 
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 배포 방법 소개::김대근, AI/ML 스페셜리스트 솔루션즈 아키텍트, AWS::AWS AIML 스페셜 웨비나
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes Networking
 
Balkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusionBalkan - data eng meetup - data fusion
Balkan - data eng meetup - data fusion
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Event driven autoscaling with KEDA
Event driven autoscaling with KEDAEvent driven autoscaling with KEDA
Event driven autoscaling with KEDA
 
Azure storage
Azure storageAzure storage
Azure storage
 
Cloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOpsCloud Native Engineering with SRE and GitOps
Cloud Native Engineering with SRE and GitOps
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Prometheus
PrometheusPrometheus
Prometheus
 

Similar to GCP Data Engineer Certification Guide

Cloud Computing
Cloud ComputingCloud Computing
Cloud ComputingOmar Fathy
 
Session 4 GCCP.pptx
Session 4 GCCP.pptxSession 4 GCCP.pptx
Session 4 GCCP.pptxDSCIITPatna
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateSimon Su
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIswesley chun
 
A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite Codemotion
 
How Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/labHow Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/labwesley chun
 
GDSC Study Jam Session 1
GDSC Study Jam Session 1GDSC Study Jam Session 1
GDSC Study Jam Session 1SahithiGurlinka
 
GCCP Session 2.pptx
GCCP Session 2.pptxGCCP Session 2.pptx
GCCP Session 2.pptxDSCIITPatna
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCPKnoldus Inc.
 
Building what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructureBuilding what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructureMediaAgility
 
Google Cloud Platform - Building a scalable mobile application
Google Cloud Platform - Building a scalable mobile applicationGoogle Cloud Platform - Building a scalable mobile application
Google Cloud Platform - Building a scalable mobile applicationLukas Masuch
 
Google Cloud Platform - Building a scalable Mobile Application
Google Cloud Platform - Building a scalable Mobile ApplicationGoogle Cloud Platform - Building a scalable Mobile Application
Google Cloud Platform - Building a scalable Mobile ApplicationBenjamin Raethlein
 
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送Google Cloud Platform - Japan
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Ido Green
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...e-Legion
 
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...Vrushali Channapattan
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar
 

Similar to GCP Data Engineer Certification Guide (20)

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Session 4 GCCP.pptx
Session 4 GCCP.pptxSession 4 GCCP.pptx
Session 4 GCCP.pptx
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite
 
Gdsc muk - innocent
Gdsc   muk - innocentGdsc   muk - innocent
Gdsc muk - innocent
 
How Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/labHow Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/lab
 
GDSC Study Jam Session 1
GDSC Study Jam Session 1GDSC Study Jam Session 1
GDSC Study Jam Session 1
 
GCCP Session 2.pptx
GCCP Session 2.pptxGCCP Session 2.pptx
GCCP Session 2.pptx
 
Introduction to GCP
Introduction to GCPIntroduction to GCP
Introduction to GCP
 
Building what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructureBuilding what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructure
 
Google Cloud Platform - Building a scalable mobile application
Google Cloud Platform - Building a scalable mobile applicationGoogle Cloud Platform - Building a scalable mobile application
Google Cloud Platform - Building a scalable mobile application
 
Google Cloud Platform - Building a scalable Mobile Application
Google Cloud Platform - Building a scalable Mobile ApplicationGoogle Cloud Platform - Building a scalable Mobile Application
Google Cloud Platform - Building a scalable Mobile Application
 
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
 
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...
 
GCCP.pptx
GCCP.pptxGCCP.pptx
GCCP.pptx
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
 

Recently uploaded

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

GCP Data Engineer Certification Guide

  • 1. GCP Google Cloud Professional Data Engineer © ANKIT MISTRY – GOOGLE CLOUD
  • 2. Google Certified Professional Data Engineer © ANKIT MISTRY – GOOGLE CLOUD
  • 3. Professional Data Engineer © ANKIT MISTRY – GOOGLE CLOUD  Pay attention for 5 minutes, before we dive in.  Challenging certification, and course is long so have patience.  Advance professional certification, Expect basics Associate level GCP knowledge  Learn by Doing  So with every exam objective, There is hand-on Lab – 80+
  • 4. © ANKIT MISTRY – GOOGLE CLOUD GCP certifications https://cloud.google.com/certification/guides/data-engineer
  • 5. Cloud Cost for this course © ANKIT MISTRY – GOOGLE CLOUD  $0 – for GCP account  GCP Free trial  $300 for next 3 months https://cloud.google.com/free  Length: Two hours  Registration fee: $200 (plus tax where applicable)  Languages: English, Japanese  Exam format: Multiple choice and multiple select,
  • 6. Data Engineering Concepts BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 7. Data engineering Overview © ANKIT MISTRY – GOOGLE CLOUD  Data Pipelines  How Data Flows Ingestion Storage Process and analyze Explore and visualize
  • 8. © ANKIT MISTRY – GOOGLE CLOUD Ingestion Storage Process and analyze Explore and visualize
  • 9. Ingest © ANKIT MISTRY – GOOGLE CLOUD  Gather Data from multiple sources  Data gather from App  Event Log, Click stream Data, e- commerce Transaction  Streaming Ingest  PubSub  Batch Ingest  Different Transfer services  GCS - gsutil
  • 10. Store © ANKIT MISTRY – GOOGLE CLOUD  Cost efficient & durable data storage https://cloud.google.com/architecture/data-lifecycle-cloud-platform#store
  • 11. Process & Analyze © ANKIT MISTRY – GOOGLE CLOUD  What kind of outcome you want  What analysis you want to perform  Convert Data into meaning  Analyze Data with BigQuery  Apply ML with  BigQuery ML  Spark ML with DataProc  Vertex API  Build ML Model with Auto ML/Custom Model https://cloud.google.com/architecture/data-lifecycle-cloud-platform
  • 12. Explore and visualize © ANKIT MISTRY – GOOGLE CLOUD  Google Data Studio – Easy to use BI Engine  Dashboard & Visualization  Datalab  Interactive Jupyter Notebook  Support for all Data Science Library  ML Prebuilt API  Vision API  Speech API
  • 13. Types of Data - Structure © ANKIT MISTRY – GOOGLE CLOUD Structured Semi- Structured Unstructured
  • 14. Structured Data © ANKIT MISTRY – GOOGLE CLOUD  Tabular Data  Represented by Rows & Columns  SQL can be used to interact with data  Fixed Schema  Each row has same number of columns  Relational Database are structured  MySQL, Oracle SQL, PostgreSQL , MSSQL  In GCP, Cloud SQL, Cloud Spanner Book_id Book_name Author_id 100 C 1 101 Java 1 102 Python 2 Author_id Author_name 1 John 2 Alice
  • 15. Semi-Structured Data © ANKIT MISTRY – GOOGLE CLOUD  Each Record has variable number of Properties  No Fixed Schema  Flexible structure  NoSQL kind of Data  Store data as key-value pair  JSON – Java Script object Notation are base way to represent semi structure data  MongoDB, Cassandra, Redis, Neo4j  In GCP, BigTable, DataStore, memoryStore Doc #1{ “studentID” : 100, “name” : “john”, “score” : 78, “country” : “US” }, Doc #2{ “studentID” : 101, “name” : “Alice”, “rank” : 7, },
  • 16. UnStructured Data © ANKIT MISTRY – GOOGLE CLOUD  No Pre define Structure in Data  Image  video data,  natural Language are example of unstructured data  Google Cloud Storage, File store inside GCP to store Unstructure data
  • 17. Batch Data vs Streaming Data © ANKIT MISTRY – GOOGLE CLOUD  Batch Data Processing  Defined Start & End of data – data size is known  Processing High volume of data after certain periodic interval  Long time to process data  Payment processing  Streaming Data  Unbounded, No End defined  Data is processed as it arrives  Size is unknown  No much heavy processing – take millisecond - seconds to process data  Stock data processing
  • 18. GCP Fundamental BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 19. GCP Regions & Zones © ANKIT MISTRY – GOOGLE CLOUD
  • 20. Why Zones & Regions © ANKIT MISTRY – GOOGLE CLOUD Singapore Zone-a Singapore Zone-b US West Zone-a US West Zone-b  Low latency  Follow Government rules  High availability  Disaster recovery
  • 21. GCP (Zones & Region)  Zones – Independent data Center  Region – Geographical area  Multi-region : Collection of Geographical  Global - Anywhere © ANKIT MISTRY – GOOGLE CLOUD Global Locations - Regions & Zones | Google Cloud Fascinating Number: Google Is Now 40% Of The Internet (forbes.com) Global Multi-regions Regions-1 Zone-a Zone-b Zone-c Regions-2 Zone-a Zone-b Zone-c
  • 22. Create GCP Free Tier Account © ANKIT MISTRY – GOOGLE CLOUD
  • 23. GCP Services © ANKIT MISTRY – GOOGLE CLOUD Storage & Database • Cloud Storage, File Store • SQL, Spanner, Big Query, Big table Data processing • Data Proc, Data Flow • Data Prep • Data Catalog • Big Query, PubSub • Composer, Data Fusion ML/AI • Vertex AI • Prebuilt API • Custom Model • Auto ML Secondary Services • Compute Engine • Kubernetes • App Engine – Cloud Run
  • 24. GCP Basic Services © ANKIT MISTRY – GOOGLE CLOUD
  • 25. Basic Infrastructure services in GCP © ANKIT MISTRY – GOOGLE CLOUD IAM Compute Engine Kubernetes App Engine Cloud Function VPC Cloud Run
  • 26. IAM © ANKIT MISTRY – GOOGLE CLOUD  Identity & access management  Who can do What on Which resources  Who - Identity  What - Action : Create, Update, Delete  Which – Resources, Compute Engine, App Engine, Cloud Storage  Roles : Collections of Permissions  Built-in Roles  Custom Role  Service Account
  • 27. IAM - Identity © ANKIT MISTRY – GOOGLE CLOUD Identity Google Account Google Groups Cloud identity Account Google Workspace Account Service Account
  • 28. IAM – Roles & Permission © ANKIT MISTRY – GOOGLE CLOUD Roles Primitive Owner Editor Viewer Pre-defined Compute Admin Storage Viewer Custom Combined Collection of Permission  Roles are collection of permissions  One can assign Role to identity, but Can not assign permission directly.
  • 29. Assign Roles to identity © ANKIT MISTRY – GOOGLE CLOUD
  • 30. Service Account © ANKIT MISTRY – GOOGLE CLOUD  For non human – like for Apps, services  Service Account is identity for Compute engine  Service account keys can be used for authentication  Max 10 keys per Service Account  Max 100 Service Account per project  Let’s Explore Service Account
  • 31. Provision Virtual machine © ANKIT MISTRY – GOOGLE CLOUD  Basic Building block of any Cloud  Compute Engine  IAAS – Full Control, more flexibility, more responsibility  Important parameter :  Zone  Service Account  Machine family – CPU, RAM  Boot Disk  Storage  Virtual Private Cloud
  • 32. App Engine Deployment © ANKIT MISTRY – GOOGLE CLOUD  PAAS Solution  No Server management  Deploy HTTP based application  Focus on Code  Standard & Flexible mode  Support many runtime engine  Python, java, Go, Node JS  Let’s Deploy Hello World NodeJS App
  • 33. GKE – kubernetes Engine © ANKIT MISTRY – GOOGLE CLOUD  Let’s say  you want to create 100’s of container to scale your app  need some automate approach which fully manage all container lifecycle  Kubernetes is the solution for it  Open source  Orchestration system for containerized application.  Open Source – Developed by Google , Launched in 2014 – Kubernetes  In 2015, Google Launched cloud version - GKE  Cloud Agnostics  Written in Go language  Let’s Deploy image to GKE 1. Create Cluster 2. Deploy Workload (Container image) 3. Expose Outside World
  • 34. Google Kubernetes Engine © ANKIT MISTRY – GOOGLE CLOUD  Orchestration system for containerized application.  Open Source – Developed by Google  Launched in 2014 – Kubernetes  In 2015, Google Launched cloud version - GKE  Cloud Agnostics  Written in Go language  Let’s deploy App on Kubernetes
  • 35. Deploy Containerized app on Google Kubernetes Engine - GKE (CAAS) © ANKIT MISTRY – GOOGLE CLOUD
  • 36. Google Cloud Run © ANKIT MISTRY – GOOGLE CLOUD  Next Generation App Engine  Deploy Containerized Workload  No Server management  Auto-scaling as Traffic grows  Let’s See in action
  • 37. Google Cloud Function © ANKIT MISTRY – GOOGLE CLOUD  Server less  Fully managed  Build small micro service  Auto scaling as traffic increase  Event based trigger  Http  File upload etc.  Message pushed to pub/sub  Let’s See in action
  • 38. Different storage product (GCP) © ANKIT MISTRY – GOOGLE CLOUD
  • 39. © ANKIT MISTRY – GOOGLE CLOUD Structured Transactional Cloud SQL Cloud Spanner Analytical Big Query Semi- Structured Big Table (Row Key) Data store Fire store Different storage product
  • 40. © ANKIT MISTRY – GOOGLE CLOUD Unstructured Cloud Storage Block Storage Persistence Disk Local SSD In-memory DB Memory Store Different storage product
  • 41. Google Cloud Storage © ANKIT MISTRY – GOOGLE CLOUD
  • 42. Google Cloud Storage © ANKIT MISTRY – GOOGLE CLOUD  Object storage solution in GCP  Unstructured Data storage  Image  Video  Binary File, etc…  Cloud storage can be used for long term archival storage  Can be access object over http, Rest API  No capacity planning required  Scale to Exabyte  Unlimited data can be stored  By Default Data is encrypted at rest  In transit also by default encryption.
  • 43. Google Cloud Storage © ANKIT MISTRY – GOOGLE CLOUD  No minimum Object Size  Max object size is 5 TB  High Durability – 99.999999999% annual  Object can be Globally access  Single API to access across multiple storage class  Data is geo - redundant  Due to Multiregional  Dual-Region storage
  • 44. Object Organization © ANKIT MISTRY – GOOGLE CLOUD Cloud Storage Buckets Folders Object  Global unique name for bucket  Example access URL :  https://storage.cloud.google.com/[bucket]/[objectname]  Bucket name appear in URL  So, be careful while naming bucket  Does not store anything like file system  Folder are virtual  Bucket level lock with data retention policy  Object are immutable  Object can be versioned
  • 45. Storage Location © ANKIT MISTRY – GOOGLE CLOUD Region • Lowest latency within a single region • Replicated data across multiple zone in single region Dual-region • High availability and low latency across 2 regions (Paired region) • Auto-failover Multi-region • Highest availability across continent area – US, EU, Asia • Auto-failover
  • 46. Storage Class © ANKIT MISTRY – GOOGLE CLOUD Standard • Good for Hot data • High frequency access • Storage Costliest • Access cost is very low • Low latency • SLA : • 99.95% Multi/Dual • 99.9% Regional Near line • Low Frequency access Once in a 30 days • Storage is Cheaper than standard • Access cost will increase • Back up • SLA : 99.9% Multi/Dual • 99.0% for Regional Cold line • Very low frequency to access • Once in 90 days • Storage is Cheaper than Near line • SLA : • 99.9% Multi/Dual • 99.0% for Regional Archive • Offline data • Backup • Data access is once in year • Storage Cheapest • Access cost very high • No SLA  How frequently access data  How much amount of data
  • 47. [Hands-on] Google Cloud Storage © ANKIT MISTRY – GOOGLE CLOUD
  • 48. Object Lifecycle management © ANKIT MISTRY – GOOGLE CLOUD  Based on condition what action needs to perform on object.  Condition  Object age  Object file type  after some specific date  Action  Transition to different storage class for high performance  Like – Standard to Nearline  Coldline to Delete
  • 49. Secure Data with Encryption © ANKIT MISTRY – GOOGLE CLOUD  Encryption  Google managed Encryption keys  No Configuration  Fully managed  Customer managed Encryption keys  Create keyring in Cloud KMS  key will be managed by customer. Like Key rotation  Customer supplied Encryption keys  We will generate Key with : openssl rand =base64 32  gsutil – encrypt with CSEK
  • 50. Object Versioning © ANKIT MISTRY – GOOGLE CLOUD  Help to prevent accidental deletion of object  Enable/Disable versioning at bucket level  Get access to older version with (object key + version number)  If you don’t need earlier version, delete it & reduce storage cost  If you don’t specify version number, always retrieve latest version  Let’s see in action
  • 51. Controlling access © ANKIT MISTRY – GOOGLE CLOUD  Who can do what on GCS at what level  Permissions  Apply at Bucket level  Uniform level access  No Object level permission  Apply uniform at all object inside bucket  Fine grained permission  Access Control List – ACL For Each object Separately  Apply Project level  IAM  Different Predefined Role  Storage Admin  Storage Object Admin  Storage Object Creator  Storage Object Viewer  Create Custom Role  Assign Bucket level Role  Select bucket & assign role  To user  To other GCP services or product
  • 52. Bucket Retention Policy © ANKIT MISTRY – GOOGLE CLOUD  Minimum duration for which bucket will be protected from  Deletion  modification  Let’s see How to Configure it.
  • 53. Signed URL © ANKIT MISTRY – GOOGLE CLOUD  Temporary access  you can give access to user who doesn’t have Google Account.  URL expired after time period defined.  Max period for which URL is valid is 7 days.  gsutil signurl -d 10m -u gs://<bucket>/<object>
  • 54. GCS - Pricing © ANKIT MISTRY – GOOGLE CLOUD  Storage Pricing  Data access Pricing  Go to Cloud Console & create Bucket, observe pricing
  • 55. Data Transfer Services © ANKIT MISTRY – GOOGLE CLOUD
  • 56. Data Transfer Service © ANKIT MISTRY – GOOGLE CLOUD  From On-premises to Google Cloud Storage (GCS)  From One bucket to another bucket inside same GCP  From Other public cloud Amazon S3, Azure Container to GCS On-premises
  • 57. On-premises to (GCS) © ANKIT MISTRY – GOOGLE CLOUD  gsutil – command line utility  Online mode of transfer  install locally Google Cloud SDK  gsutil -m cp large_number_of_small_files (-m for parallel upload)  Should we go for it or not?  Good Network  Follow chart in next slide  Transfer Service for on-premises data  This will quickly and securely move your data from private data centers into Google Cloud Storage  Two step process  installing an agent  create a transfer job
  • 58. Transfer Appliance © ANKIT MISTRY – GOOGLE CLOUD  Transfer Appliance  Physical device which securely transfer large amounts of data to Google Cloud Platform  When data that exceeds 20 TB or would take more than a week to upload.
  • 59. Online vs Offline transfer © ANKIT MISTRY – GOOGLE CLOUD
  • 60. Transfer Service|cloud data © ANKIT MISTRY – GOOGLE CLOUD  This will quickly and securely transfer data into Google Cloud Storage  From various sources  Amazon S3  Azure Blob Storage  Move data between Cloud Storage buckets  Create Transfer Job  Onetime run or recurring
  • 61. Google Block Storage © ANKIT MISTRY – GOOGLE CLOUD
  • 62. Google Block Storage © ANKIT MISTRY – GOOGLE CLOUD  Block storage – hard Disk storage  Direct attached Storage  Network attached Storage
  • 63. Direct attached – Local SSD © ANKIT MISTRY – GOOGLE CLOUD  Local SSD  Physically attached to VM  Very High Performance – 10x to 100x of Persistence Disk  Costlier than Persistence Disk  You can not re attach to other VM  Once VM destroy, Local SSD will be deleted  Lower Availability  Temporary/Ephemeral Storage  No Snapshot  Let’s see in action.
  • 64. Local SSD with Compute Engine © ANKIT MISTRY – GOOGLE CLOUD
  • 65. Network attached storage © ANKIT MISTRY – GOOGLE CLOUD  Network attached hard disk  Persistent Disks  Zonal, Regional  Not attached directly to any VM  Can be re-attached with other VM  Very Flexible – resize easily  Permanent storage  Snapshot supported  Cheaper than Local SSD
  • 66. Persistence Disk with Compute Engine © ANKIT MISTRY – GOOGLE CLOUD
  • 67. FileStore © ANKIT MISTRY – GOOGLE CLOUD
  • 68. Storage Which storage to use when © ANKIT MISTRY – GOOGLE CLOUD  Cloud Storage  Unstructured data storage  Video stream, Image  Staging environment  Compliance  Backup  Data lake  Persistent Disk  Attach Disk with VM & Containers  Share read-only disk with multiple VM  Database storage  Local Disk  Temporary high performance attach Disk  File Store  Performance predictable  Lift-shift millions of Files
  • 69. Structured data Solution in GCP © ANKIT MISTRY – GOOGLE CLOUD
  • 70. Structured data © ANKIT MISTRY – GOOGLE CLOUD Cloud SQL Cloud Spanner
  • 71. Few Concept © ANKIT MISTRY – GOOGLE CLOUD  Relational data  OLTP  OLAP  RTO vs RPO  Vertical vs Horizontal Scaling  Availability & Durability
  • 72. OLTP & OLAP © ANKIT MISTRY – GOOGLE CLOUD
  • 73. OLTP © ANKIT MISTRY – GOOGLE CLOUD  OLTP – Online Transaction Processing  Simple Query  Large number of small transaction  Traditional RDBMS  Database modification  Popular Database - MySQL, PostgreSQL, Oracle, MSSQL  ERP, CRM, Banking application  GCP - Cloud SQL, Cloud spanner
  • 74. OLAP © ANKIT MISTRY – GOOGLE CLOUD OLAP – Online Analytical Processing  Data warehousing  Data is collected from multiple sources Complex Query  Data analysis  Google Cloud Big Query – Petabyte Data warehouse  Reporting Application, Web click analysis, BI Dashboard app
  • 75. Vertical – Horizontal Scaling © ANKIT MISTRY – GOOGLE CLOUD Vertical Scaling 8 VCPU - 16 GB 12 VCPU - 24 GB Horizontal Scaling 8 VCPU - 16 GB
  • 76. RTO & RPO © ANKIT MISTRY – GOOGLE CLOUD  Data loss : 13 hours  System Downtime : 7 Hours 5th March 4:00 PM 5th March 3:00 AM 5th March 11:00 PM Last Backup Disaster happens System Recovered Loss data RTO RPO Downtime
  • 77. RTO & RPO © ANKIT MISTRY – GOOGLE CLOUD  RTO – Recovery Time objective  Maximum time for which system can be down  RPO - Recovery Point objective  Maximum time for which organization can tolerate Dataloss
  • 78. RTO & RPO © ANKIT MISTRY – GOOGLE CLOUD  RTO – Recovery Time objective  Maximum time for which system can be down  RPO - Recovery Point objective  Maximum time for which organization can tolerate Dataloss
  • 79. Durability © ANKIT MISTRY – GOOGLE CLOUD  If you loose data means  business is down  No business afford to loose data  How healthy & resilient your data is  Object Storage provider measure durability in terms of number of 9’s  Example : 99.9999999999999% - 11 9’s  That means that even with one billion objects, you would likely go a hundred years without losing a single one!  https://cloud.google.com/blog/products/storage-data-transfer/understanding- cloud-storage-11-9s-durability-target
  • 80. Availability © ANKIT MISTRY – GOOGLE CLOUD  If region goes down where your data stored  Replicate data across many region  How much amount of time data is up/available to access.  Data replicated across multiple regions, means higher Availability  SLA – service level agreement  SLA – 99.99% : four 9’s  https://uptime.is  https://cloud.google.com/terms/sla
  • 81. GCP Database products © ANKIT MISTRY – GOOGLE CLOUD Structure data • Cloud SQL • Cloud Spanner • BigQuery Semi- structure data • BigTable • Datastore • Memorystore
  • 82. Google Cloud SQL © ANKIT MISTRY – GOOGLE CLOUD
  • 83. Google Cloud SQL © ANKIT MISTRY – GOOGLE CLOUD  Fully managed Relational database services for MySQL, PostgreSQL & SQL Server  Lift & shift above database  Regional Database with 99.95% SLA  Storage up to 30 TB  Scale up to 96 core & 416 GB Memory  No Horizontal Scaling  Data is encrypted with Google managed key or CMEK  Cloud SQL can be accessed from anywhere like – App Engine, Compute Engine…  Used for storing Transactional database  Ecommerce, CRM kind application backend.
  • 84. Google Cloud SQL © ANKIT MISTRY – GOOGLE CLOUD  No maintenance & auto update  Back-up Database  On-demand Backup  Schedule backup  Database migration service (DMS)  migrate data from different SQL system to Cloud SQL  Point-in Time Recovery  Scale with Read replicas – To transfer workload to other instance  Export data  gcloud utility or Cloud Console  In SQL/CSV format
  • 85. [Hands-on] Create Google Cloud SQL © ANKIT MISTRY – GOOGLE CLOUD
  • 86. [Hands-on] Connect Cloud SQL & IP Whitelisting © ANKIT MISTRY – GOOGLE CLOUD
  • 87. [Hands-on] Data migration to Cloud SQL Demo © ANKIT MISTRY – GOOGLE CLOUD
  • 88. [Hands-on] Cloud SQL failover demo © ANKIT MISTRY – GOOGLE CLOUD
  • 89. © ANKIT MISTRY – GOOGLE CLOUD Google Cloud SQL Failover https://cloud.google.com/sql/docs/mysql/high-availability
  • 90. Cloud SQL Explore © ANKIT MISTRY – GOOGLE CLOUD
  • 91. Cloud SQL Export © ANKIT MISTRY – GOOGLE CLOUD
  • 92. Google Cloud Spanner © ANKIT MISTRY – GOOGLE CLOUD
  • 93. Google Cloud Spanner © ANKIT MISTRY – GOOGLE CLOUD  Distributed & scalable solution for RDBMS in GCP  Fully managed, Mission critical application  Horizontal Scalability  use when Data volume > 2 TB  Costlier than Cloud SQL  Cloud SQL has just Read replicas,  where as in cloud spanner horizontal read/write across region  Highly scalable, Petabyte scale  Data is strongly typed.  Must define schema database  Datatype for each column of each table must be defined.  99.999% availability  Cloud native solution – specific to GCP  Lift & Shift not possible, Not recommended  Spanner = Cloud SQL + Horizontal Scalable  Scale to petabyte  Regional/ Multi-region level instance can be created  Data export  can not export with gcloud  Cloud Console or Cloud Dataflow Job
  • 94. Spanner vs RDBMS-SQL © ANKIT MISTRY – GOOGLE CLOUD Spanner Cloud SQL Availability High During failover little downtime Scalable Horizontal vertical Price Costly Cheaper than spanner SQL/Schema Support yes yes Replication High Only Read Replica
  • 95. [Hands-on] Cloud Spanner © ANKIT MISTRY – GOOGLE CLOUD
  • 96. Cloud Spanner Demo © ANKIT MISTRY – GOOGLE CLOUD  Create Spanner Instance  Create database edu_db  Create 2 Table  Author  AuthorID  AuthorName  Book  BookId  Bookname  AuthorId
  • 97. After Job Done make sure to delete Spanner Instance © ANKIT MISTRY – GOOGLE CLOUD
  • 98. Which database to use when © ANKIT MISTRY – GOOGLE CLOUD  Cloud SQL  Lift & Shift SQL based system  CRM, Ecommerce App  Max Data size is 30 TB  Cloud Spanner  Horizontal scalability  Low latency  High scalability in terms of storage + compute  if Data Storage requirement is beyond TB
  • 99. Semi-Structured data Solution in GCP © ANKIT MISTRY – GOOGLE CLOUD
  • 100. NoSQL Introduction © ANKIT MISTRY – GOOGLE CLOUD  Not a SQL  Flexible Schema  Variable number of property  Data model will be like  Document  key-value pair  Graph based { “studentID” : 100, “name” : “john”, “score” : 78, “country” : “US” }, { “studentID” : 101, “name” : “Alice”, “rank” : 7, }, key value 100 std1 101 std2
  • 101. Semi-structured data © ANKIT MISTRY – GOOGLE CLOUD BigTable Datastore Firestore
  • 102. Google Cloud Datastore © ANKIT MISTRY – GOOGLE CLOUD
  • 103. Cloud Datastore © ANKIT MISTRY – GOOGLE CLOUD  Highly scalable NoSQL database  Serverless  Document kind data storage – MongoDB  App Engine + Datastore  SQL Like Queries – GQL  Support ACID Transaction  Multiple indexes  Data replication across different region  Use case  Session Info  Product catlog  Export data from gcloud utility only Datastore RDBMS Kind Table Entity Row Property Column Key Primary Key
  • 104. [Hands-on] Datastore © ANKIT MISTRY – GOOGLE CLOUD
  • 105. Google Cloud Firestore © ANKIT MISTRY – GOOGLE CLOUD
  • 106. Cloud Firestore © ANKIT MISTRY – GOOGLE CLOUD  Firestore is the next generation of Datastore  Highly scalable NoSQL database  Collection & Document Model  Two mode  native Mode  datastore mode  Real-time updates  Mobile and Web client libraries  Let’s see in Action Firestore RDBMS Collection group Table Document Row Field Column Document ID Primary Key
  • 107. [Hands-on] Firestore © ANKIT MISTRY – GOOGLE CLOUD
  • 108. Datastore & Firestore Pricing © ANKIT MISTRY – GOOGLE CLOUD
  • 109. Google Cloud Memorystore © ANKIT MISTRY – GOOGLE CLOUD
  • 110. Cloud Memorystore © ANKIT MISTRY – GOOGLE CLOUD  Fully managed Inmemory database  sub-millisecond data access  Two engine supported  Redis  Memcached  Only Internal IP  Highly available with 99.9% SLA  Import/Export data from Cloud Storage to memory store  Let’s create memory store Instance
  • 111. [Hands-on] Memorystore © ANKIT MISTRY – GOOGLE CLOUD
  • 112. Google Cloud BigTable © ANKIT MISTRY – GOOGLE CLOUD
  • 113. Cloud BigTable © ANKIT MISTRY – GOOGLE CLOUD  No Multi column index  Only Row key based indexing  Design Row Key is very important  Design Row key by keeping in your mind  which is your frequent query in application  No Hot spotting  Don’t use monotonically increasing key  Seamless integration with  Warehouse – BigQuery  Machine Learning Product  Used for  Financial data  Time series Data  Fully managed  Wide column NOSQL database  Not serverless  Scale horizontally with Multiple Node  Scale to huge Volume of data  Data stored at column wise  Column are grouped into column family  Milli second latency  Handles millions of request per second  How to access  cbt – command line (part of cloud sdk)  Hbase API
  • 114. Cloud BigTable © ANKIT MISTRY – GOOGLE CLOUD Row Key Personal_data_cf Professional_data_cf name age salary designation company 1 2 3 Professional_data_cf:salary
  • 115. [Hands-on] Google Cloud BigTable © ANKIT MISTRY – GOOGLE CLOUD
  • 116. BigTable Pricing © ANKIT MISTRY – GOOGLE CLOUD
  • 117. GCP Data Processing Solution © ANKIT MISTRY – GOOGLE CLOUD  BigQuery  Analytical Workload – Storage + Processing  DataFlow – Apache Beam  DataProc – Spark, Hadoop  Data Fusion  Cloud Composer  Data Prep  Cloud PubSub
  • 118. Google Cloud BigQuery © ANKIT MISTRY – GOOGLE CLOUD
  • 119. Cloud BigQuery © ANKIT MISTRY – GOOGLE CLOUD  Data warehouse solution in GCP  Like Relational database – SQL schema  Serverless  Built using BigTable + GCP Infrastructure  BigQuery is Columnar storage  This is for Analytical database  not for Transactional purpose  Exabyte scale  Query using  Standard SQL  legacy SQL  Big Query can query from external data source.  Cloud storage, SQL, Big Table  Biquery can load data from various sources.  CSV, JSON, Avro, SQL and many more  Query is very expensive  $5 approx. for 1 TB of data scanned  Before query execution do dry run.  Alternative to OpenSource Apache Hive  How to access BigQuery  Cloud Console  bq – command line tool  Client library - written in C#, Go, Java, Node.js, PHP, Python, and Ruby
  • 120. BigQuery Data organization © ANKIT MISTRY – GOOGLE CLOUD Projects Datasets Tables Jobs  Projects are top level container in GCP  Dataset hold multiple tables  Each table must belong to dataset  Assign Role at the organization, project, and dataset level  Tables – contain data  It has Schema  Types of Tables  Native tables, External tables, Views  Jobs  manage asynchronous tasks  Types of Job  Load, Query, Extract, Copy
  • 121. When BigQuery should be used © ANKIT MISTRY – GOOGLE CLOUD  When workload is analytical  When Data doesn’t change in database, as bigquery use built –in cache  For complex query  When query takes more execution time.  off-load some workload from primary transaction DB  When you large volume of data  No Join is preferred.  When you data is denormalized
  • 122. [Hands-on] Cloud BigQuery Explore + Public dataset © ANKIT MISTRY – GOOGLE CLOUD
  • 123. [Hands-on] Cloud BigQuery + Local data © ANKIT MISTRY – GOOGLE CLOUD
  • 124. Cloud BigQuery Pricing © ANKIT MISTRY – GOOGLE CLOUD  On-demand  Pay for what you use  Flat rat Pricing  Allocate compute & storage capacity
  • 125. Google Cloud PubSub © ANKIT MISTRY – GOOGLE CLOUD
  • 126. PubSub © ANKIT MISTRY – GOOGLE CLOUD Application DB Application DB Notification Service Notification Service
  • 127. How PubSub works © ANKIT MISTRY – GOOGLE CLOUD  Fully-managed asynchronous messaging service  Scale to billions of message per day  Publisher – App send message to Topic  Push & Pull way to access messages  Pull – Subscriber pull message  Push – Message will be sent to subscriber via webhook  One topic – Multiple Subscriber  One subscriber – Multiple Topic https://cloud.google.com/pubsub/docs/overview
  • 128. Cloud PubSub © ANKIT MISTRY – GOOGLE CLOUD  Fully-managed Pubsub system inside Google Cloud  Serverless  Auto-scaling and auto-provisioning with support from zero to hundreds of GB/second  Topic – Storage reference  Publisher send message to topic at pubsub.googleapis.com  Push – Pull way to access message  Once subscriber receive message ack is sent.  Cloud Pubsub act as staging environment for many GCP services
  • 129. Advantage PubSub © ANKIT MISTRY – GOOGLE CLOUD  Durability of data will increase  Highly Scalable, Scalable  Decoupling between both system (Publisher & Subscriber)  Application don’t synchronously communicate with Notification service  Application (Publisher) is not dependent on Notification service (Subscriber)
  • 130. [Hands-on] Cloud PubSub © ANKIT MISTRY – GOOGLE CLOUD
  • 131. Cloud DataFlow © ANKIT MISTRY – GOOGLE CLOUD  Managed service for variety of data processing  An advanced unified programming model to implement batch and streaming data processing jobs that run on various execution engine/ runner  Cloud version of Apache Beam = (Batch + Stream)  Serverless, Fully managed  Horizontal autoscaling of worker  Jobs created with  Pre-define template  Notebook instance  Write Data Pipeline job in Java, Python, SQL  From Cloud Shell/Local Machine
  • 132. How DataFlow Works © ANKIT MISTRY – GOOGLE CLOUD  Write Job in Java, Python Go  Unified API for both batch + stream Processing  No Need to separately handle Batch & streaming data  Execution  Direct Runner  Scaling issue  Apache Flink  Apache Spark  Cloud DataFlow
  • 133. Apache Beam © ANKIT MISTRY – GOOGLE CLOUD  Pipeline  A pipeline is a graph of transformations that a user constructs that defines the data processing they want to do.  IO-Transform  https://beam.apache.org/documentation/io/built-in/  Pcollection  Fundamental data type in Beam  Ptransform  The operations executed within a pipeline  https://beam.apache.org/documentation/programming-guide/#transforms  Runner - Execution engine
  • 134. [Hands-on] Cloud DataFlow PRE-DEFINE TEMPLATE © ANKIT MISTRY – GOOGLE CLOUD
  • 135. [Hands-on] Cloud DataFlow NOTEBOOK INSTANCE © ANKIT MISTRY – GOOGLE CLOUD
  • 136. [Hands-on] Cloud DataFlow EXECUTE JOB FROM SHELL WITH DATAFLOW © ANKIT MISTRY – GOOGLE CLOUD
  • 137. Cloud DataProc © ANKIT MISTRY – GOOGLE CLOUD  Managed Hadoop & Spark Services inside GCP  Lift/Shift Existing Hadoop/Spark based Job  Cluster type  Standard (1 master, N workers)  Single Node (1 master, 0 workers)  High Availability (3 masters, N workers)  Worker node regular VM or Preemptible VM (Cost reduction)  Job Supported :  Hadoop, SparkR, Spark, SparkSQL, Hive, Pig, PySpark  Demo  Spark, PySpark, Notebook Instance
  • 138. [Hands-on] Create Cloud DataProc Cluster © ANKIT MISTRY – GOOGLE CLOUD
  • 139. [Hands-on] Cloud DataProc SPARK JOB © ANKIT MISTRY – GOOGLE CLOUD
  • 140. [Hands-on] Cloud DataProc SUBMIT PYSPARK JOB © ANKIT MISTRY – GOOGLE CLOUD
  • 141. [Hands-on] Cloud DataProc NOTEBOOK INSTANCE © ANKIT MISTRY – GOOGLE CLOUD
  • 142. Cloud Fusion © ANKIT MISTRY – GOOGLE CLOUD
  • 143. Cloud Data Fusion © ANKIT MISTRY – GOOGLE CLOUD  Fully-managed, cloud native solution to quickly building data pipelines  Code free, Drag-n-drop tool  150+ preconfigured connectors & transformations  Built with Open-source CDAP  3 Edition are available  Developer  Basic  Enterprise  Pricing :  https://cloud.google.com/data-fusion/pricing#cloud-data-fusion-pricing  Let’s see in Action – create Cloud Fusion Instance
  • 144. Cloud Data Fusion Demo © ANKIT MISTRY – GOOGLE CLOUD
  • 145. Cloud Composer © ANKIT MISTRY – GOOGLE CLOUD
  • 146. Cloud Composer © ANKIT MISTRY – GOOGLE CLOUD  Fully Managed Apache Airflow which in GCP  Airflow is a workflow & orchestration engine  With Airflow, one can programmatically schedule and monitor workflows  Workflows are defined as directed acyclic graphs (DAGs)  DAGs are written in Python 3.x  Built-in integration for Other GCP services  Google BigQuery,  Cloud Dataflow & Dataproc,  Cloud Datastore  Cloud Storage,  Cloud Pub/Sub, and Cloud ML Engine
  • 147. [Hands-on] Create Cloud Composer Instance © ANKIT MISTRY – GOOGLE CLOUD
  • 148. Write first DAG in Cloud Composer © ANKIT MISTRY – GOOGLE CLOUD
  • 149. Data Loss Prevention API BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 150. Data Loss Prevention API © ANKIT MISTRY – GOOGLE CLOUD  Fully managed service designed to help you discover, classify, and protect your most sensitive data.  PII data  Person’s name, Credit Card Number, SSN  Apply API on Cloud Storage, Big Query Data  DLP work upon Free form Text, Structured & Unstructured data (image)  What to do with this Data  Identify sensitive data  De-identify data  Masking and Encryption  re-identify (In case want to recover original data)
  • 151. De-Identification of Data © ANKIT MISTRY – GOOGLE CLOUD  Redacation – remove sensitive data  Replacement – replace with some tokens (Like Info_type)  Masking – Replace one/more character with some other char  Encryption – Encrypt Sensitive Data
  • 152. TEMPLETES, INFOTYPES & MATCH LIKELIHOOD BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 153. TEMPLATES © ANKIT MISTRY – GOOGLE CLOUD  Configuration which define for  Inspection of Jobs  De-identification of Jobs  Once Template defined , can be reused for other Jobs TEMPLATES TYPES Identification : Find Sensitive Data De–Identification : Remove Sensitive Data
  • 154. INFOTYPES © ANKIT MISTRY – GOOGLE CLOUD  What to Scan For  Like Credit Card  SSN  Age Types of INFOTYPES Built-IN US_SOCIAL_SECURITY_NUMBER, EMAIL_ADDRESS Such types of 120 Built-in Infotype Defined STORED Custom Infotype Based on Fixed words, Regular Expression, Custom Dictionary
  • 155. MATCH LIKELIHOOD © ANKIT MISTRY – GOOGLE CLOUD LIKELIHOOD_UNSPECIFIED Default value; same as POSSIBLE. VERY_UNLIKELY It is very unlikely that the data matches the given InfoType. UNLIKELY It is unlikely that the data matches the given InfoType. POSSIBLE It is possible that the data matches the given InfoType. LIKELY It is likely that the data matches the given InfoType. VERY_LIKELY It is very likely that the data matches the given InfoType.
  • 156. DLP API Demo https://cloud.google.com/dlp/demo/#!/ BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 157. Create INFO_TYPE (Hands-on) BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 158. Create TEMPLETES (Hands-on) BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 159. Create Job for Inspection (Hands-on) BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 160. Create Template for De- identification BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 161. Apply Some more rules to template BY ANKIT MISTRY © ANKIT MISTRY – GOOGLE CLOUD
  • 162. Data Catalog © ANKIT MISTRY – GOOGLE CLOUD
  • 163. Data Catalog © ANKIT MISTRY – GOOGLE CLOUD  Most organizations today are dealing with a large and growing number of data assets.  Data stakeholders (consumers, producers, and administrators) face a number of challenges:  Searching for insightful data  Understanding data  Making data useful  Data Catalog  A fully managed and highly scalable data discovery and metadata management service.  Single place to discover all data, asset across all project  Using Data catalog  Search data assets  tag data
  • 164. How Data Catalog works © ANKIT MISTRY – GOOGLE CLOUD
  • 165. Metadata © ANKIT MISTRY – GOOGLE CLOUD  Technical Metadata  For BigQuery, Pubsub these metadata resides inside individual products  Technical meta data being registered by catalog automatically  Business Metadata  Attach Tag to existing data asset  Define some Tag template and attach metadata
  • 166. [Hands-on] Data Catalog © ANKIT MISTRY – GOOGLE CLOUD
  • 167. ML/AI Module Introduction © ANKIT MISTRY – GOOGLE CLOUD
  • 168. Machine Learning © ANKIT MISTRY – GOOGLE CLOUD
  • 169. Machine Learning - GCP © ANKIT MISTRY – GOOGLE CLOUD  Concept behind Machine Learning  Types of ML System  Pre trained Model  Custom Model  TPU – tensor processing unit
  • 170. Machine Learning © ANKIT MISTRY – GOOGLE CLOUD  Design Spam email classification system  How to design?  What rules you will code inside system  If message coming from some specified list of senders, spam it  If message contain word like lottery, promotion, spam it  But how many such rule you will define inside system.  It is very difficult & cumbersome task to design such way.  If spammer start sending spam which is not part of rule book.  So, need some intelligent approach,  Machine Learning is the solution behind it.
  • 171. Machine Learning © ANKIT MISTRY – GOOGLE CLOUD  Rather than define such rule,  In machine learning, system learn from data  Training + Testing kind of system Input – message with labels ML Algorithm Model Test input + Model Model Predicted Output Training Process Testing Process
  • 172. Types of ML System © ANKIT MISTRY – GOOGLE CLOUD  ML Types  Supervised learning  Label has been given  Regression  Classification  Unsupervised learning  No labels  Find Structure within data
  • 173. Regression © ANKIT MISTRY – GOOGLE CLOUD  Output prediction is continuous in nature  Example  House Price prediction  Regression ML Algorithm :  Linear Regression  SVR  Decision Tree Regressor Area No of Bedroom Price 5434 5 3536 2342 5 3564 243 1 4564 987 4 7675
  • 174. Classification © ANKIT MISTRY – GOOGLE CLOUD  Output prediction is discrete in nature  Example  Sentiment analysis of review : +ve/-ve  This product is very much helpful. +ve  Is it Orange?  Yes/No  Classification Algorithm :  Logistic Regression  SVM  KNN  Decision Tree Classification
  • 175. Unsupervised Learning © ANKIT MISTRY – GOOGLE CLOUD  No label Given  Find Structure within data  Clustering is type of Unsupervised Learning  Some clustering Algorithm :  K-Means  hierarchical
  • 176. Machine Learning workflow © ANKIT MISTRY – GOOGLE CLOUD Ingest Data Data Preparation Split data (Training + Testing) Apply ML algorithm to Training Data Model Testing Data Model Prediction Training Testing
  • 177. Machine Learning + GCP © ANKIT MISTRY – GOOGLE CLOUD GCP ML Solution Pre-Trained Model For generic use cases Vision API, Speech API, Language API Auto ML Throw your data & rest GCP will take care Custom Training Tensorflow, PyTorch, Scikit-learn Spark ML, BigQuery ML
  • 178. Data Preparation with DataPrep © ANKIT MISTRY – GOOGLE CLOUD
  • 179. DataPrep © ANKIT MISTRY – GOOGLE CLOUD  Intelligent Data Preparation tool  Visually explore, clean, and prepare data for analysis and machine learning  Build by Trifacta – Third Party tool, not cloud native one  Play with this tool without any code, with just click  Dataprep is serverless and works at any scale  No infrastructure to deploy or manage  Automatically detect schema, anomalies  Do all time consuming task easily  Concern – Need to share data with Trifacta
  • 180. DataPrep ETL Pipeline © ANKIT MISTRY – GOOGLE CLOUD
  • 181. Let’s see DataPrep in Action © ANKIT MISTRY – GOOGLE CLOUD
  • 182. Pre-trained Model © ANKIT MISTRY – GOOGLE CLOUD
  • 183. Pre-Trained Model © ANKIT MISTRY – GOOGLE CLOUD Vision API Natural Language API Speech to Text API Text to Speech API  Google has huge amount of data  Google has already trained ML/AI algorithm to build model  For generic use case like  Object recognition/detection – Vision API  OCR  Speech to Text  Language Translation  NLP API – to get insight from natural language  You can take advantage of pre-built model.  No Training required from customer  Use already built Rest API for above use cases
  • 184. Vision API © ANKIT MISTRY – GOOGLE CLOUD  Derive insights from your images  Detect printed and handwritten text  Detect objects  Identify popular places and product logos  Moderate content  Celebrity recognition  How to use  Web UI – Just for Testing  https://cloud.google.com/vision#section-2  gcloud – CLI  Python SDK
  • 185. Natural Language API © ANKIT MISTRY – GOOGLE CLOUD  Derive insights from unstructured text using Google machine learning  Identify entities within documents  Sentiment analysis  Content classification  How to use  gcloud – CLI  Python SDK
  • 186. Speech Text API © ANKIT MISTRY – GOOGLE CLOUD  Speech to Text API  Accurately convert speech into text using an API powered by Google’s AI technologies.  125 languages support  Streaming speech recognition  Content filtering  Automatic punctuation  https://cloud.google.com/speech-to-text#section-2  Text to Speech API  Convert text into natural-sounding speech using an API powered by Google’s AI technologies.  https://cloud.google.com/text-to-speech#section-2
  • 187. ML API Pricing © ANKIT MISTRY – GOOGLE CLOUD
  • 188. Auto Machine Learning © ANKIT MISTRY – GOOGLE CLOUD
  • 189. Auto Machine Learning © ANKIT MISTRY – GOOGLE CLOUD  AutoML – Auto Machine Learning  Your use case is not generic  You have some custom requirement  Vision API – recognize shoes  Auto ML – Different types of shoes detection  Adidas, Nike shoes  Throw your data & GCP will create best model for you  State-of art Transfer learning technology  Throw your data & Google AI will create model  2 use cases Demo :  Flower species recognition  Text Classification
  • 190. Text Classification © ANKIT MISTRY – GOOGLE CLOUD  Dataset creation Data Class My eldest son who is 27 just got word he has a new job after finishing his bachelors degree. This made me very happy! achievement I visited my best friend at her school on St. Patrick's day. bonding My mom cooked some delicious rice for me with curd. affection Today I make Eye contact with my crush. She Also look into my Eyes For a Seconds or Two. I can still Memorize his Beautiful Eyes. affection  Train Model  Analyze Model  Deploy for Prediction achievement Affection bonding enjoy_the_moment exercise leisure nature
  • 191. Flower Classification © ANKIT MISTRY – GOOGLE CLOUD  Dataset creation  Train Model  Define Node hours  Analyze Model  Deploy for Prediction daisy dandelion roses sunflowers tulips
  • 192. TPU © ANKIT MISTRY – GOOGLE CLOUD  TPU – Tensor Processing Unit  Machine Learning Training is one of the most time consuming process  It may take hours to days to sometime week  Training time depend upon Ml Algorithm + Amount of dataset  Google introduce Tensorflow framework to do Machine Learning which powers their own ML Product  Tensor are basic building block of this framework.  So, To do training faster Google created ASIC based in-house dedicated computing for Tensor Processing  Speed up training by 20x to 30x  Work with VM, GKE, AI Platform  Quickly experiment with number of ML Models creation
  • 193. Train your own model © ANKIT MISTRY – GOOGLE CLOUD
  • 194. Custom Model © ANKIT MISTRY – GOOGLE CLOUD  You have your own dataset  You want to train your own model  You have team of data scientist  They want to write own algorithm based on  Scikit-learn, XGBoost  Tensorflow  PyTorch Framework  Notebook instance  Build Logistic Regression model for flower species recognition  Save model in pickle file  Deploy model endpoint
  • 195. BigQuery ML © ANKIT MISTRY – GOOGLE CLOUD  Create ML Model in SQL  No Python, No Java.  No need to export data to other environment  Model support  Linear Regression  Multiclass Logistic Regression  K-Means  XGBoost  Tensorflow – Import  use case demo  Flower species recognition  From BigQuery public dataset
  • 196. Main BigQuery Function © ANKIT MISTRY – GOOGLE CLOUD  Create MODEL  Model Type – Linear Reg, Logistic  Label Column  Learning Rate etc…  Evaluate Model  ML.Evaluate  Provide Model & Test Data  Determine how good model performance on Test data  Prediction  ML.Prediction  Apply Live data to Model to get prediction
  • 197. [Hands-on] BigQuery ML © ANKIT MISTRY – GOOGLE CLOUD
  • 198. Cloud Data Studio © ANKIT MISTRY – GOOGLE CLOUD
  • 199. Data Studio © ANKIT MISTRY – GOOGLE CLOUD  BI tool from Google  Connect your data from spreadsheets, Analytics, Google Ads, Google BigQuery and many more connectors  Drag & drop, no code  Create reports & Dashboards  Free  Let’s see in Action
  • 200. THANK YOU © ANKIT MISTRY – GOOGLE CLOUD