Google Cloud Platform
Data Engineer
Module 3
Copyright © www.ine.com
Module 3
Data Services
Copyright © www.ine.com
Module Overview
Data Services (Dataproc, Dataflow & Bigquery)
Cloud Pub/Sub
Cloud Endpoints
Cloud Functions
Copyright © www.ine.com
Data Services
Data Services
Data Services
Use Google Cloud Dataproc, an Apache Hadoop, Apache Spark, Apache
Pig, and Apache Hive service, to easily process big datasets at low cost.Use
Control your costs by quickly creating managed clusters of any size and
turning them off when you're done.Control
Cloud Dataproc integrates across Google Cloud Platform products, giving
you a powerful and complete data processing platform.Integrate
Data Services
Data
Services Dataflow is a unified programming model
and a managed service for developing
and executing a wide range of data
processing patterns including ETL, batch
computation, and continuous computation.
Cloud Dataflow frees you from operational
tasks like resource management and
performance optimization.
Data Services
These examples give you a sense of
the processing capabilities of Dataflow.
In the simple model pipeline, data is
input from source into a PCollection,
transformed, and output. The pipeline
is a Directed Acyclic Graph (DAG).
In the multiple transform pipelines,
data read from BigQuery is filtered into
two collections based on the initial
character of the name.
Data Services
Data Services
BigQuery is Google's fully managed, petabyte scale, low cost
enterprise data warehouse for analytics. BigQuery
is serverless.
There is no infrastructure to manage and you don't need a
database administrator, so you can focus on analyzing data to
find meaningful insights using familiar SQL.
BigQuery is a powerful Big Data analytics platform used by all
types of organizations, from startups to Fortune 500 companies.
Data Services
Cloud Pub/Sub
Copyright © www.ine.com
Cloud PUB/SUB
Cloud Pub/Sub is a fully-managed real-time messaging
service that allows you to send and receive messages
between independent applications
Decouples the sender and receiver
Push/Pull Data Service
Asynchronous communications
Many benefits over direct communication
https://cloud.google.com/pubsub/architecture
Benefits of Cloud Pub/Sub
Scales globally
Low latency
Dynamic rate limiting
Availability
Durability - replicated storage of messages
Reliability
End-to-end reliability via application ACKs
Security
Encryption in motion and at rest
Maintenance
PUB/SUB
CLOUD PUB/SUB
Lets Walk thru the workflow
CLOUD PUB/SUB
Cloud Pub/Sub is flexible and all encompassing
Balancing workloads in network clusters
Implementing asynchronous workflows
Distributing event notifications
Refreshing distributed caches
Logging to multiple systems
Data streaming from various processes or devices
Reliability improvement
CLOUD PUB/SUB USE CASES
Cloud Endpoints
Copyright © www.ine.com
Exposes an API for front-end client for mobile or web-
application to make use of cloud-based application services
Frees developers from writing wrapper to access App
Engine resources from a mobile or web client
Cloud ENPOINTS
Cloud ENPOINTS
Can you say SERVERLESS!!
Cloud Functions
Copyright © www.ine.com
CLOUD FUNCTIONs
Event-based microservices
● Fully managed, serverless,
secure
● Triggers
○ Cloud Pub/Sub, HTTP,
Cloud Storage
● Code
Deploy functions from a Cloud
Storage bucket, Github or
Bitbucket repo
Written in Javascript and runs
in Node.js
● Stackdriver integration
Cloud FUNCTIONS
Cloud FUNCTION
Compare Cloud Functions with Cloud Endpoints. Cloud Endpoints exposes an array
of endpoint or API functions, whereas Cloud Functions exposes a single endpoint.
The Cloud Endpoints backend is an App Engine backend, so you have a long-running
programming environment with full access to complex data and storage services.
In Cloud Functions, you have one single piece of code that accepts a limited input,
executes rapidly, produces some output, and exits.
Google Cloud Platform https://cloud.google.com/
Console
https://console.cloud.google.com/
Documentation https://cloud.google.com/docs/
Pricing
https://cloud.google.com/pricing/
Free Tier https://cloud.google.com/free/
Resources
Copyright © www.ine.com

Intro to Google Cloud Platform Data Engineering.

  • 1.
    Google Cloud Platform DataEngineer Module 3 Copyright © www.ine.com
  • 2.
  • 3.
    Module Overview Data Services(Dataproc, Dataflow & Bigquery) Cloud Pub/Sub Cloud Endpoints Cloud Functions Copyright © www.ine.com
  • 4.
  • 5.
  • 6.
    Data Services Use GoogleCloud Dataproc, an Apache Hadoop, Apache Spark, Apache Pig, and Apache Hive service, to easily process big datasets at low cost.Use Control your costs by quickly creating managed clusters of any size and turning them off when you're done.Control Cloud Dataproc integrates across Google Cloud Platform products, giving you a powerful and complete data processing platform.Integrate
  • 7.
  • 8.
    Data Services Dataflow isa unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
  • 9.
    Data Services These examplesgive you a sense of the processing capabilities of Dataflow. In the simple model pipeline, data is input from source into a PCollection, transformed, and output. The pipeline is a Directed Acyclic Graph (DAG). In the multiple transform pipelines, data read from BigQuery is filtered into two collections based on the initial character of the name.
  • 10.
  • 11.
    Data Services BigQuery isGoogle's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. There is no infrastructure to manage and you don't need a database administrator, so you can focus on analyzing data to find meaningful insights using familiar SQL. BigQuery is a powerful Big Data analytics platform used by all types of organizations, from startups to Fortune 500 companies.
  • 12.
  • 13.
  • 14.
    Cloud PUB/SUB Cloud Pub/Subis a fully-managed real-time messaging service that allows you to send and receive messages between independent applications Decouples the sender and receiver Push/Pull Data Service Asynchronous communications Many benefits over direct communication https://cloud.google.com/pubsub/architecture
  • 15.
    Benefits of CloudPub/Sub Scales globally Low latency Dynamic rate limiting Availability Durability - replicated storage of messages Reliability End-to-end reliability via application ACKs Security Encryption in motion and at rest Maintenance PUB/SUB
  • 16.
    CLOUD PUB/SUB Lets Walkthru the workflow
  • 17.
    CLOUD PUB/SUB Cloud Pub/Subis flexible and all encompassing
  • 18.
    Balancing workloads innetwork clusters Implementing asynchronous workflows Distributing event notifications Refreshing distributed caches Logging to multiple systems Data streaming from various processes or devices Reliability improvement CLOUD PUB/SUB USE CASES
  • 19.
  • 20.
    Exposes an APIfor front-end client for mobile or web- application to make use of cloud-based application services Frees developers from writing wrapper to access App Engine resources from a mobile or web client Cloud ENPOINTS
  • 21.
    Cloud ENPOINTS Can yousay SERVERLESS!!
  • 22.
  • 23.
  • 24.
    Event-based microservices ● Fullymanaged, serverless, secure ● Triggers ○ Cloud Pub/Sub, HTTP, Cloud Storage ● Code Deploy functions from a Cloud Storage bucket, Github or Bitbucket repo Written in Javascript and runs in Node.js ● Stackdriver integration Cloud FUNCTIONS
  • 25.
    Cloud FUNCTION Compare CloudFunctions with Cloud Endpoints. Cloud Endpoints exposes an array of endpoint or API functions, whereas Cloud Functions exposes a single endpoint. The Cloud Endpoints backend is an App Engine backend, so you have a long-running programming environment with full access to complex data and storage services. In Cloud Functions, you have one single piece of code that accepts a limited input, executes rapidly, produces some output, and exits.
  • 26.
    Google Cloud Platformhttps://cloud.google.com/ Console https://console.cloud.google.com/ Documentation https://cloud.google.com/docs/ Pricing https://cloud.google.com/pricing/ Free Tier https://cloud.google.com/free/ Resources Copyright © www.ine.com