INTRODUCTION TO
GOOGLE CLOUD PLATFORM
FOR BIG DATA
”
FIRST THINGS FIRST...
● Who Am I?
● What I'm Going to Talk About?
2
3
● Brazilian Data Analyst
● Databases Management Student
● Google fan
● Mom of 1 / Pet Mom of 8
● Plant Based Geek
● Crazy about Nature
4
WHAT I'M GOING TO TALK ABOUT?
■ Big Data Beyond the Hype
[ What Is | The 5 Vs ]
■ What is the Google Cloud Platform?
[ What Is | The Ecosystem ]
■ GCP Products for Big Data
[ Example of Big Data Lifecycle | Ingesting | Storing | Processing | Analysing ]
■ GCP Big Data Solutions to IMWT's Portfolio
[ Challenges | Example | Steps to Success ]
5
Big Data
Beyond the Hype
6
High-volume, high-velocity
and high-variety information
assets that demand cost-
effective, innovative forms of
information processing for
enhanced insight and
decision making.
WHAT IS BIG DATA?
Source: Gartner IT Glossary
7
BIG
DATA
Source: Adapted from Michael Walker (2012)
THE 5 Vs
Terabytes to Exabytes
of existing data
to process
Milliseconds to Seconds
to process
VOLUME
Data at Rest
VALUE
Data Into Money
VERACITY
Data In Doubt
VARIETY
Data In Many Forms
VELOCITY
Data In Motion
Structured, unstructured,
text, multimedia...Uncertainty due to data
inconsistency, incompleteness,
Ambiguities, model approximations...
Business models can be
associated to the data
8
What Is
Google Cloud Platform?
9
A suite of cloud
computing services that
runs on the same
infrastructure that
Google uses internally
for its end-user products.
WHAT IS GOOGLE CLOUD PLATFORM?
Source: GCP Website (2018)
10
GCP
ECOSYSTEM
Source: Google Cloud Platform (2018)
11
GCP
ECOSYSTEM
12
GCP Products to
Big Data
13
EXAMPLE OF BIG DATA LIFECYCLE
Source: GCP Website(2018)
14
INGESTION
Source: GCP Website(2018)
Serverless, fully managed, scalable and pay-
for-use platform for apps and beckends.
Save money while focus on code
rather than infrastructure
Integrated, open and global real-time event
stream ingestion, delivery and analysis
platform.
Fast reporting, targeting and
optimization in advertising and media
15
PROCESSING
Source: GCP Website(2018)
Simple, automated
and reliable stream
and batch data
processing platform.
Fast, easy-to-use and
fully managed cloud
service for running
Apache Spark and
Hadoop cluster.
Minimize latency and
maximize utilization.
Low costs. Focus on the
data, not on the cluster.
16
STORAGE
Source: GCP Website(2018)
In memory, relational,
non-relational, object
and warehouse cloud
storage solutions.
Secure, cost-effective and easily
access storage for every need.
17
EXPLORATION
Source: GCP Website(2018)
Easy-to-use and interactive
tool for data exploration,
analysis, visualization and
machine learning.
Fast, scalable, cost-effective
and fully managed cloud
data warehouse for
analytics.
Set of integrated data-and-
marketing analysis products.
Free. May incur compute, storage
and other cloud services.
Serverless and built-in Machine
Learning.
18
ANALYTICS
Source: GCP Website(2018)
Fast, large scale and easy-to-
use
AI products and services.
Easy-to-use deep learning
models to speech-to-text /
image-to-JSON conversion
and dynamic translation.
Pre trained models.
No advanced ML
skill required.
Better training performance
compared to other
deep learning systems.
19
GCP Big Data Solutions to
IMWT's Portfolio
20
Source: Adapted from Nasser T, Tariq RS (2015) Big Data Challenges. J Comput Eng Inf Technol 4:3
CHALLENGES
STORAGE
21
EXAMPLE
INGESTION PROCESSING EXPLORATION ANALYSIS
Web Crawler Solution
Simplified Architecture
APP ENGINE DATAFLOW
DATAPROC
SQL DATAPREP
DATALAB MACHINE LEARNING
DATA STUDIO
22
Source: Adapted from IBM (2014)
STEPS TO SUCCESS
Identify high-value opportunities
Establish the right architecture and funding model
Prove value to business through pilot programs
Scale by expanding to additional use cases
Transform to a data-driven culture
”
Thank You!

Introduction to Google Cloud Platform for Big Data - Trusted Conf

  • 1.
    INTRODUCTION TO GOOGLE CLOUDPLATFORM FOR BIG DATA
  • 2.
    ” FIRST THINGS FIRST... ●Who Am I? ● What I'm Going to Talk About? 2
  • 3.
    3 ● Brazilian DataAnalyst ● Databases Management Student ● Google fan ● Mom of 1 / Pet Mom of 8 ● Plant Based Geek ● Crazy about Nature
  • 4.
    4 WHAT I'M GOINGTO TALK ABOUT? ■ Big Data Beyond the Hype [ What Is | The 5 Vs ] ■ What is the Google Cloud Platform? [ What Is | The Ecosystem ] ■ GCP Products for Big Data [ Example of Big Data Lifecycle | Ingesting | Storing | Processing | Analysing ] ■ GCP Big Data Solutions to IMWT's Portfolio [ Challenges | Example | Steps to Success ]
  • 5.
  • 6.
    6 High-volume, high-velocity and high-varietyinformation assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making. WHAT IS BIG DATA? Source: Gartner IT Glossary
  • 7.
    7 BIG DATA Source: Adapted fromMichael Walker (2012) THE 5 Vs Terabytes to Exabytes of existing data to process Milliseconds to Seconds to process VOLUME Data at Rest VALUE Data Into Money VERACITY Data In Doubt VARIETY Data In Many Forms VELOCITY Data In Motion Structured, unstructured, text, multimedia...Uncertainty due to data inconsistency, incompleteness, Ambiguities, model approximations... Business models can be associated to the data
  • 8.
  • 9.
    9 A suite ofcloud computing services that runs on the same infrastructure that Google uses internally for its end-user products. WHAT IS GOOGLE CLOUD PLATFORM? Source: GCP Website (2018)
  • 10.
  • 11.
  • 12.
  • 13.
    13 EXAMPLE OF BIGDATA LIFECYCLE Source: GCP Website(2018)
  • 14.
    14 INGESTION Source: GCP Website(2018) Serverless,fully managed, scalable and pay- for-use platform for apps and beckends. Save money while focus on code rather than infrastructure Integrated, open and global real-time event stream ingestion, delivery and analysis platform. Fast reporting, targeting and optimization in advertising and media
  • 15.
    15 PROCESSING Source: GCP Website(2018) Simple,automated and reliable stream and batch data processing platform. Fast, easy-to-use and fully managed cloud service for running Apache Spark and Hadoop cluster. Minimize latency and maximize utilization. Low costs. Focus on the data, not on the cluster.
  • 16.
    16 STORAGE Source: GCP Website(2018) Inmemory, relational, non-relational, object and warehouse cloud storage solutions. Secure, cost-effective and easily access storage for every need.
  • 17.
    17 EXPLORATION Source: GCP Website(2018) Easy-to-useand interactive tool for data exploration, analysis, visualization and machine learning. Fast, scalable, cost-effective and fully managed cloud data warehouse for analytics. Set of integrated data-and- marketing analysis products. Free. May incur compute, storage and other cloud services. Serverless and built-in Machine Learning.
  • 18.
    18 ANALYTICS Source: GCP Website(2018) Fast,large scale and easy-to- use AI products and services. Easy-to-use deep learning models to speech-to-text / image-to-JSON conversion and dynamic translation. Pre trained models. No advanced ML skill required. Better training performance compared to other deep learning systems.
  • 19.
    19 GCP Big DataSolutions to IMWT's Portfolio
  • 20.
    20 Source: Adapted fromNasser T, Tariq RS (2015) Big Data Challenges. J Comput Eng Inf Technol 4:3 CHALLENGES
  • 21.
    STORAGE 21 EXAMPLE INGESTION PROCESSING EXPLORATIONANALYSIS Web Crawler Solution Simplified Architecture APP ENGINE DATAFLOW DATAPROC SQL DATAPREP DATALAB MACHINE LEARNING DATA STUDIO
  • 22.
    22 Source: Adapted fromIBM (2014) STEPS TO SUCCESS Identify high-value opportunities Establish the right architecture and funding model Prove value to business through pilot programs Scale by expanding to additional use cases Transform to a data-driven culture
  • 23.