The document provides an overview of Google Cloud Platform (GCP) and its compute and storage capabilities. It discusses how GCP offers on-demand CPUs, custom machine types, and automatic discounts for simplicity and agility. It also demonstrates creating a Compute Engine instance and SSHing into it. Additionally, it explains how Cloud Storage can be used for persistent storage and staging data for other GCP products and services.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
This document discusses Google Cloud Platform and its data and analytics capabilities. It begins by explaining the evolution of cloud computing models from virtualized data centers to true on-demand cloud services. It then highlights some of Google Cloud Platform's key differentiators like true cloud economics, future-proof infrastructure, access to innovation, and Google-grade security. The document provides overviews of Google Cloud Platform's storage, database, big data, and machine learning offerings and common use cases for each. It also showcases some of Google's innovations in data analytics and machine learning technologies.
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsChris Jang
The document discusses creating a gaming analytics platform using Google Cloud Platform. It describes collecting diverse data from sources like user acquisition campaigns, app stores, and custom game events. This data can then be analyzed using standard metrics, key game indicators, and custom questions. BigQuery is recommended for batch processing while Dataflow (Apache Beam) enables real-time streaming analytics. Dataflow provides autoscaling, fully managed processing, and allows batch and streaming in one framework. This speeds up development time compared to typical big data architectures.
GeoPython - Mapping Data in Jupyter Notebooks with PixieDustMargriet Groenendijk
This document discusses using PixieDust to create data visualizations and maps in Jupyter notebooks. PixieDust allows users to easily create visualizations with a simple display() function call and integrate data from notebooks with cloud services. Features highlighted include the package manager, various visualization options, integration with Scala, custom visualization creation, and embedding apps in notebooks. The document concludes with references for further information.
Containerizing the Cloud with Kubernetes and DockerJames Chittenden
See how containers and Google Cloud Platform make it easier to build, run and maintain distributed systems, by building on the same core container technologies that power all of Google. Get a tour of Kubernetes, the new open source container cluster management implementation that turns these concepts into reality. Come learn how containers and Google Cloud Platform make the technology and application architectures that power Google available to all developers across the world.
This document summarizes Google Cloud Platform (GCP). It discusses how GCP provides true economic benefits of cloud through Google's future-oriented architecture and direct access to Google software innovations. It highlights GCP's openness by empowering customers to choose. The document then overviews GCP's infrastructure, data services, application services, and runtime services to enable no-touch operations and breakthrough insights. It concludes by thanking the audience.
The document discusses Big Data challenges at Dyno including having a multi-terabyte data warehouse with over 100 GB of new raw data daily from 65 online and unlimited offline data sources, facing daily data quality problems, and needing to derive user interests and intentions from user information, behavior, and other data while managing a high performance and cost effective system. It also advertises job openings at Dyno for frontend and backend developers.
Learn how recent innovation at Google allows you to produce intelligence from IoT data. We will look at some use cases and you will get an overview of the building blocks we use to build truly intelligent IoT solutions in the cloud and on the edge.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
This document discusses Google Cloud Platform and its data and analytics capabilities. It begins by explaining the evolution of cloud computing models from virtualized data centers to true on-demand cloud services. It then highlights some of Google Cloud Platform's key differentiators like true cloud economics, future-proof infrastructure, access to innovation, and Google-grade security. The document provides overviews of Google Cloud Platform's storage, database, big data, and machine learning offerings and common use cases for each. It also showcases some of Google's innovations in data analytics and machine learning technologies.
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsChris Jang
The document discusses creating a gaming analytics platform using Google Cloud Platform. It describes collecting diverse data from sources like user acquisition campaigns, app stores, and custom game events. This data can then be analyzed using standard metrics, key game indicators, and custom questions. BigQuery is recommended for batch processing while Dataflow (Apache Beam) enables real-time streaming analytics. Dataflow provides autoscaling, fully managed processing, and allows batch and streaming in one framework. This speeds up development time compared to typical big data architectures.
GeoPython - Mapping Data in Jupyter Notebooks with PixieDustMargriet Groenendijk
This document discusses using PixieDust to create data visualizations and maps in Jupyter notebooks. PixieDust allows users to easily create visualizations with a simple display() function call and integrate data from notebooks with cloud services. Features highlighted include the package manager, various visualization options, integration with Scala, custom visualization creation, and embedding apps in notebooks. The document concludes with references for further information.
Containerizing the Cloud with Kubernetes and DockerJames Chittenden
See how containers and Google Cloud Platform make it easier to build, run and maintain distributed systems, by building on the same core container technologies that power all of Google. Get a tour of Kubernetes, the new open source container cluster management implementation that turns these concepts into reality. Come learn how containers and Google Cloud Platform make the technology and application architectures that power Google available to all developers across the world.
This document summarizes Google Cloud Platform (GCP). It discusses how GCP provides true economic benefits of cloud through Google's future-oriented architecture and direct access to Google software innovations. It highlights GCP's openness by empowering customers to choose. The document then overviews GCP's infrastructure, data services, application services, and runtime services to enable no-touch operations and breakthrough insights. It concludes by thanking the audience.
The document discusses Big Data challenges at Dyno including having a multi-terabyte data warehouse with over 100 GB of new raw data daily from 65 online and unlimited offline data sources, facing daily data quality problems, and needing to derive user interests and intentions from user information, behavior, and other data while managing a high performance and cost effective system. It also advertises job openings at Dyno for frontend and backend developers.
Learn how recent innovation at Google allows you to produce intelligence from IoT data. We will look at some use cases and you will get an overview of the building blocks we use to build truly intelligent IoT solutions in the cloud and on the edge.
Google provides a suite of data and analytics services and tools for businesses to manage large datasets and gain insights from data. These include BigQuery for analytics, Cloud Dataproc for batch processing, Cloud Dataflow for streaming data, Cloud Pub/Sub for event delivery, and Cloud Bigtable for large-scale NoSQL databases. Google has been developing these services based on its own experiences managing big data over 15 years.
Google Cloud Platform itself has been on a very rapid rise over the past few years. It has a lot of advantages over AWS or Microsoft Azure. In this slideshow, you can learn more about these top advantages. For more details, you can also read this post https://kinsta.com/blog/google-cloud-hosting/
Google Cloud Networking provides a global, flexible, and secure networking foundation for applications and data. Key elements include:
- A global fiber network with over 100 points of presence and hundreds of thousands of miles of cable connecting Google's regions and zones.
- The Andromeda network virtualization stack, which powers VPC networking and provides scalable isolation, high performance, and distributed firewall capabilities.
- Global and regional load balancing options like HTTP(S) and TCP/UDP load balancing for optimizing application delivery worldwide.
- Hybrid connectivity options like Cloud Interconnect, VPN, and Direct Peering to build hybrid cloud architectures connecting on-premises to Google Cloud.
This document discusses running Node.js applications on Google Compute Engine. It provides an overview of Compute Engine and how to set it up, install Node.js, and create a sample Node.js application. The document also mentions other Google Cloud Platform services like Cloud Storage, Cloud SQL, and App Engine.
Google's Infrastructure and Specific IoT ServicesIntel® Software
This document discusses Google Cloud Platform's Internet of Things (IoT) solutions. It describes IoT Core, which handles device management and communication, including the Device Manager for registering devices and MQTT Broker for bidirectional messaging. It explains how IoT Core collects analog sensor data from devices and transforms it into useful business insights and intelligence through data processing and analytics services like Cloud Dataflow, BigQuery, and Cloud ML.
Critical Breakthroughs and technical Challenges in Big Data Driven Innovation discusses 4 key breakthroughs in Google Cloud Platform's approach to big data:
1. Batch and streaming data processing can be combined using Cloud Dataflow.
2. Real-time data ingestion at massive scales is enabled through technologies like Cloud Bigtable which can process billions of events per hour.
3. Analytics can be done at the speed of thought through BigQuery which allows complex queries on petabytes of data to return results in seconds.
4. Machine learning is made available to everyone through services that offer pre-trained models via APIs and allow custom modeling using TensorFlow on Google Cloud.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingEdwin Poot
Disruption can be intimidating. You may even be losing business to one or more rising competitors. You may be wondering how you could possibly compete. Rest assured, this disruption doesn’t mean you need to turn your business upside down. But just be smart in how you engage your business using innovation without the need for huge changes, high risks or large investments.
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
The document summarizes the evolution of cloud computing and Google Cloud Platform's offerings. It discusses how cloud infrastructure has moved from colocated data centers (1st wave) to virtualized infrastructure (2nd wave) to automated services and scalable data (3rd wave). It then provides an overview of Google Cloud Platform's compute, storage, database, analytics and machine learning services and how they make complex data analysis simpler. The document positions Google Cloud Platform as building on Google's expertise in infrastructure and data to provide customers an advantage.
This document provides an overview of Google Cloud Platform services for IoT and big data analytics, including fully managed ingestion, processing, and analysis of IoT data. It introduces Google Cloud Pub/Sub for messaging, Cloud Dataflow for stream and batch data processing, and BigQuery for petabyte-scale data warehousing and analysis. The presentation includes demos of building an event streaming pipeline using these services to ingest data from Pub/Sub, process it in Dataflow, and analyze results in BigQuery.
Google Cloud Dataflow is a next generation managed big data service based on the Apache Beam programming model. It provides a unified model for batch and streaming data processing, with an optimized execution engine that automatically scales based on workload. Customers report being able to build complex data pipelines more quickly using Cloud Dataflow compared to other technologies like Spark, and with improved performance and reduced operational overhead.
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
Google Dataproc is Google Cloud's fully managed Apache Spark and Apache Hadoop service. Alluxio is an open source data orchestration platform that can be used with Dataproc to accelerate analytics workloads. With a single initialization action, Alluxio can be installed on a Dataproc cluster to cache data from Cloud Storage for faster queries. Alluxio also enables "zero-copy bursting" of workloads to the cloud by allowing frameworks to access data directly from remote HDFS without needing to copy it. This provides elastic compute capacity while avoiding high network latency and bandwidth costs of copying large datasets.
Google Cloud Platform (GCP) is a cloud computing service that offers infrastructure as a service, platform as a service and software as a service from Google. It provides over 50 services including computing, storage, databases, networking, machine learning, and developer tools. GCP aims to provide reliable and scalable services through its global network and data centers located in over 200 countries. It offers competitive pricing compared to other cloud providers like AWS and Azure, with pay-as-you-go pricing and discounts for long-term commitments. GCP sees strong revenue growth of 52% in Q1 2020 due to increased demand during the COVID-19 pandemic.
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...Codemotion
Andrea will show how to build a performant, reliable and scalable serverless IoT architecture powered by Google Cloud Platform (GCP) managed services, the same tools used by Google for its productions services, now available for developers and companies. Such architecture allows you to focus on delivering value to your business rather than managing the underlying infrastructure. Google Cloud has always believed in the vision of serverless, starting with App Engine in 2008. Since then, Google has evolved more serverless offerings in Application Development, Data Analytics and Machine Learning.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
This webinar discusses understanding the real costs of public cloud infrastructure. It features presentations by Dave McKenzie of Dimension Data and Charlie Burns of Saugatuck Technology. The webinar will cover topics like important cost elements, the impact of infrastructure granularity, and Dimension Data's cloud offerings. Attendees will learn how to accurately assess total costs and get a $200 credit from Dimension Data.
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks
The presentation "How overlay networks can make public clouds your global WAN" presented by Ryan Koop on Oct 24, 2013 at LASCON in Austin, TX.
Enterprises, organizations and governments are realizing the benefits of cloud flexibility, cost savings, scalability and connectivity. Yet the traditional approach focuses too much on the underlying infrastructure, instead of the applications.
So who is making solutions for the people who work at the application layer? Are software-defined things secure?
With a focus on application-layer integration, governance and security, overlay networks let developers, and the enterprise apps they work with, use the public clouds as a global WAN network, not just extra storage.
Developers can build on top of overlay networking to extend traditional networks to the cloud with added security such as encryption, IPsec connections, VLANs and VPNs into the public cloud networks.
Prime examples are the previously cost-prohibitive projects can now use public clouds as global points of presence to create cloud WAN to partners and customers.
The document discusses building machine learning solutions with Google Cloud. It describes Nexxworks as a team of data engineers, data scientists, and machine learning engineers who help close the gap between having lots of data and lacking insights by building robust and agile machine learning solutions through Google Cloud's scalable APIs. The document provides examples of use cases like predictive maintenance, logistics optimization, customer service chatbots, and medical image classification. It also discusses techniques like deep learning, word embeddings, convolutional neural networks, and reinforcement learning.
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
Why You Need to Move Your Website to the CloudEktron
Ektron's Jonathan Wall, Director, Product Marketing and Ben Schilens, Senior Vice President of Operations discuss
- Cloud trends
- The benefits of the Cloud
- Different Clouds and how to choose
- A Cloud story: What's going on today
- How the Cloud reduces TCO
- Who uses the Cloud for their Website
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
More Related Content
Similar to Material de treinamento do Google Cloud 2018
Google provides a suite of data and analytics services and tools for businesses to manage large datasets and gain insights from data. These include BigQuery for analytics, Cloud Dataproc for batch processing, Cloud Dataflow for streaming data, Cloud Pub/Sub for event delivery, and Cloud Bigtable for large-scale NoSQL databases. Google has been developing these services based on its own experiences managing big data over 15 years.
Google Cloud Platform itself has been on a very rapid rise over the past few years. It has a lot of advantages over AWS or Microsoft Azure. In this slideshow, you can learn more about these top advantages. For more details, you can also read this post https://kinsta.com/blog/google-cloud-hosting/
Google Cloud Networking provides a global, flexible, and secure networking foundation for applications and data. Key elements include:
- A global fiber network with over 100 points of presence and hundreds of thousands of miles of cable connecting Google's regions and zones.
- The Andromeda network virtualization stack, which powers VPC networking and provides scalable isolation, high performance, and distributed firewall capabilities.
- Global and regional load balancing options like HTTP(S) and TCP/UDP load balancing for optimizing application delivery worldwide.
- Hybrid connectivity options like Cloud Interconnect, VPN, and Direct Peering to build hybrid cloud architectures connecting on-premises to Google Cloud.
This document discusses running Node.js applications on Google Compute Engine. It provides an overview of Compute Engine and how to set it up, install Node.js, and create a sample Node.js application. The document also mentions other Google Cloud Platform services like Cloud Storage, Cloud SQL, and App Engine.
Google's Infrastructure and Specific IoT ServicesIntel® Software
This document discusses Google Cloud Platform's Internet of Things (IoT) solutions. It describes IoT Core, which handles device management and communication, including the Device Manager for registering devices and MQTT Broker for bidirectional messaging. It explains how IoT Core collects analog sensor data from devices and transforms it into useful business insights and intelligence through data processing and analytics services like Cloud Dataflow, BigQuery, and Cloud ML.
Critical Breakthroughs and technical Challenges in Big Data Driven Innovation discusses 4 key breakthroughs in Google Cloud Platform's approach to big data:
1. Batch and streaming data processing can be combined using Cloud Dataflow.
2. Real-time data ingestion at massive scales is enabled through technologies like Cloud Bigtable which can process billions of events per hour.
3. Analytics can be done at the speed of thought through BigQuery which allows complex queries on petabytes of data to return results in seconds.
4. Machine learning is made available to everyone through services that offer pre-trained models via APIs and allow custom modeling using TensorFlow on Google Cloud.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Battling the disrupting Energy Markets utilizing PURE PLAY Cloud ComputingEdwin Poot
Disruption can be intimidating. You may even be losing business to one or more rising competitors. You may be wondering how you could possibly compete. Rest assured, this disruption doesn’t mean you need to turn your business upside down. But just be smart in how you engage your business using innovation without the need for huge changes, high risks or large investments.
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
The document summarizes the evolution of cloud computing and Google Cloud Platform's offerings. It discusses how cloud infrastructure has moved from colocated data centers (1st wave) to virtualized infrastructure (2nd wave) to automated services and scalable data (3rd wave). It then provides an overview of Google Cloud Platform's compute, storage, database, analytics and machine learning services and how they make complex data analysis simpler. The document positions Google Cloud Platform as building on Google's expertise in infrastructure and data to provide customers an advantage.
This document provides an overview of Google Cloud Platform services for IoT and big data analytics, including fully managed ingestion, processing, and analysis of IoT data. It introduces Google Cloud Pub/Sub for messaging, Cloud Dataflow for stream and batch data processing, and BigQuery for petabyte-scale data warehousing and analysis. The presentation includes demos of building an event streaming pipeline using these services to ingest data from Pub/Sub, process it in Dataflow, and analyze results in BigQuery.
Google Cloud Dataflow is a next generation managed big data service based on the Apache Beam programming model. It provides a unified model for batch and streaming data processing, with an optimized execution engine that automatically scales based on workload. Customers report being able to build complex data pipelines more quickly using Cloud Dataflow compared to other technologies like Spark, and with improved performance and reduced operational overhead.
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
Google Dataproc is Google Cloud's fully managed Apache Spark and Apache Hadoop service. Alluxio is an open source data orchestration platform that can be used with Dataproc to accelerate analytics workloads. With a single initialization action, Alluxio can be installed on a Dataproc cluster to cache data from Cloud Storage for faster queries. Alluxio also enables "zero-copy bursting" of workloads to the cloud by allowing frameworks to access data directly from remote HDFS without needing to copy it. This provides elastic compute capacity while avoiding high network latency and bandwidth costs of copying large datasets.
Google Cloud Platform (GCP) is a cloud computing service that offers infrastructure as a service, platform as a service and software as a service from Google. It provides over 50 services including computing, storage, databases, networking, machine learning, and developer tools. GCP aims to provide reliable and scalable services through its global network and data centers located in over 200 countries. It offers competitive pricing compared to other cloud providers like AWS and Azure, with pay-as-you-go pricing and discounts for long-term commitments. GCP sees strong revenue growth of 52% in Q1 2020 due to increased demand during the COVID-19 pandemic.
Andrea Ulisse - How to build a scalable serverless IoT architecture on GCP - ...Codemotion
Andrea will show how to build a performant, reliable and scalable serverless IoT architecture powered by Google Cloud Platform (GCP) managed services, the same tools used by Google for its productions services, now available for developers and companies. Such architecture allows you to focus on delivering value to your business rather than managing the underlying infrastructure. Google Cloud has always believed in the vision of serverless, starting with App Engine in 2008. Since then, Google has evolved more serverless offerings in Application Development, Data Analytics and Machine Learning.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
This webinar discusses understanding the real costs of public cloud infrastructure. It features presentations by Dave McKenzie of Dimension Data and Charlie Burns of Saugatuck Technology. The webinar will cover topics like important cost elements, the impact of infrastructure granularity, and Dimension Data's cloud offerings. Attendees will learn how to accurately assess total costs and get a $200 credit from Dimension Data.
"How overlay networks can make public clouds your global WAN" by Ryan Koop o...Cohesive Networks
The presentation "How overlay networks can make public clouds your global WAN" presented by Ryan Koop on Oct 24, 2013 at LASCON in Austin, TX.
Enterprises, organizations and governments are realizing the benefits of cloud flexibility, cost savings, scalability and connectivity. Yet the traditional approach focuses too much on the underlying infrastructure, instead of the applications.
So who is making solutions for the people who work at the application layer? Are software-defined things secure?
With a focus on application-layer integration, governance and security, overlay networks let developers, and the enterprise apps they work with, use the public clouds as a global WAN network, not just extra storage.
Developers can build on top of overlay networking to extend traditional networks to the cloud with added security such as encryption, IPsec connections, VLANs and VPNs into the public cloud networks.
Prime examples are the previously cost-prohibitive projects can now use public clouds as global points of presence to create cloud WAN to partners and customers.
The document discusses building machine learning solutions with Google Cloud. It describes Nexxworks as a team of data engineers, data scientists, and machine learning engineers who help close the gap between having lots of data and lacking insights by building robust and agile machine learning solutions through Google Cloud's scalable APIs. The document provides examples of use cases like predictive maintenance, logistics optimization, customer service chatbots, and medical image classification. It also discusses techniques like deep learning, word embeddings, convolutional neural networks, and reinforcement learning.
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
Why You Need to Move Your Website to the CloudEktron
Ektron's Jonathan Wall, Director, Product Marketing and Ben Schilens, Senior Vice President of Operations discuss
- Cloud trends
- The benefits of the Cloud
- Different Clouds and how to choose
- A Cloud story: What's going on today
- How the Cloud reduces TCO
- Who uses the Cloud for their Website
Similar to Material de treinamento do Google Cloud 2018 (20)
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
3. 1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Introducing
Google Cloud Platform:
Big Data and Machine Learning
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
4. 2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud computing is a continuation of a long-term
shift in how computing resources are managed
Now
2000
1980s
Next
First Wave
Server on-premises
You own everything.
It is yours to manage.
Second Wave
Data centers
You pay for the hardware
but rent the space.
Still yours to manage.
First Generation
Cloud Virtualized
data centers
You don’t rent hardware and
space, but still control
and configure virtual
machines. Pay for what
you provision.
Third Wave
Managed service
Completely elastic storage,
processing, and machine
learning so that you can
invest your energy in great
apps. Pay for what you use.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
What is Google Cloud Platform
Google Cloud Big Data products
Agenda
5. 3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google’s mission is to organize
the world’s information and make
it universally accessible and
useful
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
To organize the world’s
information,Google has been
building the most powerful
infrastructure on the planet
6. 4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Edge node locations >1000
Edge points of presence >100
Network
Network sea cable investments
In terms of hardware, Google Cloud has the largest cloud network, with
over 100 points of presence, and 100,000s of miles of fiber optic cable.
Unity (US, JP) 2010
Monet (US, BR) 2017
Tannat (BR, UY, AR) 2017
Junior (Rio, Santos) 2017
FASTER (US, JP, TW) 2016
PLCN (HK, LA) 2019
Indigo (SG, ID, AU) 2019
SJC (JP, HK, SG) 2013
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Future region and
number of zones
The network connects 15 regions,
with 3 more coming
Current region and
number of zones
S Carolina
N Virginia
Oregon
Iowa
Montreal
Los Angeles
3
4
3
3
3
3
Frankfurt
Belgium
London
FinlandNetherlands 3
33
2
3
São Paulo
3
Taiwan
Singapore
Mumbai
Sydney
Tokyo
3
3
2
3
3
3
HongKong 3
2
3
7. 5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In terms of software, organizing the world’s information
has meant that Google needed to invent data processing methods
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
Bigtable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
http://research.google.com/pubs/papers.html
TPU
2018
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud opens up that innovation and infrastructure to you
2002 2004 2006 2008 2010 2012 2014 2016
ML Engine
Pub/Sub
Dataflow
Datastore
Dataflow
Cloud Storage
BigQuery
Bigtable
Dataproc
Cloud Storage
2018
Auto ML
Cloud Spanner
8. 6
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Foundation Databases
Data-handling
frameworks
Analytics and ML
Cloud
Storage
Cloud
Bigtable
BigQuery
Cloud Dataproc
Compute
Engine
Cloud SQL
Cloud
Datalab Cloud Dataflow
Cloud Pub/Sub
Cloud
Spanner
ML APIs
...
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Spotify illustrates the typical journey of companies that come to
Google Cloud: From lower costs to increased reliability to business
transformation
2
3
1
Spend less
No-ops, Pay
for use, Secure
Flexible
Complete
Innovative
Powerful
9. 7
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A suite of products that can be put together for data processing
Change how you computeChange where you compute
Improve scalability
and reliability
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Atomic Fiction lowered their costs with per-minute
(now per-second) billing
https://www.youtube.com/watch?v=mBY-RjE15WA
Change where you compute
10. 8
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FIS was able to improve reliability and scalability
on a massive data-processing challenge
6 BILLION
MARKET EVENTS
WRITTEN PER HOUR
1.7 GIGs
PER SECOND
PER HOUR
6 TBs
10 BN
WRITTEN
PER HOUR
BURSTS
1.7 GIGABYTES
PER SECOND
10 TERABYTES
PER HOUR
The Consolidated Audit Trail (CAT) is a data repository of all equities and options
orders, quotes, and events; FIS processed the CAT to organize 100 billion market events
into an “order lifecycle” in a 4-hour window using Cloud Bigtable.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Rooms to Go transformed its business with data and machine learning
https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html
BigQuery
Analyze
CRM
Customer Relationship Manager
customer demographics, past purchases
Rooms
to Go
Google Analytics
Premium
landing pages,
views
Combine
data
Collect
data
completely
designed room
packages
11. 9
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In summary, Google Cloud offers you ways to…
Become a truly
data-driven
company
Apply machine
learning broadly
and easily
Spend less
on ops and
administration
Incorporate real-
time data into
apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
12. 10
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Platform is:
(select all of the correct options)
Cloud OnBoard
Module review
Operated by Google on the same
infrastructure it uses
A set of modular services from
which you can compose cloud-based
applications
Most cost-effective if you pre-
purchase instances on a yearly
basis
A platform on which to host
scalable and fast distributed
applications
13. 11
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/
Datacenters https://www.google.com/about/datacenters/
Google IT security https://cloud.google.com/files/Google-
CommonSecurity-WhitePaper-v1.4.pdf
Why Google Cloud
Platform?
https://cloud.google.com/why-google/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Compute & Storage
Fundamentals
Version #1.1
14. 12
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud provides an earth-scale computer
Data storage
Compute power
Networking
16. 14
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we will :
1. Create a Compute Engine instance
2. SSH into the instance
3. Install the software package git
(for source code version control)
Cloud OnBoard
Demo : Create a Compute Engine Instance
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
CPUs on demand + Demo
A global filesystem + Demo
Agenda
17. 15
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use Cloud Storage for persistent storage and as staging
ground for import to other Google Cloud products
31
Compute
Engine + Disk
Cloud SQL
BigQuery
Dataproc
Cloud Storage
Store/StageTransform
Raw data (any format)
4
Ingest/ Extract
Load
2
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Create a bucket and copy the data over using the Cloud
SDK; blobs are referenced through a gs://.../ URL
sales*.csv gs://acme-sales/data/
Copy
Google Cloud Platform Project
Bucket
Objects
Data and
metadata
Bucket
Objects
Data and
metadata
gsutil cp
18. 16
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Storage gives you durability,
reliability, and global reach
Cloud
Storage
Cloud
SQL
Compute
Engine
Transfer Services
are useful for ingest
Ingest
Store
Import
Control access at project,
bucket and/or object level
Publish
Use Cloud Storage
as staging area
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Control latency and availability
with zones and regions
Choose the closest
zone/region so as
to to reduce latency.
Region: North America
Zone: us-central1-a
...
Region: Europe
Zone: europe-west1-b
...
Region: ...
Zone: ...
...
Distribute your apps and data across
regions for global availability.
Distribute your apps
and data across zones
to reduce service
disruptions.
19. 17
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Interact with
Cloud Storage
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In this demo, we carry out the steps of an ingest-
transform-and-publish data pipeline manually
1. Ingest data into a Compute Engine instance
2. Transform data on the Compute Engine instance
3. Store the transformed data on Cloud Storage
4. Publish Cloud Storage data to the web
Cloud OnBoard
Demo : Interact with Cloud Storage
20. 18
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Ingest-Transform-Publish
using core infrastructure
Cloud
Storage
Cloud
SQL
Compute
Engine
Step 1
Ingest/
Extract
Store
Import
Publish
Step 2 Step 3
Step 4
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud Shell gives you an easy command-line
Cloud Shell comes pre-installed with the tools, libraries,
and so on you need to interact with Google Cloud Platform
Click
Do Now
21. 19
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
❏ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review (1 of 2)
22. 20
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Compute nodes on GCP are:
(select the correct option)
➔ Allocated on demand, and you pay for the time that they are up.
❏ Expensive to create and teardown
❏ Pre-installed with all the software packages you might ever need.
❏ One of ~50 choices in terms of CPU and memory
Cloud OnBoard
Module review answers (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
❏ May be required to be read at some later time
❏ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
23. 21
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Google Cloud Storage is a good option for storing data that:
(select all of the correct options)
❏ Is ingested in real-time from sensors and other devices
❏ Will be frequently read/written from a compute node
➔ May be required to be read at some later time
➔ May be imported into a cluster for analysis
Cloud OnBoard
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Resources
Google Cloud Platform https://cloud.google.com/compute/
Datacenters https://cloud.google.com/storage/
Pricing https://cloud.google.com/pricing/
Cloud Launcher https://cloud.google.com/launcher/
Pricing Philosophy https://cloud.google.com/pricing/philosophy/
24. 22
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Data Analysis
on the Cloud
Version #1.1
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
25. 23
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Google Cloud Platform began in 2008, with App Engine,
a serverless way to run web applications
1
Develop
2
Upload
3
Autoscales Reliable
App Engine
Your code
http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html
http://googleappengine.blogspot.com/2013/05/the-google-app-engine-blog-is-moving.html
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Compute
Engine
Container
Engine
App Engine
Flex
App Engine
There [was] something fundamentally
wrong with what we were doing in 2008
… We didn't get the right stepping
stones into the cloud …
-- Eric Schmidt, Executive Chairman, Google
26. 24
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
GCP now consists of a suite of products that together provide these
stepping stones in a business’ transformative journey
Change how you computeChange where you compute
Flexibility, scalability
and reliability
Cost effective virtual machines,
storage, Hadoop, and MySQL to
migrate your current workloads to
the public cloud.
Reliable, autoscaling messaging,
data processing, and storage.
Fully managed products for data
warehousing, data analysis,
streaming, and machine learning.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine learning. This is the next
transformation … the programming
paradigm is changing. Instead of
programming a computer, you teach a
computer to learn something and it
does what you want.
Cloud OnBoard
Eric Schmidt,
Executive Chairman,
Google
27. 25
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
“If you want to teach a neural network to
recognize a cat, for instance, you don’t
tell it to look for whiskers, ears, fur,
and eyes. You simply show it thousands
and thousands of photos of cats, and
eventually it works things out.”
WIRED’s headline
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Search
People who bought ...
Spam filtering
Suggest next video
Route planning
Smart Reply
Cloud OnBoard
Machine Learning is not new,
but it is now mainstream
What’s common to all of
these use cases of Machine
Learning?
?
28. 26
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
There are three components in a recommendation system
RecommendingRating Training
Users rate a few houses
explicitly or implicitly
A machine learning model is
created to predict a user’s
rating of a house
For each user, the model is
applied to every unrated
house and the top 5 houses
for that user are saved.
What else is needed??
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The ML algorithm essentially clusters users and items
How often do you need to compute
the predicted ratings?
Where would you save them?
2
3
Who is like this user? Is this a good house?
Predict rating
Is this house similar to houses that
people similar to this user like?
Predicted rating = user-preference *
item-quality
1
?
29. 27
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In addition to the ML algorithm, you also need
sophisticated data management
Data Collection
Data Analysis
Machine Learning
Serving
Scalable front end to collect customer actions
Data that is accessible and not silo-ed
(Re-)training and experimentation
Scalable, real-time system to serve
recommendations
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
30. 28
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choose your storage solution based on your access pattern
Capacity
Access
metaphor
Read
Write
Update
granularity
Usage
Petabytes +
Like files in a
file system
Have to copy to
local disk
One file
An object
(a “file”)
Store blobs
Gigabytes
Relational
database
SELECT rows
INSERT row
Field
No-ops SQL
database on
the cloud
Terabytes
Persistent
Hashmap
Filter objects
on property
put object
Attribute
Structured
data from
AppEngine apps
Petabytes
Key-value(s),
HBase API
scan rows
put row
Row
No-ops, high
throughput,
scalable,
flattened data
Petabytes
Relational
SELECT rows
Batch/stream
Field
Interactive SQL*
querying fully
managed warehouse
Cloud
Storage
Cloud SQL Datastore Bigtable BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud SQL is a fully managed database service
Cloud SQL
Google-managed
MySQL or Postgres
Flexible pricing
Familiar
Managed backups
Automatic replication
Fast connection from GCE & GAE
Connect from anywhere
Google Security
31. 29
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Set up rentals data
in Cloud SQL
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we populate rentals data in Cloud
SQL for the recommendation engine to use:
1. Create Cloud SQL instance
2. Create database tables by importing .sql
files from Cloud Storage
3. Populate the tables by importing .csv
files from Cloud Storage
4. Allow access to Cloud SQL
5. Explore the rentals data using SQL
statements from Cloud Shell
Demo: Setup rentals data in Cloud SQL
Cloud
Storage
Cloud SQL
External
machine
Import
32. 30
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Stepping stones to transformation
Your SQL database in the cloud + Demo
Managed Hadoop in the cloud + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
There is a rich open-source ecosystem for big data
http://hadoop.apache.org/
http://pig.apache.org/
http://hive.apache.org/
http://spark.apache.org/
33. 31
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataproc reduces the cost and complexity associated with
Spark and Hadoop clusters
Dataproc
Google-managed:
Hadoop
Pig
Hive
Spark
Image Versioning
Familiar
Resize in seconds
Automated cluster mgmt
Integrates with Google Cloud
Flexible VMs
Google Security
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Recommendations ML
with Dataproc
34. 32
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we implement
machine learning recommendations
using Cloud Dataproc:
1. Launch Dataproc
2. Train and apply ML model
written in PySpark to create
product recommendations
3. Explore inserted rows in
Cloud SQL
Dataproc
Cloud SQL
Train
model
Show
recommendations
1
Demo: Recommendations ML with Cloud Dataproc
2
3
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
35. 33
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
❏ Transactional updates on relatively small datasets
Module review (1 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Relational databases are a good choice when you need:
(select all of the correct options)
❏ Streaming, high-throughput writes
❏ Fast queries on terabytes of data
❏ Aggregations on unstructured data
✓ Transactional updates on relatively small datasets
Module review (1 of 2)
36. 34
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
❏ Fully-managed versions of the software offer no-ops
❏ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
(select all of the correct options)
❏ It’s the same API, but Google implements it better
❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on
✓ Fully-managed versions of the software offer no-ops
✓ Running it on Google infrastructure offers reliability and cost savings
Module review (2 of 2)
38. 36
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structuredNeed
MOBILE
SDKs
SQL
Horizontal
scalability
No-SQL Millisecond
Latency
Latency in
secondsCloud
Spanner
Cloud
SQL
Firebase
Storage
Cloud
Storage
Firebase
Realtime DB
Cloud
Datastore
Cloud
Bigtable
BigQuery
Need
MOBILE
SDKs
One
database
enough
39. 37
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Use cloud spanner if you need globally consistent data or more
than one Cloud SQL instance
Source:
https://quizlet.com/blog/
quizlet-cloud-spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
40. 38
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Bigtable is meant for high throughput data where access is primarily
for a range of Row Key prefixes
NASDAQ#1426535612045
...
MD:SYMBOL:
ZXZZT
...
Row Key Column data
MD:LASTSALE:
600.58
...
MD:LASTSIZE:
300
...
MD:TRADETIME:
1426535612045
...
MD:EXCHANGE:
NASDAQ
...
Tables should be tall and narrow
Store changes as new rows
Bigtable will automatically
compact the table
41. 39
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Short meaningful column names reduce storage and RPC overhead
NASDAQ#1426535612045
MD:SYMBOL:
ZXZZT
Row Key Column data
MD:LASTSALE:
600.58
MD:LASTSIZE:
300
MD:TRADETIME:
1426535612045
MD:EXCHANGE:
NASDAQ
Design row key with most
common query in mind
Design row key to minimize hotspots
Column families is a quick
way to get some hierarchy
Use short column names
Designed for sparse tables
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Can work with Bigtable using the HBase API
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
byte[] CF = Bytes.toBytes("MD"); // column family
Connection connection = ConnectionFactory.createConnection(...)
Table table = null;
try {
table = connection.getTable(TABLE_NAME);
Put p = new Put(Bytes.toBytes("NASDAQ#GOOG #1234561234561"));
p.addColumn(CF, Bytes.toBytes("SYMBOL"), Bytes.toBytes("GOOG"));
p.addColumn(CF, Bytes.toBytes("LASTSALE"), Bytes.toBytes(742.03d));
...
table.put(p);
} finally {
if (table != null) table.close();
}
42. 40
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
43. 41
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
BigQuery is a fully managed data warehouse that lets you do ad-hoc
SQL queries on massive volumes of data
Project X
Dataset A Dataset B
Project Y
Dataset C Dataset D
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
Table 1
Table 2
BigQuery Service
44. 42
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A demo of BigQuery on a 10 billion-row dataset shows what it is
and what it can do
#standardsql
SELECT
language, SUM(views) as views
FROM `bigquery-samples.wikipedia_benchmark.Wiki10B`
WHERE
title like "%google%"
GROUP by language
ORDER by views DESC
Familiar, SQL 2011 query
language
Interactive ad-hoc analysis
of petabyte-scale databases
No need to provision
clusters
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Three ways of loading data into BigQuery
POST
Serverless
ETL
CSV
JSON
AVRO
Google
Sheets
Files on disk or Cloud
Storage
Stream Data Federated data source
45. 43
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
With Federated data sources, you can directly query files on
Cloud Storage, without having to ingest them into BigQuery
Also: Google Drive, Bigtable
Also: JSON/Avro/Google Sheet
Can also pass in a schema
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Transactions Yes Single-row No Yes Yes No
Complex
queries
No No No Yes Yes Yes
Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+
Unit size 1 MB/entity ~10 MB/cell
~100 MB/row
5 TB/object Determined
by DB engine
10,240 MiB/
row
10 MB/row
Comparing storage options: technical details
46. 44
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud
Datastore
Bigtable
Cloud
Storage
Cloud SQL
Cloud
Spanner
BigQuery
Type NoSQL
document
NoSQL
wide column
Blobstore Relational
SQL for OLTP
Relational
SQL for OLTP
Relational
SQL for OLAP
Best for Getting
started, App
Engine
applications
“Flat” data,
Heavy
read/write,
events,
analytical
data
Structured
and
unstructured
binary or
object data
Web
frameworks,
existing
applications
Large-scale
database
applications
(> ~2 TB)
Interactive
querying,
offline
analytics
Use cases Getting
started, App
Engine
applications
AdTech,
Financial
and IoT data
Images,
large media
files,
backups
User
credentials,
customer
orders
Whenever
high I/O,
global
consistency
is needed
Data
warehousing
Comparing storage options: use cases
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Fast random access
Warehouse and interactively query petabytes
Interactive, iterative development + Demo
Agenda
47. 45
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Increasingly, data analysis and machine learning are carried
out in self-descriptive, shareable, executable notebooks
Code
Output
Markup
Share
A typical notebook
contains code,
charts, and
explanations
Image Source:
Git Logo from
Wikipedia
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab is an open-source notebook built on Jupyter (IPython)
Analyze data in BigQuery,
Compute Engine or Cloud Storage
Use existing
Python packages
Datalab is free—just pay
for Google Cloud resources
48. 46
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab notebooks are developed in an iterative, collaborative process
Development
Process in
Cloud Datalab
PHASE 1
Write code in
Python
PHASE 5
Share and
collaborate
PHASE 2
Run cell
(Shift+Enter)
PHASE 4
Write
commentary in
markdown
PHASE 3
Examine Output
1
3
4
5
2
5
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Datalab supports BigQuery
%%sql
To
Pandas
49. 47
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Create ML dataset
with BigQuery
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In this demo, we use BigQuery to create a
dataset that we later use to build a taxi
demand forecast system using Machine Learning.
● What kinds of things affect taxi demand?
● What are some ways to measure “demand”?
Demo: Create ML dataset with
BigQuery
50. 48
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Create ML dataset with BigQuery
In this demo, we use BigQuery to create a dataset that we later
use to build a taxi demand forecast system using Machine Learning.
1. Use BigQuery and Datalab to explore and visualize data
2. Build a Pandas dataframe that will be used as the training
dataset for machine learning using TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Module Review
51. 49
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed
High-throughput writes of wide-column data
Warehousing structured data
Develop Big Data algorithms interactively in Python
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Match the use case on the left with the product on the right
Global consistency needed (4)
High-throughput writes of wide-column data (2)
Warehousing structured data (3)
Develop Big Data algorithms interactively in Python (1)
1. Datalab
2. BigTable
3. BigQuery
4. Spanner
53. 51
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TensorFlow is an open source library that underlies many Google products
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo: Playing with neural networks to learn what they are
http://playground.tensorflow.org/
54. 52
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Supervised machine learning requires features and labels
…
…
Neural Network
…
Input
features Prediction
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Machine Learning with TensorFlow involves four steps:
1
Gather
Data
Gather training data (input features and labels)
Create model
Train the model based on input data
Use the model on new data
2
Create
3
Train
4
Use
55. 53
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Gather training data and select input features
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
Input features
targetdiscard
1
Gather
Data
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
All input features need to be numeric
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
Gather
Data
One-hot encodingUse as-is
56. 54
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Create a neural network model, defining the number of feature columns
and hidden units
2
Create
npredictors
nhidden
noutputs
…
…
estimator = DNNRegressor(hidden_units=[5], feature_columns=[...])
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
3
Train
estimator.fit(predictors, targets, steps=1000)
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
…
…
Predicted
value of
taxicab
demand
model
True value of
taxicab
demand
CostUpdate
model based
on Cost
npredictors
57. 55
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
4
Use
True value of
taxicab demand
rain
Max temp
…
…
Predicted value
of taxicab
demand
Update model
based on
Cost
Neural network image by Dake, Mysid [CC BY 1.0], via Wikimedia Commons
model
Cost
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Train the model on the collected data
input = pd.DataFrame.from_dict(data =
{'dayofweek' : [4, 5, 6],
'mintemp' : [60, 15, 60],
'maxtemp' : [80, 80, 65],
'rain' : [0, 0.8, 0]})
# read trained model from /tmp/trained_model
estimator = DNNRegressor(model_dir='/tmp/trained_model',
hidden_units=[5])
pred = estimator.predict(input.values)
print pred
4
Use
58. 56
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 2:
Carry out ML
with TensorFlow
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 2: Carry out ML with TensorFlow
Neural Network
Inputs Prediction
In this demo, we build a neural network to predict taxicab demand
on a day-by-day basis using TensorFlow.
59. 57
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Machine learning with TensorFlow + Demo
Pre-built machine learning models + Demo
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The accuracy of a ML problem is driven largely by the size and quality
of the dataset; this is why ML requires massive compute
Size of dataset
Scale of Compute Problem
Accuracy
https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
60. 58
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CloudML Engine simplifies the use of Distributed TensorFlow
...
...
...
.
.
.
.
.
.Size of
dataset
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ML APIs are pre-trained ML models (trained off Google’s data) for common
tasks; they are accessible through REST APIs
Use your own data to train models Machine Learning as an API
Cloud
Vision API
Cloud
Translation API
Cloud
Natural Language
API
Cloud
Speech API
Cloud Machine
Learning Engine
TensorFlow
Cloud Video
Intelligence
62. 60
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Web annotations
{
"entityId": "/m/016ms7",
"score": 1.44038,
"description": "Ford Anglia"
}
{
"entityId": "/m/0gff2yr",
"score": 5.92256,
"description": "ArtScience Museum"
}
{
"entityId": "/m/0h898pd",
"score": 7.4162,
"description": "Harry Potter (Literary Series)"
}
CC-BY 2.0 Rev Stan: https://www.flickr.com/photos/revstan/6865880240
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Try it in the browser with your own images
cloud.google.com/vision
63. 61
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Translation API supports 100+ languages
https://cloud.google.com/translate/
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Wootric uses the Cloud Natural Language API (entity and sentiment) to
make sense of qualitative customer feedback
64. 62
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Extracted entities are tied into a knowledge graph
Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith,
is a British novelist, screenwriter and film producer best known as
the author of the Harry Potter fantasy series
{
"name": "Joanne 'Jo' Rowling",
"type": "PERSON",
"metadata": {
"mid": "/m/042xh",
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
}
{
"name": "British",
"type": "LOCATION",
"metadata": {
"mid": "/m/07ssc",
"wikipedia_url": "http://en.wikipedia.org/wiki/United_Kingdom"
}
{
"name": "Harry Potter",
"type": "PERSON",
"metadata": {
"mid": "/m/078ffw",
"wikipedia_url":
"http://en.wikipedia.org/wiki/Harry_Potter"
}
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
When you analyze sentiment, you get a score (positive/negative) as well
as a magnitude (how intense?)
{
"documentSentiment": {
"score": 0.8,
"magnitude": 0.8
}
}
The food was excellent, I would definitely go back!
65. 63
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The Cloud Speech API can be used to transcribe audio to text
http://cloud.google.com/speech
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Like the Vision API, the Video Intelligence API can identify labels in a
video, along with a timestamp
{
"description": "Bird's-eye view",
"language_code": "en-us",
"locations": {
"segment": {
"start_time_offset": 71905212,
"end_time_offset": 73740392
},
"confidence": 0.96653205
}
}
https://cloud.google.com/video-intelligence/
66. 64
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo 2 Part 3:
Machine Learning APIs
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use several of the Machine Learning
APIs (Vision, Translate, Natural
Language Processing, Speech) from
Python
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Demo 2, Part 3: Machine
Learning APIs
67. 65
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“How much is this car worth?”
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
“Hi Ocado,
I love your website. I have children so it’s
easier for me to do the shopping online.
Many thanks for saving my time!
Regards”
Feedback Customer is happy
Improves natural
language processing
of customer service
claims
“Thanks to the Google Cloud Platform, Ocado was able to use
the power of cloud computing and train our models in
parallel.”
68. 66
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
50%
of enterprises will be
spending more per annum
on bots and chatbot
creation than traditional
mobile app development by
2021 – Gartner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Use
Dialogflow to
create a new
shopping
experience
Custom image
model to
price cars
Build off NLP
API to route
customer
emails
Use Vision
API as-is to
find text in
memes
69. 67
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Confidential & Proprietary
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL DESIGN
DATA
PREPROCESSING
Introducing Cloud AutoML
A technology that can automatically create a Machine Learning Model
UPDATEDEPLOYEVALUATETUNE ML MODEL
PARAMETERS
ML MODEL
DESIGN
DATA
PREPROCESSING
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Handbag Shoe Hat
Cloud AutoML
Cloud AutoML Vision
Model is now trained and ready to make prediction.
This model can scale as needed to adapt to customer demands.
Upload and label
images
Train your model
in minutes or one day Evaluate
70. 68
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods
No-ops, custom machine learning applications at scale
Automatically reject inappropriate image content
Build application to monitor Spanish twitter feed
Transcribe customer support calls
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
71. 69
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Match the use case on the left with the
product on the right
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Module review
Create, test new machine learning methods (2)
No-ops, custom machine learning applications at scale (4)
Automatically reject inappropriate image content (1)
Build application to monitor Spanish twitter feed (5)
Transcribe customer support calls (3)
1. Vision API
2. TensorFlow
3. Speech API
4. Cloud ML
5. Translation API
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Spanner https://cloud.google.com/spanner/
Cloud Bigtable https://cloud.google.com/bigtable/
Google BigQuery https://cloud.google.com/bigquery/
Cloud Datalab https://cloud.google.com/datalab/
TensorFlow https://www.tensorflow.org/
72. 70
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Cloud Machine Learning https://cloud.google.com/ml/
Vision API https://cloud.google.com/vision/
Translation API https://cloud.google.com/translate/
Speech API https://cloud.google.com/speech/
Video Intelligence API
https://cloud.google.com/video-
intelligence
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Data Processing Architecture
Cloud OnBoard
Cloud OnBoard
73. 71
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Asynchronous processing is useful for
long-lived tasks or to have loose
coupling between two systems
P1 P2 P3 Producers
C1 C2 C3 Consumers
Potential use cases:
1. Send an SMS
2. Train ML model
3. Process data from multiple sources
4. Weekly reports …
Message
Queue
74. 72
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
For robust asynchronous processing, you need:
P1 P2 P3
C1 C2 C3
1. A global, highly available queue
2. Scale without over-provisioning
4. Reliable delivery of messages
3. Queue
must be
interoperable
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pub/Sub provides a no-ops, serverless global message queue
75. 73
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Dataflow offers NoOps data pipelines in Java and Python
p = beam.Pipeline(options=options)
p.run();
traffic = lines | beam.Map(parse_data).with_output_types(unicode)
| beam.GroupByKey() # (sensor, [speed])
output = traffic | beam.io.WriteToText(‘gs://...]’)
lines = p | beam.io.ReadFromText(‘gs://…’)
Each of these steps is run
in parallel and autoscaled
by execution framework
Open-source API (Apache
Beam) can be executed on
Flink, Spark, etc. also
Group
Transform 3
Transform 4
Write
Read
Transform 1
Input
Output
Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed)
| beam.Map(avg_speed) # (sensor, avgspeed)
| beam.Map(lambda tup: '%s: %d' % tup))
Map
Group-By
Reduce
77. 75
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Message-oriented architectures
Serverless data pipelines
GCP Reference Architecture
Agenda
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Choosing where to store data on GCP
unstructured
Data analytics
workload
Transactional
workload
structured
SQL
Horizontal
scalability
No-SQL
Millisecond
Latency
Latency in
seconds
Cloud
Spanner
Cloud
SQL
Cloud
Storage
Cloud
Datastore
Cloud
Bigtable
BigQuery
One
database
enough
78. 76
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Run Spark/Hadoop jobs on Cloud Dataproc
Cloud
Dataproc
Input and Output
Data Sources
Cloud
Storage
Cloud
Bigtable
BigQuery
Client
Direct
access
API
Applications on
cluster
Input and
output
connectors
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
On GCP, you can have the same data processing pipeline for
processing both batch and stream
Cloud
Storage
Raw logs,
files, assets,
Google
Analytics data,
and so on
Events,
metrics,
and so on Stream
Batch
Bigtable
B CA
Cloud ML
Engine
Applications
and Reports
Cloud
Datalab
Data Studio
Dashboards/BI
Co-workers
Cloud
Pub/Sub
Cloud
Dataflow
BigQuery
79. 77
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Demo:
Module Review
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
80. 78
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Match the use case on the left with the product on the right
Module review
A. Decoupling producers and consumers of data
in large organizations and complex systems
B. Scalable, fault-tolerant multi-step
processing of data
1. Cloud Dataflow
2. Cloud Pub/Sub
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (1 of 2)
Cloud Pub/Sub https://cloud.google.com/pubsub/
Cloud Dataflow https://cloud.google.com/dataflow/
Processing media using
Cloud Pub/Sub and
Compute Engine
https://cloud.google.com/solutions/me
dia-processing-pub-sub-compute-engine
81. 79
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Resources (2 of 2)
Reverse Geocoding of
Geolocation Telemetry
in the Cloud Using the
Maps API
https://cloud.google.com/solutions/reverse-
geocoding-geolocation-telemetry-cloud-maps-
ap
Using Cloud Pub/Sub for
Long-running Tasks
https://cloud.google.com/solutions/us
ing-cloud-pub-sub-long-running-tasks
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
Cloud OnBoard
Cloud OnBoard
Summary
Version #1.1
82. 80
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
An Evolving Cloud
Your kit, someone
else’s building.
Yours to manage.
1st Wave
Standard virtual
kit,for rent.
Still yours to manage.
2nd Wave
Invest your energy
in great apps
3rd Wave
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Cloud OnBoard
Google Cloud provides a way to take advantage of Google’s
investments in infrastructure and data processing innovation
2002
Cloud
Storage
2004 2006 2008 2010 2012 2014 2016
DataProc Bigtable BigQuery
Cloud
Storage
DataFlow
DataStore
DataFlow
Pub/Sub
ML Engine
2018
Auto ML
Cloud
Spanner
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
84. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
In summary, GCP offers you ways to...
To get the most
out of data and
secure competitive
advantage.
Create citizen
data scientists
Transform your
organization into
a truly data driven
company. Putting
tools into hands of
domain experts.
Apply machine
learning broadly
and easily
We make it simple and
practical to
incorporate machine
learning models
within custom
applications.
We’ve “automated
out” the complexity
of building and
maintaining data
and analytics
systems.
Spend less on ops
and administration
Incorporate real-time
data into apps and
architectures
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Today
Google Cloud Platform
Fundamentals: Big Data
and Machine Learning
Next Steps on your Google Cloud learning journey
1 2 3
Tomorrow
Complete hands-on labs:
Baseline: Data, ML, AI quest
google.qwiklabs.com
Future
Find more training online
cloud.google.com/training
82
85. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Complete 10 hands-on labs free on Qwiklabs
by 30 April, and receive $200 in GCP credits
[Only for Cloud OnBoard Attendees]
1
2
3
Receive a follow up email after event
Username
Password
Create Qwiklabs account with the email
you used to register for Cloud OnBoard
Open your email and confirm account
4 Return to Qwiklabs and log in
5 Enroll in the Baseline: Data, ML, AI quest and
take your first lab!
6
Complete all 10 labs and we will send you an
email after 30 April with instructions to redeem
the $200 credits. Make sure you opt-in to receive
emails from Qwiklabs.
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
1
2
3
Go to
https://www.coursera.org/voucher/CloudOnBoardML
Activate voucher and sign
up for a free account
Enroll in Serverless Data
Analysis with Google BigQuery
and Cloud Dataflow for Free
-Limited period offer!
Explore other Courses at
Coursera.org/Googlecloud
To help you get started
Activate your voucher now for a free course worth $99!
83
86. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Make Google Cloud certification your goal!
cloud.google.com/certification
Find study guides, tips, practice
exams, and testing sites
Confidential & Proprietary
A special offer for Cloud Onboard Singapore attendees:
Visit https://goo.gl/bmJwwk before May 18th to enroll, and eligible
startups* receive $3,000 in Google Cloud and Firebase credits.
Google Cloud Startup Program
$3,000
g.co/cloudstartups
cloud.startups@google.com
Google Cloud is a perfect fit for launching
and scaling your early-stage startup. What’s an eligible
startup?
• Raised no more than a Series A
• Less than 5 years old
• Are located in our approved
countries
• Have not participated in
the Google Cloud Startup
program before
in credits
84
87. 1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
Be part of the
GCP User Group SG Community!
or bit.ly/gcpusergroupsg
Learn from leads, users,
and tech expertsLearn
Network, share, learn -
all about Google CloudConnect
Gain access to the
Google Cloud team
and the latest
capabilities
Access
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
Big Data & Machine Learning
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cloud OnBoard
Resources
Big data and machine learning blog https://cloud.google.com/blog/big-data/
Google Cloud Platform blog https://cloudplatform.googleblog.com/
Google Cloud Platform curated articles https://medium.com/google-cloud
85