The document discusses creating a gaming analytics platform using Google Cloud Platform. It describes collecting diverse data from sources like user acquisition campaigns, app stores, and custom game events. The data is analyzed using standard metrics like DAU, MAU, and retention as well as custom metrics specific to each game. It recommends using BigQuery for batch processing and Cloud Dataflow for real-time stream processing. Cloud Dataflow allows processing data from batch and streaming sources together and offers features like autoscaling and liquid sharding. The document provides examples of using Cloud Dataflow to calculate real-time user scores and team scores from game data streamed through Pub/Sub.
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...Vrushali Channapattan
Twitter collects petabytes of data every day and empowers its engineers and data scientists for large data processing with an hybrid on-premises and cloud model. In this talk, we will look at its GCP architecture and the resource hierarchy. We will deep dive into the storage design that uses Google Cloud Storage to organize petabytes of data that are replicated from on-premises HDFS clusters. We will take a look at how the user-management tooling has been designed to create and manage access for thousands of accounts (human and service accounts) at Twitter. We will talk about how the design deals with the security measures for accounts and tooling systems running in GCP and the complexities of dataset permissions. We will share the challenges we faced as we tried to design our system at scale and our learnings and solutions.
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model and SDKs in Java and Python to process data across Google Cloud Platform services like Pub/Sub, BigQuery, and Cloud Storage. The Cloud Dataflow service automatically optimizes and runs data pipelines at scale in a reliable, cost-effective manner without requiring operational management by the user.
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...Codemotion
This document discusses processing tweets about Black Friday using serverless data architecture on Google Cloud Platform. It describes:
1) Using Google Cloud Pub/Sub to ingest tweets in real-time and guarantee delivery at scale.
2) Running a Python application that filters tweets and publishes them to a Pub/Sub topic using containers and Kubernetes for scalability.
3) Building a Cloud Dataflow pipeline that reads from Pub/Sub, formats tweets, analyzes sentiment with Natural Language API, and writes results to BigQuery for querying and visualization.
OSMC 2018 | Why we recommend PMM to our clients by Matthias CrauwelsNETWAYS
As service providers, one of our responsibilities is helping clients understand what causes contributed to a production downtime incident, and how to avoid (as much as possible) them from happening again. We do this with Incident Reports, and one common recommendation we make is to have a historical monitoring system in place. All our clients have point-in-time monitoring solutions in place, solutions that can alert them when a system is down or behaving in unacceptable ways. But historical monitoring is still not common, and we believe a lot of companies can benefit from deploying one of them. In most cases, we have recommended Percona Monitoring and Management (PMM), as a good and Open Source solution for this problem. In this session, we will talk about the reasons why we recommend PMM as a way to prevent incidents, and also to investigate their possible causes when one has happened.
The document discusses using interactive event graphs and Spark to scale security investigations. It describes how Graphistry uses event graphs visualized through GPUs to provide scalable views of relationships and patterns across billions of events. An example is given of using this approach for incident response by constructing an event graph to analyze the spread of a botnet outbreak.
OSMC 2018 | Logging is coming to Grafana by David kaltschmidtNETWAYS
Grafana is an OSS dashboarding platform with a focus on visualising time.series data as beautiful graphs. Now we’re adding support to show your logs inside Grafana as well. Adding support for log aggregation makes Grafana an even better tool for incident response: First, the metric graphs help in a visual zoning in on the issue. Then you can seamlessly switch over to view and search related log files, allowing you to better understand what your software was doing while the issue was occurring. The main part of this talk shows how to deploy the necessary parts for this integrated experience. In addition I’ll show the latest features of Grafana both for creating dashboards and maintaining their configuration. The last 10-15 will be reserved for a Q&A.
There is growing interest in running Apache Spark natively on Kubernetes. lan Filonenko explains the design idioms, architecture and internal mechanics of Spark orchestrations over Kubernetes. Since data for Spark analytics is often stored in HDFS, Ilan will also explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as data locality and security through the use of Kubernetes constructs such as secrets and RBAC rules
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...Vrushali Channapattan
Twitter collects petabytes of data every day and empowers its engineers and data scientists for large data processing with an hybrid on-premises and cloud model. In this talk, we will look at its GCP architecture and the resource hierarchy. We will deep dive into the storage design that uses Google Cloud Storage to organize petabytes of data that are replicated from on-premises HDFS clusters. We will take a look at how the user-management tooling has been designed to create and manage access for thousands of accounts (human and service accounts) at Twitter. We will talk about how the design deals with the security measures for accounts and tooling systems running in GCP and the complexities of dataset permissions. We will share the challenges we faced as we tried to design our system at scale and our learnings and solutions.
Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model and SDKs in Java and Python to process data across Google Cloud Platform services like Pub/Sub, BigQuery, and Cloud Storage. The Cloud Dataflow service automatically optimizes and runs data pipelines at scale in a reliable, cost-effective manner without requiring operational management by the user.
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...Codemotion
This document discusses processing tweets about Black Friday using serverless data architecture on Google Cloud Platform. It describes:
1) Using Google Cloud Pub/Sub to ingest tweets in real-time and guarantee delivery at scale.
2) Running a Python application that filters tweets and publishes them to a Pub/Sub topic using containers and Kubernetes for scalability.
3) Building a Cloud Dataflow pipeline that reads from Pub/Sub, formats tweets, analyzes sentiment with Natural Language API, and writes results to BigQuery for querying and visualization.
OSMC 2018 | Why we recommend PMM to our clients by Matthias CrauwelsNETWAYS
As service providers, one of our responsibilities is helping clients understand what causes contributed to a production downtime incident, and how to avoid (as much as possible) them from happening again. We do this with Incident Reports, and one common recommendation we make is to have a historical monitoring system in place. All our clients have point-in-time monitoring solutions in place, solutions that can alert them when a system is down or behaving in unacceptable ways. But historical monitoring is still not common, and we believe a lot of companies can benefit from deploying one of them. In most cases, we have recommended Percona Monitoring and Management (PMM), as a good and Open Source solution for this problem. In this session, we will talk about the reasons why we recommend PMM as a way to prevent incidents, and also to investigate their possible causes when one has happened.
The document discusses using interactive event graphs and Spark to scale security investigations. It describes how Graphistry uses event graphs visualized through GPUs to provide scalable views of relationships and patterns across billions of events. An example is given of using this approach for incident response by constructing an event graph to analyze the spread of a botnet outbreak.
OSMC 2018 | Logging is coming to Grafana by David kaltschmidtNETWAYS
Grafana is an OSS dashboarding platform with a focus on visualising time.series data as beautiful graphs. Now we’re adding support to show your logs inside Grafana as well. Adding support for log aggregation makes Grafana an even better tool for incident response: First, the metric graphs help in a visual zoning in on the issue. Then you can seamlessly switch over to view and search related log files, allowing you to better understand what your software was doing while the issue was occurring. The main part of this talk shows how to deploy the necessary parts for this integrated experience. In addition I’ll show the latest features of Grafana both for creating dashboards and maintaining their configuration. The last 10-15 will be reserved for a Q&A.
There is growing interest in running Apache Spark natively on Kubernetes. lan Filonenko explains the design idioms, architecture and internal mechanics of Spark orchestrations over Kubernetes. Since data for Spark analytics is often stored in HDFS, Ilan will also explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as data locality and security through the use of Kubernetes constructs such as secrets and RBAC rules
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsChris Jang
The document discusses creating a gaming analytics platform using Google Cloud Platform. It describes collecting diverse data from sources like user acquisition campaigns, app stores, and custom game events. This data can then be analyzed using standard metrics, key game indicators, and custom questions. BigQuery is recommended for batch processing while Dataflow (Apache Beam) enables real-time streaming analytics. Dataflow provides autoscaling, fully managed processing, and allows batch and streaming in one framework. This speeds up development time compared to typical big data architectures.
Google cloud big data summit master gcp big data summit la - 10-20-2015Raj Babu
The Big Data Summit agenda included presentations on Google Cloud Platform (GCP) products and services for big data. Rohit Khare from Google was scheduled to give a presentation on GCP for big data from 2:30-3:30pm, followed by customer story presentations from BlueCava and Pixalate. There would also be a panel discussion and partner presentation, followed by a reception from 5-6pm. Logistics details were provided for parking, badges, facilities, and wireless access.
This document discusses using Kubernetes as a data platform. It describes using use case driven development to build the initial platform, focusing on simple use cases that provide value. It also covers onboarding new data sources, an overview of the data platform architecture including data lakes and batch/online services, deployment approaches both on-premise and cloud native, and addressing challenges like GDPR compliance and autoscaling. Lessons learned include selecting cloud infrastructure based on data locations and using Kubernetes for its support and to avoid maintaining separate clusters.
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDays Riga
This document discusses using Kubernetes as a data platform. It describes using use case driven development to build the initial platform, focusing on simple use cases that provide value. The platform is designed to facilitate collaboration and democratize data access. Pipelines are used to process and transform data in the data lake. The platform supports features like continuous deployment, autoscaling, and compliance with GDPR for data retention and deletion. Lessons learned include selecting cloud infrastructure based on data locations and using Kubernetes for its support and to avoid managing separate clusters.
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
Teaser: provide developers a new way of understanding advanced analytics and choosing the right cloud architecture
The new buzzword is #serverless, as there are many great services that helps us abstract away the complexity associated with managing servers. In this session we will see how serverless helps on large data analytics backends.
We will see how to architect for Cloud and implement into an existing project components that will take us into the #serverless architecture that will ingest our streaming data, run advanced analytics on petabytes of data using BigQuery on Google Cloud Platform - all this next to an existing stack, without being forced to reengineer our app.
BigQuery enables super-fast, SQL/Javascript queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, SQL 2011 standard, working with streaming inserts, User Defined Functions written in Javascript, reference external JS libraries, and several use cases for everyday backend developer: funnel analytics, email heatmap, custom data processing, building dashboards, extracting data using JS functions, emitting rows based on business logic.
This document provides an overview of various Google Cloud Platform services including Compute Engine, Networking, Load Balancing, Cloud Launcher, Cloud Storage, Cloud SQL, Cloud Monitoring, Cloud DNS, and Deployment Manager. It includes descriptions of the basic concepts and functionality for each service. It also outlines several hands-on labs demonstrating how to use specific GCP services like backing up instances to Cloud Storage snapshots, exporting Cloud SQL databases to Cloud Storage, enabling Cloud Logging, and deploying a VM instance using Deployment Manager.
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
The document summarizes the evolution of cloud computing and Google Cloud Platform's offerings. It discusses how cloud infrastructure has moved from colocated data centers (1st wave) to virtualized infrastructure (2nd wave) to automated services and scalable data (3rd wave). It then provides an overview of Google Cloud Platform's compute, storage, database, analytics and machine learning services and how they make complex data analysis simpler. The document positions Google Cloud Platform as building on Google's expertise in infrastructure and data to provide customers an advantage.
You may know Google for search, YouTube, Android, Chrome, and Gmail, but that's only as an end-user of OUR apps. Did you know you can also integrate Google technologies into YOUR apps? We have many APIs and open source libraries that help you do that! If you have tried and found it challenging, didn't find not enough examples, run into roadblocks, got confused, or just curious about what Google APIs can offer, join us to resolve any blockers. Code samples will be in Python and/or Node.js/JavaScript. This session focuses on showing you how to access Google Cloud APIs from one of Google Cloud's compute platforms, whether serverless or otherwise.
Gimel at Dataworks Summit San Jose 2018Romit Mehta
Gimel is PayPal's data platform that provides a unified interface for accessing and analyzing data across different data stores and processing engines. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses around data access and application lifecycle, and a demo of how Gimel simplifies a flights cancelled use case. It also discusses Gimel's open source journey and integration with ecosystems like Spark and Jupyter notebooks.
Gimel Data Platform is an analytics platform developed by PayPal that aims to simplify data access and analysis. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses in data access and application lifecycle management, a demo of a sample flights cancelled use case using Gimel, and PayPal's plans to open source Gimel.
Pivotal Greenplum provides fast, secure cloud deployments of its data warehouse platform with the same experience across AWS, Azure, and GCP. Deployments are optimized for speed through performance tuning of virtual machines, disks, and networks. Key goals include leveraging cloud features like on-demand provisioning, node replacement, disk snapshots, upgrades, and optional installations through a web interface. Deployments are similar across clouds with comparable parameters, tools, and software versions. Security is ensured through vendor-reviewed templates, password encryption, and network isolation.
Critical Breakthroughs and technical Challenges in Big Data Driven Innovation discusses 4 key breakthroughs in Google Cloud Platform's approach to big data:
1. Batch and streaming data processing can be combined using Cloud Dataflow.
2. Real-time data ingestion at massive scales is enabled through technologies like Cloud Bigtable which can process billions of events per hour.
3. Analytics can be done at the speed of thought through BigQuery which allows complex queries on petabytes of data to return results in seconds.
4. Machine learning is made available to everyone through services that offer pre-trained models via APIs and allow custom modeling using TensorFlow on Google Cloud.
30-45-min tech talk given at user groups or technical conferences to introducing developers to integrating with Google APIs from Python .
ABSTRACT
Want to integrate Google technologies into the web+mobile apps that you build? Google has various open source libraries & developer tools that help you do exactly that. Users who have run into roadblocks like authentication or found our APIs confusing/challenging, are welcome to come and make these non-issues moving forward. Learn how to leverage the power of Google technologies in the next apps you build!!
[Study Guide] Google Professional Cloud Architect (GCP-PCA) CertificationAmaaira Johns
Start Here---> https://bit.ly/3bGEd9l <---Get complete detail on GCP-PCA exam guide to crack Professional Cloud Architect. You can collect all information on GCP-PCA tutorial, practice test, books, study material, exam questions, and syllabus. Firm your knowledge on Professional Cloud Architect and get ready to crack GCP-PCA certification. Explore all information on GCP-PCA exam with the number of questions, passing percentage, and time duration to complete the test.
Image archive, analysis & report generation with Google Cloudwesley chun
Google Cloud provides a diverse array of services to realize the ambition of solving real business problems, like constrained resources. An image archive & analysis plus report generation use-case can be realized with just Google Workspace & GCP APIs. The principle of mixing-and-matching Google technologies is applicable to many other challenges faced by you, your organization, or your customers. These slides are from a half- to 1-hour presentation about this case study.
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
This document discusses Google Cloud Platform's Internet of Things (IoT) architecture and services. It describes how IoT data can be captured using protocols and streaming into Google Cloud Pub/Sub. Machine learning algorithms can then detect patterns in real-time streams. Data is also archived in Cloud Storage. Google Cloud Dataflow is highlighted for processing both batch and stream workloads, with features like autoscaling, intuitive programming model, and unified processing of data.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
This document describes a serverless data architecture for processing tweets about Black Friday and performing sentiment analysis using Google Cloud Platform services. It involves collecting tweets from Twitter using Pub/Sub, running containers on Kubernetes, processing the data with Dataflow pipelines that write to BigQuery tables, and using the Natural Language API for sentiment analysis. The full pipeline is demonstrated in a live demo.
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsChris Jang
The document discusses creating a gaming analytics platform using Google Cloud Platform. It describes collecting diverse data from sources like user acquisition campaigns, app stores, and custom game events. This data can then be analyzed using standard metrics, key game indicators, and custom questions. BigQuery is recommended for batch processing while Dataflow (Apache Beam) enables real-time streaming analytics. Dataflow provides autoscaling, fully managed processing, and allows batch and streaming in one framework. This speeds up development time compared to typical big data architectures.
Google cloud big data summit master gcp big data summit la - 10-20-2015Raj Babu
The Big Data Summit agenda included presentations on Google Cloud Platform (GCP) products and services for big data. Rohit Khare from Google was scheduled to give a presentation on GCP for big data from 2:30-3:30pm, followed by customer story presentations from BlueCava and Pixalate. There would also be a panel discussion and partner presentation, followed by a reception from 5-6pm. Logistics details were provided for parking, badges, facilities, and wireless access.
This document discusses using Kubernetes as a data platform. It describes using use case driven development to build the initial platform, focusing on simple use cases that provide value. It also covers onboarding new data sources, an overview of the data platform architecture including data lakes and batch/online services, deployment approaches both on-premise and cloud native, and addressing challenges like GDPR compliance and autoscaling. Lessons learned include selecting cloud infrastructure based on data locations and using Kubernetes for its support and to avoid maintaining separate clusters.
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDays Riga
This document discusses using Kubernetes as a data platform. It describes using use case driven development to build the initial platform, focusing on simple use cases that provide value. The platform is designed to facilitate collaboration and democratize data access. Pipelines are used to process and transform data in the data lake. The platform supports features like continuous deployment, autoscaling, and compliance with GDPR for data retention and deletion. Lessons learned include selecting cloud infrastructure based on data locations and using Kubernetes for its support and to avoid managing separate clusters.
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
Teaser: provide developers a new way of understanding advanced analytics and choosing the right cloud architecture
The new buzzword is #serverless, as there are many great services that helps us abstract away the complexity associated with managing servers. In this session we will see how serverless helps on large data analytics backends.
We will see how to architect for Cloud and implement into an existing project components that will take us into the #serverless architecture that will ingest our streaming data, run advanced analytics on petabytes of data using BigQuery on Google Cloud Platform - all this next to an existing stack, without being forced to reengineer our app.
BigQuery enables super-fast, SQL/Javascript queries against petabytes of data using the processing power of Google’s infrastructure. We will cover its core features, SQL 2011 standard, working with streaming inserts, User Defined Functions written in Javascript, reference external JS libraries, and several use cases for everyday backend developer: funnel analytics, email heatmap, custom data processing, building dashboards, extracting data using JS functions, emitting rows based on business logic.
This document provides an overview of various Google Cloud Platform services including Compute Engine, Networking, Load Balancing, Cloud Launcher, Cloud Storage, Cloud SQL, Cloud Monitoring, Cloud DNS, and Deployment Manager. It includes descriptions of the basic concepts and functionality for each service. It also outlines several hands-on labs demonstrating how to use specific GCP services like backing up instances to Cloud Storage snapshots, exporting Cloud SQL databases to Cloud Storage, enabling Cloud Logging, and deploying a VM instance using Deployment Manager.
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
The document summarizes the evolution of cloud computing and Google Cloud Platform's offerings. It discusses how cloud infrastructure has moved from colocated data centers (1st wave) to virtualized infrastructure (2nd wave) to automated services and scalable data (3rd wave). It then provides an overview of Google Cloud Platform's compute, storage, database, analytics and machine learning services and how they make complex data analysis simpler. The document positions Google Cloud Platform as building on Google's expertise in infrastructure and data to provide customers an advantage.
You may know Google for search, YouTube, Android, Chrome, and Gmail, but that's only as an end-user of OUR apps. Did you know you can also integrate Google technologies into YOUR apps? We have many APIs and open source libraries that help you do that! If you have tried and found it challenging, didn't find not enough examples, run into roadblocks, got confused, or just curious about what Google APIs can offer, join us to resolve any blockers. Code samples will be in Python and/or Node.js/JavaScript. This session focuses on showing you how to access Google Cloud APIs from one of Google Cloud's compute platforms, whether serverless or otherwise.
Gimel at Dataworks Summit San Jose 2018Romit Mehta
Gimel is PayPal's data platform that provides a unified interface for accessing and analyzing data across different data stores and processing engines. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses around data access and application lifecycle, and a demo of how Gimel simplifies a flights cancelled use case. It also discusses Gimel's open source journey and integration with ecosystems like Spark and Jupyter notebooks.
Gimel Data Platform is an analytics platform developed by PayPal that aims to simplify data access and analysis. The presentation provides an overview of Gimel, including PayPal's analytics ecosystem, the challenges Gimel addresses in data access and application lifecycle management, a demo of a sample flights cancelled use case using Gimel, and PayPal's plans to open source Gimel.
Pivotal Greenplum provides fast, secure cloud deployments of its data warehouse platform with the same experience across AWS, Azure, and GCP. Deployments are optimized for speed through performance tuning of virtual machines, disks, and networks. Key goals include leveraging cloud features like on-demand provisioning, node replacement, disk snapshots, upgrades, and optional installations through a web interface. Deployments are similar across clouds with comparable parameters, tools, and software versions. Security is ensured through vendor-reviewed templates, password encryption, and network isolation.
Critical Breakthroughs and technical Challenges in Big Data Driven Innovation discusses 4 key breakthroughs in Google Cloud Platform's approach to big data:
1. Batch and streaming data processing can be combined using Cloud Dataflow.
2. Real-time data ingestion at massive scales is enabled through technologies like Cloud Bigtable which can process billions of events per hour.
3. Analytics can be done at the speed of thought through BigQuery which allows complex queries on petabytes of data to return results in seconds.
4. Machine learning is made available to everyone through services that offer pre-trained models via APIs and allow custom modeling using TensorFlow on Google Cloud.
30-45-min tech talk given at user groups or technical conferences to introducing developers to integrating with Google APIs from Python .
ABSTRACT
Want to integrate Google technologies into the web+mobile apps that you build? Google has various open source libraries & developer tools that help you do exactly that. Users who have run into roadblocks like authentication or found our APIs confusing/challenging, are welcome to come and make these non-issues moving forward. Learn how to leverage the power of Google technologies in the next apps you build!!
[Study Guide] Google Professional Cloud Architect (GCP-PCA) CertificationAmaaira Johns
Start Here---> https://bit.ly/3bGEd9l <---Get complete detail on GCP-PCA exam guide to crack Professional Cloud Architect. You can collect all information on GCP-PCA tutorial, practice test, books, study material, exam questions, and syllabus. Firm your knowledge on Professional Cloud Architect and get ready to crack GCP-PCA certification. Explore all information on GCP-PCA exam with the number of questions, passing percentage, and time duration to complete the test.
Image archive, analysis & report generation with Google Cloudwesley chun
Google Cloud provides a diverse array of services to realize the ambition of solving real business problems, like constrained resources. An image archive & analysis plus report generation use-case can be realized with just Google Workspace & GCP APIs. The principle of mixing-and-matching Google technologies is applicable to many other challenges faced by you, your organization, or your customers. These slides are from a half- to 1-hour presentation about this case study.
Session 8 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
This technical session for Local Experts in Data Sharing (LEBDs), this session will explain how to create data processing services that are key to i4Trust.
This document discusses Google Cloud Platform's Internet of Things (IoT) architecture and services. It describes how IoT data can be captured using protocols and streaming into Google Cloud Pub/Sub. Machine learning algorithms can then detect patterns in real-time streams. Data is also archived in Cloud Storage. Google Cloud Dataflow is highlighted for processing both batch and stream workloads, with features like autoscaling, intuitive programming model, and unified processing of data.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
This document describes a serverless data architecture for processing tweets about Black Friday and performing sentiment analysis using Google Cloud Platform services. It involves collecting tweets from Twitter using Pub/Sub, running containers on Kubernetes, processing the data with Dataflow pipelines that write to BigQuery tables, and using the Natural Language API for sentiment analysis. The full pipeline is demonstrated in a live demo.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
5. Confidential & ProprietaryGoogle Cloud Platform 5
Diverse Data Sources
Data from user
acquisition
campaigns
Data from Google
Play and App Store
Turnkey gaming
metrics
(e.g. player churn and
spend predictions from
Play Games Services)
7. Confidential & ProprietaryGoogle Cloud Platform 7
Custom game events
Custom logs
Custom player telemetry
specific to your games
Diverse Data Sources
8. Confidential & ProprietaryGoogle Cloud Platform 8
Continuum of Gaming Analytics
Standard metrics:
● DAU, MAU, ARPPU
● Player Progression
● Feature Engagement
● Spend
● Retention / Churn
● Daily revenue targets
● Fraud and cheating
Key indicators specific to your game:
● Activity in communities, joining
guilds, # of friends in-game
● Reached meaningful milestone
or achievement
● Time to first meaningful transaction
● Player response to specific A/B tests
Turnkey Custom
9. ● How many players made it to stage 12?
● What path did they take through the stage?
● Health and other key stats at this point in time?
● Of the players who took the same route where a
certain condition was true, how many made an in-
app purchase?
● What are the characteristics of the player segment
who didn’t make the purchase vs. those who did?
● Why was this custom event so successful in driving
in-app purchases compared to others?
Ask custom questions
Confidential & ProprietaryGoogle Cloud Platform 9
10. 秘密 / 占有情報Google Cloud Platform 10
3 Things to Remember
秘密 / 占有情報Google Cloud Platform 10
Speed up from Batch to Real-Time
Speed up Development Time
Speed up Batch Processing1
3
2
16. Confidential & ProprietaryGoogle Cloud Platform 16
Some of DeNA's Hadoop+Hive woes:
● Many bottlenecks & failure points
● 3 hour data ingestion lag
● Too many analysts at peak time
● Slow queries
● ...
46. 秘密 / 占有情報Google Cloud Platform 46
Reads game data published in near real-time, and
uses that data to perform two separate processing
tasks:
● Calculates the total score for every unique
user and publishes speculative results for
every ten minutes of processing time.
● Calculates the team scores for each hour that
the pipeline runs using fixed-time windowing..
● In addition, the team score calculation uses
Dataflow's trigger mechanisms to provide
speculative results for each hour (which
update every five minutes until the hour is
up), and to also capture any late data and
add it to the specific hour-long window to
which it belongs.
Leaderboard Example
秘密 / 占有情報Google Cloud Platform 46
47.
48. 秘密 / 占有情報Google Cloud Platform 48
http://goo.gl/vz1Cj5
● UserScore: Basic Score Processing in Batch
● HourlyTeamScore: Advanced Processing in
Batch with Windowing
● LeaderBoard: Streaming Processing with
Real-Time Game Data
● GameStats: Abuse Detection and Usage
Analysis
Cloud Dataflow and Spark examples
Sample Code on Github
秘密 / 占有情報Google Cloud Platform 48
49. Confidential & ProprietaryGoogle Cloud Platform 49
US Mobile Game Company goes Real-time Streaming
Streaming Pipeline
BigQuery
Analytics Engine
Cloud Pub/Sub
Asynchronous messaging
Real
Time
Events
Cloud Dataflow
Parallel data processing
32 4
Streaming Pipeline
iOS
1
Real-time Events
51. Building what’s next 51
Time to Understanding
Typical Big Data
Processing
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
52. Building what’s next 52
Time to Understanding
Big Data with Google:
Focus on insight,
not infrastructure.
Programming
54. Confidential & Proprietary 54Google Cloud Platform
speed 10B logs TBs of info 10x faster
Provisions new services
in seconds instead
of days
Google App Engine syncs
with BigQuery to automatically
store tens of billions
of application logs so TabTale
can analyze issues
on a moment's notice
Run queries on terabytes
of information
in a few seconds
Can now deliver new backend
features 10 times faster
without dealing with
infrastructure maintenance
“Our ability to provision new services in seconds saves us a lot of time,
since it used to take days. The gaming industry is characterized by short-
term projects, so it’s important for us to have a backend that is flexible
and works fast.”
59. Confidential & ProprietaryGoogle Cloud Platform 59
TensorFlow open source
manifestation of our ML capability
Machine Learning - TensorFlow Machine Learning - Vision API
Label / Entity Detection, Facial
Detection, OCR, Logo Detection, Safe
Search
Machine Learning - Cloud Dataproc
Managed Hadoop, Hive, Spark
90 secs to start cluster
60. Confidential & ProprietaryGoogle Cloud Platform 60
Like you, Google is committed to gaming
Use Google’s latest technologies to build,
distribute, and monetize your games