Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bridge to Cloud: Using Apache Kafka to Migrate to GCP


Published on

Watch this talk here:

Most companies start their cloud journey with a new use case, or a new application. Sometimes these applications can run independently in the cloud, but often times they need data from the on premises datacenter. Existing applications will slowly migrate, but will need a strategy and the technology to enable a multi-year migration.

In this session, we will share how companies around the world are using Confluent Cloud, a fully managed Apache Kafka® service, to migrate to Google Cloud Platform. By implementing a central-pipeline architecture using Apache Kafka to sync on-prem and cloud deployments, companies can accelerate migration times and reduce costs.

Register now to learn:
-How to take the first step in migrating to GCP
-How to reliably sync your on premises applications using a persistent bridge to cloud
-How Confluent Cloud can make this daunting task simple, reliable and performant

Published in: Technology
  • Be the first to comment

Bridge to Cloud: Using Apache Kafka to Migrate to GCP

  1. 1. 1 Building the Bridge to Cloud Using Apache Kafka to Migrate to GCP
  2. 2. 2 Speakers Priya Shivakumar Director of Product, Confluent Ryan Lippert Product Marketing, Google Cloud
  3. 3. 3 Agenda
  4. 4. 4 App Service Service Service Service Service Service Service Service App Service Service App App Service Service ServiceDeveloper APIs Real-time Event Streaming Platform Event Streaming
  5. 5. 5 The Great “Cloud Shift”
  6. 6. 6 Cloud Migration: A one time thing?
  7. 7. 7 In reality, we keep running
  8. 8. 8 We don’t want to just move. We want to build for the cloud.
  9. 9. 9 Cloud Bigtable Cloud SQL BigQuery Cloud Storage
  10. 10. 10 Cloud Bigtable Cloud Storage
  11. 11. 1111 —Chris Roberts, VP of Enterprise Architecture, Alight A New Paradigm --- In Our Customer’s Words “Event Streaming has gotten our organization to think differently about how we deliver solutions . It is now a foundational part of our technology strategy.”
  12. 12. 12 Adoption of Event Streaming 60%Fortune 100 Companies Using Apache Kafka
  13. 13. 13C O N F I D E N T I A L Apache Kafka, the de-facto OSS standard for event streaming Real-time | Scalable | Persistent | Reliable | 2 trillion messages 500 billion events
  14. 14. 14 What is our bridge? Confluent Replicator
  15. 15. 15C O N F I D E N T I A L Replicator | Reliable, Scalable, Simple Feature List Replicator Mirror-maker Reliable Auto creation of topics ✔ Partial New partition addition Configuration replication ✔ X Single message transformations ✔ X Active-active replication ✔ X Scalable Aggregate cluster - single management point for multiple clusters ✔ X Auto scale - scale replication processes as Kafka traffic increases with a single configuration ✔ X Simple Control Center Integration - manage and monitor replication via Control Center UI ✔ X Disaster Recovery support Active-active replication - redirect events to avoid infinite replication loops in active-active configurations ✔ X
  16. 16. 16 Disaster Recovery and Bridge to Cloud ● ● ● Confluent Replicator
  17. 17. 17 Establish Your Foundation 1 Deploy Confluent on-premises and on GCP Confluent Replicator Cloud Bigtable Cloud Storage
  18. 18. 18 Establish Your Foundation 2 Create your pipeline and replicate your topics to your GCP cluster Confluent Replicator Cloud Bigtable Cloud Storage
  19. 19. 19 Let the Traffic Flow 3 Migrate app by app, database by database Cloud Bigtable Cloud Storage
  20. 20. 20 Cloud Bigtable Cloud SQL BigQuery Cloud Storage
  21. 21. 21 Cloud Bigtable Cloud SQL BigQuery Cloud Storage
  22. 22. 22 Confluent Cloud manages Kafka for you Mission-Critical Reliability Complete Streaming Service Freedom of Choice
  23. 23. 23C O N F I D E N T I A L Confluent Cloud Battle tested for massive scale, mission-critical pipelines ● ● ● ● ● ● ● ● ●
  24. 24. 24C O N F I D E N T I A L ® Database changes Log events IoT events Web events Transformations Custom apps Analytics Monitoring Hadoop Database Data warehouse CRM DATA INTEGRATION REAL-TIME APPS Confluent Cloud Kafka re-engineered for cloud
  25. 25. 25C O N F I D E N T I A L Private Cloud Public CloudHybrid Cloud Confluent Cloud Industry’s only hybrid Kafka service
  26. 26. 26C O N F I D E N T I A L Confluent | Singular Kafka focus and innovation Confluent Vision for Kafka ● Automated disaster recovery ● Global applications with geo-awareness ● Efficient and infinite data with tiered storage ● Unlimited horizontal scalability for single clusters ● Faster elastic scaling for brokers and partitions ● Easy Kubernetes- based orchestration and management with Confluent operator ● Faster elastic scaling when adding brokers and partitions
  27. 27. 27C O N F I D E N T I A L Confluent Cloud + GCP Ecosystem Cloud Dataproc Cloud Dataflow BigQuery Cloud Storage Cloud Bigtable Cloud Machine Learning Engine
  28. 28. 28 Key Considerations for Cloud Analytics Platforms ● ● ● ●
  29. 29. Big data is in our DNA 8 products with > 1BN users
  30. 30. 30 Our approach to data analytics Focus on analytics not infrastructure Develop comprehensive solutions End-to-end ML lifecycle Innovation and proven results
  31. 31. 31 Serverless data analytics From infrastructure to platform for insights Performance tuning Monitoring Reliability Deployment & configuration Utilization improvements The traditional data analytics platform Analysis and insights Resource provisioning Handling growing scale Analysis and insights The serverless data analytics model
  32. 32. 32 Complete foundation for data lifecycle Data ingestion at any scale Reliable streaming data pipeline Advanced analytics Data warehousing and data lake Google SheetsApache Beam Cloud Pub/Sub Cloud Dataflow Cloud Dataproc BigQuery Cloud Storage Cloud AI Google Data StudioData Transfer Service Tensorflow Cloud Composer Cloud IoT Core Cloud Dataprep
  33. 33. 33 Serverless analytics for complete ML lifecycle Ingest Explore Prepare Preproces s Train Hypertune Test Predict (Online) Predict (Batch) ML activity GCP services Apache Kafka (Confluent) Transfer Service GCS Pub/Sub BigQuery Dataprep Dataflow Dataproc BigQuery Dataprep Dataflow Dataproc BigQuery Data Machine Learning Engine Apps
  34. 34. 34 2008 2010 2012 2014200620042002 2016 2018 Google papers Open source Google Cloud products BigQuery Pub/Sub Dataflow Bigtable ML Spanner GFS Map Reduce Flume JavaBigTable Dremel Spanner Millwheel TensorflowDataflow Fifteen years of tackling big data problems Composer
  35. 35. 35 2008 2010 2012 2014200620042002 2016 2018 Google papers Open source Google Cloud products BigQuery Pub/Sub Dataflow Bigtable ML Spanner GFS Map Reduce Flume JavaBigTable Dremel Spanner Millwheel TensorflowDataflow Fifteen years of tackling big data problems Composer
  36. 36. 36 Modernize your data warehouse foundation Analyze streaming data in real time Process big data with Hadoop/Spark Get all your business data in one place for faster and comprehensive analysis Gain real-time business insights and make your business more responsive Simplify complex tasks with pre-learned machine learning engines
  37. 37. 37 BigQuery: modernize your data warehouse Get all your business data in one place for faster and comprehensive analysis
  38. 38. 38 Data warehouses From 1st-gen EDWs, increased data collection and analysis has helped build more data-driven businesses. 90’s 00’s BI foundations Data warehousing formed the foundation of reporting and business intelligence. Cloud data warehousing BigQuery represents a fundamentally different approach to cloud data warehousing. Now AI foundations We’re working to make BigQuery the foundation for organizations that will leverage machine intelligence in their businesses. Next Data warehousing for AI-driven business
  39. 39. 39 What is BigQuery? Convenience of standard SQL Fully managed and serverless Google Cloud Platform’s enterprise data warehouse for analytics Encrypted, durable and highly available Petabyte-scale storage and queries Real-time analytics on streaming data
  40. 40. 40 BigQuery: architecture Serverless. Decoupled storage and compute for maximum flexibility. SQL:2011 Compliant Petabit network BigQuery High-available cluster compute (Dremel)Streaming ingest Free bulk loading Replicated, distributed storage (99.9999999999% durability) REST API Client libraries In 7 languages Web UI, CLIDistributed memory shuffle tier
  41. 41. 41 BigQuery ML empowers data analysts and data scientists Execute ML initiatives without moving data from BigQuery Iterate on models in SQL in BigQuery to increase development speed Automate model selection, and hypertuning Introducing BigQuery ML
  42. 42. 42 Unlock big data for all users with BigQuery & Sheets “For analysts spread across the globe, this is a blessing. They can now collaborate easily with a streamlined flow for sharing their insights.” -- Nikhil Mishra @ Yahoo
  43. 43. 43 Modern data warehouse on Google Cloud Platform Batch pipeline Confluent managed Apache Kafka Cloud Storage raw log storage Cloud Dataflow parallel data processing Cloud BigQuery analytics engine Google Data Studio Visual analytics & dashboarding Real-time events Streaming pipeline Streaming pipeline Batch pipeline Batch load Partner BI Tools Co-workers Google Sheets Cloud Dataprep Visual data preparation
  44. 44. 44 Firebase Export GA360 Export Google BQ-Data Transfer Service Partners You can use BigQuery to build a modern marketing data warehouse Salesforce, Marketo, Facebook, Twitter, CRM data etc... ML BigQuery Dataprep Dataflow Data Studio DataLab Extract Transform Load Visualization
  45. 45. 45 Analyze streaming data in real time Gain real-time business insights and make your business more responsive
  46. 46. 46 Real time is real value E-Commerce: Clickstream analysis and dynamic user segmentation Retail: Process point-of-sale transactions for real-time inventory positions Mobile gaming: find the best Poké Ball collectors Manufacturing: IoT data analysis for improving operational efficiency
  47. 47. 47 Stream data analytics on Google Cloud Platform Ingest AnalyzeTransform Cloud Dataflow Machine learning & data warehouse Ingest and distribute data reliably Fast, correct computations quickly and simply BigQuery Cloud Machine Learning Cloud Natural Language API Cloud Translation API Cloud Vision API Cloud Pub/Sub Confluent Cloud (managed Apache Kafka)
  48. 48. 48 Cloud Dataflow The fully-managed data processing service that simplifies development and management of stream and batch pipelines Accelerate development for streaming & batch Fast, simplified data pipeline development via expressive Java and Python APIs in the Apache Beam SDK Simplified management and operations Remove operational overhead by letting Cloud Dataflow auto-manage performance, scaling, availability, security and compliance. Build on a foundation for machine learning Add TensorFlow-based Cloud Machine Learning models and APIs to your data processing pipelines for real-time predictions
  49. 49. 49 With Google Cloud Platform you get unified, open and fully-managed architecture for stream analytics you can use Endpoint clients User & device data Or Or Ingest Transform Analyze (data warehouse) Web IoT Mobile PubSub Apache Kafka Apache Beam Dataflow Apache Spark BigQuery ML BigTable Data Studio 3rd-party BI Tools Data consumers
  50. 50. 50 Faster and easier Spark & Hadoop jobs with Cloud Dataproc
  51. 51. 51 Cloud Dataproc It is the simpler, more cost-efficient way to make your Apache Spark & Hadoop deployments a success It’s flexible Create and resize managed Hadoop and Spark clusters in less than 90 seconds It’s easy Lift and shift existing projects or ETL pipelines, no redevelopment necessary It’s cost effective Easily process large datasets at low cost, pay only for the resources you use (by the minute) It’s open Leverage tools, libraries, and documentation from the Spark and Hadoop ecosystem
  52. 52. From DIY to fully managed Self-managed On premises On compute engine Cloud Dataproc Custom code Monitoring/Health Dev integration Scaling Job submission GCP connectivity Deployment Creation Custom code Monitoring/Health Dev integration Scaling Job submission GCP connectivity Deployment Creation Custom code Monitoring/Health Dev integration Scaling Job submission GCP connectivity Deployment Creation Google managed
  53. 53. 53 Data lake for analytics Store massive volume of structured & unstructured (such as videos, images, text files etc) data economically Perform ad-hoc analysis on the unstructured data Process unstructured data and load structured data into the data warehouse for reporting and analysis Create machine-learning models based on unstructured data and predict outcomes (Image recognition & classification, voice translation, handwriting recognitions, video stream analysis, etc) Data storage Ad-hoc analysis Data processing Advanced analytics
  54. 54. 54 Artificial intelligence and machine learning
  55. 55. Confidential & Proprietary Google is an AI company Used across products: Google 3 directories containing Brain Model
  56. 56. Make AI easy, fast and useful for enterprises and developers
  57. 57. Why partner with Google on AI? Scale Speed Quality Best performance for AI workloads with customized hardware and Cloud TPUs Instant access to thousands of machines with Google Cloud Pre-trained AI building blocks solve business needs, with the highest quality Customization Cloud AutoML and ML Engine to customize models, and advanced solutions lab for deeper needs 1 2 3 4
  58. 58. 58 Comprehensive set of AI Building Blocks New New Conversation Cloud Speech-to-Text Dialogflow Enterprise Edition Cloud Text-to-Speech Sight Cloud Vision Cloud Video Intelligence AutoML Vision Language Cloud Translation Cloud Natural Language AutoML Translation AutoML Natural Language New
  59. 59. Cloud ML Engine For large scale deep learning Simple API (train, batch predict, online predict, manage model) Managed TensorFlow: regression, trees, SVMs, NN, RNN, CNN, etc. Accelerators everywhere (CPU, GPU, TPU) at scale Jupyter notebooks for data exploration and visualization
  60. 60. Fully managed in the cloud Deeply integrates with TensorFlow Created by Google to train and execute deep neural networks Accelerate ML workloads and speed up time to market with Cloud TPU
  61. 61. 61C O N F I D E N T I A L Confluent | Complete portfolio of products and services built around Kafka Kafka Training Confluent Platform Professional Services Fully Managed Kafka
  62. 62. 62 Q&A
  63. 63. 63 Next Steps
  64. 64. 64