SlideShare a Scribd company logo
1 of 26
Download to read offline
How a distributed graph
analytics platform uses
Apache Kafka for data
ingestion in real time
Rayees Pasha & Duc Le
Kafka Summit US - Sep 2021
1
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Agenda
● Overview of Graph analytics and TigerGraph
● Overview of Data ingestion into TigerGraph
● Use of Kafka Connect Framework and Benefits
● TigerGraph Data Ingestion Deep dive
● Demo - Data Ingestion using Kafka on TG Cloud
2
© 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Rayees Pasha
Product Lead,
TigerGraph
● Responsible for
TigerGraph Database
Engine, Language and
Platform areas of the
product.
● Prior Lead PM and ENG
positions at Workday,
Hitachi and HP
● Expertise in Database
Management and Big
Data Technologies
Session Presenters
3
Duc Le
Engineering Manager,
TigerGraph
● Lead Developer for
TigerGraph Cloud
● Master in Management
Information Systems from
Carnegie Mellon University
● Areas of specialty:
Full-stack Development,
Cloud, Containers and
Connectors
Overview of Graph
Analytics and TigerGraph
4
© 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Why Graph; Why Now?
Businesses want to ask business logic
questions of their data
Blending data from multiple sources,
multiple business units, and
increasingly external data
Larger and more varied datasets mean
more variables to analyze and
connections to explore and test
Importance of Graph in Today’s World
5
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM |
6
6
Who is TigerGraph?
We provide advanced analytics and machine learning on connected data
○ The only scalable graph database for the enterprise: 40-300x faster than
competition
○ Foundational for AI and ML solutions
○ Designed for efficient concurrent OLTP and OLAP workloads
○ SQL-like query language (GSQL) accelerates time to solution
○ Available on-premise & on: Google GCP, Microsoft Azure,
Our customers include:
○ The largest companies in financial services, healthcare, telecom, media, utilities
and innovative startups in cybersecurity, ecommerce and retail
Founded in 2012, HQ in Redwood City, California
Corporate Overview Video
© 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Advanced Analytics and Machine Learning on Connected Data
Advanced
Analytics
LEARN FROM CONNECTED DATA
AI-based Customer 360 for entity resolution,
recommendation engine, fraud detection
In-Database
Machine Learning
Distributed
Graph DB
Friction-free scale up from GB to TB to
Petabyte with lowest cost of ownership
.
CONNECT ALL DATASETS
AND PIPELINES
Customer 360 connecting 200+
datasets and pipelines
Item 360 for eCommerce across 100+
datasets
Fortune 50 Retailer
7 out of top 10 global banks
Real-time fraud detection and credit risk
assessment
10-100X faster than current solutions
ANALYZE CONNECTED DATA
Automotive Manufacturer
Supply chain planning accelerated
from 3 weeks to 45 minutes
Leading Healthcare Provider
7
Leading FinTech Company
Overview of Data Ingestion
into TigerGraph
8
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
TigerGraph Architecture
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Modes of Data Ingestion supported
Bulk Data
• Bulk data loads
using native File
loader
File Loader
Low-latency
● JDBC Type 4 driver for
Java, Python
● Spark can be used for
parallel loads
Real-time
● Streaming Data
Applications
● High-frequency Data
Apps
Bulk Data
Bulk data loads
using
•Native File loader,
•Kafka loader
Low-latency
● JDBC Type 4
driver for Java,
Python
● Spark can be
used for parallel
loads
Real-time
● Streaming Data
Apps
● High-frequency
Data Apps
Native File
Loader
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Data Ingestion Into TigerGraph Using Kafka loader
11
Step 3
Each GPE consumes the
partial data updates,
processes it and puts it on
disk.
Loading Jobs and POST use
UPSERT semantics:
● If vertex/edge doesn't
yet exist, create it.
● If vertex/edge already
exists, update it.
● Idempotent
Step 1
Loaders take in user source
data.
● Bulk load of data files or
a Kafka stream in CSV or
JSON format
● HTTP POSTs via REST
services (JSON)
● GSQL Insert commands
Step 2
Dispatcher takes in the data
ingestion requests in the form of
updates to the database.
1. Query IDS to get internal
IDs
2. Convert data to internal
format
3. Send data to one or more
corresponding GPEs
Use of Kafka Connect
Framework and Benefits
12
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Data
Source 1
Data
Source 2
Data
Source 3
TigerGraph Connector Framework Using Kafka Connect
TigerGraph
Cluster
Kafka Connect
Kafka (Can be customer-hosted)
Loader
(Available 2021Q4)
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
TigerGraph Connector Framework - Benefits
● Full control of data ingestion process
○ Throttle intake based on capacity
○ Pause as needed
○ Resume and restart data ingestion jobs as needed.
● Flexibility of system deployment
○ Works with natively deployed Kafka in the TigerGraph cluster
○ Allows customers to leverage existing TigerGraph with drop-in
integration with external Kafka cluster
● Push down ETL capabilities
○ Users can use data transformation with loader support for UDF
functions
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka Loader
Easy integration of data sources
Kafka Connect
+
Data source
connector
Current Data Ingestion
Architecture Deep Dive
16
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Current Use of TigerGraph Connector Framework
AWS S3
TigerGraph
Cluster
Kafka Connect
Kafka
User Input
Language
Server
GraphStudio
(browser)
Kafka
Stream
GSQL CLI
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Define the Data Source:
● CREATE DATA_SOURCE S3 s = "/path/to/s3.config"
● s3.config
S3 Loading Job through GSQL
{
"file.reader.settings.fs.s3a.access.key": "AKIAJ****4YGHQ",
"file.reader.settings.fs.s3a.secret.key": "R8bli****p+dT4"
}
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Create a Loading Job
● loading_job.gsql
● files.config
S3 Loading Job through GSQL
{
"file.uris": "s3://my-bucket/data.csv"
}
CREATE LOADING JOB job1 FOR GRAPH my_graph {
DEFINE FILENAME f = "$s:/path/to/files.config";
LOAD f TO VERTEX v1 VALUES ($0, $1, $2);
}
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Run the Loading Job
● RUN LOADING JOB job1
S3 Loading Job through GSQL
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Define the Data Source:
S3 Loading Job through GraphStudio
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Map Data Files to Vertex type or Edge type
S3 Loading Job through GraphStudio
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Map Data columns to Vertex or Edge attributes
S3 Loading Job through GraphStudio
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Run the Loading Job
S3 Loading Job through GraphStudio
Demo using TigerGraph
GraphStudio Application
25
Thanks
26

More Related Content

What's hot

Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud PlatformOpsta
 
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나NAVER CLOUD PLATFORMㅣ네이버 클라우드 플랫폼
 
[Apache Kafka® Meetup by Confluent] Graph-based stream processing
[Apache Kafka® Meetup by Confluent] Graph-based stream processing[Apache Kafka® Meetup by Confluent] Graph-based stream processing
[Apache Kafka® Meetup by Confluent] Graph-based stream processingconfluent
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringThe Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringTimothy Spann
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksNeo4j
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesRed Hat Developers
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixDataWorks Summit
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryNeo4j
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfNeo4j
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Neo4j
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationTigerGraph
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentCloudOps2005
 
OpenShift 4 installation
OpenShift 4 installationOpenShift 4 installation
OpenShift 4 installationRobert Bohne
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 

What's hot (20)

Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나
네이버클라우드플랫폼이 제안하는 멀티클라우드(박기은 CTO) - IBM 스토리지 세미나
 
[Apache Kafka® Meetup by Confluent] Graph-based stream processing
[Apache Kafka® Meetup by Confluent] Graph-based stream processing[Apache Kafka® Meetup by Confluent] Graph-based stream processing
[Apache Kafka® Meetup by Confluent] Graph-based stream processing
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and SpringThe Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy Agent
 
OpenShift 4 installation
OpenShift 4 installationOpenShift 4 installation
OpenShift 4 installation
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 

Similar to How a distributed graph analytics platform uses Apache Kafka for data ingestion in real time | Duc Le and Rayees Pasha, TigerGraph

Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...TigerGraph
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewJinyu Wang
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataprocAlluxio, Inc.
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionDaniel Zivkovic
 
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020 Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020 Oracle
 
Portworx 201 Customer Deck.pptx
Portworx 201 Customer Deck.pptxPortworx 201 Customer Deck.pptx
Portworx 201 Customer Deck.pptxssuser1490e8
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not YearsReplatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not YearsVMware Tanzu
 
Challenges In Modern Application
Challenges In Modern ApplicationChallenges In Modern Application
Challenges In Modern ApplicationRahul Kumar Gupta
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
 
The Current And Future State Of Service Mesh
The Current And Future State Of Service MeshThe Current And Future State Of Service Mesh
The Current And Future State Of Service MeshRam Vennam
 

Similar to How a distributed graph analytics platform uses Apache Kafka for data ingestion in real time | Duc Le and Rayees Pasha, TigerGraph (20)

Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service Overview
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataproc
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
 
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020 Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
 
Portworx 201 Customer Deck.pptx
Portworx 201 Customer Deck.pptxPortworx 201 Customer Deck.pptx
Portworx 201 Customer Deck.pptx
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not YearsReplatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
Replatform your Teradata to a Next-Gen Cloud Data Platform in Weeks, Not Years
 
Challenges In Modern Application
Challenges In Modern ApplicationChallenges In Modern Application
Challenges In Modern Application
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
The Current And Future State Of Service Mesh
The Current And Future State Of Service MeshThe Current And Future State Of Service Mesh
The Current And Future State Of Service Mesh
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

How a distributed graph analytics platform uses Apache Kafka for data ingestion in real time | Duc Le and Rayees Pasha, TigerGraph

  • 1. How a distributed graph analytics platform uses Apache Kafka for data ingestion in real time Rayees Pasha & Duc Le Kafka Summit US - Sep 2021 1
  • 2. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Agenda ● Overview of Graph analytics and TigerGraph ● Overview of Data ingestion into TigerGraph ● Use of Kafka Connect Framework and Benefits ● TigerGraph Data Ingestion Deep dive ● Demo - Data Ingestion using Kafka on TG Cloud 2
  • 3. © 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Rayees Pasha Product Lead, TigerGraph ● Responsible for TigerGraph Database Engine, Language and Platform areas of the product. ● Prior Lead PM and ENG positions at Workday, Hitachi and HP ● Expertise in Database Management and Big Data Technologies Session Presenters 3 Duc Le Engineering Manager, TigerGraph ● Lead Developer for TigerGraph Cloud ● Master in Management Information Systems from Carnegie Mellon University ● Areas of specialty: Full-stack Development, Cloud, Containers and Connectors
  • 4. Overview of Graph Analytics and TigerGraph 4
  • 5. © 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Why Graph; Why Now? Businesses want to ask business logic questions of their data Blending data from multiple sources, multiple business units, and increasingly external data Larger and more varied datasets mean more variables to analyze and connections to explore and test Importance of Graph in Today’s World 5
  • 6. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | 6 6 Who is TigerGraph? We provide advanced analytics and machine learning on connected data ○ The only scalable graph database for the enterprise: 40-300x faster than competition ○ Foundational for AI and ML solutions ○ Designed for efficient concurrent OLTP and OLAP workloads ○ SQL-like query language (GSQL) accelerates time to solution ○ Available on-premise & on: Google GCP, Microsoft Azure, Our customers include: ○ The largest companies in financial services, healthcare, telecom, media, utilities and innovative startups in cybersecurity, ecommerce and retail Founded in 2012, HQ in Redwood City, California Corporate Overview Video
  • 7. © 2020. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Advanced Analytics and Machine Learning on Connected Data Advanced Analytics LEARN FROM CONNECTED DATA AI-based Customer 360 for entity resolution, recommendation engine, fraud detection In-Database Machine Learning Distributed Graph DB Friction-free scale up from GB to TB to Petabyte with lowest cost of ownership . CONNECT ALL DATASETS AND PIPELINES Customer 360 connecting 200+ datasets and pipelines Item 360 for eCommerce across 100+ datasets Fortune 50 Retailer 7 out of top 10 global banks Real-time fraud detection and credit risk assessment 10-100X faster than current solutions ANALYZE CONNECTED DATA Automotive Manufacturer Supply chain planning accelerated from 3 weeks to 45 minutes Leading Healthcare Provider 7 Leading FinTech Company
  • 8. Overview of Data Ingestion into TigerGraph 8
  • 9. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | TigerGraph Architecture
  • 10. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Modes of Data Ingestion supported Bulk Data • Bulk data loads using native File loader File Loader Low-latency ● JDBC Type 4 driver for Java, Python ● Spark can be used for parallel loads Real-time ● Streaming Data Applications ● High-frequency Data Apps Bulk Data Bulk data loads using •Native File loader, •Kafka loader Low-latency ● JDBC Type 4 driver for Java, Python ● Spark can be used for parallel loads Real-time ● Streaming Data Apps ● High-frequency Data Apps Native File Loader
  • 11. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Data Ingestion Into TigerGraph Using Kafka loader 11 Step 3 Each GPE consumes the partial data updates, processes it and puts it on disk. Loading Jobs and POST use UPSERT semantics: ● If vertex/edge doesn't yet exist, create it. ● If vertex/edge already exists, update it. ● Idempotent Step 1 Loaders take in user source data. ● Bulk load of data files or a Kafka stream in CSV or JSON format ● HTTP POSTs via REST services (JSON) ● GSQL Insert commands Step 2 Dispatcher takes in the data ingestion requests in the form of updates to the database. 1. Query IDS to get internal IDs 2. Convert data to internal format 3. Send data to one or more corresponding GPEs
  • 12. Use of Kafka Connect Framework and Benefits 12
  • 13. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Data Source 1 Data Source 2 Data Source 3 TigerGraph Connector Framework Using Kafka Connect TigerGraph Cluster Kafka Connect Kafka (Can be customer-hosted) Loader (Available 2021Q4)
  • 14. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | TigerGraph Connector Framework - Benefits ● Full control of data ingestion process ○ Throttle intake based on capacity ○ Pause as needed ○ Resume and restart data ingestion jobs as needed. ● Flexibility of system deployment ○ Works with natively deployed Kafka in the TigerGraph cluster ○ Allows customers to leverage existing TigerGraph with drop-in integration with external Kafka cluster ● Push down ETL capabilities ○ Users can use data transformation with loader support for UDF functions
  • 15. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka Loader Easy integration of data sources Kafka Connect + Data source connector
  • 17. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Current Use of TigerGraph Connector Framework AWS S3 TigerGraph Cluster Kafka Connect Kafka User Input Language Server GraphStudio (browser) Kafka Stream GSQL CLI
  • 18. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Define the Data Source: ● CREATE DATA_SOURCE S3 s = "/path/to/s3.config" ● s3.config S3 Loading Job through GSQL { "file.reader.settings.fs.s3a.access.key": "AKIAJ****4YGHQ", "file.reader.settings.fs.s3a.secret.key": "R8bli****p+dT4" }
  • 19. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Create a Loading Job ● loading_job.gsql ● files.config S3 Loading Job through GSQL { "file.uris": "s3://my-bucket/data.csv" } CREATE LOADING JOB job1 FOR GRAPH my_graph { DEFINE FILENAME f = "$s:/path/to/files.config"; LOAD f TO VERTEX v1 VALUES ($0, $1, $2); }
  • 20. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Run the Loading Job ● RUN LOADING JOB job1 S3 Loading Job through GSQL
  • 21. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Define the Data Source: S3 Loading Job through GraphStudio
  • 22. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Map Data Files to Vertex type or Edge type S3 Loading Job through GraphStudio
  • 23. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Map Data columns to Vertex or Edge attributes S3 Loading Job through GraphStudio
  • 24. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Run the Loading Job S3 Loading Job through GraphStudio