SlideShare a Scribd company logo
Continuously computing and
indexing a derived graph using
Apache Fluo
Keith Turner
Peterson Technologies
Percolator : Google’s Use Case
● Terabytes of new data coming in each day
● To build index: join terabytes of new data with petabytes of existing data.
● Joining new data with existing data via Map Reduce took multiple days.
● Using Percolator, index update time dropped from days to minutes.
Fluo Features
● Layer on top of Accumulo
● Snapshot Isolation : only see committed data
● Cross Row/Node Transactions
○ Read/write data from multiple nodes
○ Fail if two transactions modify same cell : collision
○ Correct in case of faults on multiple nodes
● Observers
○ User code, executes a transaction
○ Triggered by persistent notifications.
○ Observers can trigger other observers
○ Runs in parallel on many nodes
Fluo 101 - Architecture
Accumulo
HDFS
Zookeeper
YARN (Kubernetes and Mesos soon)
Client Cluster
Fluo Client
for App 1
Fluo Client
for App 1
Fluo Client
for App 2
Fluo Application 2Fluo Application 1
Fluo Worker
Observer1 Observer2
Fluo Oracle
Fluo Worker
ObserverA
Fluo Oracle
Fluo Worker
Observer1 Observer2
Table1 Table2
Derived graph overview
Graphs from multiple social networks
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Analytics determine aliases
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Aliases
A1
A2
A3
A4
Create derived graph
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Derived
A1
A2
A3
A4
F5
Add an edge
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3
Derived
A1
A2
A3
A4
F5
Add edge in derived graph
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Derived
A1
A2
A3
A4
F5
Add attributes
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3
Derived
A1
A2
A3
A4
F5
Location:
4 Privet Dr
Timezone:
GMT
Add attributes in derived graph
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Derived
A1
A2
A3
A4
F5
Location:
4 Privet Dr
Timezone:
GMT
Timezone:
GMT
Location:
4 Privet Dr
Putting it all together
Fluo Derived Graph
Application
Raw Graph Data
Changes
Alias analytics
Attribute analytics
Query System
Analytic System
Distribution of data on cluster
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6
Input graph 1 (e.g. Twitter data)
Input graph 2 (e.g. Github data)
Derived graph
Input graph 3 (e.g. Facebook data)
Aliases
Attributes
Using Map Reduce to create derived graph
● Three to Four Joins/Map Reduce jobs
● Analysis/indexing of derived graph requires additional jobs
● When input data changes, must reprocess all data
Derived edges Map Reduce job #1
Input
Aliases
A1 F1, T1
A1 F1
A2 T2
A3 F3,T3
Edges
T1 T3
T3 T1
T1 T2
F1 F3
Output
Derived
Edges
Original
Edges
A1 T3 T1 T3
A3 T1 T3 T1
A1 T2 T1 T2
A1 F3 F1 F3
Derived edges Map Reduce job #2
Input
Aliases
A1 F1, T1
A1 F1
A2 T2
A3 F3,T3
Output
Derived
Edges
Original
Edges
A1 A3 T1 T3
A3 A1 T3 T1
A1 A2 T1 T2
A1 A3 F1 F3
Derived
Edges
Original
Edges
A1 T3 T1 T3
A3 T1 T3 T1
A1 T2 T1 T2
A1 F3 F1 F3
Unique edges Map Reduce job (optional)
Input
Output
Derived
Edges
Original
Edges
A1 A3 {T1->T3,F1->F3}
A1 A2 {T1->T2}
A3 A1 {T3->T1}
Derived
Edges
Original
Edges
A1 A3 T1 T3
A3 A1 T3 T1
A1 A2 T1 T2
A1 A3 F1 F3
Derived attributes Map Reduce job
Input
Aliases
A1 F1, T1
A1 F1
A2 T2
A3 F3,T3
Attributes
T1 {K1=V1}
F1 {K2=V2}
Output
Derived Attributes
A1 {T1.K1=V1,
F1.K2=V2}
Analysis/Indexing Map Reduce jobs ...
Input
Derived
Edges
Original
Edges
A1 A3 {T1->T3,F1->F3}
A1 A2 {T1->T2}
A3 A1 {T3->T1}
Derived Attributes
A1 {T1.K1=V1,
F1.K2=V2}
Output ????
Using Fluo to create derived graph
● Inputs
○ Raw edges
○ Raw node attributes
○ Aliases
● Supports adding and removing
○ Does not require reprocessing all data
● Outputs changes to derived graph
Adding a new edge
Fluo Data (stored in Accumulo table)
Twitter data
T1 alias A1
T3 alias A3
T5 alias A5
New Edge Transaction : T1->T5
Derived GraphGithub data
G1 alias A1
G2 alias A2
G7 alias A7
Fluo Data
Twitter data
T1 alias A1
T3 alias A3
T5 alias A5
New Edge Transaction : T1->T5
● Read Aliases
Derived GraphGithub data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Write Edges
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Notify nodes
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Commit
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
Processing Changes to a Derived Node
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Read changed edges
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Mark edge processed
● Queue for export
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Commit
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
+ A5<-A1 Followers:1 Following:0
Processing an Alias Change
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A1 -> A7 G1:G7
A5 <- A1 T5:T1
A7 <- A1 G7:G1
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A1 -> A7 G1:G7
A5 <- A1 T5:T1
A7 <- A1 G7:G1
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Read edges and alias
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Delete edges
● Insert edges
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Set notifications
Legend
Data WrittenData Read Notification
Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Commit
Legend
Data WrittenData Read Notification
Legend
Concurrent Aliases Change
Twitter data (time 1)
T1 alias A9
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
● Alias for T1 and T2 both change.
● Starts two transactions.
● Collision : one fails, one succeeds.
Twitter data (time2)
T1 alias A9
T1 -> T5 A1:A7
T1 -> T5 A9:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A9
T5 <- T1 A7:A1
Twitter data (time 0)
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Transaction 1
Changes
Transaction 2
Changes
Adding attribute not covered
Mixer prototype
● Supports add/remove of edges, aliases, and attributes.
● Exports changes to external query table.
○ Uses invert on export.
● Can lookup nodes in external query table
● Available soon on github
● Easy to run using MiniFluo and MiniAccumulo
○ Git clone
○ ./mixer.sh mini &> mini.log &
○ ./mixer.sh shell fluo.properties
Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
tw:alice95
loc=TX
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
alice
tw:alice95
loc=TX
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
alice
tw:alice95
loc=TX
joe
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1
bob -> joe followers=1,following=2,rawEdges=2
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
Getting started with Fluo
● Fluo Tour
● Documentation on website
● Mailing list and IRC
https://fluo.apache.org

More Related Content

Similar to Using Apache Fluo to Create a Derived Graph

Sprint 19 report
Sprint 19 reportSprint 19 report
Sprint 19 report
ManageIQ
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Accumulo Summit 2014: Accismus -- Percolating with Accumulo
Accumulo Summit 2014: Accismus -- Percolating with AccumuloAccumulo Summit 2014: Accismus -- Percolating with Accumulo
Accumulo Summit 2014: Accismus -- Percolating with Accumulo
Accumulo Summit
 
Sprint 19
Sprint 19Sprint 19
Sprint 19
ManageIQ
 
Android binder introduction
Android binder introductionAndroid binder introduction
Android binder introduction
Derek Fang
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
Edhole.com
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Hsien-Hsin Sean Lee, Ph.D.
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
manual ecs mothoerboard h61m loga 1155 ok
manual ecs mothoerboard h61m loga 1155 okmanual ecs mothoerboard h61m loga 1155 ok
manual ecs mothoerboard h61m loga 1155 ok
brenk08
 
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Marlon Dumas
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
PingCAP
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
Edhole.com
 
5asm the stackandsubroutines
5asm the stackandsubroutines5asm the stackandsubroutines
5asm the stackandsubroutines
Rabi Iftikhar
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
Oswald Campesato
 
Pdf secrets v2
Pdf secrets v2Pdf secrets v2
Pdf secrets v2
Ange Albertini
 
Loader
LoaderLoader
Loader
nikhilshrama
 
How Firebird transactions work
How Firebird transactions workHow Firebird transactions work
How Firebird transactions work
Alexey Kovyazin
 
ECE_374_Lec4.pdf
ECE_374_Lec4.pdfECE_374_Lec4.pdf
ECE_374_Lec4.pdf
UmaMahesh521840
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020
Guido Oswald
 
REGISTER TRANSFER AND MICROOPERATIONS2017-3-5.ppt
REGISTER  TRANSFER  AND  MICROOPERATIONS2017-3-5.pptREGISTER  TRANSFER  AND  MICROOPERATIONS2017-3-5.ppt
REGISTER TRANSFER AND MICROOPERATIONS2017-3-5.ppt
NARENDRAKUMARCHAURAS1
 

Similar to Using Apache Fluo to Create a Derived Graph (20)

Sprint 19 report
Sprint 19 reportSprint 19 report
Sprint 19 report
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Accumulo Summit 2014: Accismus -- Percolating with Accumulo
Accumulo Summit 2014: Accismus -- Percolating with AccumuloAccumulo Summit 2014: Accismus -- Percolating with Accumulo
Accumulo Summit 2014: Accismus -- Percolating with Accumulo
 
Sprint 19
Sprint 19Sprint 19
Sprint 19
 
Android binder introduction
Android binder introductionAndroid binder introduction
Android binder introduction
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
manual ecs mothoerboard h61m loga 1155 ok
manual ecs mothoerboard h61m loga 1155 okmanual ecs mothoerboard h61m loga 1155 ok
manual ecs mothoerboard h61m loga 1155 ok
 
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
5asm the stackandsubroutines
5asm the stackandsubroutines5asm the stackandsubroutines
5asm the stackandsubroutines
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
 
Pdf secrets v2
Pdf secrets v2Pdf secrets v2
Pdf secrets v2
 
Loader
LoaderLoader
Loader
 
How Firebird transactions work
How Firebird transactions workHow Firebird transactions work
How Firebird transactions work
 
ECE_374_Lec4.pdf
ECE_374_Lec4.pdfECE_374_Lec4.pdf
ECE_374_Lec4.pdf
 
Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020Spark + AI Summit recap jul16 2020
Spark + AI Summit recap jul16 2020
 
REGISTER TRANSFER AND MICROOPERATIONS2017-3-5.ppt
REGISTER  TRANSFER  AND  MICROOPERATIONS2017-3-5.pptREGISTER  TRANSFER  AND  MICROOPERATIONS2017-3-5.ppt
REGISTER TRANSFER AND MICROOPERATIONS2017-3-5.ppt
 

Recently uploaded

SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 

Recently uploaded (20)

SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 

Using Apache Fluo to Create a Derived Graph

  • 1. Continuously computing and indexing a derived graph using Apache Fluo Keith Turner Peterson Technologies
  • 2. Percolator : Google’s Use Case ● Terabytes of new data coming in each day ● To build index: join terabytes of new data with petabytes of existing data. ● Joining new data with existing data via Map Reduce took multiple days. ● Using Percolator, index update time dropped from days to minutes.
  • 3. Fluo Features ● Layer on top of Accumulo ● Snapshot Isolation : only see committed data ● Cross Row/Node Transactions ○ Read/write data from multiple nodes ○ Fail if two transactions modify same cell : collision ○ Correct in case of faults on multiple nodes ● Observers ○ User code, executes a transaction ○ Triggered by persistent notifications. ○ Observers can trigger other observers ○ Runs in parallel on many nodes
  • 4. Fluo 101 - Architecture Accumulo HDFS Zookeeper YARN (Kubernetes and Mesos soon) Client Cluster Fluo Client for App 1 Fluo Client for App 1 Fluo Client for App 2 Fluo Application 2Fluo Application 1 Fluo Worker Observer1 Observer2 Fluo Oracle Fluo Worker ObserverA Fluo Oracle Fluo Worker Observer1 Observer2 Table1 Table2
  • 6. Graphs from multiple social networks Twitter T1 T2 T3 T4 Github G1 G2 G3 G4 Facebook F1 F2 F3 F5
  • 10. Add edge in derived graph Twitter T1 T2 T3 T4 Github G1 G2 G3 G4 Facebook F1 F2 F3 F5 Derived A1 A2 A3 A4 F5
  • 12. Add attributes in derived graph Twitter T1 T2 T3 T4 Github G1 G2 G3 G4 Facebook F1 F2 F3 F5 Derived A1 A2 A3 A4 F5 Location: 4 Privet Dr Timezone: GMT Timezone: GMT Location: 4 Privet Dr
  • 13. Putting it all together Fluo Derived Graph Application Raw Graph Data Changes Alias analytics Attribute analytics Query System Analytic System
  • 14. Distribution of data on cluster Server 1 Server 2 Server 3 Server 4 Server 5 Server 6 Input graph 1 (e.g. Twitter data) Input graph 2 (e.g. Github data) Derived graph Input graph 3 (e.g. Facebook data) Aliases Attributes
  • 15. Using Map Reduce to create derived graph ● Three to Four Joins/Map Reduce jobs ● Analysis/indexing of derived graph requires additional jobs ● When input data changes, must reprocess all data
  • 16. Derived edges Map Reduce job #1 Input Aliases A1 F1, T1 A1 F1 A2 T2 A3 F3,T3 Edges T1 T3 T3 T1 T1 T2 F1 F3 Output Derived Edges Original Edges A1 T3 T1 T3 A3 T1 T3 T1 A1 T2 T1 T2 A1 F3 F1 F3
  • 17. Derived edges Map Reduce job #2 Input Aliases A1 F1, T1 A1 F1 A2 T2 A3 F3,T3 Output Derived Edges Original Edges A1 A3 T1 T3 A3 A1 T3 T1 A1 A2 T1 T2 A1 A3 F1 F3 Derived Edges Original Edges A1 T3 T1 T3 A3 T1 T3 T1 A1 T2 T1 T2 A1 F3 F1 F3
  • 18. Unique edges Map Reduce job (optional) Input Output Derived Edges Original Edges A1 A3 {T1->T3,F1->F3} A1 A2 {T1->T2} A3 A1 {T3->T1} Derived Edges Original Edges A1 A3 T1 T3 A3 A1 T3 T1 A1 A2 T1 T2 A1 A3 F1 F3
  • 19. Derived attributes Map Reduce job Input Aliases A1 F1, T1 A1 F1 A2 T2 A3 F3,T3 Attributes T1 {K1=V1} F1 {K2=V2} Output Derived Attributes A1 {T1.K1=V1, F1.K2=V2}
  • 20. Analysis/Indexing Map Reduce jobs ... Input Derived Edges Original Edges A1 A3 {T1->T3,F1->F3} A1 A2 {T1->T2} A3 A1 {T3->T1} Derived Attributes A1 {T1.K1=V1, F1.K2=V2} Output ????
  • 21. Using Fluo to create derived graph ● Inputs ○ Raw edges ○ Raw node attributes ○ Aliases ● Supports adding and removing ○ Does not require reprocessing all data ● Outputs changes to derived graph
  • 22. Adding a new edge
  • 23. Fluo Data (stored in Accumulo table) Twitter data T1 alias A1 T3 alias A3 T5 alias A5 New Edge Transaction : T1->T5 Derived GraphGithub data G1 alias A1 G2 alias A2 G7 alias A7
  • 24. Fluo Data Twitter data T1 alias A1 T3 alias A3 T5 alias A5 New Edge Transaction : T1->T5 ● Read Aliases Derived GraphGithub data G1 alias A1 G2 alias A2 G7 alias A7 Legend Data WrittenData Read Notification
  • 25. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 New Edge Transaction : T1->T5 ● Write Edges Derived Graph A1 -> A5 T1:T5 new A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Legend Data WrittenData Read Notification
  • 26. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 New Edge Transaction : T1->T5 ● Notify nodes Derived Graph A1 -> A5 T1:T5 new A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Legend Data WrittenData Read Notification
  • 27. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 New Edge Transaction : T1->T5 ● Commit Derived Graph A1 -> A5 T1:T5 new A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Legend Data WrittenData Read Notification
  • 28. Processing Changes to a Derived Node
  • 29. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 Derived Node Transaction : A1 ● Read changed edges Derived Graph A1 -> A5 T1:T5 new A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Export Queue Legend Data WrittenData Read Notification
  • 30. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 Derived Node Transaction : A1 ● Mark edge processed ● Queue for export Derived Graph A1 -> A5 T1:T5 A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Export Queue + A1->A5 Followers:0 Following:1 Legend Data WrittenData Read Notification
  • 31. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 Derived Node Transaction : A1 ● Commit Derived Graph A1 -> A5 T1:T5 A5 <- A1 T5:T1 new Github data G1 alias A1 G2 alias A2 G7 alias A7 Export Queue + A1->A5 Followers:0 Following:1 Legend Data WrittenData Read Notification
  • 32. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 Derived Graph A1 -> A5 T1:T5 A5 <- A1 T5:T1 Github data G1 alias A1 G2 alias A2 G7 alias A7 Export Queue + A1->A5 Followers:0 Following:1 + A5<-A1 Followers:1 Following:0
  • 34. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A7 T5 <- T1 A5:A1 Derived Graph A1 -> A5 T1:T5 A1 -> A7 G1:G7 A5 <- A1 T5:T1 A7 <- A1 G7:G1 Github data G1 alias A1 G1 -> G7 A1:A7 G2 alias A2 G7 alias A7 G7 <- G1 A7:A1 Alias Change Transaction : T5 Legend Data WrittenData Read Notification
  • 35. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A7 T5 <- T1 A5:A1 Derived Graph A1 -> A5 T1:T5 A1 -> A7 G1:G7 A5 <- A1 T5:T1 A7 <- A1 G7:G1 Github data G1 alias A1 G1 -> G7 A1:A7 G2 alias A2 G7 alias A7 G7 <- G1 A7:A1 Alias Change Transaction : T5 ● Read edges and alias Legend Data WrittenData Read Notification
  • 36. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A7 T3 alias A3 T5 alias A7 T5 <- T1 A7:A1 Derived Graph A1 -> A5 T1:T5 deleted A1 -> A7 G1:G7 A1 -> A7 T1:T5 new A5 <- A1 T5:T1 deleted A7 <- A1 G7:G1 A7 <- A1 T5:T1 new Github data G1 alias A1 G1 -> G7 A1:A7 G2 alias A2 G7 alias A7 G7 <- G1 A7:A1 Alias Change Transaction : T5 ● Delete edges ● Insert edges Legend Data WrittenData Read Notification
  • 37. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A7 T3 alias A3 T5 alias A7 T5 <- T1 A7:A1 Derived Graph A1 -> A5 T1:T5 deleted A1 -> A7 G1:G7 A1 -> A7 T1:T5 new A5 <- A1 T5:T1 deleted A7 <- A1 G7:G1 A7 <- A1 T5:T1 new Github data G1 alias A1 G1 -> G7 A1:A7 G2 alias A2 G7 alias A7 G7 <- G1 A7:A1 Alias Change Transaction : T5 ● Set notifications Legend Data WrittenData Read Notification
  • 38. Fluo Data Twitter data T1 alias A1 T1 -> T5 A1:A7 T3 alias A3 T5 alias A7 T5 <- T1 A7:A1 Derived Graph A1 -> A5 T1:T5 deleted A1 -> A7 G1:G7 A1 -> A7 T1:T5 new A5 <- A1 T5:T1 deleted A7 <- A1 G7:G1 A7 <- A1 T5:T1 new Github data G1 alias A1 G1 -> G7 A1:A7 G2 alias A2 G7 alias A7 G7 <- G1 A7:A1 Alias Change Transaction : T5 ● Commit Legend Data WrittenData Read Notification
  • 39. Legend Concurrent Aliases Change Twitter data (time 1) T1 alias A9 T1 -> T5 A1:A5 T3 alias A3 T5 alias A7 T5 <- T1 A5:A1 ● Alias for T1 and T2 both change. ● Starts two transactions. ● Collision : one fails, one succeeds. Twitter data (time2) T1 alias A9 T1 -> T5 A1:A7 T1 -> T5 A9:A5 T3 alias A3 T5 alias A7 T5 <- T1 A5:A9 T5 <- T1 A7:A1 Twitter data (time 0) T1 alias A1 T1 -> T5 A1:A5 T3 alias A3 T5 alias A5 T5 <- T1 A5:A1 Transaction 1 Changes Transaction 2 Changes
  • 41. Mixer prototype ● Supports add/remove of edges, aliases, and attributes. ● Exports changes to external query table. ○ Uses invert on export. ● Can lookup nodes in external query table ● Available soon on github ● Easy to run using MiniFluo and MiniAccumulo ○ Git clone ○ ./mixer.sh mini &> mini.log & ○ ./mixer.sh shell fluo.properties
  • 42.
  • 43. Derived graph in Fluo bob tw:bob99 g+:bobE gh:bob799 tw:alice95 loc=TX g+:joe8 gh:jojo gh:jeb fb:joe9 gh:eAdam gh:alice++ tz=CST Bob in query table updated by Fluo bob -> g+:joe8 followers=1,following=0,rawEdges=1 bob -> gh:jojo followers=1,following=1,rawEdges=1 bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1 Status up to here
  • 44. Derived graph in Fluo bob tw:bob99 g+:bobE gh:bob799 alice tw:alice95 loc=TX g+:joe8 gh:jojo gh:jeb fb:joe9 gh:eAdam gh:alice++ tz=CST Bob in query table updated by Fluo bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1 bob -> g+:joe8 followers=1,following=0,rawEdges=1 bob -> gh:jojo followers=1,following=1,rawEdges=1 bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1 Status up to here
  • 45. Derived graph in Fluo bob tw:bob99 g+:bobE gh:bob799 alice tw:alice95 loc=TX joe g+:joe8 gh:jojo gh:jeb fb:joe9 gh:eAdam gh:alice++ tz=CST Bob in query table updated by Fluo bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1 bob -> joe followers=1,following=2,rawEdges=2 bob -> g+:joe8 followers=1,following=0,rawEdges=1 bob -> gh:jojo followers=1,following=1,rawEdges=1 bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1 Status up to here
  • 46. Getting started with Fluo ● Fluo Tour ● Documentation on website ● Mailing list and IRC