SlideShare a Scribd company logo
1 of 31
Graph Recommendations
Harry Powell & Raffael Strassnig
Advanced Data Analytics - Barclays
Customer values and recommendation engines
Understanding customer values is a similar problem to recommendation
– How is Tesco similar to Asda?
– How is Tesco in Bristol similar to Asda in Bath?
– How is Tesco in Bristol similar to Bristol Angling Centre?
Assume that the similarity of two businesses can be inferred by the extent
to which they share customers
The data: Customer transactions
Timestam
p
Customer Business Amount (£)
… Bob Smith Tesco, Bristol …
… Mary
Jones
Tesco, Bristol …
… Bob Smith Asda, Bath …
… John
Taylor
Bristol Angling Centre …
Transactional data can be seen as a bipartite
graph
Tesco
Asda BP
Boots
Timestam
p
CustomerI
D
MerchantName Amount (£)
… 1 Tesco …
… 1 Asda …
… 2 Boots …
… 2 BP …
… 3 Tesco …
… 3 Boots …
… 3 BP …
… 4 Asda …
Transactions imply customer preferences over
business values
Tesco

Price
Quality
Boots

Health
? Price ?
? Quality ?
Asda

Price
Customer
Business
Problems with conventional similarity metrics
Conventional recommenders (say Cosine similarity) are useless in
sparsely connected networks
𝐴𝑠𝑑𝑎 = 𝐀 = (1,1,1,0,1,1,0,0)
𝑇𝑒𝑠𝑐𝑜 = 𝐁 = 1,0,1,0,1,1,0,1
𝐀 ⋅ 𝐁 = 𝐀 𝐁 cos 𝜃
Customer Tesco Asda
Bob Smith Yes Yes
Mary Jones No Yes
John Taylor Yes Yes
Jane
Williams
No No
Gary Brown Yes Yes
Liz Davis Yes Yes
David Evans No No
Helen
Wilson
Yes No
Problems with conventional graph metrics
Conventional graph metrics (say minimum distance) are useless in
networks with significant hubs.
Inferring latent preferences from n-degree
separation
Tesco Boots Asda Lloyds
Pharmacy
Tesco
Asda
BP
Boots
Expected Degrees of Separation
Can factorise out to homogeneous graph
Tesco Asda BP BootsTesco
Asda BP
Boots
Tesco
Asda
BP
Boots
𝑝 𝐴 𝐵
Large number of paths
Each node has a few
thousand outgoing
edges
Destination
Business
a
d
aa
b c
bc d d
Markov factorisation
Factorise
Factorise
Factorise
Factorising graphs
Scalability problems with trees
Where 𝑛 = 1,000,000
Actually 𝑛 starts at a few thousand for low 𝑘, but quickly escalates
𝑛
𝑠𝑜𝑢𝑟𝑐𝑒
𝑛𝑜𝑑𝑒𝑠
× (𝑛 − 1)
𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛
𝑛𝑜𝑑𝑒𝑠
× 𝑘
𝑑𝑒𝑝𝑡ℎ
× (𝑛 − 1)
𝑛𝑜𝑑𝑒𝑠 𝑎𝑡
𝑒𝑎𝑐ℎ 𝑠𝑡𝑒𝑝
⇒ 𝑂 𝑛3 𝑘
Elimination of insignificant paths
Aggregate of all
nodes/paths below
threshold probability
Comparing PageRank and Asymptotic EDS
PageRank: stationary distribution non-absorbing (eigenvector)
EDS: we would like to know how far we need to go (on average) to hit B
starting from A
Comment on ergodicity and teleporting
 Teleporting solely mechanism to get around non-ergodic graphs
 We assume a connected graph (eliminate graphs that only have
customers that solely shop with them) natural for retail not for websites.
Absorbing transition matrix
Discrete phase type distribution
– Computes EDS for all sources to single destination
– Gives exact results
𝑃 =
𝑇 𝑡
0 1
, 𝐸𝐷𝑆 = 𝐼 − 𝑇 −1
Spark implementation of Absorbing Transition
Matrix
[Spark code]
Has unacceptably high complexity (𝑂(𝑛4)) due to inverting large matrix
Estimating EDS using path sampling
An alternative is a sampling approach which is fully distributable
– Complexity = 𝑂(𝑛 × 𝑘 × 𝑙)
– Converges to analytical solution
Cautions
– Shorter path length can have high variance (𝑙 < 4)
– Signal dilution (applies to exact solution too)
Spark implementation of sampling methodology
Convergence fails when paths too long
Results
Results
MCDONALDS PRIMARK STORES LTD
PRIMARK STORES LTD 90.8 MCDONALDS 53.7
LIDL 105.3 LIDL 105.0
BOOTS 121.2 BOOTS 120.1
ALDI 123.2 ALDI 122.8
MARKS&SPENCER PLC 126.9 MARKS&SPENCER
PLC
126.2
POST OFFICE COUNTER 128.2 POST OFFICE
COUNTER
128.2
GREGGS PLC 154.1 GREGGS PLC 154.2
Results vs Pagerank
Pagerank EDS
SEVERN RIVER CROSSIN
PLC
MCDONALDS
MCDONALDS PRIMARK STORES LDT
POST OFFICE COUNTER LIDL
LIDL BOOTS
PRIMARK STORES ALDI
ALDI MARKS&SPENCER PLC
MARTIN MCCOLL POST OFFICE COUNTER
MARKS&SPENCER PLC GREGGS PLC
Results
Results aren’t much use because of low differentiation between shops
Once you have escaped vicinity of a shop reverts to random walk.
Certain nodes connect everything (MCDONALDS, IKEA)
Theoretically interesting metric – but not insightful
Localised EDS
Localised EDS
𝑃 =
𝑇 𝑡
0 1
We use that 𝑃 𝑘 which yields the results for the k-neighbourhood
Set k according to the problem at hand (eg. 5, 10, 20, …)
Spark implementation of Localised EDS
Brute force GPU implementation of Localised
EDS
Brute force matrix multiplication using GPUs
Parallelisation can be achieved by destination node
Sort-of scales (𝑂(𝑘 . 𝑛3
) ) as long as matrix fits into memory
Can be faster than Spark
Compare results from EDS and Localised EDS
STARBUCKS BS16
Starbucks BS 35 1.51 McDonalds 54.0
KFC 2.04 Primark Stores 90.9
Krispy Creme 3.11 Lidl 105.9
The Old Fish Market Pub 4.8 Boots 120.2
IKEA 6.74 Marks & Spencer 126.3
Summary
• We used a probabilistic graph similarity metric to derive a richer
characterisation of customer behaviour
• We implemented a number of approaches to estimate it
• All had complexity/scalability challenges
• Use it yourselves and share what you find!
Further work
• Derive the statistical properties of the EDS
• Explore methods for approximation of matrix multiplication
Data Science Section
….coming soon!

More Related Content

Similar to Graph Recommendations

Chap 3. signal processing elemnt part three
Chap 3. signal processing elemnt part threeChap 3. signal processing elemnt part three
Chap 3. signal processing elemnt part three
YemaneBayray
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 

Similar to Graph Recommendations (20)

Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilled
 
analog to digital converter.ppt
analog to digital converter.pptanalog to digital converter.ppt
analog to digital converter.ppt
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
A High Speed Successive Approximation Pipelined ADC
A High Speed Successive Approximation Pipelined ADCA High Speed Successive Approximation Pipelined ADC
A High Speed Successive Approximation Pipelined ADC
 
A High Speed Successive Approximation Pipelined ADC.pdf
A High Speed Successive Approximation Pipelined ADC.pdfA High Speed Successive Approximation Pipelined ADC.pdf
A High Speed Successive Approximation Pipelined ADC.pdf
 
Adc f05
Adc f05Adc f05
Adc f05
 
Successive approximation adc
Successive approximation adcSuccessive approximation adc
Successive approximation adc
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Xxx treme aggregation
Xxx treme aggregationXxx treme aggregation
Xxx treme aggregation
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Harry Powell - "Reimagining Data"
Harry Powell - "Reimagining Data"Harry Powell - "Reimagining Data"
Harry Powell - "Reimagining Data"
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Chap 3. signal processing elemnt part three
Chap 3. signal processing elemnt part threeChap 3. signal processing elemnt part three
Chap 3. signal processing elemnt part three
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 

Recently uploaded (20)

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 

Graph Recommendations

  • 1. Graph Recommendations Harry Powell & Raffael Strassnig Advanced Data Analytics - Barclays
  • 2. Customer values and recommendation engines Understanding customer values is a similar problem to recommendation – How is Tesco similar to Asda? – How is Tesco in Bristol similar to Asda in Bath? – How is Tesco in Bristol similar to Bristol Angling Centre? Assume that the similarity of two businesses can be inferred by the extent to which they share customers
  • 3. The data: Customer transactions Timestam p Customer Business Amount (£) … Bob Smith Tesco, Bristol … … Mary Jones Tesco, Bristol … … Bob Smith Asda, Bath … … John Taylor Bristol Angling Centre …
  • 4. Transactional data can be seen as a bipartite graph Tesco Asda BP Boots Timestam p CustomerI D MerchantName Amount (£) … 1 Tesco … … 1 Asda … … 2 Boots … … 2 BP … … 3 Tesco … … 3 Boots … … 3 BP … … 4 Asda …
  • 5. Transactions imply customer preferences over business values Tesco  Price Quality Boots  Health ? Price ? ? Quality ? Asda  Price Customer Business
  • 6. Problems with conventional similarity metrics Conventional recommenders (say Cosine similarity) are useless in sparsely connected networks 𝐴𝑠𝑑𝑎 = 𝐀 = (1,1,1,0,1,1,0,0) 𝑇𝑒𝑠𝑐𝑜 = 𝐁 = 1,0,1,0,1,1,0,1 𝐀 ⋅ 𝐁 = 𝐀 𝐁 cos 𝜃 Customer Tesco Asda Bob Smith Yes Yes Mary Jones No Yes John Taylor Yes Yes Jane Williams No No Gary Brown Yes Yes Liz Davis Yes Yes David Evans No No Helen Wilson Yes No
  • 7. Problems with conventional graph metrics Conventional graph metrics (say minimum distance) are useless in networks with significant hubs.
  • 8. Inferring latent preferences from n-degree separation Tesco Boots Asda Lloyds Pharmacy
  • 10. Can factorise out to homogeneous graph Tesco Asda BP BootsTesco Asda BP Boots Tesco Asda BP Boots 𝑝 𝐴 𝐵
  • 11. Large number of paths Each node has a few thousand outgoing edges Destination Business a d aa b c bc d d
  • 14. Scalability problems with trees Where 𝑛 = 1,000,000 Actually 𝑛 starts at a few thousand for low 𝑘, but quickly escalates 𝑛 𝑠𝑜𝑢𝑟𝑐𝑒 𝑛𝑜𝑑𝑒𝑠 × (𝑛 − 1) 𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑛𝑜𝑑𝑒𝑠 × 𝑘 𝑑𝑒𝑝𝑡ℎ × (𝑛 − 1) 𝑛𝑜𝑑𝑒𝑠 𝑎𝑡 𝑒𝑎𝑐ℎ 𝑠𝑡𝑒𝑝 ⇒ 𝑂 𝑛3 𝑘
  • 15. Elimination of insignificant paths Aggregate of all nodes/paths below threshold probability
  • 16. Comparing PageRank and Asymptotic EDS PageRank: stationary distribution non-absorbing (eigenvector) EDS: we would like to know how far we need to go (on average) to hit B starting from A Comment on ergodicity and teleporting  Teleporting solely mechanism to get around non-ergodic graphs  We assume a connected graph (eliminate graphs that only have customers that solely shop with them) natural for retail not for websites.
  • 17. Absorbing transition matrix Discrete phase type distribution – Computes EDS for all sources to single destination – Gives exact results 𝑃 = 𝑇 𝑡 0 1 , 𝐸𝐷𝑆 = 𝐼 − 𝑇 −1
  • 18. Spark implementation of Absorbing Transition Matrix [Spark code] Has unacceptably high complexity (𝑂(𝑛4)) due to inverting large matrix
  • 19. Estimating EDS using path sampling An alternative is a sampling approach which is fully distributable – Complexity = 𝑂(𝑛 × 𝑘 × 𝑙) – Converges to analytical solution Cautions – Shorter path length can have high variance (𝑙 < 4) – Signal dilution (applies to exact solution too)
  • 20. Spark implementation of sampling methodology Convergence fails when paths too long
  • 22. Results MCDONALDS PRIMARK STORES LTD PRIMARK STORES LTD 90.8 MCDONALDS 53.7 LIDL 105.3 LIDL 105.0 BOOTS 121.2 BOOTS 120.1 ALDI 123.2 ALDI 122.8 MARKS&SPENCER PLC 126.9 MARKS&SPENCER PLC 126.2 POST OFFICE COUNTER 128.2 POST OFFICE COUNTER 128.2 GREGGS PLC 154.1 GREGGS PLC 154.2
  • 23. Results vs Pagerank Pagerank EDS SEVERN RIVER CROSSIN PLC MCDONALDS MCDONALDS PRIMARK STORES LDT POST OFFICE COUNTER LIDL LIDL BOOTS PRIMARK STORES ALDI ALDI MARKS&SPENCER PLC MARTIN MCCOLL POST OFFICE COUNTER MARKS&SPENCER PLC GREGGS PLC
  • 24. Results Results aren’t much use because of low differentiation between shops Once you have escaped vicinity of a shop reverts to random walk. Certain nodes connect everything (MCDONALDS, IKEA) Theoretically interesting metric – but not insightful
  • 26. Localised EDS 𝑃 = 𝑇 𝑡 0 1 We use that 𝑃 𝑘 which yields the results for the k-neighbourhood Set k according to the problem at hand (eg. 5, 10, 20, …)
  • 27. Spark implementation of Localised EDS
  • 28. Brute force GPU implementation of Localised EDS Brute force matrix multiplication using GPUs Parallelisation can be achieved by destination node Sort-of scales (𝑂(𝑘 . 𝑛3 ) ) as long as matrix fits into memory Can be faster than Spark
  • 29. Compare results from EDS and Localised EDS STARBUCKS BS16 Starbucks BS 35 1.51 McDonalds 54.0 KFC 2.04 Primark Stores 90.9 Krispy Creme 3.11 Lidl 105.9 The Old Fish Market Pub 4.8 Boots 120.2 IKEA 6.74 Marks & Spencer 126.3
  • 30. Summary • We used a probabilistic graph similarity metric to derive a richer characterisation of customer behaviour • We implemented a number of approaches to estimate it • All had complexity/scalability challenges • Use it yourselves and share what you find! Further work • Derive the statistical properties of the EDS • Explore methods for approximation of matrix multiplication

Editor's Notes

  1. Introduction to Barclays Advanced Data Analytics (ADA) ADA is a data science team which innovates, designs and builds applications that deliver, direct to customers, relevant analytical content that will help them make smart decisions to improve their lives. We are aim to make applications that will revolutionise the way Barclays relates to our customers. The long term vision is to give each of our customers the same level of engagement and support in planning their finances and their lives as they would have if they were billionaires.