Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From econophysicsto networks to data science: Estonian network of payments

40 views

Published on

Stephanie Rendón de la Torre
Ph.D. TALLINN UNIVERSITY OF TECHNOLOGY / Sr. Data Scientist- Swedbank
January 16th, 2020

Published in: Economy & Finance
  • Be the first to comment

  • Be the first to like this

From econophysicsto networks to data science: Estonian network of payments

  1. 1. Stephanie Rendón de la Torre Ph.D. TALLINN UNIVERSITY OF TECHNOLOGY / Sr. Data Scientist- Swedbank January 16th, 2020 From econophysics to networks to data science: Estonian network of payments
  2. 2. © Swedbank Program • Introduction to econophysics • Introduction to complex networks • Objectives • Data • Results • Future and current work 2
  3. 3. Prisoner’s dilemma For starters…
  4. 4. Econophysics • Econophysics is an interdisciplinary research field, applying the methods of physics to economical problems, e.g. describing and understanding the market behaviour. • Application of physics to the field of economics and finance. • Basic tools: statistical physics • The term “econophysics” was coined by H. Eugene Stanley in 1995 in Kolkata. • Before the term ‘Econophysics’ was coined many people from different branches of science had worked and applied their knowledge in the field of economics leading to evolution of econophysics: 90’s boom! • Mutual attraction between Physics and Economics ○ The nbr of physicists working in economic problems has increased dramatically in the last 15-20 years
  5. 5. © Swedbank • 5 • Much better economical data now / genuine interest in complex systems. • Key year: 1973  currencies in traded financial markets. Daily $5.3trillionUSD 100 days of NYSE trading = 1 days in FOREX • Black-Scholes Model – Nobel Prize • 80s= electronic trading! = data! Lots of data! • Today: Big Data where data is the new oil •
  6. 6. © Swedbank • Money is a gas!
  7. 7. © Swedbank Distribution of money in a country • Imagine money was a gas… • Few people have a lot of money and a lot of people have little money • Boltzmann-Gibbs distribution fits most of the data • Income inequality is high • Build models to explain why is this happening … „There is a great temptation to consider the exchanges of money which occur in economic interaction as analogous to the exchanges of energy which occur in physical shocks between gas molecules.”…
  8. 8. © Swedbank Complex networks -But before… Complex problem or complicated problem?
  9. 9. © Swedbank Complexity • Quantity of information needed to specify a system • Quality: It is what makes the system complex and it has something to do with the ability to understand a system; it refers to the existence of emergent properties, which appear as a consequence of the interactions of the components of the system. • Complicated ≠ Complex
  10. 10. © Swedbank What is a system? • Set of entities that form a unified whole through their interactions. A system is defined in terms of its boundary, which determines the components that are or are not part of the system. • Complex system: Composed of many elements which may interact which each other. These interactions give rise to collective behaviors.
  11. 11. © Swedbank Social networks: Milgram’s experiment Milgram, Psych Today 2, 60 (1967) Dodds et al., Science 301, 827 (2003) “Six degrees of separation” Small world network I really do know everybody on the planet!
  12. 12. © Swedbank Why model networks? • Simpler representation of possibly very complex structures • Can gain insight into how networks form and how they grow • May allow mathematical derivation of certain properties • Can serve to “explain” certain properties observed in real networks • Can predict new properties or outcomes for networks that do not even exist • Can serve as benchmarks for evaluating real networks
  13. 13. © Swedbank Why is Network Analysis relevant? • Customers may influence other customers (their contacts) with their behaviour • Into which customers to focus and which to avoid • What can we give to customers (from the bank's perspective)? • What can we get from them? • Often ignored is the importance of the interactions a customer might have with other customer, and the inevitable influence their contacts have on one another.
  14. 14. © Swedbank Some applications for industry •Reduce churnincrease chales •Identify key customers-> Pinpoint influential customers •Additional marketing opportunities •Customer behaviors and pattern identification •Analyzing the spread of contagion (marketing “buzz” effect) •Targeting for offering  Offer a product to specific influential customers in terms of importance in the network of connections. •Offering products and services: is A uses product/service X, how likely is it to be taken up by B? •Antimoney laundering procedures - fraud
  15. 15. © Swedbank Network science • Network science research in finances and economics: Huge potential. • Big data era changes everything! • Network science is an active interdisciplinary research field, originated from mathematics branch: graph theory, extended into many directions.. • Complex networks can be biological, technological, economic, social… • With complex networks it is possible to describe the structure of systems that are suitable to be represented as graphs. • Networks play an important role in a wide range of economic and social phenomena. The use of techniques and methods from graph theory has permitted economic networks studies to expand the knowledge and give insights into fin/soc/eco phenomena. 15
  16. 16. © Swedbank Objectives • To study the structure (characteristics and dynamics) of the Estonian network of payments through analysis of different experiments that involve: – Global and local topology – Community detection – Fractal and multifractal properties • This research work presents an extensive study that contributes to the field of complex networks by adding empirical evidence with a new, unique and very interesting study case. • The first study on economic development of a country from a complex network approach, through payments data. • 16 Explore local, global, mesoscale structures by using known methodologies
  17. 17. © Swedbank Data • Obtained from Swedbank’s databases. • The data set is unique in its kind and very interesting: ~80% of EE bank transactions are executed through Swb system of payments; hence, this data set reasonably reproduces the structure of the EE economy and can be used as a proxy of it. • Domestic payments (company-to company electronic transactions) of 2014. • 16,613 nodes, 2,617,478 payment transactions, and 43,375 links. • Nodes = Estonian companies • Links = payments done between the companies. 17 Total companies analysed (N) 16,613 Total number of payments analysed 2,617,478 Total value of transactions 3,803,462,026 * Average value of transaction per customer 87,600 * Maximum value of a transaction 121,533 * Minimum value of a transaction (aggregated) 1,000 * Average volume of transaction per company 60 Maximum volume of transaction per company 24,859 Minimum volume of transaction per company 20 Network's characteristics *All money quantities are expressed in monetary units and not in real currencies in order to protect the confidentiality of the data set. The purpose of showing monetary units is to provide a notion of the proportions of quantities and not to show exact amounts of money. Two nodes are connected if there was money transaction
  18. 18. © Swedbank •
  19. 19. © Swedbank Complex networks • Links can have weights attached to them /directed or undirected • The simplest quantity observed in a network: degree. It measures how important is a node with respect to its nearest neighbors. The degree of a node is # of neighbours of that node and is defined as 19 ∑= = N s Nskp N NkP 1 ),,( 1 ),( ∑= k kkPk )( _ 𝑘𝑘𝑖𝑖 = � 𝑗𝑗∈𝜁𝜁 𝑖𝑖 𝑎𝑎𝑖𝑖𝑖𝑖 , the sum runs over the set 𝜁𝜁 𝑖𝑖 of neighbours of 𝑖𝑖. For example: 𝜁𝜁 𝑖𝑖 = �𝑗𝑗|𝑎𝑎𝑖𝑖𝑖𝑖 = }1 . In a directed network there are 2 characteristics of a node, # links that end at a node and # of links that start from the node. These quantities are known as the out-degree 𝑘𝑘 𝑜𝑜 and the in-degree 𝑘𝑘 𝑑𝑑 of a node, and are defined as 𝑘𝑘 𝑑𝑑 = � j∈ζ i 𝑎𝑎𝑖𝑖𝑖𝑖 𝑑𝑑 , 𝑘𝑘 𝑜𝑜 = � j∈ζ i 𝑎𝑎𝑖𝑖𝑖𝑖 𝑜𝑜 . • DD: Is the simplest statistical characteristic of a network; it characterizes only local properties of the network, even this info is sufficient to determine basic properties.
  20. 20. © Swedbank Complex networks • It is possible to categorize networks by the degree distributions of their tails. • Degree distributions of real-world networks are different when compared to random networks (RN). • RN commonly show PDD, while real networks might have long tails in the right part of the distribution with values that are far above the mean. • Measuring the tail of the distribution of the degree could be achieved by building a plot of the distribution function: • This type of distribution is called scale-free: No natural scale. 20 𝑃𝑃 ( 𝑘𝑘 ) ∼ 𝑘𝑘 𝛾𝛾
  21. 21. © Swedbank Adjacency matrix • Square matrix representation of the image of the network. • For a simple network with node set A, the adjacency matrix is a square |N| × |N| matrix such that its element Bij is one when there is a link from node i to node j, and zero when there is no link. • Undirected/directed/weighted/unweighted 21 ANxN 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 (𝑜𝑜𝑜𝑜𝑜𝑜 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑) asymmetric symmetric Translate everything into the MATRIX!
  22. 22. © Swedbank Results • This network has SF properties (DD, and statistical distributions of the community structures: size, overlap and membership distributions). • The power-law tail signals that the prob. of finding companies paying out very large quantities of $ is small. Moreover, while companies have absolute freedom in choosing how much $ to pay to their CP with whom they interact with, the overall system obeys a scaling law, which is a particular property of critical phenomena in highly interactive self- organized systems. • Small-world (7 degrees of separation). Low clustering coefficient (0.18) • Disassortative (resiliency high degree nodes connect with low degree nodes). 22 22 (a) degree distribution for the connectivity network of the Estonian network of payments. X axis is the number of 𝑘𝑘 degrees and Y axis is 𝑃𝑃 𝑘𝑘 . 𝑃𝑃 𝑘𝑘 ∼ 𝑘𝑘−2.4 . (b) out-degree distribution of the network, 𝑃𝑃 ( 𝑘𝑘 ) ∼ 𝑘𝑘−2 .39 . c) in-degree distribution 𝑃𝑃 𝑘𝑘 ∼ 𝑘𝑘−2 .49 . Node out-degree distribution by strength. 𝑃𝑃 ( 𝑘𝑘 ) ∼ 𝑘𝑘−2 .3 d) 2 displays the link weight distribution (volume:number of payments transacted). probability P(s) that a company has k outgoing links.
  23. 23. © Swedbank Results • Robustness tests: centralities and collective influencer nodes (Morone & Makse) – I found the nodes that prevent the network from breaking into disconnected components (Percolat. thresh.= 8%). – The most influential companies in the network are not necessarily those which have more economic activity. Only a small number of companies maintain the unity of the network. – SFN: Robustness of the network against random attacks but it also revealed its vulnerability to targeted attacks. – Financial systems show this pattern: System collapse when few nodes collapse. 23 Plots of the effect of the targeted and random removal of nodes from the network of payments. (a) The average shortest-path length < l > in the GCC plotted against the percentage of removed nodes. (b) The GCC plotted against the percentage of removed nodes. Continuous lines display the effect of the targeted removal of nodes and the dashed lines display the effect of the random removal of nodes. Pc are the percolation thresholds, for each case.
  24. 24. © Swedbank Results • First MFA of a complex network of payments where specific fractal and MF properties were studied • One can study a simplified vers. of the network (skeleton networks) and still capture general structure of the original network . • SkN had a slightly smaller FD than the original network, and they both were very similar: SkN preserves structure while simplifies complexity. • Fractal scaling analysis by estimating FD of the network and skeleton. Then, study MF behaviour by using a sandbox algorithm to calculate the spectrum of 𝐷𝐷 𝑞𝑞 and 𝜏𝜏 𝑞𝑞 . • 𝑁𝑁𝐵𝐵(𝑟𝑟𝐵𝐵)~𝑟𝑟 𝐵𝐵 −𝑑𝑑 𝐵𝐵 24Graph representation of the skeleton of the Estonian network of payments. Fractal scaling representation of the network. The original network (o) and the skeleton network (●). The straight line is included for guidance and has a slope of 2.3. 𝑑𝑑𝐵𝐵𝐵𝐵=2.32 𝑑𝑑𝐵𝐵𝐵𝐵= 2.39 Min # boxes to tile network Lateral size of the box (radius) FD FD is the absolute value of the slope of the linear fit
  25. 25. © Swedbank Results • Study general MF structure and explore statistical measures: sandbox algorithm. • Fixed size box counting algorithm, one of the most efficient and known for MFA adapted for networks. • I calculated the spectrum of the ME and GFD curves. Results: Estonian economy is MF. • Large values of 𝐷𝐷(𝑞𝑞) spectra and this means that the distribution of links is quite irregular in the network, suggesting there are hubs contrasting with other nodes holding few links. This structure could be relevant when specific critical events occur in the economy that could threaten the whole network. • MF of a complex network can be determined by the shape of 𝜏𝜏(𝑞𝑞) or 𝐷𝐷(𝑞𝑞) curves. If 𝜏𝜏(𝑞𝑞) is a straight line or 𝐷𝐷(𝑞𝑞) is a constant, then the network is monofractal; similarly if 𝐷𝐷(𝑞𝑞) or 𝜏𝜏(𝑞𝑞) have convex shapes, then the network is MF. • D(q) decreases sharply after q=-4. High densities around the hubs. (interesting feature) 25(a) Plot of mass exponents 𝜏𝜏(𝑞𝑞) as function of q. (b) Plot of generalized fractal dimensions 𝐷𝐷(𝑞𝑞) as function of q. Curves indicated by circles represent numerical estimations of the mass exponents and generalized fractal dimensions, respectively. few companies have the role of hubs, while the rest are just small participants
  26. 26. © Swedbank Results • Communities: Networks have sections in which the nodes are more densely connected to each other than to the rest of the nodes in the networks. Graph partioning process. • Locating communities allows an easier study/understanding of the network, and provides insights revealing relevant groups of nodes, creating meaningful classifications, discovering similarities, etc. • I studied the overlapping community structures by using the CPM. Output: features for predictive analytics, or targeted campaigns, segmentation models. 26 Visual representation of a section of the overlapping network of communities (Estonian network of payments). The circles (nodes) represent communities and the black lines between them represent shared nodes between communities.
  27. 27. © Swedbank Results 27 Cumulative distribution of community degrees d. Cumulative distribution function of the membership number 𝑚𝑚𝑖𝑖. Cumulative distribution function of the overlap size 𝑠𝑠𝑜𝑜.Cumulative community size distribution at different times t. 𝑚𝑚𝑖𝑖 − # comms to where the node 𝑖𝑖 belongs to Max 𝑚𝑚 =10 (a company can belong to maximum of 10 different communities) nodes that belong to many different comms is quite small, while nodes belonging to at least 1 is high. Prob of a comm to have a size higher or equal to s. Scaling tail is higher as t increases Many small comms coexisting with few large comms Network of overlapping comms: links- overlaps, Nodes= comms Comms degree: nbr of links Community degrees in the end of the tail: biggest customers Central part decays faster Observable curvature in log-log plot. No approximation method fitted the distribution. K max=63 The range in which comms overlap with each other . The overlap size = # of nodes that 2 comms share. 𝑃𝑃(𝑠𝑠𝑜𝑜) proportion of overlaps larger than 𝑠𝑠𝑜𝑜. The largest overlap size is 22, at s_o≥ 9 the #of overlapping nodes becomes small. GDP growth!
  28. 28. © Swedbank Further work • Community structure: investigate if the similarities in communities’ features amongst different complex networks arise randomly or if there are any unknown properties shared by all of them. • Predicting changes in a payment network through community detection analysis. Further applications: strengthening relationships between companies of the same community to improve the performance of the whole network, targeted marketing, identification of patterns between companies and tracking of suspicious activities. • Multifractality: Potential factors that drive the strength of the multifractal spectrum. Some applications: Studying the origin of such factors. • Patterns and the changes of the multifractal spectrum during financial crisis periods for risk pattern recognition purposes, using different probability measures • Building network models to forecast country money flows or potential industry growth trends based on transactions. 28
  29. 29. © Swedbank Current work in progress at the bank: data science topics, AI, Machine learning… • Propensity models- machine learning • Communities networks – identifying customers with similar needs (clustering) • Chat box: NLP, text analysis. • Payment classificators • Recommender systems • Cash flow predictions – tensor flow • Tools: open software- scala, python, pyspark, etc… 29
  30. 30. © Swedbank List of publications • Rendón de la Torre S., Kalda J., Kitt R., Engelbrecht J. (2016). On the topologic structure of economic complex networks: Empirical evidence from large scale payment network of Estonia. Chaos, Solitons & Fractals, 90, 18−27 DOI:10.1016/j.chaos.2016.01.018. • Rendón de la Torre S., Kalda J., Kitt R., Engelbrecht J. (2017). Fractal and multifractal analysis of complex networks: Estonian network of payments. The European Physical Journal B, 90. DOI: 10.1140/epjb/e2017- 80214-5. • Rendón de la Torre S., Kalda J. (2018) Review of structures and dynamics of economic complex networks: Large scale payment network of Estonia. In: Zengqjang C., Dehmer M., Emmert-Streib F., Shi Y. (eds.), Modern and interdisciplinary problems in network science. Taylor & Francis CRC Group, USA, 193-226 https://www.crcpress.com/Modern-and-Interdisciplinary-Problems-in-Network-Science-A-Translational/Chen- Dehmer-Emmert-Streib-Shi/p/book/9780815376583. • Rendón de la Torre S., Kalda J., Kitt R., Engelbrecht J. (2019) Detecting overlapping community structure: Estonian network of payments. Proceedings of the Estonian Academy of Sciences, 68(1) 79-88. DOI:10.3176/proc.2019.1.08 • Rendón de la Torre S., Kalda J., Kitt R. (2019) Specific statistical properties of the strength of links and nodes of the Estonian network of payments. Proceedings of the Estonian Academy of Sciences. Manuscript (in press). 30
  31. 31. © Swedbank Thank you! 31

×