Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

Tda presentation

1,905 views

Published on

Some notes on using Topological Data Analysis in general and for finance.

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

Tda presentation

1. 1. TOPOLOGICAL DATA ANALYSIS HJ vanVeen· Data Science· Nubank Brasil
2. 2. TOPOLOGY I • "When a truth is necessary, the reason for it can be found by analysis, that is, by resolving it into simpler ideas and truths until the primary ones are reached." - Leibniz
3. 3. TOPOLOGY II • Topology is the mathematical study of topological spaces. • Topology is interested in shapes, • More speciﬁcally: the concept of 'connectedness'
4. 4. TOPOLOGY III • A topologist is someone who does not see the difference between a coffee mug and a donut.
5. 5. HISTORY I • “Nothing at all takes place in the universe in which some rule of maximum or minimum does not appear.” - Euler • Seven Bridges of Koningsbrucke: devise a walk through the city that would cross each bridge once and only once.
6. 6. HISTORY II
7. 7. HISTORY III • Euler's big insights: • Doesn’t matter where you start walking, only matters which bridges you cross. • A similar solution should be found, regardless where you start your walk. • only the connectedness of bridges matter, • a solution should also apply to all other bridges that are connected in a similar fashion, no matter the distances between them.
8. 8. HISTORY IV • We now call these graph walks ‘Eulerian walks’ in Euler’s honor. • Euler's ﬁrst proven graph theory theorem: • 'Euler walks' are possible if exactly zero or two nodes have an odd number of edges.
9. 9. TDA I • TDA marries 300-year old maths with modern data analysis. • Captures the shape of data • Is invariant • Compresses large datasets • Functions well in the presence of noise / missing variables
10. 10. TDA II • Capturing the shape of data                              •Traditional techniques like clustering or dimensionality reduction have trouble capturing this shape.
11. 11. TDA III • Invariance.          • Euler showed that only connectedness matters.The size, position, or pose of an object doesn't change that object.
12. 12. TDA IV • Compression. • Compressed representations use   the order in data. • Only order can be compressed. • Random noise or slight variations   are ignored. • Lossy compression retains the most  important features. • "Now where there are no parts, there neither extension, nor shape, nor divisibility is possible. And these monads are the true atoms of nature and, in a word, the elements of things." - Leibniz
13. 13. MAPPER I • Mapper was created by Ayasdi Co-founder Gurjeet Singh during his PhD under Gunnar Carlsson. • Based on the idea of partial clustering of the data guided by a set of functions deﬁned on the data.
14. 14. MAPPER II • Mapper was inspired by the Reeb Graph.
15. 15. MAPPER III • Map the data with overlapping intervals. • Cluster the points inside the intervals • When clusters share data points draw an edge • Color nodes by function
16. 16. MAPPER IV
17. 17. MAPPERV Distance_to_median(row) x y z 1.5 1.5 1.5 1.5 1.5 -0.5 -0.5 -0.5 0 1 1 1 0 1 0.9 1.1 3 2 2 2 3 2.1 1.9 2 Y
18. 18. MAPPERVI • In conclusion:
19. 19. FUNCTIONS • Raw features or point-cloud axis / coordinates • Statistics: Mean, Max, Skewness, etc. • Mathematics: L2-norm, FourierTransform, etc. • Machine Learning: t-SNE, PCA, out-of-fold preds • Deep Learning: Layer activations, embeddings
20. 20. CLUSTER ALGO’S • DBSCAN / HDBSCAN: • Handles noise well. • No need to set number of clusters. • K-Means: • Creates visually nice simplicial complexes/graphs
21. 21. SOME GENERAL USE CASES • ComputerVision • Model and feature inspection • Computational Biology / Healthcare • Persistent Homology
22. 22. COMPUTERVISION • Demo
23. 23. MODEL AND FEATURE INSPECTION • Demo
24. 24. COMPUTATIONAL BIOLOGY • Example
25. 25. PERSISTENT HOMOLOGY • Example
26. 26. SOME FINANCE USE CASES • Customer Segmentation • Transactional Fraud • Accurate Interpretable Models • Exploration / Analysis
27. 27. CUSTOMER SEGMENTATION • Demo
28. 28. TRANSACTIONAL FRAUD • Example of spousal fraud
29. 29. ACCURATE INTERPRETABLE MODELS • Create: global linear model • Function: L2-norm • Color: Heatmap by ground truth and animate to out-of-fold model predictions • Identify: Low accuracy sub graphs • Select: Features that are most important for sub graphs • Create: Local linear models on sub graphs • Stack: DecisionTree • Compare: Divide-and-Conquer and LIME • DEMO
30. 30. EXPLORATION / ANALYSIS • Demo
31. 31. QUESTIONS?
32. 32. FURTHER READING • Google terms: • Ayasdi,Topological Data Analysis, Robert Ghrist, Gurjeet Singh, Gunnar Carlsson, Anthony Bak,Allison Gilmore, Simplicial Complex, Python Mapper. • Videos: • https://www.youtube.com/watch?v=4RNpuZydlKY • https://www.youtube.com/watch?v=x3Hl85OBuc0 • https://www.youtube.com/watch?v=cJ8W0ASsnp0 • https://www.youtube.com/watch?v=kctyag2Xi8o