### SlideShare for iOS

by Linkedin Corporation

FREE - On the App Store

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory ...

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory devices may suffer from faults, where some bits may arbitrarily flip and corrupt the values of the affected memory cells. The appearance of such faults may seriously compromise the correctness and performance of computations, and the larger is the memory usage the higher is the probability to incur into memory errors. In recent years, many algorithms for computing in the presence of memory faults have been introduced in the literature: in particular, an algorithm or a data structure is called resilient if it is able to work correctly on the set of uncorrupted values. This part will cover recent work on resilient algorithms and data structures.

- Total Views
- 1,611
- Views on SlideShare
- 659
- Embed Views

- Likes
- 0
- Downloads
- 5
- Comments
- 0

http://almada2013.ru | 952 |

Uploaded via SlideShare as Adobe PDF

© All Rights Reserved

- 1. Algorithms for BIG DATA: Graphs and Memory Errors Giuseppe F. Italiano Università di Roma “Tor Vergata” italiano@disp.uniroma2.it ALMADA, July-August 2013
- 2. Some advertising first School on Graph Theory, Algorithms & Applications Erice, Italy, September 8-‐‑16, 2014 Consider applying to the School!
- 3. BIG data NYT, Feb 11, 2012: The Age of Big Data • What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. … A lot of people are talking about big data, but most people are just creating it The real value is in the analysis
- 4. Why BIG data?
- 5. Why BIG data? In God we trust.
- 6. Why BIG data? In God we trust. All others must bring data. (attributed to W Edwards Deming)
- 7. And all others (we) are indeed bringing data! What happens in an Internet minute! Every two days now we create as much information as we did from the dawn of civilization up until 2003 (Eric Schmidt, Google CEO)
- 8. Latest News…
- 9. How do we view BIG data?
- 10. BIG data: Is it the size or the network? Big Data is notable not because of its size, but because of its relationality to other data. Due to efforts to mine and aggregate data, Big Data is fundamentally networked (threaded with connections) Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.
- 11. Networked BIG data? Not only social networks or Web graphs Recommendation systems: generate better song or movie suggestions (Pandora or Netflix), Data analytics, e.g., monitoring trending topics on Twitter Etc…
- 12. • Networks represent interaction among units. • In the case of social and economic networks, these units (nodes) are individuals or organizations. • At some broad level, the study of networks can encompass the study of all kinds of interactions. • Transportation • Communication • Social • Friendship / Trust / Trade / Credit and financial flows. • Information transmission • Web links / Information exchange / Diffusion of ideas and innovation. • Spread of epidemics. Networks
- 13. Da Demetrescu et al. McGraw Hill 2004 The network of loans among ﬁnancial institutions can be used to analyze the roles that different participants play in the ﬁnancial system, and how the interactions among these roles affect the health of individual participants and the system as a whole. (Bech and Atalay 2008)
- 14. Da Demetrescu et al. McGraw Hill 2004The bow-tie graph structure of the Web (Broder et al 2000)
- 15. Theory of network structure and behavior addressees simultaneous challenges deriving from • Economics: theories for strategic interaction among small numbers of parties, as well as for cumulative behavior of large, homogeneous populations. • Sociology: some fundamental insights into structure of social networks, but network methodology refined only in domains and scales where data-collection traditionally possible (well- defined groups with tens to hundreds of people). • Computer Science: with rise of Web and social media, dealt with design constraints on large computing systems which are not only technological but also human (complex feedback that human audiences create when humans collectively use Web for communication, self-expression, and creation of knowledge). Networks
- 16. Ability to work with massive network datasets enriched the picture: study networks with billions of interacting items at a level of resolution where each connection is recorded (this is exactly what an Internet search engine is doing!) Ongoing and challenging scientific problem to bridge these vastly different levels of scale, so that predictions and principles from one level can be reconciled with those of others. Networks
- 17. Need for data analytics Just one example (although important in many applications): “node centrality”: Degree of influence or importance of a node within the social domain under consideration One expects such importance to be reflected in the structure of the social network
- 18. Need for data analytics Just one example (although important in many applications): “node centrality”: Degree of influence or importance of a node within the social domain under consideration One expects such importance to be reflected in the structure of the social network How do we measure node centrality?
- 19. Da Demetrescu et al. McGraw Hill 2004
- 20. Da Demetrescu et al. McGraw Hill 2004 15th Century Florentine Marriages Data
- 21. Da Demetrescu et al. McGraw Hill 2004
- 22. Da Demetrescu et al. McGraw Hill 2004 The social network of friendships within a 34-person karate club provides clues to the fault lines that eventually split the club apart (Zachary, 1977)
- 23. Da Demetrescu et al. McGraw Hill 2004 23
- 24. Road networks, Point-‐to-‐point shortest paths: seconds (Dijkstra) à microseconds Rou:ng in transporta:on networks A. V. Goldberg. The hub labeling algorithm. SEA 2013.
- 25. n The world-‐wide web can be represented as a directed graph n Web search and crawl: traversal n Link analysis, ranking: Page rank and HITS n Document classiﬁca:on and clustering n Internet topologies (router networks) are naturally modeled as graphs Internet and the WWW
- 26. n Reorderings for sparse solvers n Fill reducing orderings n Par::oning, eigenvectors n Heavy diagonal to reduce pivo:ng (matching) n Data structures for eﬃcient exploita:on of sparsity n Deriva:ve computa:ons for op:miza:on n Matroids, graph colorings, spanning trees n Precondi:oning n Incomplete Factoriza:ons n Par::oning for domain decomposi:on n Graph techniques in algebraic mul:grid n Independent sets, matchings, etc. n Support Theory n Spanning trees & graph embedding techniques Scien:ﬁc Compu:ng B. Hendrickson, Graphs and HPC: Lessons for Future Architectures , hUp:// www.er.doe.gov/ascr/ascac/Mee:ngs/Oct08/Hendrickson%20ASCAC.pdf Image source: Yifan Hu, A gallery of large graphs Image source: Tim Davis, UF Sparse Matrix Collec:on.
- 27. n Graph abstrac:ons are very useful to analyze complex data sets. n Sources of data: petascale simula:ons, experimental devices, the Internet, sensor networks n Challenges: data size, heterogeneity, uncertainty, data quality Large-‐scale data analysis Astrophysics: massive datasets, temporal variations Bioinformatics: data quality, heterogeneity Social Informatics: new analytics challenges, data uncertainty Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg (2,3) www.visualComplexity.com
- 28. n Study of the interac:ons between various components in a biological system n Graph-‐theore:c formula:ons are pervasive: n Predic:ng new interac:ons: modeling n Func:onal annota:on of novel proteins: matching, clustering n Iden:fying metabolic pathways: paths, clustering n Iden:fying new protein complexes: clustering, centrality Data Analysis and Graph Algorithms in Systems Biology Image Source: Giot et al., A Protein Interaction Map of Drosophila melanogaster , Science 302, 1722-1736, 2003.
- 29. Image Source: Nexus (Facebook application) Graph–theore:c problems in social networks – Community identification: clustering – Targeted advertising: centrality – Information spreading: modeling
- 30. n [Krebs 04] Post 9/11 Terrorist Network Analysis from public domain informa:on n Plot masterminds correctly iden:ﬁed from interac:on paUerns: centrality n A global view of en::es is ofen more insighgul n Detect anomalous ac:vi:es by exact/approximate subgraph isomorphism. Image Source: http://www.orgnet.com/hijackers.html Network Analysis for Intelligence and Survelliance Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies for intelligence analysis, CACM, 47 (3, March 2004): pp 45-47
- 31. n Old (1990) British TV series was still popular n Films featuring Kevin Spacey had always done well n Movies directed by David Fincher, (“The Social Network”) had a healthy share Big Hits are now being informed by Big Data? Power of Recommendation Systems
- 32. Power of Recommendation Systems
- 33. We Need: 1. “Bigger machines” Two main ideas behind Google’s computing platform: • Google File System (GFS), way of distributing data across hundred/thousand inexpensive computers • MapReduce, breaks given job into smaller pieces, sends those tasks out to the different computers, then gathers the answers in one central node. Hadoop is an open source implementation Is this enough? MapReduce not designed to analyze data sets threaded with connections… Google’s Pregel system developed to work with graph structures, since MapReduce had fallen short.
- 34. Need for “Bigger machines” In God we trust. All others must bring data. (attributed to W Edwards Deming)
- 35. We Need: 2. Smarter algorithms Need more algorithms capable of turning “meaningless” numbers into actionable insights. Collecting large amounts of statistics and numbers bring little benefit if there is no layer of added algorithmic intelligence. Detect signals from large amounts of real, live data is much like rapidly fishing for needles in a haystack. It is like finding needles the moment they are dropped into the haystack… NSA knows about it!
- 36. E.g., Anomaly detection
- 37. We Need: 3. Faster algorithms “Progress in Algorithms Beats Moore’s Law” (from The White House advisory report 2010) Or, you cannot just throw HW at problems!: • Linear Programming: in 20 years, speed-ups quite evenly divided between algorithms and hardware improvements. • Sparse linear systems: in 25 years, 10^4 hardware, 10^6 algorithms. • The N-Body Problem: in 30 years, 10^7 hardware, 10^10 algorithms. Need staggering algorithmic advances for "big data"
- 38. Google or Bing Maps
- 39. Routing in Road Networks Typical road networks are huge: 10s of millions nodes and arcs Getting directions with classical shortest paths algorithms (Dijkstra) will require seconds That’s too slow! The algorithms have to run in milliseconds!
- 40. Routing in Road Networks Typical road networks are huge: 10s of millions nodes and arcs Getting directions with classical shortest paths algorithms (Dijkstra) will require seconds That’s too slow! The algorithms have to run in milliseconds! A. V. Goldberg. The hub labeling algorithm. SEA 2013.
- 41. We’ll focus on 3. Faster algorithms For graphs with m edges and n nodes, this means that the algorithms should run in linear time and space [O(m+n)] with low asymptotic constant Quadratic time and space is too much Constants do matter
- 42. A Methodological Break
- 43. In theory, theory and practice are the same. Theory
- 44. In practice, theory and practice are different... The real world out there…
- 45. Wish to combine theory and practice… Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Bridging the Gap between Theory and Practice? …i.e., nothing works and you don't know why.
- 46. Disclaimer
- 47. Disclaimer BIG data is like teenage sex
- 48. Disclaimer BIG data is like teenage sex Everyone talks about it
- 49. Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it
- 50. Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it
- 51. Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it So everyone claims they are doing it…
- 52. Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it So everyone claims they are doing it… And like sex, the ones getting the most are smart enough not to talk about it!
- 53. Outline of Lectures 1. Algorithms for BIG graphs • The centrality of centrality • How to store BIG Graphs (WebGraph Framework) • Four Degrees of Separation • Diameter and Radius 2. Big Data and Memory Errors

Full NameComment goes here.