• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Algorithms for Big Data: Graphs and Memory Errors 1 (Lecture by Giuseppe Italiano)
 

Algorithms for Big Data: Graphs and Memory Errors 1 (Lecture by Giuseppe Italiano)

on

  • 1,451 views

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory ...

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory devices may suffer from faults, where some bits may arbitrarily flip and corrupt the values of the affected memory cells. The appearance of such faults may seriously compromise the correctness and performance of computations, and the larger is the memory usage the higher is the probability to incur into memory errors. In recent years, many algorithms for computing in the presence of memory faults have been introduced in the literature: in particular, an algorithm or a data structure is called resilient if it is able to work correctly on the set of uncorrupted values. This part will cover recent work on resilient algorithms and data structures.

Statistics

Views

Total Views
1,451
Views on SlideShare
520
Embed Views
931

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 931

http://almada2013.ru 931

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Algorithms for Big Data: Graphs and Memory Errors 1 (Lecture by Giuseppe Italiano) Algorithms for Big Data: Graphs and Memory Errors 1 (Lecture by Giuseppe Italiano) Presentation Transcript

    • Algorithms for BIG DATA: Graphs and Memory Errors Giuseppe F. Italiano Università di Roma “Tor Vergata” italiano@disp.uniroma2.it ALMADA, July-August 2013
    • Some advertising first School  on    Graph  Theory,  Algorithms  &  Applications Erice,  Italy,  September  8-­‐‑16,  2014 Consider  applying  to  the  School!
    • BIG data NYT, Feb 11, 2012: The Age of Big Data •  What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. … A lot of people are talking about big data, but most people are just creating it The real value is in the analysis
    • Why BIG data?
    • Why BIG data? In God we trust.
    • Why BIG data? In God we trust. All others must bring data. (attributed to W Edwards Deming)
    • And all others (we) are indeed bringing data! What happens in an Internet minute! Every two days now we create as much information as we did from the dawn of civilization up until 2003 (Eric Schmidt, Google CEO)
    • Latest News…
    • How do we view BIG data?
    • BIG data: Is it the size or the network? Big Data is notable not because of its size, but because of its relationality to other data. Due to efforts to mine and aggregate data, Big Data is fundamentally networked (threaded with connections) Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself.
    • Networked BIG data? Not only social networks or Web graphs Recommendation systems: generate better song or movie suggestions (Pandora or Netflix), Data analytics, e.g., monitoring trending topics on Twitter Etc…
    • •  Networks represent interaction among units. •  In the case of social and economic networks, these units (nodes) are individuals or organizations. •  At some broad level, the study of networks can encompass the study of all kinds of interactions. •  Transportation •  Communication •  Social •  Friendship / Trust / Trade / Credit and financial flows. •  Information transmission •  Web links / Information exchange / Diffusion of ideas and innovation. •  Spread of epidemics. Networks
    • Da Demetrescu et al. McGraw Hill 2004 The network of loans among financial institutions can be used to analyze the roles that different participants play in the financial system, and how the interactions among these roles affect the health of individual participants and the system as a whole. (Bech and Atalay 2008)
    • Da Demetrescu et al. McGraw Hill 2004The bow-tie graph structure of the Web (Broder et al 2000)
    • Theory of network structure and behavior addressees simultaneous challenges deriving from •  Economics: theories for strategic interaction among small numbers of parties, as well as for cumulative behavior of large, homogeneous populations. •  Sociology: some fundamental insights into structure of social networks, but network methodology refined only in domains and scales where data-collection traditionally possible (well- defined groups with tens to hundreds of people). •  Computer Science: with rise of Web and social media, dealt with design constraints on large computing systems which are not only technological but also human (complex feedback that human audiences create when humans collectively use Web for communication, self-expression, and creation of knowledge). Networks
    • Ability to work with massive network datasets enriched the picture: study networks with billions of interacting items at a level of resolution where each connection is recorded (this is exactly what an Internet search engine is doing!) Ongoing and challenging scientific problem to bridge these vastly different levels of scale, so that predictions and principles from one level can be reconciled with those of others. Networks
    • Need for data analytics Just one example (although important in many applications): “node centrality”: Degree of influence or importance of a node within the social domain under consideration One expects such importance to be reflected in the structure of the social network
    • Need for data analytics Just one example (although important in many applications): “node centrality”: Degree of influence or importance of a node within the social domain under consideration One expects such importance to be reflected in the structure of the social network How do we measure node centrality?
    • Da Demetrescu et al. McGraw Hill 2004
    • Da Demetrescu et al. McGraw Hill 2004 15th Century Florentine Marriages Data
    • Da Demetrescu et al. McGraw Hill 2004
    • Da Demetrescu et al. McGraw Hill 2004 The social network of friendships within a 34-person karate club provides clues to the fault lines that eventually split the club apart (Zachary, 1977)
    • Da Demetrescu et al. McGraw Hill 2004 23
    • Road  networks,  Point-­‐to-­‐point  shortest  paths:  seconds  (Dijkstra)  à    microseconds   Rou:ng  in  transporta:on  networks   A. V. Goldberg. The hub labeling algorithm. SEA 2013.  
    • n  The  world-­‐wide  web  can  be  represented  as  a  directed  graph   n  Web  search  and  crawl:  traversal   n  Link  analysis,  ranking:  Page  rank  and  HITS   n  Document  classifica:on  and  clustering   n  Internet  topologies  (router  networks)  are  naturally  modeled   as  graphs   Internet  and  the  WWW  
    • n  Reorderings  for  sparse  solvers   n  Fill  reducing  orderings   n  Par::oning,  eigenvectors   n  Heavy  diagonal  to  reduce  pivo:ng  (matching)     n  Data  structures  for  efficient  exploita:on              of  sparsity   n  Deriva:ve  computa:ons  for  op:miza:on   n  Matroids,  graph  colorings,  spanning  trees   n  Precondi:oning   n  Incomplete  Factoriza:ons   n  Par::oning  for  domain  decomposi:on   n  Graph  techniques  in  algebraic  mul:grid   n  Independent  sets,  matchings,  etc.   n  Support  Theory   n  Spanning  trees  &  graph  embedding  techniques   Scien:fic  Compu:ng   B.  Hendrickson,   Graphs  and  HPC:  Lessons  for  Future  Architectures ,  hUp:// www.er.doe.gov/ascr/ascac/Mee:ngs/Oct08/Hendrickson%20ASCAC.pdf   Image  source:  Yifan  Hu,   A  gallery  of  large   graphs   Image  source:  Tim  Davis,  UF  Sparse  Matrix   Collec:on.  
    • n  Graph  abstrac:ons  are  very  useful  to  analyze  complex  data   sets.   n  Sources  of  data:  petascale  simula:ons,  experimental  devices,   the  Internet,  sensor  networks   n  Challenges:  data  size,  heterogeneity,  uncertainty,  data  quality   Large-­‐scale  data  analysis   Astrophysics: massive datasets, temporal variations Bioinformatics: data quality, heterogeneity Social Informatics: new analytics challenges, data uncertainty Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg (2,3) www.visualComplexity.com
    • n  Study  of  the  interac:ons  between     various  components  in  a   biological  system   n  Graph-­‐theore:c  formula:ons  are   pervasive:   n  Predic:ng  new  interac:ons:   modeling   n  Func:onal  annota:on  of  novel   proteins:  matching,  clustering   n  Iden:fying  metabolic  pathways:   paths,  clustering   n  Iden:fying  new  protein  complexes:   clustering,  centrality   Data  Analysis  and  Graph  Algorithms  in  Systems  Biology   Image Source: Giot et al., A Protein Interaction Map of Drosophila melanogaster , Science 302, 1722-1736, 2003.
    • Image Source: Nexus (Facebook application) Graph–theore:c  problems  in  social  networks   –  Community identification: clustering –  Targeted advertising: centrality –  Information spreading: modeling
    • n  [Krebs   04]  Post  9/11   Terrorist  Network  Analysis   from  public  domain   informa:on   n  Plot  masterminds  correctly   iden:fied  from  interac:on   paUerns:  centrality   n  A  global  view  of  en::es  is   ofen  more  insighgul   n  Detect  anomalous  ac:vi:es  by   exact/approximate  subgraph   isomorphism.   Image Source: http://www.orgnet.com/hijackers.html Network  Analysis  for  Intelligence  and  Survelliance   Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies for intelligence analysis, CACM, 47 (3, March 2004): pp 45-47
    • n  Old (1990) British TV series was still popular n  Films featuring Kevin Spacey had always done well n  Movies directed by David Fincher, (“The Social Network”) had a healthy share Big Hits are now being informed by Big Data? Power of Recommendation Systems
    • Power of Recommendation Systems
    • We Need: 1. “Bigger machines” Two main ideas behind Google’s computing platform: •  Google File System (GFS), way of distributing data across hundred/thousand inexpensive computers •  MapReduce, breaks given job into smaller pieces, sends those tasks out to the different computers, then gathers the answers in one central node. Hadoop is an open source implementation Is this enough? MapReduce not designed to analyze data sets threaded with connections… Google’s Pregel system developed to work with graph structures, since MapReduce had fallen short.
    • Need for “Bigger machines” In God we trust. All others must bring data. (attributed to W Edwards Deming)
    • We Need: 2. Smarter algorithms Need more algorithms capable of turning “meaningless” numbers into actionable insights. Collecting large amounts of statistics and numbers bring little benefit if there is no layer of added algorithmic intelligence. Detect signals from large amounts of real, live data is much like rapidly fishing for needles in a haystack. It is like finding needles the moment they are dropped into the haystack… NSA knows about it!
    • E.g., Anomaly detection
    • We Need: 3. Faster algorithms “Progress in Algorithms Beats Moore’s Law” (from The White House advisory report 2010) Or, you cannot just throw HW at problems!: •  Linear Programming: in 20 years, speed-ups quite evenly divided between algorithms and hardware improvements. •  Sparse linear systems: in 25 years, 10^4 hardware, 10^6 algorithms. •  The N-Body Problem: in 30 years, 10^7 hardware, 10^10 algorithms. Need staggering algorithmic advances for "big data"
    • Google or Bing Maps
    • Routing in Road Networks Typical road networks are huge: 10s of millions nodes and arcs Getting directions with classical shortest paths algorithms (Dijkstra) will require seconds That’s too slow! The algorithms have to run in milliseconds!
    • Routing in Road Networks Typical road networks are huge: 10s of millions nodes and arcs Getting directions with classical shortest paths algorithms (Dijkstra) will require seconds That’s too slow! The algorithms have to run in milliseconds! A. V. Goldberg. The hub labeling algorithm. SEA 2013.  
    • We’ll focus on 3. Faster algorithms For graphs with m edges and n nodes, this means that the algorithms should run in linear time and space [O(m+n)] with low asymptotic constant Quadratic time and space is too much Constants do matter
    • A Methodological Break
    • In theory, theory and practice are the same. Theory
    • In practice, theory and practice are different... The real world out there…
    • Wish to combine theory and practice… Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Bridging the Gap between Theory and Practice? …i.e., nothing works and you don't know why.
    • Disclaimer
    • Disclaimer BIG data is like teenage sex
    • Disclaimer BIG data is like teenage sex Everyone talks about it
    • Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it
    • Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it
    • Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it So everyone claims they are doing it…
    • Disclaimer BIG data is like teenage sex Everyone talks about it Nobody really knows how to do it Everyone thinks everyone else is doing it So everyone claims they are doing it… And like sex, the ones getting the most are smart enough not to talk about it!
    • Outline of Lectures 1.  Algorithms for BIG graphs •  The centrality of centrality •  How to store BIG Graphs (WebGraph Framework) •  Four Degrees of Separation •  Diameter and Radius 2.  Big Data and Memory Errors