IC05 cours 4

2,378 views

Published on

Mesure(s) de phénomènes dynamiques sur le web : Théorie(s), modèle(s), expérimentation(s), interfaces

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,378
On SlideShare
0
From Embeds
0
Number of Embeds
132
Actions
Shares
0
Downloads
40
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

IC05 cours 4

  1. 1. IC 05 / semestre printemps 2008 IC 05 / semestre printemps 2008 Franck.ghitalla Département TSH Président de WebAtlas [email_address] Mesure(s) de phénomènes dynamiques sur le web Théorie(s), modèle(s), expérimentation(s), interfaces
  2. 2. IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 1) Quelques repères bibliographiques
  3. 3. IC 05 / semestre printemps 2008 A.-L. Barabasi, Nature , 2005.
  4. 4. IC 05 / semestre printemps 2008 A.-L. Barabasi, Physics , 2005.
  5. 5. IC 05 / semestre printemps 2008 Kumar-Raghavan-Novak-Tomkins, WWW3 conference , 2003.
  6. 6. IC 05 / semestre printemps 2008 Beyond serving as online diaries, weblogs have evolved into a complex social structure, one which is in many ways ideal for the study of the propagation of information. As weblog authors discover and republish information, we are able to use the existing link structure of blogspace to track its flow. Where the path by which it spreads is ambiguous, we utilize a novel inference scheme that takes advantage of data describing historical, repeating patterns of "infection." Our paper describes this technique as well as a visualization system that allows for the graphical tracking of information flow. E. Adar, Lada A. Adamic, WebIntelligence Conference, 2005.
  7. 7. IC 05 / semestre printemps 2008 Abstract A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research eld can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise | that the appearance of a topic in a document stream is signaled by a urst of activity," with certain features rising sharply in frequency as the topic emerges. The goal of the present work is to develop a formal approach for modeling such bursts," in such a way that they can be robustly and eciently identied, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an innite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network trac. The resulting algorithms are highly ecient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them. J. Kleinberg, 8th ACM SIGKDD international conference on Knowledge discovery and data mining , 2002.
  8. 8. IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 2) Modéliser les phénomènes temporels sur le web
  9. 9. IC 05 / semestre printemps 2008 1 2 3 4 Articulation des TYPES de temporalité (information ON and IN the net) Topic Detection and Tracking ( TDT ) Dynamics of network ( patterns temporels ) Articulation des NIVEAUX de temporalité( Global / local dynamics) Modèle opérationnel Design du système(s) de mesure Production/vérification des hypothèses Optimisation/profiling des systèmes de capture et de traitement Question(s) sémiologique(s) de visualisation et le défi de la spatialisation de phénomènes temporels
  10. 10. IC 05 / semestre printemps 2008 2-1) Articulation des TYPES de temporalité (information ON and IN the net) Préoccupation contemporaine : téléphonie, cryptographie, norme Ipv6 et réseaux ad-hoc…et maintenant le web / à différentes échelles Extraire des structures signifiantes des flux d’informations / le champ de la TDT ( Topic Detection and Tracking ) / Un thème dans un courant de documents  : développement de l’activité autour du thème, puis retombée / Le temps comme ordre (principe d’ordonnancement) MAIS distinction à faire entre «  événement de structure  » (Network dynamics) et modèle propagatoire (épidémiologique et/ou viral) de la diffusion ou des flux Information IN and ON the Net IN and hypertext topology « Any local change in the network topology can be obtained through a combination of four elementary processes: addition and removal of a node and addition or removal of an edge. » / growth, preferential attachment as dynamic rules ON and information propagation Modèles de circulation virale / la topologie du réseau comme vecteur Épidémiologie, rumeur, diffusion de l’innovation
  11. 11. IC 05 / semestre printemps 2008 2-2) Articulation des NIVEAUX de temporalité, ( Global / local dynamics) Verrous théorique et technique : Temporalité propre des objets réseau / temporalité du phénomène étudié (détection de signal faible, mouvement de « fond », organisation d’acteurs…) / temporalité des mesures / modèles théoriques de l’Histoire Exemple : quand (et quoi) sonder? Avec quelle régularité pour quel résultats? Propriété méthodologique : cartographie = rendre statique du dynamique, mesure de phénomènes dynamiques : introduire du temps dans du statique / l’aller-retour statique-dynamique
  12. 12. IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) TOPIC DETECTION AND TRACKING « Time series » / queuing theory Data elements are a function of time : D = {(t 1 ,y 1 ),(t 2 ,y 2 ),…,(t n ,y n )} Théorie du Signal : (fréquence / amplitude ou intensité) appliqué au Text Mining Mesure à deux états (au plus simple) par rapport à un seuil Mesure à états multiples : choix du type d’indicateurs, définition des échelles TEMPORAL PATTERNS Equal / non-equal time steps linear (cycles) / non-linear patterns (but non chaotic)
  13. 13. IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) Hierarchical Structure and E-mail Streams all the mail I sent and received during this period, unltered by content but excluding long les. It contains 34344 messages in UNIX mailbox format, totaling 41.7 megabytes of ascii text, excluding message headers. Subsets of the collection can be chosen by selecting all messages that contain a particular string or set of strings; this can be viewed as an analogue of a folder" of related messages, although messages in the present case are related not because they were manually led together but because they are the response set to a particular query. To give a qualitative sense for the kind of structure one obtains, Figures 2 and 3 show the results of computing bursts for two dierent queries using the automaton A2. Figure 2 shows an analysis of the stream of all messages containing the word ITR," which is prominent in my e-mail because it is the name of a large National Science Foundation program for which my colleagues and I wrote two proposals in 1999-2000.
  14. 14. IC 05 / semestre printemps 2008 2-3) Topic Detection and Tracking ( TDT ) Text Mining
  15. 15. IC 05 / semestre printemps 2008 2-4) Dynamics of network ( patterns temporels ) L’inscription du temps dans les systèmes : temps « invisible et continu » du système / temporalité d’événements remarquables Emergence : the « first event » « The sudden jump in network property occurs at a « critical state ». In random network theory, this state is <K>=1. From a mostly disconnected state, the system evolves suddenly to a single connected component » <ul><li>Topology evolution (universal rules?) </li></ul><ul><li>Growth </li></ul><ul><li>Preferential attachment </li></ul>
  16. 16. IC 05 / semestre printemps 2008 2-4) Dynamics of network ( patterns temporels ) critical states / phase transition (facteur interne?) Équilibre? Feature of spontaneous order? Signal faible et prédictibilité Bibliothèque de cas et méthodes de repérage des courbes ascendantes/naissantes Mémoire et réseaux (réactivation potentielle des topologies/états critiques) Robustness/Vulnerability (facteur externe?) Error and Attack Tolerance / planed organisation and developpment? Ordered / random (crystal/liquid) Connected / fragmented (percolation) Synchronized / random-phased (lazer/light) Quels types/degrés de corrélation entre facteurs externes et phase transition? Mutations systémiques
  17. 17. IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… 3) Systèmes, interfaces, cas
  18. 18. IC 05 / semestre printemps 2008 Temporal patterns, Topic Detection and Tracking, network and human dynamics… Detect and validate properties of an unknown function f Temporal behavior of data elements When was something greatest/least? Is there a pattern? Are two series similar? Do any of the series match a pattern? Provide simpler, faster access to the series OBJECTIVES OF TIME SERIES VISUALIZATION(S) OR NETWORK EVOLUTION
  19. 19. IC 05 / semestre printemps 2008 Modéliser les propriétés topologiques (statiques) du domaine (cartographie) Distribuer les systèmes de mesure, traiter les données, assurer la visualisation des patterns Disposer de modèles prédictifs ou des scénarios évolutifs ( ce qui suppose de les avoir testés dans plusieurs cas) dans leur articulation à la cartographie Verrous théorique et technique : Bibliothèque de cas Exemple : la « grippe aviaire » comme phénomène informationnel stratégique Modèle opérationnel : Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques Un exemple en veille stratégique : la « grippe aviaire » Contexte : qui parle du H5N1 sur le web? En quels termes? La thémétique est-elle localisable sur le web? Par quels canaux et/ou relais d’opinion se propage l’information? Peut-on fournir des indicateurs a) de localisation b) de densité c) de propagation des informations associées à la thématique?
  20. 20. IC 05 / semestre printemps 2008 Mesure quantitative de « bruit » (type Tendançologue ) Analyse thématique quantitative et qualitative (contenu textuel) SYNTHESE Global/local (topologie, contenu), niveau de couches (haute/agrégats), phénomènes dynamiques/statiques
  21. 21. IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections Susan Havre, Elizabeth Hetzler, Paul Whitney, Lucy Nowell Interactive Visualization of Serial Periodic Data John Carlis, Joseph Konstan Visual Queries for Finding Patterns in Time Series Data Harry Hochheiser, Ben Shneiderman 3 exemples de systèmes
  22. 22. IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections River metaphor: Each attribute is mapped to a “ current ” in the “ river ”, flowing along the timeline Current width ~= strength of theme River width ~= global strength Color mapping (similar themes – same color family) Comparing two rivers
  23. 23. IC 05 / semestre printemps 2008 ThemeRiver: Visualizing Thematic Changes in Large Document Collections
  24. 24. IC 05 / semestre printemps 2008 Interactive Visualization of Serial Periodic Data Spiral axis = serial attributes Radii = periodic attributes Period = 360° Focus on pure serial periodic data (equal durations of cycles) Simultaneous display of serial and periodic attributes (e.g. seasonality) Traditional layouts exaggerate distance across period boundaries Focus+Context / Zoom unsuitable Chimpanzees Monthly food consumption 1980-1988
  25. 25. IC 05 / semestre printemps 2008 Interactive Visualization of Serial Periodic Data 12 common food types Consistent ordering Boundary lines Helpful ? 112 food types Muliple linked spirals: 2 chimpanzees group avg size / max size <ul><li>One data set at a time </li></ul><ul><li>One spoke at a time / animation </li></ul><ul><li>Dynamic query ( Movie database ) </li></ul>
  26. 26. IC 05 / semestre printemps 2008 Visual Queries for Finding Patterns in Time Series Data <ul><li>Visualization alone is not enough (when dealing with multiple entities, e.g. stocks/genes) </li></ul><ul><li>identifying patterns and trends </li></ul><ul><li>Algorithmic/statistical methods </li></ul><ul><li>Intuitive tools for dynamic queries (e.g. QuerySketch) </li></ul><ul><li>Visual query operator for time series (e.g. 1500 stocks) </li></ul><ul><li>Rectangular region drawn on the timeline display </li></ul><ul><li>X-axis of the box = time period </li></ul><ul><li>Y-axis of the box = constraint on the values </li></ul><ul><li>Multiple timeboxes = conjunctive queries </li></ul>
  27. 27. IC 05 / semestre printemps 2008 Visual Queries for Finding Patterns in Time Series Data <ul><li>Entity display window </li></ul><ul><li>Query space </li></ul><ul><li>Controlling multiple boxes together </li></ul><ul><li>Query by example </li></ul><ul><li>linked updates between views </li></ul>http://www.cs.umd.edu/hcil/timesearcher/
  28. 28. IC 05 / semestre printemps 2008 http://cdc25.biol.vt.edu/Pubs/TysonNR.pdf
  29. 29. IC 05 / semestre printemps 2008 IC 05 / semestre printemps 2008 Franck.ghitalla Département TSH Président de WebAtlas [email_address] Mesure(s) de phénomènes dynamiques sur le web Théorie(s), modèle(s), expérimentation(s), interfaces

×