Roberto Trasarti PhD Thesis

1,737 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,737
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
51
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Roberto Trasarti PhD Thesis

  1. 1. University of Pisa<br />Mastering the Spatio-Temporal Knowledge Discovery Process<br />PhD Candidate:Roberto Trasarti<br />PhD Thesis discussion<br />
  2. 2. Spatio-Temporal context<br />Research on moving-object data analysis has been recently fostered by the widespread diffusion of new techniques and systems for monitoring, collecting and storing location-aware data, generated by a wealth of technological infrastructures, such as:<br />Global Positioning System (GPS)<br />Global System for Mobile (GSM)<br />Sensor networks<br />
  3. 3. Knowledge Discovery Process<br />Knowledge discovery is a multi-step process, that involves data preprocessing, pattern mining stages and pattern post-processing.<br />
  4. 4. Motivations<br />Lack of a unifying framework, where mining tools are specific components of the knowledge discovery process.<br /> ?<br />Models<br />Data<br />Having elements from different worlds causes an impedence mismatch<br />
  5. 5. Related Works<br />In the literature there aren’t proposals addressing the problem of an uniform framework<br />There are approaches on Moving Objects Database such as Secondo and Hermes which provide some primitives.<br />The thesis work has been inspired by well known literature works on the inductive database vision<br />
  6. 6. The proposed Framework<br />A conceptual framework that poses the basis of the proposed data mining query language and the developed system, the Two-Worlds model.<br />This thesis proposes:<br /><ul><li>A uniform way to represent the worlds entities: data and models
  7. 7. A set of operators between the two-worlds</li></li></ul><li>The object relational database paradigm<br />Database: D = {S1...Sn}<br />Schema: Sj = {T1...Tm} <br />Table: Ti = <a1...ah><br />Attribute: ar A<br />Attribute types: A = {Numerical, Categorical, Descriptive, Object}<br />Numerical: the types which describe a number with its precision.<br />Categorical: representing a value in a pre-defined set and format.<br />Descriptive: any string of characters.<br />Object: a complex type which can contain other attributes, lists and methods<br />
  8. 8. Object representation of Data and Models<br />Using the object relational paradigm we represent data and models as objects<br />The set of attribute types A can be partitioned in three subset : AsAd Am<br />Ad<br />Data Types<br />Data World<br />Spatial objectTemporal object<br />Moving object<br />AmModelstypes<br />Model World<br />T-Pattern objectsCluster object Flock object<br />Object<br />Type<br />
  9. 9. Data Types<br />y<br />Spatial objectis an object which has a geometric shape and a position in space.<br />Temporal objectis an object which has an absolute temporal reference and a duration.<br />Moving objectis an object which changesin time and space. <br />x<br />t<br />y<br />t<br />x<br />
  10. 10. Data-World<br />The D-World represents the entities to be analyzed, as well as their properties and mutual relationships. <br />Intuitively the D-World is the set of entities which describe the trajectory dataset and/or a set of regions and/or a partition of the day. <br />The D-World is a set of tables defined only by attributes in Ad and As<br />
  11. 11. Models Types<br />T-Pattern is a concise description of frequent behaviors, in terms of both space and time<br />Clusteris a the spatio-temrporal affinitybetween a set of moving objectsw.r.t. a distance function.<br />Flockis the spatio-temporal coincidence between a set of moving objectswho move togheter.<br />RegionA<br />RegionC<br />RegionB<br />10 min<br />5 min<br />
  12. 12. Model-World<br />The M-World contains all the movement patterns extracted from the data with their properties and relationships. <br />The M-World contains the collection of models, unveiled at the different stages of the knowledge discovery process.<br />The M-World is a set of tables defined only by attributes in Am and As<br />
  13. 13. Two-Worlds Operators<br />Operators can be intra-world or inter-world and for each type different classes of operators have been defined.<br />
  14. 14. The aim of this class of operators is to build objects in D-World starting from the raw data.<br />It realizes the data acquisition step of the knowledge discovery process. <br />Generic Data Constructor operator is defined as OPconstructor(T,p)  Td <br />Data Constructor Operators<br />
  15. 15. This kind operatorsrealizes the extractionof models from the D-World through data mining algorithms.<br />Generic Model Constructor operator is defined as OPmining(Td,p)  Tm<br />Model Constructor Operators<br />
  16. 16. Transformation operators are intra-world tasks aimed at manipulating data and models <br />These operations are the means for expressing data pre-processing and post-processing tasks.<br />Generic D-Transformation operator is defined as OPD-Transf(Td,p) T’d<br />Generic M-Transformation operator is defined as OPM-Transf(Tm,p) T’m<br />Transformation Operators<br />
  17. 17. Relation operatorsinclude both intra-worldand inter-world operations and have the objective of creating relations between data, models, and the combination of the two.<br />Generic DD-Relation operator is defined as OPDD-Relation (Tdd,f ) TRdd<br />Generic MM-Relation operator is defined as OPMM-Relation (Tmm,f ) TRmm<br />Generic DM-Relation operator is defined as OPDM-Relation (Tdm,f ) TRdm<br />Relation Operators<br />
  18. 18. The predicate f can assume a large variety of predicates. However, the semantics of these predicates depends on the type of the data (resp.model) objects to which they are applied.<br />Predicates of relation operators<br />DD<br />DM<br />MM<br />
  19. 19. Data Mining Query Language<br />We defined a data mining query language to support the user during knowledge discovery tasks. <br />Three advantages:<br /><ul><li>The compositionality of the operators
  20. 20. The iterative querying
  21. 21. The repeatability of the process</li></li></ul><li>DMQL Grammar<br />DMQL:= DataConstructionOperator| ModelConstructionOperator| TransformationOperator| RelationOperator|SQLStandard<br />TransformationOperator:=<br /> ’CREATE TRANSFORMATION‘ TableName ’USING’ TransformationName<br /> ’FROM(’SqlCall’)<br /> [’SET’Parameters]<br />RelationOperator:=<br /> ’CREATE RELATION’ TableName ’USING’ RelationPredicate<br /> ’FROM(’SqlCall’)’<br />DataConstructionOperator:=<br /> ’CREATE DATA’ TableName <br /> ’BUILDING’ DataConstructorName<br /> ’FROM(’SqlCall’)’<br /> [’SET’Parameters]<br />ModelConstructionOperator:=<br /> ’CREATE MODELS’ TableName ’USING’ ModelConstructorName<br /> ’FROM(’SqlCall’)’<br /> [’SET’Parameters]<br />
  22. 22. The Design of the GeoPKDD system<br />The GeoPKDD system is an implementation of the Two-Worlds model and the Data Mining Query Language.<br />
  23. 23. Object Realtional Database and Database Manager<br />As described above the object relational database contains both data and models and grants the power of SQL. It contains the representation of data and models.<br />The database manager realizes a middle layer and using the translation libraries detaches the system from the database techonologies<br />
  24. 24. Language Parser and Controller<br />Identifies the various types of queries and builds a plan of execution of them as sequence of actions for the controller.<br />Example:<br />CREATE MODELS ClusteringTable USING OPTICSFROM (Select t.id, t.trajobj fromTrajectories t)SET OPTICS.distance_method = Route Similarity AND OPTICS.eps = 50 AND OPTICS.min_size = 100<br />Plan:<br />Retrieve[ Select t.id, t.trajobj from Trajectories t ] <br />Translate[ Data type: Moving point ]<br />Execute[ Mining algorithm: Optics algorithm, Parameters: ... ]<br />Translate[ Model type: Cluster ]<br />Store[ Table Name: ClusteringTable ]<br />
  25. 25. Algorithms Manager<br />This component is a plug-in module capable of managing different sets of libraries<br />Each library realizes a different sets of operators according to the Two-World framework proposed.<br />
  26. 26. Algorithms Libraries<br />Data construction library<br />Moving object Reconstruction algorithm<br />Spatial object Builder algotirhm<br />Termporal object Builder algoritm<br />Model construction library<br />T-Pattern algorithm<br />Optics algorithm<br />T-Flock algorithm<br />Transformation library<br />Resampling algorithm<br />Intersection algoritm<br />Object filtering<br />T-Anonimity algorithms<br />Relation Library<br />All the predicates<br />CREATE DATA MobilityData BUILDING MOVING_POINTSFROM (SELECT userid,lon,lat,datetime FROM MobilityRawData ORDER BY userid,datetime) SET MOVING_POINT.MAX_SPACE_GAP = 2000m AND MOVING_POINT.MAX_TIME_GAP = 1800 sec<br /> CREATE MODELS Patterns USING T-PATTERNFROM (Select t.id, t.trajobj from Trajectories t) SET T-PATTERN.support = .02 AND T-PATTERN.time = 120 sec<br /> CREATE TRANSFORMATION AnonimizedData USING NWA<br /> FROM (SELECT t.id, t.trajobj FROM Trajectories t)<br /> SET ANONYMIZATION.K = 10 AND<br /> ANONYMIZATION.TIME_SLOT = 600 sec<br /> CREATE RELATION EntailmentTable USING ENTIAL<br /> FROM (SELECT t.id, t.trajobj, p.id, p.obj FROM Trajectories t, Patterns p) <br />
  27. 27. Extending the system<br />The GeoPKDD system provides various way to be extended:<br /><ul><li>Architecture level: new components
  28. 28. Algorithm level: new algrorithms
  29. 29. Types level: new data types or model types</li></li></ul><li>Add-ons: Reasoning component<br />This component exploits application domain knowledge encoded in an ontology to infer a semantic interpretation of discovered patterns.<br />SELECT id, trajobj<br />FROM Trajectories t<br />WHERE SEM_CONCEPT(trajobj) = 'TouristTrajectory'<br />
  30. 30. Add-ons: Location Prediction<br />The goal is to constructs a predictive model using the set of T-patterns extracted on a set of trajectories.<br />Given a new trajectory the predictive model can be used to predict the next location of it.<br />Prediction Tree<br />Local patterns<br />Trajectory dataset<br />CREATE TRANSFORMATION TPatternTree USING TPATTERN_TREE<br />FROM( Select p.id, p.TpatternObj FROM PatternTable p )<br />
  31. 31. Add-ons: K-Best Map Matching<br />A new way to perform the Map Matching<br />The shortest path assumption in real cases can be violated in situations where other external factors play a role (i.e. Traffic congestion)<br />CREATE DATA K-MobilityData BUILDING K-MOVING_POINTS<br />FROM( SELECT userid, lon, lat, datetime FROM MobilityRawData ORDER BY userid, datetime)<br />SET K-MOVING_POINTS.K = 5 AND<br /> K-MOVING_POINTS.MAP = StreetMapFile.wkt<br />
  32. 32. A Case Study in a Urban Mobility Scenario<br />A set of experiments performed on a real world case study, demonstrating the capabilities of the GeoPKDD system and how this can be exploited to extract useful knowledge from raw mobility data. <br /><ul><li>GPS traces
  33. 33. 17K private cars
  34. 34. One week of ordinary mobility
  35. 35. 200K trips (trajectories)
  36. 36. Milan, Italy</li></ul>Data donated by<br />
  37. 37. Demo<br />GeoPKDD system<br />Equipped with a very simple GUI which enables the user to write down DMQL queries and visualize the results<br />M-Atlas<br />The new generation of the GUI where the DMQL is used to build complex analysis creating scripts.<br />
  38. 38. Contributions<br />The contributions of the thesis are:<br /><ul><li>the creation of a theoretical framework in order to manage the complex Knowledge discovery process on mobility data
  39. 39. the definition of a DMQL which realizes the operators of the framework
  40. 40. the implementation of a real system capable of handling large amount of data
  41. 41. three extensions of the system: reasoning component, k-best map matching and location prediction algorithms
  42. 42. An extensive study and analysis on a real case of study </li></li></ul><li>Achievements<br />The GeoPKDD system was one of the two project demonstrators and has been successfully presented in the final review of the GeoPKDD project.<br />Presented at the European parliament as one the selected project in the Future and Emerging Technologies (FET) program<br />Published in several conferences such as KDD, ICDM, EDBT, AGILE, etc.<br />It is used in the collaboration with the Milan Mobility Agency for mobility understanding<br />It is currently used in collaboration with Orange Telecom for the “Big Paris” project<br />
  43. 43. Publications<br />
  44. 44. Thank you<br />Questions?<br />

×