–
2
•
•
•
•
•
•
•
•
•
•
•
•
–
3
–
4
HTML
–
•
•
–
•
–
•
•
•
–
–
•
–-
•Apriori
–
–
–
5
–
Apriori
6
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
itemse...
–
•
–
–
•
–
–
•
•
•
–
•SPARQL
–
7
pd:cygri
Richard Cyganiak
dbpedia:Berlin
foaf:name
foaf:based_near
foaf:Person
rdf:type
...
–
8

–

–
–
•–-
–
9

–
Victoria NebotRafael Berlanga
•Finding association rules in semantic web data
•
–
–
–
–Q = (Target Concept,Context Conce...
–
Victoria NebotRafael Berlanga
11
–
Victoria NebotRafael Berlanga
12
CREATE MINING MODEL <Dataset Path>
{
?patient RESOURCE TARGET
?drug RESOURCE
?jadi LITE...
–
Victoria NebotRafael Berlanga
13
TID Subject Aggregation Path Object Feature
1 PTN_XY21 (VISIT1,RHEX1) Malformation Dise...
–
Ziawasch AbedjanFelix Naumann
•Context and Target Configurations for Mining RDF Data
•
–
•
•
•
–
–
14
–
Venkata Narasimha et al
•LiDDM: A Data Mining System for Linked Data
•
–
•SPARQL
•
–
–
•
•
•
15
–
16
–
17
List of ItemsTid
A,B,ET001
B,DT002
B,CT003
A,B,DT004
A,CT005
B,CT006
A,BT007
A,B,C,ET008
A,B,CT009
–
18
List of ItemsTid
a,d,eT001
b,a,f,g,hT002
b,a,d,fT003
b,a,cT004
a,d,g,kT005
b,d,g,c,iT006
b,d,g,r,jT007
–
•
–
–
•
–
•
–
–
•
–
19
–
•
–
–
–
–
20
(SPARQL)
(Linked Data)
MinConf, MinSup
–
21

–
22
Entity ID
Relation ID
Relation ID
……
InputEntities
Relations
Source Entities List
Source Entities List
Is Large

Rel...
–
SWApriori
•
–
–
–NodeInfo
–
–
•
–NodeInfo
–
–Source Entities List
–
–
–
23
–
24
–
25
,2-Itemset
–
26
,
,
, ,
,
,
,
2-Itemset
3-Itemset
Association Rules
–
SWApriori
27
1. Algorithm 1. Mining association rules from semantic web data
2. SWApriori(DS, MinSup, MinConf)
3. Input:...
–
SWApriori
28
–
Generate2LargeItemset
29
–
GenerateRules
30
–
31
–
32
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P, O
S, P,...
–
•
–
–
•
–
•
–
•
–
•
–
–
–
33
–
•
–
•SPARQL
–
•HTML
–
•
–
•
–
•
–owl:sameAs
34
–
•
–
•
•DBPedia
–
–
–Wikipedia
•Factbook
–
•Freebase
–
–
35
–
DBPedia
•
–
–
–
–
–
–
•
–
–
–
–
–
–
36
–
Factbook
•
–
–
–
–
–
–
•
–
–
–
–
–
–
37
–
Freebase
•
–
–
–
–
–
–
•
–
–
–
–
–
–
38
–
•
–DBPedia
•DBPedia
–SPARQL
–
•Factbook
–
•Freebase
–
39
SELECT * {
?Subject rdf:type <http://dbpedia.org/ontology/Count...
–
•
–Factbook
–DBPedia
–Freebase
•
–
–
–
–
–
–
40
–
Factbook
41
2-Large Itemset
NaN
–
Factbook
42



















–
43
2-Large Itemset
NaN
NaN
–
•
–
•
–
44
FreebaseFactBookDBPedia








–
•DBPedia
–
•Factbook
–
•Freebase
–
45
Freebas
e
FactBookDBPedia







–
•
–
–
–
–
•
–
–
–
–
–
–
–
–
46
–
•
–
–
–
–
–
•
–
–
–
–
–
–
–
47
–
48
•
•
•
•
•

•
•
•
–
49
[1] T. C. Corporation, Introduction to Data Mining and Knowledge Discovery
[2] T. I. R.Agrawal, A.N.Swami, "Mining as...
–
50
[14] F. V. H. D.Fensel, I.Horrocks, D.L.McGuinness, P.F.Patel-Schneider, "OIL: An Ontology Infrastructure for the
Sem...
–
51
[26] N.Lavraˇc, "Using Ontologies in Semantic Data Mining with SEGS and g-SEGS," presented at the Slovenian
Ministry ...
–
52
[38] J. h. G.Gosta, "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," presented at the IEEE
TRANSACTIONS ...
–
53
 R.Ramezani, M.H.Saraee, M.A.Nematbakhsh. “A New approach to mining
Association Rules from Semantic Web data”. Submi...
–
54
–
55
Finding Association Rules in Linked Data
Upcoming SlideShare
Loading in …5
×

Finding Association Rules in Linked Data

677 views

Published on

My M.Sc. thesis in Isfahan University of Technology, Fall 2012

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
677
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Finding Association Rules in Linked Data

  1. 1. – 2 • • • • • • • • • • • •
  2. 2. – 3
  3. 3. – 4 HTML
  4. 4. – • • – • – • • • – – • –- •Apriori – – – 5
  5. 5. – Apriori 6 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 Scan D C1 L1 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 C2 C2 Scan D C3 L3itemset {2 3 5} Scan D itemset sup {2 3 5} 2
  6. 6. – • – – • – – • • • – •SPARQL – 7 pd:cygri Richard Cyganiak dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin
  7. 7. – 8  –  – – •–-
  8. 8. – 9 
  9. 9. – Victoria NebotRafael Berlanga •Finding association rules in semantic web data • – – – –Q = (Target Concept,Context Concept,Features) •Target Concept- •Context ConceptTID •Features – • 10
  10. 10. – Victoria NebotRafael Berlanga 11
  11. 11. – Victoria NebotRafael Berlanga 12 CREATE MINING MODEL <Dataset Path> { ?patient RESOURCE TARGET ?drug RESOURCE ?jadi LITERAL ?disease RESOURCE PREDICT ?report RESOURCE CONTEXT } WHERE { ?patient rdf:type Patient. ?drug rdf:type Drug. ?disease rdf:type Disease. ?report rdf:type Report. ?report damageIndex ?jadi. }
  12. 12. – Victoria NebotRafael Berlanga 13 TID Subject Aggregation Path Object Feature 1 PTN_XY21 (VISIT1,RHEX1) Malformation Disease 1 PTN_XY21 (VISIT1,RHEX1) RHEX1 2 PTN_XY21 (VISIT1,TREAT1) Methotrexate Drug 3 PTN_XY21 (VISIT2,RHEX2) Malformation Disease 3 PTN_XY21 (VISIT2,RHEX2) RHEX2 3 PTN_XY21 (VISIT2,RHEX2) Bad rotation Disease 4 PTN_XY21 (VISIT2,TREAT2) Methotrexate Drug 4 PTN_XY21 (VISIT2,TREAT2) Corticosteroids Drug TID Object 1 {Malformation, RHEX1} 2 {Methotrexate} 3 {Malformation, RHEX2, Bad rotation} 4 {Methotrexate, Corticosteroids} Subject Aggregation Path Object Feature PTN_XY21 (VISIT1,RHEX1,ULTRA1) Malformation Disease PTN_XY21 (VISIT1,RHEX1) RHEX1 PTN_XY21 (VISIT1,TREAT1,DT1) Methotrexate Drug PTN_XY21 (VISIT2,RHEX2,ULTRA2) Malformation Disease PTN_XY21 (VISIT2,RHEX2) RHEX2 PTN_XY21 (VISIT2,RHEX2,ULTRA3) Bad rotation Disease PTN_XY21 (VISIT2,TREAT2,DT2) Methotrexate Drug PTN_XY21 (VISIT2,TREAT2,DT3) Corticosteroids Drug … … … …
  13. 13. – Ziawasch AbedjanFelix Naumann •Context and Target Configurations for Mining RDF Data • – • • • – – 14
  14. 14. – Venkata Narasimha et al •LiDDM: A Data Mining System for Linked Data • – •SPARQL • – – • • • 15
  15. 15. – 16
  16. 16. – 17 List of ItemsTid A,B,ET001 B,DT002 B,CT003 A,B,DT004 A,CT005 B,CT006 A,BT007 A,B,C,ET008 A,B,CT009
  17. 17. – 18 List of ItemsTid a,d,eT001 b,a,f,g,hT002 b,a,d,fT003 b,a,cT004 a,d,g,kT005 b,d,g,c,iT006 b,d,g,r,jT007
  18. 18. – • – – • – • – – • – 19
  19. 19. – • – – – – 20 (SPARQL) (Linked Data) MinConf, MinSup
  20. 20. – 21 
  21. 21. – 22 Entity ID Relation ID Relation ID …… InputEntities Relations Source Entities List Source Entities List Is Large  Relation IDNode ID Item Item 1 Item 2 Item n… Itemset ConfidenceItemItem 1 Item 2 Item n… , , Support Rule NodeInfo
  22. 22. – SWApriori • – – –NodeInfo – – • –NodeInfo – –Source Entities List – – – 23
  23. 23. – 24
  24. 24. – 25 ,2-Itemset
  25. 25. – 26 , , , , , , , 2-Itemset 3-Itemset Association Rules
  26. 26. – SWApriori 27 1. Algorithm 1. Mining association rules from semantic web data 2. SWApriori(DS, MinSup, MinConf) 3. Input: 4. DS: Dataset that consists triples (Subject, Predicate, and Object) 5. MinSup: Minimum support 6. MinConf: Minimum confidence 7. Output: 8. AllFIs: Large itemsets 9. Rules: Association rules 10. Variables: 11. FIs, Candidates: List of Itemsets 12. IS, IS1, IS2, IS3: Itemset (multiple items) 13. NodeInfoList: List of NodeInfo 14. Begin 15. Traverse triples and discretize objects 16. Delete triples which their subject, predicate or object has frequency less than MinSup 17. Convert input dataset's data to numerical values 18. Store converted data into NodeInfo instances 19. NodeInfoList = NodeInfo instances
  27. 27. – SWApriori 28
  28. 28. – Generate2LargeItemset 29
  29. 29. – GenerateRules 30
  30. 30. – 31
  31. 31. – 32 S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O S, P, O DS1 DS2 DS3 DS1/Iran owl:Population xsd:int 75,000,000 DS1/Iran ont:Border DS1/Afghanistan DS1/Iran ont:West DS2/Iraq DS1/Iran owl:sameAs DS2/Iran DS1/Iran owl:sameAs DS3/Xr.36O77z ont:West
  32. 32. – • – – • – • – • – • – – – 33
  33. 33. – • – •SPARQL – •HTML – • – • – • –owl:sameAs 34
  34. 34. – • – • •DBPedia – – –Wikipedia •Factbook – •Freebase – – 35
  35. 35. – DBPedia • – – – – – – • – – – – – – 36
  36. 36. – Factbook • – – – – – – • – – – – – – 37
  37. 37. – Freebase • – – – – – – • – – – – – – 38
  38. 38. – • –DBPedia •DBPedia –SPARQL – •Factbook – •Freebase – 39 SELECT * { ?Subject rdf:type <http://dbpedia.org/ontology/Country> . ?Subject ?Predicate ?Object } ORDER BY ?Subject http://dbpedia.org/resource/[CountryName] SELECT ?Subject ?Predicate ?Object WHERE { ?Subject ?Predicate ?Object } ORDER BY ?Subject http://rdf.freebase.com/ns/m.03shp  http://rdf.freebase.com/rdf/en/[CountryName]
  39. 39. – • –Factbook –DBPedia –Freebase • – – – – – – 40
  40. 40. – Factbook 41 2-Large Itemset NaN
  41. 41. – Factbook 42                   
  42. 42. – 43 2-Large Itemset NaN NaN
  43. 43. – • – • – 44 FreebaseFactBookDBPedia        
  44. 44. – •DBPedia – •Factbook – •Freebase – 45 Freebas e FactBookDBPedia       
  45. 45. – • – – – – • – – – – – – – – 46
  46. 46. – • – – – – – • – – – – – – – 47
  47. 47. – 48 • • • • •  • • •
  48. 48. – 49 [1] T. C. Corporation, Introduction to Data Mining and Knowledge Discovery [2] T. I. R.Agrawal, A.N.Swami, "Mining association rules between sets of items in large databases," SIGMOD, pp. 207- 216, 1993. [3] R. B. V.Nebot, "Finding association rules in semantic web data.," Knowledge-Based Systems, pp. 51-62, 2012. [4] J. W. Seifert, Data Mining: An Overview, December 2004. [5] D. J. HAND, Data Mining: Statistics and More?, December 2002. [6] S. L. Eamonn Keogh, Chotirat Ann Ratanamahatana Towards Parameter-Free Data Mining, September 2005. [7] R. S. R.Agrawal, "Fast algorithms for mining association rules," presented at the In Proceeding of 20th international conference in large databases, 1994. [8] A. Ale-Ahmad. (2006). Introduction to Semantic Web. [9] F. V. H. Grigoris Antoniou, A Semantic Web Primer, 2004. [10] T. Gruber, "Toward principles for the design of ontologies used for knowledge sharing," Human–Computer Studies, pp. 907-928, 1995. [11] W. K. N. Zehua Liu, Ee-Peng Lim, Feifei Li, "Towards Building Logical Views of Websites," Data & Knowledge Engineering, vol. 49, pp. 197-222, 2004. [12] K. H. Veltman, "Challenges for a Semantic Web," presented at the Proceedings of the International Workshop on the Semantic Web 2002, 2002. [13] T. M. Haibo Yu, Makoto Amamiya, "An architecture for personal semantic web information retrieval system," presented at the WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web, 2005.
  49. 49. – 50 [14] F. V. H. D.Fensel, I.Horrocks, D.L.McGuinness, P.F.Patel-Schneider, "OIL: An Ontology Infrastructure for the Semantic Web," IEEE Intelligent Systems, vol. 18, 2001. [15] W3C. (2009-10-27). OWL 2 Web Ontology Language Document Overview, http://www.w3.org/TR/owl2-overview/. [16] J. Rapoza. (2006). SPARQL Will Make the Web Shine, http://www.eweek.com/c/a/Application- Development/SPARQL-Will-Make-the-Web-Shine. [17] J. L. C.Bizer, G.Kobilarov, S.Auer, C.Becker, R.Cyganiak, S.Hellmann, "DBpedia - A crystallization point for theWeb of Data," Web Semantics, pp. 154-165, 2009. [18] T. H. C.Bizer, T.Berners-Lee, "Linked data - the story so far," International Journal on Semantic Web and Information Systems, pp. 1-22, 2009. [19] Linked Open Data Project, http://linkeddata.org/. [20] N. G.-P. J.M.Benitez, F.Herrera, "Special issue on "New Trends in Data Mining" NTDM," Knowledge-Based Systems, pp. 1-2, 2012. [21] H. W. J.Zhang, Y.Sun, "Discovering Associations among Semantic Links.IEEE," presented at the International Conference on Web Information Systems and Mining, 2009. [22] Y. S. S.Bloehdorn, "Kernel methods for mining instance data in ontologies," ISWC/ASWC, LNCS, pp. 58-71, 2007. [23] C. d. A. N.Fanizzi, F.Esposito, "Metric-based stochastic conceptual," Information Systems, pp. 792-806, 2009. [24] L.Getoor, "Link mining: a new data mining challenge," presented at the SIGKDD Explorations, 2003. [25] A. H. G.Stumme, B.Berendt, "Semantic web mining: state of the art and future directions," Sci. Services Agents World Wide Web 4, pp. 124-143, 2006.
  50. 50. – 51 [26] N.Lavraˇc, "Using Ontologies in Semantic Data Mining with SEGS and g-SEGS," presented at the Slovenian Ministry of Higher Education, Science and Technology. [27] L. D. R. S.Muggleton, "Inductive logic programming: theory and methods," J.Log. Program, pp. 629-679, 1994. [28] K. Z. X.Liu, W.Pedrycz, "An improved association rules mining method," Expert Systems, pp. 1362-1374, 2012. [29] J. M. V. V.Pachón Álvarez, "An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization," Expert Systems with Applications, pp. 585-593, 2012. [30] e. a. D. Kontokostas, "Internationalization of Linked Data: The case of the Greek DBpedia edition," Web Semantics: Sci.Serv. Agents World Wide Web, 2012. [31] F. N. Z.Abedjan, "Context and Target Configurations for Mining RDF Data," presented at the ACM, 2011. [32] M. J. K.J.Kochut, "SPARQLeR: Extended Sparql for Semantic Association Discovery," presented at the ESWC2007, 2007. [33] R. I. V.Narasimha, O.P.Vyas, "LiDDM: A Data Mining System for Linked Data," presented at the LDOW2011, Hyderabad, India, 2009. [34] G. K. M.Kuramochi, "Frequent Subgraph Discovery," presented at the International Conference on Data Mining (ICDM), 2001. [35] V. T. Vivek Tiwari, S.Gupta, R.Tiwari, "Association Rule Mining: A Graph Based Approach for Mining Frequent Itemsets," presented at the International Conference on Networking and Information Technology, 2010. [36] S. N. Y.Chi, R.R. Muntz, J.N.Kok, "Frequent Subtree Mining - An Overview," Fundamenta Informations, 2001. [37] A. H. T. T.Jiang, "Mining RDF Metadata for Generalized Association Rules: Knowledge Discovery in the Semantic Web Era," presented at the WWW2006, 2006.
  51. 51. – 52 [38] J. h. G.Gosta, "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," presented at the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005. [39] M. S. Yannis Kalfoglou, "Ontology mapping: the state of the art," The Knowledge Engineering Review, 2003. [40] I.-Y. S. Namyoun Choi, Hyoil Han, "A Survey on Ontology Mapping," presented at the ACM SIGMOD, 2006. [41] DBPedia. Community. (2012). http://dbpedia.org. [42] H. B. Raymond Kosala, "Web mining research: a survey," presented at the ACM SIGKDD, 2000. [43] K. S. Reddy, "Understanding the scope of web usage mining & applications of web data usage patterns," presented at the Computing, Communication and Applications (ICCCA), 2012. [44] B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), 2011. [45] K. W. B.C.M.Fung, M.Ester, "Hierarchical document clustering using frequent itemsets," presented at the 2003, Proceedings of the Third SIAM International Conference on Data Mining, SIAM, 2003.
  52. 52. – 53  R.Ramezani, M.H.Saraee, M.A.Nematbakhsh. “A New approach to mining Association Rules from Semantic Web data”. Submitted to International Journal of Semantic Web and Information Systems (IJSWIS)
  53. 53. – 54
  54. 54. – 55

×