What happened?

463 views

Published on

Presentation of final project for Semantic Web Technology course.

Published in: Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
463
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

What happened?

  1. 1. What happened? Martin Majlis
  2. 2. Outline <ul><li>Introduction
  3. 3. Architecture
  4. 4. Back-end </li><ul><li>Downloading
  5. 5. Extraction </li></ul><li>Front-end </li><ul><li>Web application
  6. 6. iGoogle Gadget </li></ul></ul>
  7. 7. Introduction <ul><li>Answer on questions: </li><ul><li>what happened on 3 January
  8. 8. what happened on 3 January 1865
  9. 9. what happened on January 1825
  10. 10. what happened from January until July 1985
  11. 11. what happened during the 16th century
  12. 12. what started on January 1930
  13. 13. what ended in 1990 </li></ul></ul>
  14. 14. Architecture <ul><li>Back-end </li><ul><li>Downloading
  15. 15. Structure Converting
  16. 16. Parsing </li></ul><li>Front-end </li><ul><li>Web application
  17. 17. iGoogle Gadget </li></ul></ul>
  18. 18. Build process <ul><li>Fully automatized
  19. 19. Target for each phase
  20. 20. Less error-prone
  21. 21. GNU Make </li></ul>
  22. 22. Data Source <ul><li>Czech Wikipedia </li><ul><li>Documented format
  23. 23. Dumps regularly generated
  24. 24. Cleaner than general texts </li></ul></ul>
  25. 25. Downloading / Conversion <ul><li>Downloading </li><ul><li>Script from DBPedia
  26. 26. Added traffic shaping </li></ul><li>Data Conversion </li><ul><li>Recognizing pages/categories
  27. 27. Building category “hierarchy” </li></ul></ul>
  28. 28. Categories <ul><li>Confusing Structure
  29. 29. Netherlands - 229 </li><ul><li>Physics, Planets, Illusions, Psychology, Literature, Organ, Neuroscience, etc. </li></ul><li>Maximal deep 5
  30. 30. Median : 31
  31. 31. Mean: 33.87 </li></ul>
  32. 32. Date Extraction – Regular Exp. <ul><li>Regular expressions aren't for parsing </li><ul><li>Day=(d+).; Month = (Jan|Feb|...); Year=(d+)
  33. 33. Date = (Day Month Year | Day Month | Month Year | Year)
  34. 34. Extract = (“from” Date “until” Date | Date “-” Date | “between” Date “and” Date | “from” Date) </li></ul><li>Day number can be on 14 positions
  35. 35. In real more than 1000 slots </li></ul>
  36. 36. Date Extraction - Tools <ul><li>Standard way: </li><ul><li>GNU Flex / GNU Bison
  37. 37. Ragel </li></ul><li>Problem with UTF-8 support </li><ul><li>Unicode – almost 100.000 characters
  38. 38. Big transition tables (100.000 vs 127) </li></ul></ul>
  39. 39. Date Extraction - Mixed <ul><li>Lexical Analysis </li><ul><li>Regular Expressions
  40. 40. Filling Table </li></ul><li>Syntactic Analysis </li><ul><li>Theoretically CFG
  41. 41. Practically again regular expressions </li></ul></ul>
  42. 42. Date Extraction - Example <ul><li>Lexical Analysis </li><ul><li>“From 23 January 1956 until 2 February 1960”
  43. 43. “From {{DATE_1}} until {{DATE_2}}” </li></ul><li>Syntactic Analysis </li><ul><li>Interval = “From” DATE “to” DATE
  44. 44. Interval = “Between” DATE “and” DATE </li></ul></ul>
  45. 45. Date Representation <ul><li>Dates from 10.000 BC to 2500 AC
  46. 46. Not exact: 13 th century, June 1689
  47. 47. Zero </li><ul><li>2 January - 5days = 28 December
  48. 48. 2 January 1AC -5days = 28 December 1BC </li></ul><li>Simple tuples </li><ul><li>(“I”, 23, 1, 1956, 20, 2, 2, 1960, 20) </li></ul></ul>
  49. 49. Web application <ul><li>PHP5 + MySQL
  50. 50. Nette Framework + Dibi
  51. 51. http://css.majlis.cz/ </li><ul><li>GT: http://jdem.cz/dspw9 </li></ul><li>HTML, JSON, XML output </li></ul>
  52. 52. iGoogle Gadget <ul><li>iGoogle = Google personalized homepage
  53. 53. URL: http://jdem.cz/dspx7
  54. 54. Using JSON
  55. 55. Tricky development </li></ul>
  56. 56. Future Work <ul><li>Improve performance </li><ul><li>20 th century events – 28s – 406.980 (one OR)
  57. 57. 20 th century events – 0.0007s – 392.573 (no OR) </li></ul><li>Improve parser architecture </li></ul>
  58. 58. Questions?
  59. 59. Thank You!

×