SlideShare a Scribd company logo
Common failures to be avoided
 when we analyze Wikipedia
        public data


           Felipe Ortega
           GsyC/Libresoft

   Wikimania 2010, Gdańsk, Poland.
#1 Beware of special types of pages
●   Official page count includes articles with just one link.
    ●   You must consider if you need to filter out
        disambiguation pages.
●   Pay attention to redirects.
    ●   Sometimes people wonder how the number of
        pages in main namespace in the dump is so high.
●   Break down evolution trends by namespace.
    ●   Articles are very different from other pages.
    ●   Explore % of already existing talk pages.
    ●   Connections from user pages.
#2 Plan your hardware carefully
●   There are some general rules.
    ●   Parallelize as much as possible.
    ●   Buy more memory before buying more disk...
    ●   But take a look at your disk requirements.
        – It's very different when you can work on
          decompressed data, on the fly.
    ●   Hardware RAID is not always the best solution.
        –   RAID 10 in Linux can perform decently in many
            average studies.
3# Know your engines (DBs)
●   Correct configuration of DB engine is crucial.
    ●   You'll always fall short with standard configs.
    ●   Fine tune parameters according to your hardware.
    ●   Exploit memory as much as possible.
        – E.g. MEMORY engine in MySQL.
    ●   Avoid unnecessary backup...
    ●   But be sure that you have copies of relevant info
        elseware!
    ●   Think about your process:
        –   Read only vs. read-write.
4# Organize your code
●   Using a SCM is a must.
    ●   SVN, GIT.
●   Upload your code to public repository.
    ●   BerliOS, SourceForge, GitHub...
●   Document your code...
    ●   ...if you ever aspire to get interest from other
        developers.
●   Use consistent version numbers.
●   Test, test, test...
    ●   Include sanity checks and “toy tests”.
#5 Use the right “spell”
●   Target data is well defined:
    ●   XML
    ●   Big portions of plain text
    ●   Inter-wiki links and outlinks.
●   Some alternatives
    ●   CelementTree (high-speed parsing)
    ●   Python (modules/short scripts) or Java (big
        projects).
    ●   Perl (regexps).
    ●   Sed & awk
#6 Avoid reinventing the wheel
●   Consider to develop only if:
    ●   No available solution fits your needs.
        – Or you can only find proprietary/evaluation
          sofwtare.
    ●   Performance of other solutions is really bad
●   Example: pywikipediabot
    ●   Simple library to query Wikipedia API.
    ●   Solves many simple needs of
        researchers/programmers.
#7 Automate everything

●   Huge data repositories.
●   Even small samples are excessively time
    consuming if processed by hand.
●   You will start to concat individual processes.
●   You will save time for later executions.
●   Your study will be reproducible.
    ●   Updating results after several months
        becomes no-brainer solution.
#8 Extreme case of Murphy's Law

●   Always expect the worst possible case.
    ●   Many caveats in each implementation.
    ●   Countless particular cases.
    ●   It's not OK with just the “average
        solution”.
         – Standard algorithms may take much
           more than expected to finish the job.
#9 Not many graphical interfaces

●   Some good reasons for that
    ●  Difficult to automate
     ● Hard to display dynamic results in real-time.


     ● Almost impossible to compute all results in a

       reasonable time frame for huge data
       collections (e.g. English Wikipedia).
●   To the best of my knowledge, there are very
    few tools with graphical interfaces out there.
●   Is there a real need for that??
#10 Communication channels

●   Wikimedia-research-l
    ●   Mailing list about research on Wikimedia
        projects.
    ●   http://meta.wikimedia.org/wiki/Research
    ●   http://meta.wikimedia.org/wiki/Wikimedia_Research_Network
    ●
        http://acawiki.org/Home
●   Final comments
    ●   Need for consolidated info point, once for all

More Related Content

Similar to Caveats

Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)
Acquia
 
Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote
Ceph Community
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
Patrick McGarry
 
Go at Skroutz
Go at SkroutzGo at Skroutz
Go at Skroutz
AgisAnastasopoulos
 
LCE12: Intro Training: Upstreaming 101
LCE12: Intro Training: Upstreaming 101LCE12: Intro Training: Upstreaming 101
LCE12: Intro Training: Upstreaming 101
Linaro
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
Alkacon Software GmbH & Co. KG
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
ncoghlan_dev
 
2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire
Marko Mitranić
 
Create your library
Create your libraryCreate your library
Create your library
Laurence Chen
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with androidThe Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with android
Stanojko Markovik
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade in
Ceph Community
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
IBACZ
 
C4ainaction-Introduction to the Pyramid Web Framework
C4ainaction-Introduction to the Pyramid Web FrameworkC4ainaction-Introduction to the Pyramid Web Framework
C4ainaction-Introduction to the Pyramid Web Framework
Francis Addai
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
Sivaranjan Goswami
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
Dave Cross
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
Attila Balazs
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Community
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Community
 

Similar to Caveats (20)

Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)Become a Better Developer with Debugging Techniques for Drupal (and more!)
Become a Better Developer with Debugging Techniques for Drupal (and more!)
 
Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote
 
engage 2014 - JavaBlast
engage 2014 - JavaBlastengage 2014 - JavaBlast
engage 2014 - JavaBlast
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
 
Go at Skroutz
Go at SkroutzGo at Skroutz
Go at Skroutz
 
LCE12: Intro Training: Upstreaming 101
LCE12: Intro Training: Upstreaming 101LCE12: Intro Training: Upstreaming 101
LCE12: Intro Training: Upstreaming 101
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
 
Path dependent-development (PyCon India)
Path dependent-development (PyCon India)Path dependent-development (PyCon India)
Path dependent-development (PyCon India)
 
2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire2019 PHP Serbia - Boosting your performance with Blackfire
2019 PHP Serbia - Boosting your performance with Blackfire
 
Create your library
Create your libraryCreate your library
Create your library
 
The Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with androidThe Good, the Bad and the Ugly things to do with android
The Good, the Bad and the Ugly things to do with android
 
Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade in
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
C4ainaction-Introduction to the Pyramid Web Framework
C4ainaction-Introduction to the Pyramid Web FrameworkC4ainaction-Introduction to the Pyramid Web Framework
C4ainaction-Introduction to the Pyramid Web Framework
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)Path Dependent Development (PyCon AU)
Path Dependent Development (PyCon AU)
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
 

Recently uploaded

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 

Recently uploaded (20)

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 

Caveats

  • 1. Common failures to be avoided when we analyze Wikipedia public data Felipe Ortega GsyC/Libresoft Wikimania 2010, Gdańsk, Poland.
  • 2. #1 Beware of special types of pages ● Official page count includes articles with just one link. ● You must consider if you need to filter out disambiguation pages. ● Pay attention to redirects. ● Sometimes people wonder how the number of pages in main namespace in the dump is so high. ● Break down evolution trends by namespace. ● Articles are very different from other pages. ● Explore % of already existing talk pages. ● Connections from user pages.
  • 3. #2 Plan your hardware carefully ● There are some general rules. ● Parallelize as much as possible. ● Buy more memory before buying more disk... ● But take a look at your disk requirements. – It's very different when you can work on decompressed data, on the fly. ● Hardware RAID is not always the best solution. – RAID 10 in Linux can perform decently in many average studies.
  • 4. 3# Know your engines (DBs) ● Correct configuration of DB engine is crucial. ● You'll always fall short with standard configs. ● Fine tune parameters according to your hardware. ● Exploit memory as much as possible. – E.g. MEMORY engine in MySQL. ● Avoid unnecessary backup... ● But be sure that you have copies of relevant info elseware! ● Think about your process: – Read only vs. read-write.
  • 5. 4# Organize your code ● Using a SCM is a must. ● SVN, GIT. ● Upload your code to public repository. ● BerliOS, SourceForge, GitHub... ● Document your code... ● ...if you ever aspire to get interest from other developers. ● Use consistent version numbers. ● Test, test, test... ● Include sanity checks and “toy tests”.
  • 6. #5 Use the right “spell” ● Target data is well defined: ● XML ● Big portions of plain text ● Inter-wiki links and outlinks. ● Some alternatives ● CelementTree (high-speed parsing) ● Python (modules/short scripts) or Java (big projects). ● Perl (regexps). ● Sed & awk
  • 7. #6 Avoid reinventing the wheel ● Consider to develop only if: ● No available solution fits your needs. – Or you can only find proprietary/evaluation sofwtare. ● Performance of other solutions is really bad ● Example: pywikipediabot ● Simple library to query Wikipedia API. ● Solves many simple needs of researchers/programmers.
  • 8. #7 Automate everything ● Huge data repositories. ● Even small samples are excessively time consuming if processed by hand. ● You will start to concat individual processes. ● You will save time for later executions. ● Your study will be reproducible. ● Updating results after several months becomes no-brainer solution.
  • 9. #8 Extreme case of Murphy's Law ● Always expect the worst possible case. ● Many caveats in each implementation. ● Countless particular cases. ● It's not OK with just the “average solution”. – Standard algorithms may take much more than expected to finish the job.
  • 10. #9 Not many graphical interfaces ● Some good reasons for that ● Difficult to automate ● Hard to display dynamic results in real-time. ● Almost impossible to compute all results in a reasonable time frame for huge data collections (e.g. English Wikipedia). ● To the best of my knowledge, there are very few tools with graphical interfaces out there. ● Is there a real need for that??
  • 11. #10 Communication channels ● Wikimedia-research-l ● Mailing list about research on Wikimedia projects. ● http://meta.wikimedia.org/wiki/Research ● http://meta.wikimedia.org/wiki/Wikimedia_Research_Network ● http://acawiki.org/Home ● Final comments ● Need for consolidated info point, once for all