Project groupPUSHPIN  Supporting Scholarly Awarenessin Publications and Social Networks          University of Paderborn  ...
CLASSIC RESEARCH                     +WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS                     +  NEW METHODS AND METH...
GOALS OF THE PROJECT             GROUP• Data     Mining in scientific publications• Who’s     writing about what? Who’s wri...
RESULTS OF PG KNOWAAN• Java-basedbackend that allows automatically analysis of publications (metadata extraction, text ana...
CO-AUTHOR NETWORKS    Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
LOCATION OF AUTHORS    Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
WORD CLOUDS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
BIBLIOMETRIC NETWORKS     Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
20. OCTOBER 2011      4:45PM
GINKGO• conference   management tool + social network• Goal:  • checksubmitted publications for plagiarized content, topic...
GENERAL FRAMEWORK
PEOPLE• Prof. Johannes   Magenheim• Wolfgang   Reinhardt• Tobias Varlemann
GOALS OF A PROJECT GROUP• self-organization   to the greatest extent• systematic    assignment of roles and responsibiliti...
TIMEFRAME• 18.10.2011    - 31.10.2012 (54 weeks)• 30   ECTS = 900 hours of work (approx. 17h / week)• Seminar    phase unt...
REQUIREMENTS• active   participation  • check    UPB mails at least daily  • good     communication skills,  • team    wor...
TOOLS• SVN    and Trac                                         #pgpushpin• Blog• Twitter   (if you like)• Mendeley     for...
SEMINAR PHASE
SEMINAR PHASE• each   one of you works on one topic  • theoretical   framework, applications, prototypes• regular   meetin...
TOPICS FOR THESEMINAR PHASE
1.HTML5 and Javascript                9.Distributed computing with  Frameworks                            Hadoop 22.Visual...
ALL TOPICS ARE FOCUSEDON SCHOLARLY OUTPUT E.G. SCIENTIFIC PAPERS,       RESEARCHER   COLLABORATION
HTML5 AND JAVASCRIPT         FRAMEWORKS• development      of sustainable web applications (responsive design)• current   a...
VISUAL ANALYTICS• information    / scientific visualization that allow reasoning• visual   analytics and their application ...
AGILE SOFTWARE DEVELOP.        IN SMALL TEAMS• agile     software development and project management in small teams• appli...
TREND DETECTION AND    VISUALIZATION & SEARCH• trend   spotting and visualization & forecasting• which   topics are gainin...
TEXT PROCESSING• PDF   text extraction (get rid of headers and footers)• Part-of-speech   detection, lemmatizing text, ste...
METADATA EXTRACTION      FROM RESEARCH PAPERS• How   to best extract metadata from research papers?• Parscit   and others ...
TEXT SIMILARITIES• Vector   Space Model & Term Document Matrix• LSA   / LSI with SVD• methods   for calculation text-based...
DISTRIBUTED COMPUTING       WITH HADOOP 1• MapReduce• Hadoop• HBase• HDFS• usage   of Apache Mahout for prototypes
DISTRIBUTED COMPUTING        WITH HADOOP 2• MapReduce• Hadoop• Hive   Data Warehousing• Job   Orchestration (e.g. with Zoo...
DEVELOPING MULTITOUCH     TABLE APPLICATIONS• http://www.youtube.com/watch?v=f1X5ffRrde8• C#   and .Net 4.0, Visual Studio...
CLUSTERING OF TEXT             DOCUMENTS• Methods   for analyzing large collections of texts • k-means, single-link, full-...
PLAGIARISM DETECTION• How    to detect potentially plagiarized content?• Ethical   discussion on (self-)plagiarism• text  ...
SOCIAL NETWORK ANALYSIS• Social   Network Theory• measures     from SNA  • existing   examples of research applications• b...
FACETED SEARCH &                INTERFACE EVAL• Best   practices and design recommendations• frameworks     for developmen...
BROWSER-BASED VISUALIZ.      OF LARGE NETWORKS• level   of detail• WebGL, web          workers• Gephi• visualize   propert...
SCIENTIFIC RECOMMENDER            SYSTEMS• state   of the art  • item-based             and collaborative filtering / hybri...
NEXT STEPS
NEXT STEPS• vote   for three topics until Wednesday, 8pm  • mail   with favorite topic, 2nd and 3rd place  • decision   on...
wolfgang reinhardt  university of paderborn                                                social media               snat...
Upcoming SlideShare
Loading in...5
×

1st meeting of PG PUSHPIN

2,054

Published on

Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,054
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

1st meeting of PG PUSHPIN

  1. 1. Project groupPUSHPIN Supporting Scholarly Awarenessin Publications and Social Networks University of Paderborn Computer Science Education Group Wolfgang Reinhardt
  2. 2. CLASSIC RESEARCH +WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS + NEW METHODS AND METHODOLOGIES = RESEARCH 2.0 & PG PUSHPIN Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  3. 3. GOALS OF THE PROJECT GROUP• Data Mining in scientific publications• Who’s writing about what? Who’s writing with whom?• Clustering & similarity measures, Recommendations, Experts• Connections to Social Networking sites (ginkgo)• visual analytics, visualizations• Extension of the knowAAN architecture & analysis of large data sets Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  4. 4. RESULTS OF PG KNOWAAN• Java-basedbackend that allows automatically analysis of publications (metadata extraction, text analysis, relations between publications a.m.m.)• Clustering and similarity detection • currently first test with Hadoop & Mahout• Rails-, JavaScript-, CSS-based frontend for navigation• Examples: Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  5. 5. CO-AUTHOR NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  6. 6. LOCATION OF AUTHORS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  7. 7. WORD CLOUDS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  8. 8. BIBLIOMETRIC NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  9. 9. 20. OCTOBER 2011 4:45PM
  10. 10. GINKGO• conference management tool + social network• Goal: • checksubmitted publications for plagiarized content, topical and social connections • Recommendations (users, events, publications) http://ginkgosem.com Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  11. 11. GENERAL FRAMEWORK
  12. 12. PEOPLE• Prof. Johannes Magenheim• Wolfgang Reinhardt• Tobias Varlemann
  13. 13. GOALS OF A PROJECT GROUP• self-organization to the greatest extent• systematic assignment of roles and responsibilities• finding and facilitate special talents• process oriented personnel placement like in industry• regular presentations of work progress• creation of interim and final reports• working on the edge of science
  14. 14. TIMEFRAME• 18.10.2011 - 31.10.2012 (54 weeks)• 30 ECTS = 900 hours of work (approx. 17h / week)• Seminar phase until January 2012• Creativity workshops in January• Core implementation phase from February 2012 onwards • agile Development (4 milestones, 4 iterations per milestone)
  15. 15. REQUIREMENTS• active participation • check UPB mails at least daily • good communication skills, • team work• creativity in design and implementation• testing ;)
  16. 16. TOOLS• SVN and Trac #pgpushpin• Blog• Twitter (if you like)• Mendeley for exchange of research papers• Delicious for social bookmarks
  17. 17. SEMINAR PHASE
  18. 18. SEMINAR PHASE• each one of you works on one topic • theoretical framework, applications, prototypes• regular meetings with supervisors• regular blogging at http://pgpushpin.wordpress.com• presentation in mid January 2012 (25 minutes plus discussion)• article due at end January 2012 (approx. 16-24 pages)
  19. 19. TOPICS FOR THESEMINAR PHASE
  20. 20. 1.HTML5 and Javascript 9.Distributed computing with Frameworks Hadoop 22.Visual Analytics 10.Developing Multitouch Table Applications3.Agile Software Development in Small Teams 11.Clustering of text documents4.Trend detection and visualization 12.Plagiarism detection5.Text processing 13.Social Network Analysis6.Metadata extraction from 14.Faceted Search User Interfaces research papers 15.Browser-based visualization of7.Text similarities large networks8.Distributed computing with 16.Scientific recommender systems Hadoop 1
  21. 21. ALL TOPICS ARE FOCUSEDON SCHOLARLY OUTPUT E.G. SCIENTIFIC PAPERS, RESEARCHER COLLABORATION
  22. 22. HTML5 AND JAVASCRIPT FRAMEWORKS• development of sustainable web applications (responsive design)• current and coming standards • web workers, local storage, WebGL, server-side JS, web sockets• Visualizations, Word Clouds, time-dependent course• Javascript frameworks for visualizations, graphs etc.
  23. 23. VISUAL ANALYTICS• information / scientific visualization that allow reasoning• visual analytics and their application to research • cartography / geovisualization • flow visualization • diagrammatic reasoning • state of the art and mockups for new developments • tools/frameworks for realization (browser-based)
  24. 24. AGILE SOFTWARE DEVELOP. IN SMALL TEAMS• agile software development and project management in small teams• application to the project group (roles and requirements)• TDD, BDD, FDD• Scrum, eXtreme programming, Kanban• Pair Programming
  25. 25. TREND DETECTION AND VISUALIZATION & SEARCH• trend spotting and visualization & forecasting• which topics are gaining ground and which are on the decline• which networks are expanding, which are saturated• ThemeRiver - StreamGraph visualizations• Custom Search Applications (Solr and its extensions) • semantic search, linked data approaches
  26. 26. TEXT PROCESSING• PDF text extraction (get rid of headers and footers)• Part-of-speech detection, lemmatizing text, stemming• classification, topic extraction and knowledge discovery (untrained)• LDA from Mahout• usageof Apache OpenNLP & Apache Mahout for prototypes
  27. 27. METADATA EXTRACTION FROM RESEARCH PAPERS• How to best extract metadata from research papers?• Parscit and others (?)• Conditional Random Fields -- CRF++ good• Support Vector Machines mathematical knowledge• Selected information is relevant only needed• extract geo locations from papers
  28. 28. TEXT SIMILARITIES• Vector Space Model & Term Document Matrix• LSA / LSI with SVD• methods for calculation text-based similarities • possibility for live calculations • temporary files• usage of Apache Mahout for prototypes
  29. 29. DISTRIBUTED COMPUTING WITH HADOOP 1• MapReduce• Hadoop• HBase• HDFS• usage of Apache Mahout for prototypes
  30. 30. DISTRIBUTED COMPUTING WITH HADOOP 2• MapReduce• Hadoop• Hive Data Warehousing• Job Orchestration (e.g. with Zookeeper)• Pig Data Flow• usage of Apache Mahout for prototypes
  31. 31. DEVELOPING MULTITOUCH TABLE APPLICATIONS• http://www.youtube.com/watch?v=f1X5ffRrde8• C# and .Net 4.0, Visual Studio 2010• WPF and Surface SDK• Fiducials• buildsimulation, mockups of possible applications, state-of-the- art presentation• http://www.microsoft.com/silverlight/pivotviewer/
  32. 32. CLUSTERING OF TEXT DOCUMENTS• Methods for analyzing large collections of texts • k-means, single-link, full-link, canopy • visualization opportunities• how to add documents to a large clustering• usage of Apache Mahout for prototypes
  33. 33. PLAGIARISM DETECTION• How to detect potentially plagiarized content?• Ethical discussion on (self-)plagiarism• text breakdown in elements (sections, paragraphs, sentences)• n-grams• internal and external plagiarism detection
  34. 34. SOCIAL NETWORK ANALYSIS• Social Network Theory• measures from SNA • existing examples of research applications• bibliometrics and scientometrics• take real conference series as example
  35. 35. FACETED SEARCH & INTERFACE EVAL• Best practices and design recommendations• frameworks for development• enclosure / APIs• only work on JSON data & no direct DB access• Java / ASP .Net / SEAM ....• own prototype
  36. 36. BROWSER-BASED VISUALIZ. OF LARGE NETWORKS• level of detail• WebGL, web workers• Gephi• visualize properties• allow faceted search• should be working on tablets
  37. 37. SCIENTIFIC RECOMMENDER SYSTEMS• state of the art • item-based and collaborative filtering / hybrid recommenders• algorithms, visualizations• existing applications in research• usage of Apache Mahout for prototypes
  38. 38. NEXT STEPS
  39. 39. NEXT STEPS• vote for three topics until Wednesday, 8pm • mail with favorite topic, 2nd and 3rd place • decision on Friday• create Wordpress, Delicious and Mendeley account• finalpresentation of PG knowAAN this Thursday, 4.45pm in F0.231• first meetings with supervisors next two weeks
  40. 40. wolfgang reinhardt university of paderborn social media snatwitter recommendations awarenessresearch networks bibliometrics artefact-actor-networks ginkgo research 2.0 www.isitjustme.de www.ginkgosem.com @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wollepb

×