1st meeting of PG PUSHPIN


Published on

Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

1st meeting of PG PUSHPIN

  1. 1. Project groupPUSHPIN Supporting Scholarly Awarenessin Publications and Social Networks University of Paderborn Computer Science Education Group Wolfgang Reinhardt
  2. 2. CLASSIC RESEARCH +WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS + NEW METHODS AND METHODOLOGIES = RESEARCH 2.0 & PG PUSHPIN Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  3. 3. GOALS OF THE PROJECT GROUP• Data Mining in scientific publications• Who’s writing about what? Who’s writing with whom?• Clustering & similarity measures, Recommendations, Experts• Connections to Social Networking sites (ginkgo)• visual analytics, visualizations• Extension of the knowAAN architecture & analysis of large data sets Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  4. 4. RESULTS OF PG KNOWAAN• Java-basedbackend that allows automatically analysis of publications (metadata extraction, text analysis, relations between publications a.m.m.)• Clustering and similarity detection • currently first test with Hadoop & Mahout• Rails-, JavaScript-, CSS-based frontend for navigation• Examples: Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  5. 5. CO-AUTHOR NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  6. 6. LOCATION OF AUTHORS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  7. 7. WORD CLOUDS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  8. 8. BIBLIOMETRIC NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  9. 9. 20. OCTOBER 2011 4:45PM
  10. 10. GINKGO• conference management tool + social network• Goal: • checksubmitted publications for plagiarized content, topical and social connections • Recommendations (users, events, publications) http://ginkgosem.com Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  12. 12. PEOPLE• Prof. Johannes Magenheim• Wolfgang Reinhardt• Tobias Varlemann
  13. 13. GOALS OF A PROJECT GROUP• self-organization to the greatest extent• systematic assignment of roles and responsibilities• finding and facilitate special talents• process oriented personnel placement like in industry• regular presentations of work progress• creation of interim and final reports• working on the edge of science
  14. 14. TIMEFRAME• 18.10.2011 - 31.10.2012 (54 weeks)• 30 ECTS = 900 hours of work (approx. 17h / week)• Seminar phase until January 2012• Creativity workshops in January• Core implementation phase from February 2012 onwards • agile Development (4 milestones, 4 iterations per milestone)
  15. 15. REQUIREMENTS• active participation • check UPB mails at least daily • good communication skills, • team work• creativity in design and implementation• testing ;)
  16. 16. TOOLS• SVN and Trac #pgpushpin• Blog• Twitter (if you like)• Mendeley for exchange of research papers• Delicious for social bookmarks
  18. 18. SEMINAR PHASE• each one of you works on one topic • theoretical framework, applications, prototypes• regular meetings with supervisors• regular blogging at http://pgpushpin.wordpress.com• presentation in mid January 2012 (25 minutes plus discussion)• article due at end January 2012 (approx. 16-24 pages)
  20. 20. 1.HTML5 and Javascript 9.Distributed computing with Frameworks Hadoop 22.Visual Analytics 10.Developing Multitouch Table Applications3.Agile Software Development in Small Teams 11.Clustering of text documents4.Trend detection and visualization 12.Plagiarism detection5.Text processing 13.Social Network Analysis6.Metadata extraction from 14.Faceted Search User Interfaces research papers 15.Browser-based visualization of7.Text similarities large networks8.Distributed computing with 16.Scientific recommender systems Hadoop 1
  22. 22. HTML5 AND JAVASCRIPT FRAMEWORKS• development of sustainable web applications (responsive design)• current and coming standards • web workers, local storage, WebGL, server-side JS, web sockets• Visualizations, Word Clouds, time-dependent course• Javascript frameworks for visualizations, graphs etc.
  23. 23. VISUAL ANALYTICS• information / scientific visualization that allow reasoning• visual analytics and their application to research • cartography / geovisualization • flow visualization • diagrammatic reasoning • state of the art and mockups for new developments • tools/frameworks for realization (browser-based)
  24. 24. AGILE SOFTWARE DEVELOP. IN SMALL TEAMS• agile software development and project management in small teams• application to the project group (roles and requirements)• TDD, BDD, FDD• Scrum, eXtreme programming, Kanban• Pair Programming
  25. 25. TREND DETECTION AND VISUALIZATION & SEARCH• trend spotting and visualization & forecasting• which topics are gaining ground and which are on the decline• which networks are expanding, which are saturated• ThemeRiver - StreamGraph visualizations• Custom Search Applications (Solr and its extensions) • semantic search, linked data approaches
  26. 26. TEXT PROCESSING• PDF text extraction (get rid of headers and footers)• Part-of-speech detection, lemmatizing text, stemming• classification, topic extraction and knowledge discovery (untrained)• LDA from Mahout• usageof Apache OpenNLP & Apache Mahout for prototypes
  27. 27. METADATA EXTRACTION FROM RESEARCH PAPERS• How to best extract metadata from research papers?• Parscit and others (?)• Conditional Random Fields -- CRF++ good• Support Vector Machines mathematical knowledge• Selected information is relevant only needed• extract geo locations from papers
  28. 28. TEXT SIMILARITIES• Vector Space Model & Term Document Matrix• LSA / LSI with SVD• methods for calculation text-based similarities • possibility for live calculations • temporary files• usage of Apache Mahout for prototypes
  29. 29. DISTRIBUTED COMPUTING WITH HADOOP 1• MapReduce• Hadoop• HBase• HDFS• usage of Apache Mahout for prototypes
  30. 30. DISTRIBUTED COMPUTING WITH HADOOP 2• MapReduce• Hadoop• Hive Data Warehousing• Job Orchestration (e.g. with Zookeeper)• Pig Data Flow• usage of Apache Mahout for prototypes
  31. 31. DEVELOPING MULTITOUCH TABLE APPLICATIONS• http://www.youtube.com/watch?v=f1X5ffRrde8• C# and .Net 4.0, Visual Studio 2010• WPF and Surface SDK• Fiducials• buildsimulation, mockups of possible applications, state-of-the- art presentation• http://www.microsoft.com/silverlight/pivotviewer/
  32. 32. CLUSTERING OF TEXT DOCUMENTS• Methods for analyzing large collections of texts • k-means, single-link, full-link, canopy • visualization opportunities• how to add documents to a large clustering• usage of Apache Mahout for prototypes
  33. 33. PLAGIARISM DETECTION• How to detect potentially plagiarized content?• Ethical discussion on (self-)plagiarism• text breakdown in elements (sections, paragraphs, sentences)• n-grams• internal and external plagiarism detection
  34. 34. SOCIAL NETWORK ANALYSIS• Social Network Theory• measures from SNA • existing examples of research applications• bibliometrics and scientometrics• take real conference series as example
  35. 35. FACETED SEARCH & INTERFACE EVAL• Best practices and design recommendations• frameworks for development• enclosure / APIs• only work on JSON data & no direct DB access• Java / ASP .Net / SEAM ....• own prototype
  36. 36. BROWSER-BASED VISUALIZ. OF LARGE NETWORKS• level of detail• WebGL, web workers• Gephi• visualize properties• allow faceted search• should be working on tablets
  37. 37. SCIENTIFIC RECOMMENDER SYSTEMS• state of the art • item-based and collaborative filtering / hybrid recommenders• algorithms, visualizations• existing applications in research• usage of Apache Mahout for prototypes
  38. 38. NEXT STEPS
  39. 39. NEXT STEPS• vote for three topics until Wednesday, 8pm • mail with favorite topic, 2nd and 3rd place • decision on Friday• create Wordpress, Delicious and Mendeley account• finalpresentation of PG knowAAN this Thursday, 4.45pm in F0.231• first meetings with supervisors next two weeks
  40. 40. wolfgang reinhardt university of paderborn social media snatwitter recommendations awarenessresearch networks bibliometrics artefact-actor-networks ginkgo research 2.0 www.isitjustme.de www.ginkgosem.com @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wollepb