1st meeting of PG PUSHPIN

  • 1,723 views
Uploaded on

Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.

Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,723
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Project groupPUSHPIN Supporting Scholarly Awarenessin Publications and Social Networks University of Paderborn Computer Science Education Group Wolfgang Reinhardt
  • 2. CLASSIC RESEARCH +WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS + NEW METHODS AND METHODOLOGIES = RESEARCH 2.0 & PG PUSHPIN Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 3. GOALS OF THE PROJECT GROUP• Data Mining in scientific publications• Who’s writing about what? Who’s writing with whom?• Clustering & similarity measures, Recommendations, Experts• Connections to Social Networking sites (ginkgo)• visual analytics, visualizations• Extension of the knowAAN architecture & analysis of large data sets Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 4. RESULTS OF PG KNOWAAN• Java-basedbackend that allows automatically analysis of publications (metadata extraction, text analysis, relations between publications a.m.m.)• Clustering and similarity detection • currently first test with Hadoop & Mahout• Rails-, JavaScript-, CSS-based frontend for navigation• Examples: Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 5. CO-AUTHOR NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 6. LOCATION OF AUTHORS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 7. WORD CLOUDS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 8. BIBLIOMETRIC NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 9. 20. OCTOBER 2011 4:45PM
  • 10. GINKGO• conference management tool + social network• Goal: • checksubmitted publications for plagiarized content, topical and social connections • Recommendations (users, events, publications) http://ginkgosem.com Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 11. GENERAL FRAMEWORK
  • 12. PEOPLE• Prof. Johannes Magenheim• Wolfgang Reinhardt• Tobias Varlemann
  • 13. GOALS OF A PROJECT GROUP• self-organization to the greatest extent• systematic assignment of roles and responsibilities• finding and facilitate special talents• process oriented personnel placement like in industry• regular presentations of work progress• creation of interim and final reports• working on the edge of science
  • 14. TIMEFRAME• 18.10.2011 - 31.10.2012 (54 weeks)• 30 ECTS = 900 hours of work (approx. 17h / week)• Seminar phase until January 2012• Creativity workshops in January• Core implementation phase from February 2012 onwards • agile Development (4 milestones, 4 iterations per milestone)
  • 15. REQUIREMENTS• active participation • check UPB mails at least daily • good communication skills, • team work• creativity in design and implementation• testing ;)
  • 16. TOOLS• SVN and Trac #pgpushpin• Blog• Twitter (if you like)• Mendeley for exchange of research papers• Delicious for social bookmarks
  • 17. SEMINAR PHASE
  • 18. SEMINAR PHASE• each one of you works on one topic • theoretical framework, applications, prototypes• regular meetings with supervisors• regular blogging at http://pgpushpin.wordpress.com• presentation in mid January 2012 (25 minutes plus discussion)• article due at end January 2012 (approx. 16-24 pages)
  • 19. TOPICS FOR THESEMINAR PHASE
  • 20. 1.HTML5 and Javascript 9.Distributed computing with Frameworks Hadoop 22.Visual Analytics 10.Developing Multitouch Table Applications3.Agile Software Development in Small Teams 11.Clustering of text documents4.Trend detection and visualization 12.Plagiarism detection5.Text processing 13.Social Network Analysis6.Metadata extraction from 14.Faceted Search User Interfaces research papers 15.Browser-based visualization of7.Text similarities large networks8.Distributed computing with 16.Scientific recommender systems Hadoop 1
  • 21. ALL TOPICS ARE FOCUSEDON SCHOLARLY OUTPUT E.G. SCIENTIFIC PAPERS, RESEARCHER COLLABORATION
  • 22. HTML5 AND JAVASCRIPT FRAMEWORKS• development of sustainable web applications (responsive design)• current and coming standards • web workers, local storage, WebGL, server-side JS, web sockets• Visualizations, Word Clouds, time-dependent course• Javascript frameworks for visualizations, graphs etc.
  • 23. VISUAL ANALYTICS• information / scientific visualization that allow reasoning• visual analytics and their application to research • cartography / geovisualization • flow visualization • diagrammatic reasoning • state of the art and mockups for new developments • tools/frameworks for realization (browser-based)
  • 24. AGILE SOFTWARE DEVELOP. IN SMALL TEAMS• agile software development and project management in small teams• application to the project group (roles and requirements)• TDD, BDD, FDD• Scrum, eXtreme programming, Kanban• Pair Programming
  • 25. TREND DETECTION AND VISUALIZATION & SEARCH• trend spotting and visualization & forecasting• which topics are gaining ground and which are on the decline• which networks are expanding, which are saturated• ThemeRiver - StreamGraph visualizations• Custom Search Applications (Solr and its extensions) • semantic search, linked data approaches
  • 26. TEXT PROCESSING• PDF text extraction (get rid of headers and footers)• Part-of-speech detection, lemmatizing text, stemming• classification, topic extraction and knowledge discovery (untrained)• LDA from Mahout• usageof Apache OpenNLP & Apache Mahout for prototypes
  • 27. METADATA EXTRACTION FROM RESEARCH PAPERS• How to best extract metadata from research papers?• Parscit and others (?)• Conditional Random Fields -- CRF++ good• Support Vector Machines mathematical knowledge• Selected information is relevant only needed• extract geo locations from papers
  • 28. TEXT SIMILARITIES• Vector Space Model & Term Document Matrix• LSA / LSI with SVD• methods for calculation text-based similarities • possibility for live calculations • temporary files• usage of Apache Mahout for prototypes
  • 29. DISTRIBUTED COMPUTING WITH HADOOP 1• MapReduce• Hadoop• HBase• HDFS• usage of Apache Mahout for prototypes
  • 30. DISTRIBUTED COMPUTING WITH HADOOP 2• MapReduce• Hadoop• Hive Data Warehousing• Job Orchestration (e.g. with Zookeeper)• Pig Data Flow• usage of Apache Mahout for prototypes
  • 31. DEVELOPING MULTITOUCH TABLE APPLICATIONS• http://www.youtube.com/watch?v=f1X5ffRrde8• C# and .Net 4.0, Visual Studio 2010• WPF and Surface SDK• Fiducials• buildsimulation, mockups of possible applications, state-of-the- art presentation• http://www.microsoft.com/silverlight/pivotviewer/
  • 32. CLUSTERING OF TEXT DOCUMENTS• Methods for analyzing large collections of texts • k-means, single-link, full-link, canopy • visualization opportunities• how to add documents to a large clustering• usage of Apache Mahout for prototypes
  • 33. PLAGIARISM DETECTION• How to detect potentially plagiarized content?• Ethical discussion on (self-)plagiarism• text breakdown in elements (sections, paragraphs, sentences)• n-grams• internal and external plagiarism detection
  • 34. SOCIAL NETWORK ANALYSIS• Social Network Theory• measures from SNA • existing examples of research applications• bibliometrics and scientometrics• take real conference series as example
  • 35. FACETED SEARCH & INTERFACE EVAL• Best practices and design recommendations• frameworks for development• enclosure / APIs• only work on JSON data & no direct DB access• Java / ASP .Net / SEAM ....• own prototype
  • 36. BROWSER-BASED VISUALIZ. OF LARGE NETWORKS• level of detail• WebGL, web workers• Gephi• visualize properties• allow faceted search• should be working on tablets
  • 37. SCIENTIFIC RECOMMENDER SYSTEMS• state of the art • item-based and collaborative filtering / hybrid recommenders• algorithms, visualizations• existing applications in research• usage of Apache Mahout for prototypes
  • 38. NEXT STEPS
  • 39. NEXT STEPS• vote for three topics until Wednesday, 8pm • mail with favorite topic, 2nd and 3rd place • decision on Friday• create Wordpress, Delicious and Mendeley account• finalpresentation of PG knowAAN this Thursday, 4.45pm in F0.231• first meetings with supervisors next two weeks
  • 40. wolfgang reinhardt university of paderborn social media snatwitter recommendations awarenessresearch networks bibliometrics artefact-actor-networks ginkgo research 2.0 www.isitjustme.de www.ginkgosem.com @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wollepb