Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Social Web: (Big) Data Mining | summer 2014/2015 course syllabus

  • 1,010 views
Uploaded on

Social Web: (Big) Data Mining | ISS FSV UK | Charles University in Prague | Faculty of Social Sciences | Institute of Sociological Studies | bachelor’s course | JSB454 | summer semester......

Social Web: (Big) Data Mining | ISS FSV UK | Charles University in Prague | Faculty of Social Sciences | Institute of Sociological Studies | bachelor’s course | JSB454 | summer semester 2014/2015

Course Syllabus (version 1.1)
Introduction to Data Mining & Data Analysis | Data Science | Digital Humanities
Big Data | Types of Data | Data Formats | Information Retrieval | Business Intelligence | Law & Ethics of Data Mining
Introduction to Web Technologies for Non-Tech Students | Database Systems | Web Programming | Semantic Web | APIs
Graph Theory | Social Network Analysis | Statistical Procedures, Apps&Tools
Pseudocoding | Introduction to Programming in Python & data mining alternatives comparison | Data Exploration & Preprocessing
Web Scraping | Data Cleaning & Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Social Media Mining | Data Cleaning & Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Text Mining | Natural Language Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Data Visualization | Data Storytelling | Electronic Publishing | Python Implementation & Libraries, Statistical Procedures, Apps & Tools
Student Webinars Week |Introducing Various Free &Open Source Data Mining Software &Apps
Machine Learning, Recommender Systems & OtherMoreAdvanced Topics | Large-ScaleDataSets| MapReduce, Hadoop, NoSQL
Course Review | Semestral Projects Consultation & Adjustments | The Remaining 99% of Data Science | Data Science Buzzwords

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,010
On Slideshare
994
From Embeds
16
Number of Embeds
3

Actions

Shares
Downloads
37
Comments
0
Likes
3

Embeds 16

http://www.littlerose.cz 10
http://www.slideee.com 5
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. JAKUB RŮŽIČKA jameslittlerose@gmail.com cz.linkedin.com/in/littlerose summer semester 2014/2015 SOCIAL WEB: (BIG) DATA MINING bachelor‘s course | ISS FSV UK | JSB454 course syllabus [version 1.1]
  • 2. outline Outline General information Intended Learning Outcomes Syllabus Types of Instruction Requirements, Examination & Assignments Course literature & Documentations
  • 3. General Information Social Web: (Big) Data Mining
  • 4. outline Social Web: (Big) Data Mining The course gives a professional and academic introduction to web & social media data mining. Emphasis is put on the intersection of data science, humanities & ICT. • PhDr. Mgr. Ing. Petr Soukup • Jakub Růžička guarantors • Jakub Růžička • Petr Soukuplecturers • 7 ECTS • elective coursecredits • 1 lecture (80min) & 1 tutorial/seminar (80min) per week lectures
  • 5. Intended Learning Outcomes in which way the course should make your life better & improve your skills
  • 6. outline Upon completion of the course, the students will be able to understand the intersection of data science, humanities & ICT within the realm of web & social media (big) data mining ask meaningful questions, perform basic analytical operations regarding both, structured & unstructured web / social media data and draw conclusions for decision making understand basic concepts and conduct subsequent data preprocessing, analysis & visualization related to social network analysis, web mining, social media mining & text mining take a positive approach towards data science & computer programming, gain confidence in basic operations and use or modify a third party (open) source code or an analytical procedure/tool describe advanced data mining methods & applications for further self education (or subsequent institutional education) or professional/academic specialization
  • 7. Syllabus course outline | topics covered
  • 8. outline Course Overview lectures are followed by tutorials in order to put knowledge into practice the exact dates & content of the lectures may be subject to change based on pace & requirements of the course group • Introduction to Data Mining & Data Analysis | Data Science | Digital HumanitiesLecture #1 • Big Data | Types of Data | Data Formats | Information Retrieval | Business Intelligence | Law & Ethics of Data MiningLecture #2 • Introduction to Web Technologies for Non-Tech Students | Database Systems | Web Programming | Semantic Web | APIsLecture #3 • Graph Theory | Social Network Analysis | Statistical Procedures, Apps & ToolsLecture #4 • Pseudocoding | Introduction to Programming in Python & data mining alternatives comparison | Data Exploration & PreprocessingLecture #5 • Web Scraping | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #6 • Social Media Mining | Data Cleaning & Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #7 • Text Mining | Natural Language Processing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #8 • Data Visualization | Data Storytelling | Electronic Publishing | Python Implementation & Libraries, Statistical Procedures, Apps & ToolsLecture #9 • Student Webinars Week | Introducing Various Free & Open Source Data Mining Software & AppsLecture #10 • Machine Learning, Recommender Systems & Other More Advanced Topics | Large-Scale DataSets | MapReduce, Hadoop, NoSQLLecture #11 • Course Review | Semestral Projects Consultation & Adjustments | The Remaining 99% of Data Science | Data Science BuzzwordsLecture #12
  • 9. Types of Instruction & workload
  • 10. outline Types of Instruction & Workload the course consists of • lectures • tutorials/seminars • guest lectures (possibly webinars) • student webinars background, how-to, support & inspiration during lectures & tutorials/seminars and online course materials for self-directed students workload | 150 hours • lectures 16h • tutorials/seminars 16h • assignments • team project 70h • webinar 20h • self-study 28h
  • 11. outline Teaching Method & Related Information storytelling • the course topics will be tied togehter via obtaining real-time (& real-life) data for decision making of a fictional political party • teams of 2-3 students will be formed as a response to a need of studying more specific area of the political campaign | teams will be differentiated based on a specific topic/area of interest rather than types of analyses collaboration • teamwork & knowledge sharing will be strongly encouraged & facilitated | collaboration has its downsides as well but since there are too many ‘individual work‘ courses & too few ‘team work‘ courses, let‘s try work together for a change BYOD Bring Your Own Device • several software packages requiring installation & personalization will be used within the course • BYOD is therefore recommended beginner quite =) friendly • although the course might be challenging for students with no analytical or computing background (introductory-level courses or professional experience), most of the time, you won‘t be required to create/write your own computer code ‘from scratch‘ (that would require another course) but you‘ll be provided with a working code (explained in a pseudocode) that you‘ll customize • user-level knowledge of social media is assumed
  • 12. Requirements, Examination & Assignments (I.) 30% Webinar collaborative, teams of 2-3 (II.) 70% Project/Research collaborative, teams of 2-4 * the percentage stands for the significance of the assignment regarding the final grade
  • 13. outline Grading the grade is calculated on WEBINAR (30%) and PROJECT/RESEARCH defence (70%) the course is graded A (>=85%), B (>=70%), C (>=60%), D (>=50%), or E (<50%) A, B or C is needed to pass the course
  • 14. outline (I.) Webinar 30% collaborative, teams of 2-3 students assignment • 1) familiarize yourself (in brief) with an assigned data mining tool or application (you might also choose your own if approved by the lecturer) and introduce it • 2) replicate an analysis (cite your source) using the tool and explain the procedure & background information • 3) prepare a short (5-15min) live webinar for your classmates & answer their questions (questions regarding your particular analysis only) • 4) let them do peer assessment of your work motivation • the volume of various data science free & open source procedures, tools & applications grows rapidly, so you definitely won‘t ‘be done‘ after passing this course • the volume of open educational resources (text, video, interactive etc.) is huge, the tools are usually well-documented & include sample analyses provided by the creators or by its community • you‘ ll learn most by a hands-on approach and you‘ll get feedback from your peers • brief description of the tool • what it is for • how one can use it • where one can get it & learn it 20% • replication of an analysis • background information • clarity of the procedure60% • question responses • only questions related to the particular analysis count (one doesn‘t become an expert on a tool replicating one analysis =)) 20%
  • 15. outline (II.) Project/Research 70% collaborative, teams of 2-4 students assignment • 1) mine/scrape, analyze & visualize available structured & unstructured web & social media data related to your team‘s area of specialization within the fictional political party campaign planning • 2) prepare an executive summary in a form of storyline highlighting the most important findings for decision making • 3) defend your project/research (examination) motivation • preparation for conducting a commercial or academic research including web & social media data mining & related analyses • an opportunity to try everything out ‘under supervision‘ & get feedback on your work • practicing teamwork skills, organizing & division of labour within a larger work group / institution • executive summary, clarity & coherence of the data story and meeting all requirements on analyses used (see the next slide) 30% • appropriateness & correctness of mining procedures & analyses used and of your data interpretation, consideration of limitations of your outcomes (critical context) 40% • answers to questions regarding procedures, analyses & other ‘technical‘ details of your project/research 30%
  • 16. outline Disscussed within a project defence & included in a project executive summary the story of your data (for decision making within your specialization) visualizations, descriptions, theoretical background, interpretations & highlights social network analysis web scraping social media mining text mining & natural language processing critical review of the project & limitations of the generalizability of your research analytical appendix with a hyperlink to source tables & datasets ‘technical‘ appendix computations, programming code, request, queries etc.
  • 17. Course literature & Documentations • you are not required to read any of the following, but you might find it handy when looking for inspiration, reference, sample analyses, sample code or when some part of the course takes your interest so that you want to follow up with more in-depth self-directed study • further online/paperback study resources, tutorials, libraries, applications & tools will be introduced within specific topics of the course
  • 18. outline Books GOLBECK, Jennifer. ANALYZING THE SOCIAL WEB. Amsterdam: Morgan Kaufmann, 2013. ISBN 01-240-5531-1. TSVETOVAT, Maksim and Alexander KOUZNETSOV. SOCIAL NETWORK ANALYSIS FOR STARTUPS. O'Reilly, 2011. ISBN 978-144-9306-465. HANSEN, Derek, Ben SCHNEIDERMAN and Marc SMITH. ANALYZING SOCIAL MEDIA NETWORKS WITH NODEXL: INSIGHTS FROM A CONNECTED WORLD. Burlington, MA: Morgan Kaufmann, 2011. ISBN 01-238-2229-7. MURRAY, Scott. INTERACTIVE DATA VISUALIZATION FOR THE WEB. Sebastopol, CA: O'Reilly Media, 2013. ISBN 14-493-6108-0. STEELE, Julie and Noah ILIINSKY. BEAUTIFUL VISUALIZATION. Sebastopol, CA: O'Reilly, 2010. ISBN 14- 493-7986-9. FRY, Ben. VISUALIZING DATA. Sebastopol, CA: O´Reilly, 2007. ISBN 05- 965-1455-7.
  • 19. outline Books MCKINNEY, Wes. PYTHON FOR DATA ANALYSIS: DATA WRANGLING WITH PANDAS, NUMPY, AND IPYTHON. Beijing: O'Reilly Media. ISBN 978- 1449319793. RUSSELL, Matthew A. MINING THE SOCIAL WEB: DATA MINING FACEBOOK, TWITTER, LINKEDIN, GOOGLE , GITHUB, AND MORE. 2nd ed. Sebastopol: O´Reilly, 2014. ISBN 978- 1-449-36761-9. JANERT, Philipp K. DATA ANALYSIS WITH OPEN SOURCE TOOLS. Sebastopol, CA: O'Reilly. ISBN 05-968- 0235-8. LUTZ, Mark. LEARNING PYTHON. 5th ed. Beijing: O'Reilly Media, 2013. ISBN 978-1449355739. BIRD, Steven, Ewan KLEIN and Edward LOPER. NATURAL LANGUAGE PROCESSING WITH PYTHON. Beijing: O´Reilly, 2009. ISBN 978-0596516499. PERKINS, Jacob. PYTHON TEXT PROCESSING WITH NLTK 2.0 COOKBOOK. Birmingham, UK: Packt Publishing, 2010. ISBN 978-1849513609.
  • 20. outline Books O'NEIL, Cathy and SCHUTT, Rachel. DOING DATA SCIENCE. Sebastopol, CA: O'Reilly, 2013. ISBN 14-493-5865-9. RAJARAMAN, Anand and Jeffrey ULLMAN. MINING OF MASSIVE DATASETS. Cambridge: Cambridge University Press, 2012. ISBN 11-070- 1535-9. NORTH, Matthew. DATA MINING FOR THE MASSES. Global Text Project, 2012. ISBN 06-156-8437-8. PROVOST, Foster. DATA SCIENCE FOR BUSINESS: WHAT YOU NEED TO KNOW ABOUT DATA MINING AND DATA-ANALYTIC THINKING. Sebastopol, CA: O´Reilly. ISBN 978-1- 449-36132-7. MINELLI, Michael, Michael CHAMBERS and DHIRAJ, Ambiga. BIG DATA BIG ANALYTICS: EMERGING BUSINESS INTELLIGENCE AND ANALYTIC TRENDS FOR TODAY'S BUSINESSES. Wiley, 2013. ISBN 111814760X. BOSLAUGH, Sarah. STATISTICS IN A NUTSHELL. 2nd ed. Farnham, Surrey, England: O'Reilly, 2012. ISBN 14-493- 1682-4.
  • 21. outline Docummentations https://www.python.o rg/doc/ http://www.w3school s.com/ https://github.com/ http://stackexchange .com/sites# http://stackoverflow.c om/ https://developers.fa cebook.com/docs/ https://dev.twitter.co m/docs https://developer.link edin.com/apis http://instagram.com/ developer/ https://developers.go ogle.com/+/ https://developers.pi nterest.com/ https://developer.four square.com/ http://flowingdata.co m/ http://www.informatio nisbeautiful.net/ http://www.reddit.co m/ https://www.statsoft.c om/textbook http://learnpythonthe hardway.org/book/ http://www.program mableweb.com/ http://www.pythonapi .com/
  • 22. outline self-directed learners, those who prefer distance/blended learning, those who want to know more, or those who don‘t want to rely on one source of information only might want to Complement/substitute different parts of the course on Coursera MIT OpenCourseWare Stanford ONLINE edX KhanAcademy Codecademy and many other Google it & learn it resources or YouTube it & watch it =)
  • 23. JAKUB RŮŽIČKA jameslittlerose@gmail.com cz.linkedin.com/in/littlerose summer semester 2014/2015 SOCIAL WEB: (BIG) DATA MINING bachelor‘s course | ISS FSV UK | JSB454 course proposal [version 1.1]