Macadamia: Master’s Programme in
Machine Learning and Data Mining
May 6, 2008
Tapani Raiko, Kai Puolamäki, Juha Karhunen, Jaakko
Hollmén, Antti Honkela, Samuel Kaski, Heikki
Mannila, Erkki Oja, and Olli Simula
Teaching Machine Learning: Workshop on open
problems and new directions. Saint-Étienne, France
Macadamia = Machine learning and Data mining
Macadamia is a Master's programme in Machine learning and
Data mining at Helsinki University of Technology, Finland.
The programme is given by the Department of Information and
Computer Science known for its pioneering research and
education in this ﬁeld.
The Master of Science degree obtained in this programme during
a span of two years enables the graduates to enter the IT industry
in Finland or world-wide. The degree also has seamless
continuation to doctoral studies for those interested in deeper
research and development in machine learning and data mining.
Machine learning and Data mining
• Machine learning researchers often use
probabilistic methods
• Data mining research algorithmic (and in
combination with probabilistic methods!)
• Active research in both topics
• Interaction useful: interesting things happen
at the intersection
Department of Information and Computer
Science
T3060, http://www.ics.tkk.ﬁ/
Finnish: Tietojenkäsittelytieteen laitos
Constituent laboratories (pre-2008):
Laboratory of Computer and Information Science
Laboratory for Theoretical Computer Science
AB HELSINKI UNIVERSITY OF TECHNOLOGY
Department of Information and Computer Science – 1/6
Resources (averages 2004–06)
Professors: 9
Other personnel: 108 py/yr
budget funding 52 py/yr
external funding 56 py/yr
Expenditures: 5.1 Me/yr
budget funding 2.7 Me/yr (incl. overhead transfers from external
funding)
external funding 2.4 Me/yr
AB HELSINKI UNIVERSITY OF TECHNOLOGY
Department of Information and Computer Science – 2/6
Degrees and teaching
Numbers (averages 2004–06)
M.Sc. (Tech.): 29/yr
Dr.Sc. (Tech.): 9/yr
ocr: 6800/yr
Four majors: computer and information science, theoretical computer
science, computational and cognitive biosciences, language
technology
Three international Master’s Programmes: Bioinformatics (MBI),
Foundations of Advanced Computing (FAdCo), Machine Learning
and Data Mining (Macadamia)
Graduate schools (positions in 2006)
Helsinki GS in Computer Science and Engineering (8)
GS in Comput. Biology, Bioinformatics, and Biometry (1)
AB GS of Language Technology in Finland (1)
HELSINKI UNIVERSITY OF TECHNOLOGY
GS in Comput. Methods of Information Technology (3) and Computer Science – 3/6
Department of Information
Research areas
Algorithms and methods for adaptive informatics
Multimodal interfaces
Bioinformatics and neuroinformatics
Computational cognitive systems
Adaptive informatics applications
Computational logic
Combinatorial algorithms and computational complexity
Cryptographic techniques and secure protocols
Computer-aided software quality control (veriﬁcation)
AB HELSINKI UNIVERSITY OF TECHNOLOGY
Department of Information and Computer Science – 4/6
People
Teaching and supervision for Macadamia students is given by an enthusiastic and
experienced group headed by world leaders in this research ﬁeld. They belong to
two national Centres of Excellence, the Adaptive Informatics Research Centre and
the From Data to Knowledge Research Centre. The host laboratory is a partner in
several Finnish graduate schools.
The professors responsible for Macadamia are:
s given in the programme. The size of the courses are given in credit points (ECTS).
mme is 120 ECTS. Note that the Special Courses have a varying topic (5–6 topics per
cluded in the curriculum.
Obligatory courses ECTS
IT-Services at TKK 2
English language tests / course 3
Machine Learning: Basic Principles 5
Machine Learning and Neural Networks 5
Machine Learning: Advanced Probabilistic Methods 5
Algorithmic methods of data mining 5
Information Visualization 5
Research Project in Computer and Information Science 5–10
Master’s thesis 30
Relevant courses ECTS
Computer Vision 5
Statistical Natural Language Processing 5
High-Throughput Bioinformatics 5
Signal Processing in Neuroinformatics 5
Machine Learning: Advanced Probabilistic Methods 5
Algorithmic methods of data mining 5
Information Visualization 5
Research Project in Computer and Information Science 5–10
Master’s thesis 30
Relevant courses ECTS
Computer Vision 5
Statistical Natural Language Processing 5
High-Throughput Bioinformatics 5
Signal Processing in Neuroinformatics 5
Image Analysis in Neuroinformatics 5
Special Course in Computer and Information Science I–VI 3–7
Introduction to Bayesian Modelling 5
Combinatorial Models and Stochastic Algorithms 6
Search problems and algorithms 4
Parallel and distributed systems 4
Cryptography and data security 4
Computational Complexity Theory 5
Finnish 1A 2
Finnish 1B 2
Finnish 2A 2
Finnish 2B 2
Topics of Special Courses during 2006–2008 ECTS
Parallel and distributed systems 4
Cryptography and data security 4
Computational Complexity Theory 5
Finnish 1A 2
Finnish 1B 2
Finnish 2A 2
Finnish 2B 2
Topics of Special Courses during 2006–2008 ECTS
Gaussian Processes for Machine Learning 6
Popular Algorithms in Data Mining and Machine Learning 5
Reinforcement Learning — Theory and Applications 6
Multimedia Retrieval 5
Introductory Elements of Functional Data Analysis 7
Independent Component Analysis 6
Information Networks 6
Variable Selection for Regression 6
Nonlinear Dimensionality Reduction 6
Modeling and Simulating Social Web 4
Decision support with data analysis 5
Data analysis and environmental informatics 5
T-61.3030 Principles of Neural Computing T-61.3050 Machine Learning: Basic Principles
T-61.5030 Advanced Course in Neural Computing T-61.5130 Machine Learning and Neural Networks
T-61.5040 Learning Models and Methods T-61.5140 Machine Learning: Advanced Probabilistic Methods
Table: Correspondences in degree requirements.
Machine Learning Course Reform
Old course (before Autumn 2007) New course
T-61.3050 Machine Learning: Basic Principles
T-61.5040 Learning Models and Methods
T-61.5140 Machine Learning: Advanced Probabilistic Methods
T-61.3030 Principles of Neural Computing
T-61.5130 Machine Learning and Neural Networks
T-61.5030 Advanced Course in Neural Computing
• Three courses were completely reformed
Table: Approximate topical correspondeces.
last autumn: increasing the weight of machine AB
See http://www.cis.hut.fi/Opinnot/T-61.3050/oldcourses
learning at the cost ofT-61.3050 computing
Kai Puolam¨ki
a
neural
• All of these courses are lectured every year
Course Bureaucracy
Chapter 1: Introduction
T-61.3050 Machine Learning: Basic Principles
Introduction
Kai Puolam¨ki
a
Laboratory of Computer and Information Science (CIS)
Department of Computer Science and Engineering
Helsinki University of Technology (TKK)
Autumn 2007
AB
Kai Puolam¨ki
a T-61.3050
General Information
Course Bureaucracy
Relation to Old Courses
Chapter 1: Introduction
Contents of the Course
How to Pass the Course
You will get 5 cr for passing this course.
Requirements for passing the course:
Pass the exercise work. The exercise work should be submitted
by 2 January 2008. More instructions will appear in a few
weeks time.
Pass the examination. You can participate to the examination
after passing the exercise work (exception: you can participate
to the December examination before passing the exercise work;
you’ll then pass the course if you pass the exercise work).
Optional, but useful:
Lectures.
Problem sessions.
Reading the book and other material.
AB
Kai Puolam¨ki
a T-61.3050
General Information
Course Bureaucracy
Relation to Old Courses
Chapter 1: Introduction
Contents of the Course
Literature
The course follows a subset of the book: Alpaydin, 2004.
Introduction to Machine Learning. The MIT Press.
Additionally, there will also be a PDF chapter on algorithmics
(complexity of problems, local minima etc.) to be distributed
from the course web site.
The lecture slides are available for download from the course
web site. I have also given Edita a permission to print them
on request.
You might also ﬁnd the material — especially the errata and
slides — at the Alpaydin’s web site (see the link at the course
web site) useful.
AB
Kai Puolam¨ki
a T-61.3050
General Information
Course Bureaucracy
Relation to Old Courses
Chapter 1: Introduction
Contents of the Course
Very Preliminary Plan of the Topics
Supervised learning, Bayesian decision theory, probability
distributions and parametric methods, multivariate methods,
clustering (mostly Alpaydin’s chapters 1–7 and appendix A)
Algorithmic issues in machine learning, such as hardness of
problems, approximation techniques and their features (such
as local minima), time and memory complexity in data
analysis (separate PDF chapter to be distributed from the
course web site)
Nonparametric methods (Alpaydin 8.1–8.2), linear
discrimination (Alpaydin 10.1–10.8), assessing and comparing
classiﬁcation algorithms (Alpaydin’s chapter 14)
I’ll try to keep the Alpaydin’s ordering of topics, and
emphasize principles rather than to go through all possible
algorithms and methods. AB
Kai Puolam¨ki
a T-61.3050
Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks
T-61.5130 Machine Learning and
Neural Networks (5 cr)
General information on the course
Autumn 2007
Prof. Juha Karhunen
http://www.cis.hut.ﬁ/Opinnot/T-61.5130/
Helsinki University of Technology, Espoo, Finland 1
Course materials
• All the course materials will be in English.
• There is no satisfactory single book suitable for this course.
• However, a large portion of the course is based on the book:
• F. Ham and I. Kostanic, Principles of Neurocomputing for Science
and Engineering, McGraw-Hill 2001.
• This book will be complemented by some material from the book S.
rof. J. Karhunen “Neural Networks: A Comprehensive Foundation”, 2nd Neural Ne
Haykin, T-61.5130 Machine Learning and ed.,
Prentice-Hall, 1998.
• That previously used book is too extensive for our course.
Helsinki University of Technology, Espoo, Finland
• Furthermore, independent component analysis is covered from a
separate review article.
• Ham’s and Kostanic’s book is quite expensive (some 160 USD).
• And we shall cover only parts of the Chapters 1-5 from it.
lecture(s), too.
Prof. J. Karhunen T-61.5130 Machine Learning and Neural Networks
Planned contents of the course
• Introduction to neural networks.
e following Models andwill be discussedfor a this course according to
•
matters learning algorithms in single neuron.
rent plans:
• Data preprocessing, Hebbian learning, and principal component
analysis.
University • Multilayer perceptron networks and their learning algorithms.
of Technology, Espoo, Finland 1
• Model assessment and selection: generalization, validation, and
regularization.
• Radial-basis function networks.
• Support vector machines.
• Independent component analysis.
• Self-organizing maps and learning vector quantization.
• Processing of temporal information using feedforward and recurrent
networks.
Helsinki University of Technology, Espoo, Finland 12
T-61.5140 Machine Learning:
Advanced Probablistic Methods
Jaakko Hollm´ n
e
Department of Information and Computer Science
Helsinki University of Technology, Finland
e-mail: Jaakko.Hollmen@tkk.fi
Web: http://www.cis.hut.fi/Opinnot/T-61.5140/
January 17, 2008
Course Material
Lecture slides and lectures
Lecture notes (aid the presentation on the lectures)
Lecture notes (contain extra material)
Course book
Christopher M. Bishop: Pattern Recognition and
Machine Learning, Springer, 2006
Chapters 8,9,10,11, and 13 covered during the course
Problem sessions
Problems and solutions
Demonstrations
Passing the Course (5 ECTS credit points)
Attend the lectures and the exercise sessions for best
learning experience :-)
Browse the material before attending the lectures and
complete the exercises
Complete the term project requiring solving of a
machine learning problem by programming
Pass the examination, next exam scheduled:
Thursday, 15th of May, morning
Requirements: passed exam and a acceptable term
project, bonus for active participation and excellent
term project (+1)
Note: Jaakko Hollmén will give a presentation
on the term project tomorrow
Topics covered on the course
Central topics
Random variables
Independence and conditional independence
Bayes’s rule
Naive Bayes classiﬁer, ﬁnite mixture models,
k-means clustering
Expectation Maximization algorithm for inference
and learning
Computational algorithms for exact inference
Computational algorithms for approximate inference
Sampling techniques
Bayesian modeling
CLUSTER Dual Degrees
• Macadamia has agreements for a dual
degree currently with three other Master’s
programmes in the CLUSTER network
• The students will spend 1 year in both
• Universitat Politècnica de Catalunya (UPC)
• Universidade Técnica de Lisboa, Instituto
Superior Técnico (IST)
Feedback from the First Students
Credit points: ?, 20, 25, 27 out of 30
“All goes well, the courses are all very
interesting”
“in general, everything is OK”
“interest to work at the lab”
“the layer between theory and running matlab
toolboxes is missing” (Nikolaj’s course!)
“some courses have more maths that I can
handle at the moment, but this isn’t a bad thing”
“some courses had overlapping schedules”
More Information
http://www.cis.hut.ﬁ/macadamia/
Coordinator Tapani Raiko
See you in Helsinki!
• Mining and Learning with Graphs (MLG) workshop,
July 4-5, 2008
• International Conference in Machine Learning
(ICML), July 5-9, 2008
• Uncertainty in Artiﬁcial
Intelligence (UAI), July
9-12, 2008
Be the first to comment