The document outlines various thesis topics proposed by Marco Brambilla in areas such as explainable AI, data science, gamification, and using ChatGPT for software engineering tasks. Specific proposals include using gamification for data collection and evaluation of AI explanations, applying data science to societal challenges, and exploring the use of ChatGPT for generating UML and BPMN models.
Tabula.io Cheatsheet: automate your data workflows
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambilla Marco
1. SCUOLA DI INGEGNERIA
INDUSTRIALE E DELL’INFORMAZIONE
Computer Science and Engineering Programme
THESIS TOPICS
AND PROPOSALS
Marco Brambilla
marco.brambilla@polimi.it
2. 2
Generalities
1. General topics and macro-areas
2. Any topic open for works at the level of thesis or tesina. Tentative type
agreed upon at the beginning, actual type assessed towards the end
3. Suggested timing for thesis: last semester of studies, possibly in parallel
with (very few) exams
4. Duration: in the range of 6 months, with a big dependency on effort and
results
4. 4
Explainable AI: context
The final aim of the Explainable Artificial Intelligence (XAI) research field can be
summarised as
“Developing inherently explainable
systems and explainability
techniques
that faithfully explicit the behaviour
of complex machine learning models
tailoring their explanation in an
understandable way for humans.”
5. 5
Global Explanations of Image Classification Tasks by Means of Local Explanations
Thesis by Antonio De Santis and Matteo Bianchi
6. 6
Gamified Data Collection for NLP Explainability Tasks
Development of a gamified platform to collect structured human knowledge for multiple,
different Natural Language Processing tasks.
NLP Task
Selection
Gamified Activity
Task #1
Gamified Activity
Task #N
Data
Structuring
Data Storing
7. 7
Global Explanations of NLP Tasks by Means of Local Explanations
Development of a pipeline to collect human knowledge to explain the behaviour of a
neural network performing NLP tasks (e.g., sentiment analysis).
Input Layer
Layer 1
…
Layer N
Output Layer
Gamified
Explainability Task
Human Interpreter
Layer-wise
Explanation
Global Explanation
8. 8
Gamified Approaches to Evaluate the Understandability of Explanations
Development of gamified approaches to evaluate the understandability of models’
explanations produced by explainability approaches through crowdsourcing.
Human Interpreter
Gamified
Understandability Task
Understandability
Comparison
Explanations from
Explainability Algorithms
14. 14
News and News Sharing
► Understanding how and when people share pieces of news on social network
► Profiling users against possible risks (fake news, superficial behaviour)
18. 18
Approach
City-scale: mobile telephone and (gross-grain geo-located)
social media data
Street/square: people counting & profiling
IoT sensors
Point of Interest:
people counting
sensor, WiFi log analysis,
beacons and (fine grain geo-
located)
social media
Descriptive, predictive, privacy-preserving and, when needed, real-time
analysis of a variety of (fused) data sources
19. Dashboards…
People counting and profiling via Mobile Data
24.512
People present
41%
71% 63%
59%
tourists
citizens
29%
female
male
37%
private
business
10 20 30 40 50 60 70
<<more?>>
age
More people than usual
25. 7
1
6
2
3
4
5
7 Areas
1. Città murata
2. Lago sponda Viale Geno
3. Lago
4. Lago sponda di Villa Olmo
5. Zona industriale
6. Brunate
7. Business e università
Phone data
28. Where do the foreign visitors come from?
Provenance
• Insight:
Switzerland,
Germany,
Netherlands,
Spain and United
Kingdom are the
top origins
• Insight: Outside
Europe, there is a
relevant evidence
from Brazil,
Japan, Korea
Republic and US
29. Which locations do people visit from where?
Statistics about nationality
31. 31
User Cluster Analysis
• Apply clustering algorithms over Topic Probabilities
Matrix to cluster users
• Multiple data slices
• Multiple algorithms
o K-means
o Hierarchical
o DBSCAN
Topic 1
Topic 3
Topic 2
34. This project has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement No
874724
Studying the exposome
for a healthier future for all children
● Exposome is defined as all factors that can affect the
quality of human health
● The Equal-Life project comprises 22 partners from 11
European countries.
● Specialists dealing with children's health, who come
from different fields (scientific, educational, social),
were involved outside the project
● One of the methods by which they have been asked to
participate is through targeted interviews.
● The problem of this new field (the exposome) lies
mainly in the definition of the terms used to define the
factors, which are often inconsistent between different
disciplines and which do not allow an objective
classification of the answers.
● The purpose of this thesis is to find a method that is able
to classify the definitions provided by the domain experts
● expand the definition where possible
● extract the categories obtained from the answers in the
interviews.
www.Equal-Life.eu https://www.humanexposome.eu/
40. 40
Exploration of ChatGPT use for software design
► Analyze the use, configuration, and features of ChatGPT
► Understand the strength and weaknesses when generating answers to
software questions
► Implement and analyze ChatGPT as generator of UML / IFML or BPMN models
41. 41
references
► https://marco-brambilla.com/blog/
► Big data and data science
► https://marco-brambilla.com/2022/11/04/exploring-the-bi-verse-a-trip-across-the-
digital-and-physical-ecospheres/
► Explainability
► https://marco-brambilla.com/2022/07/11/the-role-of-human-knowledge-in-
explainable-ai/
► https://marco-brambilla.com/2022/06/01/exp-crowd-gamified-crowdsourcing-for-ai-
explainability/