Unsupervised Learning
2° Year of course - First semester
Frequency Not mandatory
- 6 CFU
- 48 hours
- English
- Trieste
- Opzionale
- Standard teaching
- Oral Exam
- SSD FIS/07
- Advanced concepts and skills
This course provides a solid base for the understanding of Unsupervised
Machine Learning techniques. The aim is to learn from the structure of
the data itself rather than from attached labels.
Knowledge and understanding: You will gain mastery on the most
common unsupervised machine learning algorithms and learn when they
can be appropriately employed.
Applying knowledge and understanding: Given a data analysis problem,
you will be able to propose explorative methods and further deeper
analysis that exploit the structure of the data in order to obtain valuable
knowledge.
Communication skills: you will be able to present the results of your
analysis together with an explanation of their practical meaning.
Learning skills: you will be able to understand all kinds of unsupervised
data analysis and navigate the existing literature to complement and
improve your data analysis methods in order to follow this fast-changing
field.
Basic knowledge of Python and scientific Python. Basic knowledge of
matrix operations, and calculus.
1. Basic notions about Unsupervised Machine Learning.
2. Dimensionality Reduction methods: General theory, classical methods
and advance techniques.
3. Intrinsic dimension estimation methods
4. Density Estimation methods: Histograms, kernel density estimation
and k-NN. Advanced methods.
5. Clustering: General Theory. Classification of clustering methods.
Classical algorithms. Overview about recent developments and new
methods. Clustering validation
Class Notes. Research papers of interest will be provided
1. General introduction to Unsupervised Machine Learning. Connection between supervised and unsupervised learning. Geometry of the data, Manifold definition.
2. Dimensionality Reduction methods: Review of PCA, Multidimensional Scaling, ISOMAP, kernel-PCA, Autoencoders, t-SNE/UMAP.
3. Intrinsic dimension estimation methods: Local and global ID. Fractal dimension, DANCO, TWO-NN.
4. Density Estimation: Review of classical density estimation methods (Histograms), the curse of dimensionality. Kernel Density estimation, k-Nearest Neighbor, Advanced methods.
5. Clustering: General considerations about clustering. Feature selection. Similarities and Distances. Classical algorithms: k-means & k-medoids. Fuzzy c-means. Hierarchical methods. Overview of modern clustering algorithms: kernel k-means, Spectral Clustering, Affinity propagation, Expectation-Maximization clustering. Density-based methods (DBSCAN,Density Peaks, Mean-Shift). Clustering validation
Frontal lectures and hands on sessions, both individual and in groups.
The balance will be roughly 60% of frontal lectures and 40% of hands-on
sessions. Ideally, each lecture will have a part of frontal teaching and a
part of hands-on training. This may range from getting used to new
libraries and tools to analyze complex datasets in groups
Bring your own laptop.
The exam will have two parts:
1. Each student will propose one paper among a selection offered by the teacher (alternative papers suggested by the students can be proposed under teacher’s acceptance) and present a contextual comparison between the method presented on the paper and the ones seen during the lectures (in general, there’s no need to do a formal presentation, a programming notebook with some graphs would be enough).
2. An interview where few questions will be asked to assess the preparation on the topics of the course.
The final mark is obtained by the sum of the scores for the first part (max 12) and the second one (max 18) to reach a maximum of 30. In the oral part 3 questions will be asked: one simple (8 points), one average (6 points) and one complex (4 points). If a question will be not answered correctly the next one will remain on the same level of difficulty as the failed one, but decreasing the points accordingly (8-6-4). Laude can be given for an exceptional exam.
The course introduces the student to modern techniques in machine learning and knowledge representation. It is a shared opinion that data science and artificial intelligence are one of the backbones of sustainable development, and all techniques learned in this course can be applied in this respect.