Mod. B: Advanced Data Analytics
2° Year of course - First semester
Frequency Not mandatory
- 3 CFU
- 24 hours
- INGLESE
- Trieste
- Opzionale
- Oral Exam
- SSD SECS-S/01
Is part of:
The course focuses on some advanced statistical models suited for the analysis of complex data and underlines some connections with statistical and machine learning techniques.
Knowledge and understanding: The student will be able to use some complex statistical models and to select among alternative models using the relevant inferential approach.
Applied knowledge and understanding: The student will be able to use R for analyzing (possibly big) datasets and for drawing coherent inference on the model that could have generate the data and using it for decisions and predictions.
Making judgements: students must show that they know how to choose the most suitable analysis strategy also in the context of analysis of a (possibly big and complex) real data set.
Communication skills: students will be able to effectively communicate the results of data analysis by using appropriate tools
Learning skills: students, at the end of the course, will be able to consult autonomously scientific papers, theoretical or applied, that involve the use of advanced statistical models.
The students should have basic knowledge of elements of probability theory and basic statistical models such as linear and generalized linear models. Some knowledge of software R will be also required.
1. Some extensions of linear and generalized linear models. Nonlinear models and
semiparametric regression: Smoothing and regression splines, Generalized Additive Models, decision trees, Multivariate adaptive regressione splines (MARS).
2. Connections with statistical learning: ensemble methods and remedies fro data imbalance in classification problems.
3. Multilevel and Hierarchical models.
4. Computer labs application of the methods using the R software.
- James G., Witten D. , Hastie T, Tibshirani R - An Introduction to Statistical Learning, Second Edition. Springer 2021. (testo principale)
Può essere scaricato liberamente da https://www.statlearning.com/
- Hastie T, Tibshirani R, Friedman J - The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer, 2009. (per consultazione)
Può essere scaricato liberamente da https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12.pdf
- Efron B., Hastie T.: Computer Age Statistical Inference – Algorithms,
Evidence, and Data Science. Cambridge University Press, 2016
(Supplementary text)
- Efron B., Hastie T.: Computer Age Statistical Inference – Algorithms,
Evidence, and Data Science. Cambridge University Press, 2016
(Supplementary text)
Additional material and information will be available at the course web
page.
The course will be delivered by traditional lectures and practical computer sessions.
Students will be encouraged to participate at discussion on selected topics during the lectures.
In the practical sessions the software R will be used to illustrate some of the main ideas and techniques by analysing some real datasets.
The course will make use of teaching tools available on moodle2, MS/Teams and wooclap platforms. In addition, all students are expected to use R software, so they need to have or have access to a computer.
The evaluation takes place at different times and in several ways:
- For attending students:
1. During the course, possibly, homework will be assigned to be delivered within the established deadlines;
2. An intermediate tests will be held during the course;
3. The student must submit a report in which he/she exposes the result of a project assigned at the end of the course.
The final evaluation will take place by averaging the marks obtained in the 3 parts (with weights respectively equal to 0.1, 0.5, 0.4).
The three parts of the exams are such that it is possible to judge the
achievement of the training objectives as set out above.
- For non-attending students:
students will participate to an oral exam in which they will also be asked to carry out some analyses using the R.
This course covers some topics related to one or more objectives of the 2030 Agenda for the Sustainable Development of United Nations.