Advanced Statistical Methods
2° Year of course - Second semester
Frequency Not mandatory
- 6 CFU
- 48 hours
- English
- Trieste
- Opzionale
- Standard teaching
- Oral Exam
- SSD SECS-S/01
- Knowledge and understanding: students will have to show that they have understood the essential ideas that motivate the use of advanced statistical techniques and the functionalities that limit their use.
- Applied knowledge and understanding: the student will have to show that he knows how to use the techniques learned for the analysis of real data, even using appropriate software tools. Specifically, the student is required to have a very good use of the Stan software and of the 'rstan' library available in R.
- Making judgements: the student must be able to navigate in the context of the analysis of real data, with always priority and vigilant attention to the sampling scheme of the data, to their possible hierarchical/multilevel structure and to their granularity.
- Communication skills: the student will be able to effectively communicate the results of data analysis using appropriate tools (including modern techniques for compiling dynamic documents, such as RMarkdown). In addition, the student is also required to make a 'visualization' effort, suitable for the production of graphical tools that summarize complex trends (above all, for example, the use of R libraries such as 'ggplot2').
- Ability to learn: at the end of the course the student will be able to consult theoretical and applied scientific works that use advanced statistical techniques, critically analyze the application of the models and algorithms explained in class, and illustrate case studies through the use of probabilistic scientific programming.
- Basic knowledge of statistics (equivalent to two courses in a three-year bachelor, or in any case equivalent to having taken and passed the Statistical Methods for Data Science exam).
- Ability to program and use R software.
1. Hierarchical/multilevel statistical models, with use of the Stan software (in this case the R library 'rstan').
a) Definition of multilevel data and general structure.
b) Linear and generalized linear multilevel models with variable slope and intercepts, estimated with Frequentist and Bayesian methods.
c) Extensions of canonical models: models for grouped data; non-nested models; models for repeated measurements.
d) Quick overview about causal inference in a multilevel setting.
2. Semi-parametric/non-parametric regression:
a) Introduction to local regression methods
b) Spline functions
c) Penalized likelihood: classical estimation and Bayesian estimation
d) Splines and hierarchical models.
3. Mixed-membership models, with case-study on text data
a) Understanding multiple membership data structures
b) Definition of a general mixed membership framework and other interpretations (mixture model for grouped data, generative model)
c) Examples in different types of analysis (Text, Social network, Survey, Population genetics, Ecology, Marketing and Clustering analyses)
d) Latent Dirichlet Allocation model and extensions for topic modelling in text data analysis
e) Bayesian and Variational inference
f) Lab with R pkgs (lda, stm, ldatuning, ldavis, rlda)
- Gelman, Andrew, and Jennifer Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.
- Gelman, Andrew, Jennifer Hill, and Aki Vehtari. Regression and other stories. Cambridge University Press, 2020.
- Wood, Simon N. Generalized additive models: an introduction with R. CRC press, 2017.
- Ruppert D., Wand M.P., Carroll R.J. Semiparametric regression. Cambridge University Press, 2010
- Blei D.M. et al., Latent dirichlet allocation, J. Mach. Learn. Res. (2003).
- Griffiths T.L. et al., Finding scientific topics, Proc. Natl Acad. Sci. (2004)
- Handbook of Mixed Membership Models and Their Applications, Edited By HYPERLINK "https://www.routledge.com/search?author=Edoardo%20M.%20Airoldi" \o "Search for more titles by Edoardo M. Airoldi" Edoardo M. Airoldi, David Blei, Elena A. Erosheva, Stephen E. Fienberg (2015)
- Classical frontal lectures and guided exercises containing some R libraries mentioned above ('rstan', 'ggplot2').
- Possible group work with common discussion at the end of the exercises.
The course will make use of teaching tools available on the moodle2, MS/Teams and wooclap platforms. In addition, all students are expected to use R software, so they must own or have access to a computer.
Oral exam on the contents of the course with illustration by the student of a project agreed with the teachers. The student will be asked to send the project's presentation two-three days before the exam date. The student will be asked some questions on the theoretical part of the course.
The maximum grade is 30. Exam evaluation criteria:
- clarity and completeness of the exposition of the project;
- degree of understanding of the theoretical and practical aspects of the subject, as emerging from the project and the oral exam;
- clarity, brevity, and precision of the exposition in the oral exam.
This teaching explores topics related to one or more objectives of the 2030 Agenda for Sustainable Development of the United Nations.