Statistical Methods
2° Year of course - First semester
Frequency Not mandatory
- 9 CFU
- 72 hours
- INGLESE
- Trieste
- Opzionale
- Oral Exam
- SSD SECS-S/01
Structured into the following modules:
(Mod. A: Statistical Inference and Modelling) The course focuses on fundamental elements of statistical inference, along with some principles and statistical techniques useful for data analysis. Knowledge and understanding: The student will be able to use appropriate statistical models and select among alternative models using the relevant inferential approach. Applied knowledge and understanding: The student will be able to use R for analyzing datasets and for drawing coherent inferences on the model that could have generated the data and using it for decisions and predictions. Making judgements: students must show that they know how to choose the most suitable analysis strategy also in the context of analysis of a real data set. Communication skills: students will be able to effectively communicate the results of data analysis by using appropriate tools Learning skills: students, at the end of the course, will be able to consult autonomously scientific papers, theoretical or applied, that involve the use of basic and advanced statistical techniques. (Mod. B: Advanced Data Analytics) The course focuses on some advanced statistical models suited for the analysis of complex data and underlines some connections with statistical and machine learning techniques. Knowledge and understanding: The student will be able to use some complex statistical models and to select among alternative models using the relevant inferential approach. Applied knowledge and understanding: The student will be able to use R for analyzing (possibly big) datasets and for drawing coherent inference on the model that could have generate the data and using it for decisions and predictions. Making judgements: students must show that they know how to choose the most suitable analysis strategy also in the context of analysis of a (possibly big and complex) real data set. Communication skills: students will be able to effectively communicate the results of data analysis by using appropriate tools Learning skills: students, at the end of the course, will be able to consult autonomously scientific papers, theoretical or applied, that involve the use of advanced statistical models.
(Mod. A: Statistical Inference and Modelling) The students should have basic knowledge of elements of probability theory and elementary statistical methods. Some basic knowledge of software R will be also required. (Mod. B: Advanced Data Analytics) The students should have basic knowledge of elements of probability theory and basic statistical models such as linear and generalized linear models. Some knowledge of software R will be also required.
(Mod. A: Statistical Inference and Modelling) 1. Random variables Review of some basic concepts of probability; the multivariate normal distribution; central limit theory and law of large numbers; statistics and their properties. Statistical models and inference. Examples of statistical models; the problems of statistical inference. Basic tools for estimation and testing statistical hypotheses. Approaches to statistical inference and design issues (16 hours) 2. Theory of maximum likelihood estimation The likelihood function; maximum likelihood estimation; large sample properties; AIC and model selection; numerical aspects (12 hours) 3. Elements of Bayesian Inference (12 hours) 4. Linear and generalized linear models The theory of linear models; regression diagnostics and model selection; generalized linear models; prediction; cross validation (16 hours) 5. Computer labs Application of the methods using the R software (12 hours) (Mod. B: Advanced Data Analytics) 1. Some extensions of linear and generalized linear models. Nonlinear models and semiparametric regression: Smoothing and regression splines, Generalized Additive Models, decision trees, Multivariate adaptive regressione splines (MARS). 2. Connections with statistical learning: ensemble methods and remedies fro data imbalance in classification problems. 3. Multilevel and Hierarchical models. 4. Computer labs application of the methods using the R software.
(Mod. A: Statistical Inference and Modelling) - Agresti A., Katery M.: Foundations of Statistics for Data Scientists: With R and Python, Chapman & Hall, 2021 (Main Text) - S.N. Wood: Core Statistics, Cambridge University Press, 2016 (Supplementary text) - Maindonald J., Braun W.J.: Data Analysis and Graphics Using R – An Example-Based Approach (Third Edition); Cambridge University Press, 2010 (Supplementary text) Additional material and information will be available at the course web page. (Mod. B: Advanced Data Analytics) - James G., Witten D. , Hastie T, Tibshirani R - An Introduction to Statistical Learning, Second Edition. Springer 2021. (testo principale) Può essere scaricato liberamente da https://www.statlearning.com/ - Hastie T, Tibshirani R, Friedman J - The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer, 2009. (per consultazione) Può essere scaricato liberamente da https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12.pdf - Efron B., Hastie T.: Computer Age Statistical Inference – Algorithms, Evidence, and Data Science. Cambridge University Press, 2016 (Supplementary text) Additional material and information will be available at the course web page.
(Mod. A: Statistical Inference and Modelling) The course will be delivered by traditional lectures and practical computer sessions. Students will be encouraged to participate in discussions on selected topics during the lectures. In the practical sessions, the software R will be used to illustrate some of the main ideas and techniques by analyzing some real datasets. (Mod. B: Advanced Data Analytics) The course will be delivered by traditional lectures and practical computer sessions. Students will be encouraged to participate at discussion on selected topics during the lectures. In the practical sessions the software R will be used to illustrate some of the main ideas and techniques by analysing some real datasets.
The course will make use of teaching tools available on moodle2, MS/Teams, and Wooclap platforms. In addition, all students are expected to use R software, so they need to have or have access to a computer.
(Mod. A: Statistical Inference and Modelling) The evaluation takes place at different times and in several ways: - For attending students: 1. During the course, possibly, homework will be assigned to be delivered within the established deadlines; 2. Some (2 or 3) intermediate tests will be held during the course; 3. The student must submit a report exposing the result of a project assigned at the end of the course. The final evaluation will take place by averaging the marks obtained in the three parts (with weights equal to 0.1, 0.5, 0.4). The three parts of the exams make it possible to judge the achievement of the training objectives as set out above. - For non-attending students: Students will participate in an oral exam and be asked to conduct analyses using the R. (Mod. B: Advanced Data Analytics) The evaluation takes place at different times and in several ways: - For attending students: 1. During the course, possibly, homework will be assigned to be delivered within the established deadlines; 2. An intermediate tests will be held during the course; 3. The student must submit a report in which he/she exposes the result of a project assigned at the end of the course. The final evaluation will take place by averaging the marks obtained in the 3 parts (with weights respectively equal to 0.1, 0.5, 0.4). The three parts of the exams are such that it is possible to judge the achievement of the training objectives as set out above. - For non-attending students: students will participate to an oral exam in which they will also be asked to carry out some analyses using the R.