SCIENTIFIC RESEARCH METHODOLOGY
First semester
Frequency Mandatory
- 4 CFU
- 48 hours
- italian
- Trieste
- Obbligatoria
- Oral Exam
- SSD INF/01, MED/01
Structured into the following modules:
Main objective of this course is first of all to understand basic concepts of descriptive and inferential statistics. At the end of the course students are able to collect data and interpret them, and to make simple inference from sample to population. Moreover, a basic introduction to the free R statistical software will be provided.
Furthermore, elementary computer skills will be provided with particular attention to data management, elementary programming notions and introduction to the basic ideas of artificial intelligence.
Knowledge and understanding: elements of descriptive statistics, applications of probability; inferential statistics, tools necessary for a medical doctor. Understand and manage data storage and retrieval issues. Understanding promises, challenges and limitations of new artificial intelligence techniques.
Ability to apply knowledge and understanding: being able to read and apply elements of Statistics to experiments and research in the field of medicine. Ability to independently store and manipulate data collections to extract relevant scientific or clinical information.
Making judgements: being able to critically evaluate the results of experiments and scientific articles with the presence of data collection and analysis.
Communication skills: being able to express oneself appropriately on the basic topics of Statistics on biomedical-health applications, in particular in view of the degree thesis project. Ability to transfer and acquire data in the correct exchange formats.
Learning skills: being able to grasp the salient elements of new topics, in particular on research methodology and data processing in the biomedical field.
basic mathematics and statistics
Recall of descriptive statistics with examples and applications
Recalls of probability theory and its use in the biomedical field.
The Relative Risk and the Odds Ratio. Examples.
Gaussian data population and other main probability distributions and their properties.
STATISTICAL INFERENCE
1. Confidence intervals for means and proportions.
2. Significance tests.
3. Inferences from means.
4. Inferences from means and proportions: comparison between two populations.
For the computer science part: Introduction to the Basic Principles of Medical Informatics. Organization of a computer. The concept of Algorithm. Elements of Artificial Intelligence and Machine Learning for Medicine. Learning vs Fitting. Supervised and Unsupervised Learning. What is Training. Neural Networks and Deep Learning. The concept of generalization. Applications in Medicine of supervised and unsupervised learning. What is Generative AI. The concept of attention. Large Language Models: C. Human and machine learning. Applications in medical research and medical practice. Data storage and manipulation: Spreadsheets versus databases. Introduction to the Basic Principles of Relational Databases. The relational data model. Design, implementation, population, querying of a relational database. Examples in MS Access DB: creation of a database; “Querying by Example”, with particular reference to data aggregation queries for decision support. Other data models: Document DBs and Graph DBs. Anonymization of medical data, and methods of “hacking” databases.
Marc M. Triola, Mario F. Triola. Fondamenti di statistica. Per le discipline biomediche. Pearson, 2017. W.W. Daniel, C.L. Cross. Biostatistica - Concetti di base per l' analisi statistica delle scienze dell' area medico – sanitaria. EdiSES, 2019. M.Bland, Statistica Medica, Seconda Edizione, APOGEO. P.Armitage, G.Berry “Statistical Methods in Medical Research”, Third Edition, Blackwell Science.
For AI: R. Borhani, S. Borhani, A.K. Katsaggelos, Fundamentals of Machine Learning and Deep Learning in Medicine, Springer, 2022
For Data bases: L. Alluri, U. Nanni, Fondamenti di basi di dati, Hoepli 20
Teachers Slides.
INTRODUCTION
Statistical methods in clinical and epidemiological studies.
DESCRIPTIVE STATISTICS
1. Tabulation and data processing: numerical and graphical summaries of data. Frequency distributions, absolute, relative and cumulative frequencies.
2. Measures of location, means, medians, mode and comparison between them.
3. Measures of dispersion: range, variance, standard deviation, coefficient of variation, percentiles. Boxplots.
4. Relationship between two variables: contingency tables, scatter plot, correlation.
PROBABILITY
1. Calculation rules and basic theorems.
2. Probability distributions for discrete and continuous random variables. Binomial and Gaussian distribution.
STATISTICAL INFERENCE
1. Sampling and sampling distributions. The Central Limit Theorem.
2. Confidence intervals for means and proportions.
3. Significance tests: null and alternative hypothesis, I and II type errors, significance level, definition and interpretation of the p-value.
4. Inferences on means.
5. Inferences on means and proportions: comparison between two populations.
Introduction to the Basic Principles of Medical Informatics. Organization of a computer. The concept of Algorithm. What is a program. Compilers and interpreters. Elements of Artificial Intelligence and Machine Learning for Medicine. Learning vs Fitting. Classical Statistics vs Machine Learning. Supervised and Unsupervised Learning. What is Training. Loss and its Loss Minimization. The concept of overfitting. The concept of Garbage In Garbage Out (GIGO). The artificial neuron. Neural Networks and Deep Learning. The concept of generalization. Convolutional Neural Networks. Applications in Medicine of supervised and unsupervised learning. What is Generative AI. The concept of attention. Large Language Models: ChatGPT (and/or other available tools) and its applications in medicine. Understanding the impact that AI is having and will increasingly have in medicine and in the way patients experience the relationship with doctors. Human learning vs machine learning and other ethical-philosophical issues. Critical commentary on some milestone articles of Machine Learning application in Medicine.
Part of data management : Information systems in medicine. Introduction to the Basic Principles of Relational Databases. The relational model of data. Design of a relational database. Creation of tables and definition of referential integrity domain key constraints. Population of the database (manual data entry, from file, from query). Examples in MS Access DB “Querying by Example”, with particular reference to data aggregation queries for decision support. Data export to CSV and Excel formats. Exchange formats: CSV, Json, XML. Alternative data models: document model, graph model.
Lectures for the theoretical part will be accompanied by a series of examples. Students participation will be stimulated by means of practical examples of clinical papers and statistical results interpretation.
Practical sessions in a computer classroom or with personal laptops will be organized to illustrate practical examples of data analysis using the free statistical software R.
The exam of the integrated course consists of two tests. The final mark will be given by the arithmetic mean of the marks in the two tests.
Written exam via quiz carried out in the classroom by accessing the Moodle platform. The quizzes focus on the theoretical and practical topics covered during the course. The quiz consists of 16 questions; the student has 60 minutes to answer. Each correct answer is worth 2 points, to obtain the sufficiency (18/30) you need to answer 9 questions correctly. To get 30/30 you need to answer 15 questions correctly. To get 30 cum laude you need to answer all 16 questions correctly.
Computer science part: written test to be taken in the classroom. The computer science part is structured as a quiz composed of 12 questions, some of which are multiple choice and some are free-answer. The sum of the points associated with the questions (declared together with the questions) is 32 which corresponds to grade 30 cum laude. Free-answer questions allow you to measure the ability to solve elementary problems. Multiple-choice questions allow you to test your knowledge of the technical language and elementary concepts of the discipline. The grade is the algebraic sum of the scores associated with the individual questions. In multiple-choice questions, the grade of the exercise is assigned when the answer is correct. In free-answer questions, the maximum score expected is assigned when the answer is error-free. In the event of errors in the completion, the corresponding maximum score is reduced based on the severity of the errors. Grade 28-30 cum laude: the student has an IN-DEPTH knowledge of the subject and knows how to solve elementary problems. Grade 24-27: the student has a GOOD knowledge of the subject and a fair ability to solve elementary problems; Grade 18-23: the student has a FAIR knowledge of the subject and a sufficient ability to solve elementary problems. Exam failed: the student is unable to solve elementary problems and has a deficient knowledge.
The overall exam grade is given by the arithmetic mean of the marks obtained in the statistics test and the computer science test.
This teaching explores topics closely related to one or more of the goals of the United Nations 2030 Agenda for Sustainable Development.