Contact : Marques Joao joao.marques@universite-paris-saclay.fr
Catégorie : Méthodes, outils, techniques et concepts mobilisables pour la réalisation des travaux de recherche doctoraux
Thématique : Formation à la recherche
Langue de l'intervention : anglais
Nombre d'heures : 30
Max participants : 20
Nbre d'inscrits : 16
Nombre de places disponibles : 4
Public prioritaire : 1ère, 2ème et 3ème année
Public concerné : Doctorant(e)s
Proposé par : Astronomie et Astrophysique d'Ile de France
| Lieu : IAP, salle de TP 35-37, ground floor Début de la formation : 31 mars 2025 Fin de la formation : 9 avril 2025 Date ouverture des inscriptions : Date fermeture des inscriptions : 17 mars 2025 Modalités d'inscription : ADUM Objectifs : At some stage, most researchers will need to conduct some form of data analysis, ranging from basic line-fitting and parameter estimation to complex, computationally intensive sampling for model selection on large datasets. Anecdotal evidence indicates that many doctoral researchers are not well-equipped for such tasks, often using correct approaches improperly or selecting unsuitable statistical tools. The purpose of this course is to build a solid understanding of principled data analysis and provide practical experience in applying appropriate methods to data.
The training objectives are to provide participants with a comprehensive understanding and practical experience in key areas of data science and information theory. They will develop a solid understanding of probability theory and apply Bayesian methods for signal processing, inference, prediction, model comparison, decision-making, and experimental design. Participants will gain the ability to build and implement Bayesian models effectively, including Bayesian hierarchical models. They will receive both theoretical and hands-on experience with numerical techniques such as Markov Chain Monte Carlo and simulation-based inference for data analysis with implicit likelihoods. Through the exploration of information theory, participants will understand the principles of data transmission and data compression, and leverage information measures for designing experiments. Additionally, they will acquire the foundations of statistical learning theory, enabling them to grasp the fundamental concepts underlying many popular machine learning algorithms. Programme : This will be a 6-day course/workshop on data science methods and information-theoretic tools for data analysis, aimed principally at first- and second-year PhD students. Sessions will be 09:45–12:30 and 14:00–17:45 on 31 March, 1, 2, 7, 8 and 9 April 2025. All sessions will be held in the computer room (salle de TP 35-37, ground floor) of the IAP.
A preliminary schedule is as follows:
Day 1: Probability theory and signal processing
Historical and conceptual introduction
Probability theory: inference, prediction, priors, maximum entropy principle
Bayesian signal processing: Gaussian random fields, Wiener filtering, signal de-noising and de-blending
Day 2: Monte Carlo techniques
Monte Carlo integration
Markov Chain Monte Carlo, the Metropolis-Hastings algorithm
Slice sampling, Gibbs sampling, Hamiltonian Monte Carlo
Days 3 and 4: Advanced Bayesian topics
Model comparison, model averaging
Bayesian decision theory and Bayesian experimental design
Bayesian hierarchical models
Fisher information and forecasts
Simulation-based inference/implicit likelihood inference
Caveats!
Day 5: Information Theory
The noisy binary symmetric channel
Shannon's theorem
Measures of information and information-theoretic experimental design
Thermodynamics and inference
Day 6: Machine Learning Theory
History of Machine Learning
Statistical learning theory
Learning via optimisation
Pré-requis : Basic mathematics and experience with at least one programming language are prerequisites.
We expect all participants to bring their own laptop, and to do a simple computational exercise in advance (in whatever language suits, preferentially python) to ensure they have appropriate software in place before the workshop starts. The preliminary exercise can be found at https://cloud.aquila-consortium.org/s/Leclercq_Supernovae_Exercise. Participants are asked to read section I–III there and code the answers to section IV. Interested readers can go on with sections V–VIII, which use notions that will be introduced during the course.
Equipe pédagogique : Florent Leclercq is a CNRS researcher at the Institut d'Astrophysique de Paris. As an interdisciplinary specialist in cosmology, data analysis and machine learning, his current work focuses on the development of principled statistical methods for numerical cosmology.
Méthode pédagogique : The course plan combines lectures with hands-on computational work. It will concentrate on setting down firm foundations of principled Bayesian data analysis, but a feature of the workshop will be a substantial element of hands-on classes where participants will learn how to apply the ideas in practice. Two slots for discussion or Q&A sessions are also scheduled.
Compétences acquises à l'issue de la formation : At the end of the course, participants should be able to (non-exhaustive list):
Express stochastic problems in terms of fundamental probability and Bayes' theorem,
Formulate data analysis problems in a principled statistical framework, and be capable of executing some methods of solution,
Understand the concepts of probability theory, inference, priors, posteriors, marginalisation, parameter inference, model selection, sampling, and apply them to real data,
Code and apply a simple Markov Chain Monte Carlo sampler to physical data,
Apply principles of information theory to understand data compression and forecast experimental results,
Understand the basics of statistical learning theory, the link between learning and optimisation problems, and the concept of a loss function.
By the end of this course, participants will be equipped with a strong foundational and practical understanding of these crucial aspects, empowering them to perform principled data analysis and make informed methodological choices. They will be prepared to apply these methods confidently to real-world data challenges.
La formation participe à l'objectif suivant :être directement utile pour la réalisation des travaux personnels de recherche
Calendrier :
Séance n° 1 Date : 31-03-2025 Horaire : 09h45 à 17:45 Intitulé cours : Probability theory and signal processing
Historical and conceptual introduction
Probability theory: inference, prediction, priors, maximum entropy principle
Bayesian signal processing: Gaussian random fields, Wiener filtering, signal de-noising and de-blending
Séance n° 2 Date : 01-04-2025 Horaire : 09h45 à 17:45 Intitulé cours : Monte Carlo techniques
Monte Carlo integration
Markov Chain Monte Carlo, the Metropolis-Hastings algorithm
Slice sampling, Gibbs sampling, Hamiltonian Monte Carlo
Séance n° 3 Date : 02-04-2025 Horaire : 09h45 à 17:45 Intitulé cours : Advanced Bayesian topics
Model comparison, model averaging
Bayesian decision theory and Bayesian experimental design
Bayesian hierarchical models
Fisher information and forecasts
Simulation-based inferenceimplicit likelihood inference
Caveats!
Séance n° 4 Date : 07-04-2025 Horaire : 09h45 à 17:45 Intitulé cours : Advanced Bayesian topics
Model comparison, model averaging
Bayesian decision theory and Bayesian experimental design
Bayesian hierarchical models
Fisher information and forecasts
Simulation-based inferenceimplicit likelihood inference
Caveats!
Séance n° 5 Date : 08-04-2025 Horaire : 09h45 à 17:45 Intitulé cours : Information Theory
The noisy binary symmetric channel
Shannon's theorem
Measures of information and information-theoretic experimental design
Thermodynamics and inference
Séance n° 6 Date : 09-04-2025 Horaire : 09h45 à 17:45 Intitulé cours : Machine Learning Theory
History of Machine Learning
Statistical learning theory
Learning via optimisation
|