Contact : LARRAUFIE Pierre pierre.larraufie@agroparistech.fr
Catégorie : Se former à la science ouverte et aux données de la recherche
Thématique : Formation à la recherche
Langue de l'intervention : anglais
Nombre d'heures : 20
Crédits/Points : 4
Max participants : 500
Nbre d'inscrits : 16
Nombre de places disponibles : 484
Public prioritaire : Aucun
Public concerné : Doctorant(e)s
Proposé par : Agriculture, Alimentation, Biologie, Environnement et Santé
| Lieu : online - FUN platform Observations : This online course is available for free on the FUN platform. All the contents of the MOOC are accessible from the start : you can follow the MOOC at your own pace and according to your needs! An Open Badge is issued on request to learners who obtain an overall score of 50% correct answers in all the quizzes and learning activities. Mots clés : Reproducible Research, research data, software environments, workflows, FITS, HDF5, Zenodo, Software Heritage, git-annex, docker, singularity, guix, make, snakemake Début de la formation : 20 mars 2025 Fin de la formation : 31 décembre 2025 Date ouverture des inscriptions : 10 janvier 2025 Date fermeture des inscriptions : 11 décembre 2025 Modalités d'inscription : you must register yourself online on the FUN platform https://www.fun-mooc.fr/en/courses/reproducible-research-ii-practices-and-tools-for-managing-comput/ Objectifs : Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue on the same theme, dealing more specifically with the issues of massive data and the complex calculations associated with them. These two MOOCs complement each other and offer a coherent training program on the subject.
This Mooc has been produced with the support of the French National Fund for Open Science (Fonds national de la science ouverte). Programme : In this second MOOC, we show you how to improve your practices for managing large data and complex computations in controlled software environments:
you will learn how to use formats like JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, docker, singularity, guix, make, and snakemake;
we will show you how to integrate them in a real-life use case: a sunspot detection study. You will see for yourself that our methods and tools allow you to work in a reliable and reproducible way.
The strength of this new MOOC lies in a general and systematic presentation of the major concepts and of how they translate into practical solutions through numerous hands-on sessions with state-of-the-art open-source tools.
Course Outline :¶
Last registration: September 3rd, 2025 for session 1 (other sessions will follow)
Module 1: Managing data
1.1 Archiving
1.2 File formats
1.3 Project Organization
1.4 Git Annex
Module 2: Managing software
2.1 On the Importance of Software Environment
2.2 Package Management Principles
2.3 Isolation and Containers
2.4 Using Containers
2.5 Building and Sharing Containers
2.6 Functional Package Managers (Guix, Docke, Singularity,...)
Module 3: Managing computations
3.1 Why do we need workflows?
3.2 From notebooks to shell scripts
3.3 Workflows with make
3.4 Workflows with snakemake
3.5 Workflows and environments
Pré-requis : This course is for everyone who relies on a computer to perform data analysis. You should have some experience with running commands in a terminal, and have a basic knowledge of git (at the level of the first MOOC) and scientific Python.
Equipe pédagogique : - Arnaud Legrand is a CNRS researcher at the Laboratoire d’Informatique in Grenoble. His research interest is the evaluation of the performance of big computing infrastructures. Both for performing experiments and for analyzing the outcomes, it is essential to capture the process rigorously.
- Christophe Pouzat is a CNRS researcher at IRMA (Institute for Advanced Mathematical Research, University of Strasbourg). He is actually a neurophysiologist, working on the analysis of experimental data. Reproducible research enables him to communicate explicitly with experimentalists, avoiding many mistakes.
- Konrad Hinsen is a CNRS researcher at the Centre de Biophysique Moléculaire in Orléans and at the Synchrotron SOLEIL in Saint Aubin. He explores the structure and dynamics of proteins by computational methods, which he tries to make reproducible.
- Matthieu Simonin is a research engineer at the Inria Centre at Rennes University. He works closely with teams studying distributed systems and provides support for experimental campaigns that combine hardware and software constraints with data manipulation. Matthieu has recently joined labos1point5, a group of members from the French academic world, helping to develop tools for quantifying the carbon footprint of research activities, the calculations for which must, of course, be reproducible!
- Ludovic Courtès is a research software engineer at Inria in Bordeaux, France. He contributes to Guix, a free software tool to deploy software environments in a reproducible fashion, with an eye towards making it a tool of choice for reproducible research.
- Kim tâm Huynh is actually a research engineer at Inria Paris. She supports research teams on methodologies and tools for software developments.
Méthode pédagogique : This course is offered in the form of a MOOC. This MOOC consists of three modules that combine video lectures, practical sessions, textual course supports, and many exercises for getting hands-on experience with the tools and methods that are presented.
Most of the exercises can be carried out in a JupyterLab environment made available to each MOOC learner. Some exercises require a Linux computer and the possibility to install system software on it.
Les Compétences et capacités visées à l'issue de la formation (fiches RNCP)
Arrêté du 22 février 2019 définissant les compétences des diplômés du doctorat et inscrivant le doctorat au répertoire national de la certification professionnelle. https://www.legifrance.gouv.fr/loda/id/JORFTEXT000038200990/ Bloc 2 : Mise en œuvre d’une démarche de recherche et développement, d’études et prospective - Garantir la validité des travaux ainsi que leur déontologie et leur confidentialité en mettant en œuvre les dispositifs de contrôle adaptés Bloc 3 : Valorisation et transfert des résultats d’une démarche R&D, d’études et prospective - Mobiliser les techniques de communication de données en « open data » pour valoriser des démarches et résultats La formation participe à l'objectif suivant :former à la science ouverte Pièces a fournir : At the end of the training, you will have to download the certificate of achievement on your personal ADUM space for us to take into account these training hours.
Web site:
https://www.fun-mooc.fr/en/courses/reproducible-research-methodological-principles-transparent-scie/
|