Français Anglais
logo Institut Polytechnique de Paris
Retour à la liste

[MOOC] Reproducible Research Part II: Practices and tools for managing computations and data [Participation : Distanciel]

Contact : WATELLO Caroline
caroline.watello@ip-paris.fr
Tél: 0175319259

Catégorie : Science ouverte

Langue de l'intervention : anglais

Nombre d'heures : 35

Crédits/Points : 5

Nbre d'inscrits : 8

Public prioritaire : Aucun

Public concerné :
Doctorant(e)s

Proposé par : Institut polytechnique de Paris


Lieu : A DISTANCE- Plateforme FUN
Observations : Dates flexibles en DISTANCIEL sur la plateforme FUN.
Mots clés : science ouverte, open science
Début de la formation : 5 mai 2025
Fin de la formation : 10 septembre 2025
Date ouverture des inscriptions : 18 avril 2025
Date fermeture des inscriptions : 27 juin 2025
Modalités d'inscription : Inscription nécessaire via ADUM ET la plateforme FUN. Le certificat de suivi doit être déposé dans l'espace personnel ADUM, dans le menu dédié.
Site web : https://www.fun-mooc.fr/fr/

Objectifs :
Manage research data
Use tools and techniques for controlling the software environment
Automate long or complex computations using workflows

Programme :
Formation dispensée par : FUN-MOOC

Following the success of the MOOC "Reproducible research: methodological principles for transparent science", the authors continue on the same theme, dealing more specifically with the issues of massive data and the complex calculations associated with them. These two MOOCs complement each other and offer a coherent training program on the subject.

In this second MOOC, we will show you how to improve your practices for managing large data and complex computations in controlled software environments:

you will learn how to use formats like JSON, FITS, and HDF5, platforms like Zenodo and Software Heritage, tools like git-annex, docker, singularity, guix, make, and snakemake;
we will show you how to integrate them in a real-life use case: a sunspot detection study. You will see for yourself that our methods and tools allow you to work in a reliable and reproducible way.

The strength of this new MOOC lies in a general and systematic presentation of the major concepts and of how they translate into practical solutions through numerous hands-on sessions with state-of-the-art open-source tools.

Pré-requis :
This course is for everyone who relies on a computer to perform data analysis. You should have some experience with running commands in a terminal, and have a basic knowledge of git (at the level of the first MOOC) and scientific Python.

Méthode pédagogique :
This MOOC consists of three independent modules that combine video lectures, quizzes, pratical sessions, textual course supports, and many exercises for getting hands-on experience with the tools and methods that are presented.

Most of the exercises can be carried out in a JupyterLab environment made available to each MOOC learner. Some exercises require a Linux computer and the possibility to install system software on it.

Compétences acquises à l'issue de la formation :
À la fin de ce cours, vous saurez :

Manage research data:
- understand the challenges posed by large volumes of data
- archive code and data on well-known archives such as Software Heritage and Zenodo
- integrate data into versioning (Git Annex)
- use structured binary data formats (FITS, HDF5)

Use tools and techniques for controlling the software environment:
- understand how software packages are built and managed
- deploy software environments as containers (ex: Docker)
- manage software environments using a functional package manager (ex: Guix)
- work in controlled software environments on a daily basis

Automate long or complex computations using workflows:
- understand the challenges of scaling up: long calculations, distributed calculations
- choose a workflow tool adapted to your needs
- automate a data analysis using make and snakemake
- control the software environments of a workflow

Les Compétences et capacités visées à l'issue de la formation (fiches RNCP)

Arrêté du 22 février 2019 définissant les compétences des diplômés du doctorat et inscrivant le doctorat au répertoire national de la certification professionnelle. https://www.legifrance.gouv.fr/loda/id/JORFTEXT000038200990/

Bloc 4 : Veille scientifique et technologique à l’échelle internationale

- Disposer d’une compréhension, d’un recul et d’un regard critique sur l’ensemble des informations de pointe disponibles

Compétences sociales

- Adaptation ; Persévérance ; Résilience ; Gestion du changement et de l'échec ; Engagement

- Connaissance et Maîtrise de soi et de son comportement = Capacité à s'auto-évaluer et se remettre en question ; Connaissance de ses propres limites ; Dosage Rigueur/souplesse


La formation participe à l'objectif suivant :former à la science ouverte


Retour à la liste