Areas of Training
  • Big Data & HPC
  • AI
  • Data Science
  • Quantum Computing
  • IoT & Edge Computing

Skills blocks1:Basics for Data Science& AI

Module 1: Python For Data Science



Objective:

In this courses, we will learn how to use Pandas DataFrames, Numpy multi-dimensional arrays, and SciPy libraries to work with various datasets. We will introduce you to pandas library used to load, manipulate, analyze, and visualize datasets.

Prerequisite:

Programming skills and Linux.

Topics:

Install IDE and Jupyter Notebook for python3

Objects, Variables and data types

Control flow and loops

Data Formatting and Data Normalization

Data Aggregation and Grouping

Data Cleaning and Handle Missing Values

Describe Statistics

Reporting and Data visualisation

Transforming a Jupyter notebook into a standalone, interactive web application accessible via Voila and Binder.

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

1 Project use case : insurance, banking, health, climate, etc...

References :

https://www.python.org



Module 2: Numerical Computing For Data Science



Objective:

This course in linear algebra and matrix calculus is essential, it gives you the basics to take up studies in data science & artificial intelligence. It will allow you to do exploratory data analysis. .

Prerequisite:

L2 or L3 level in linear algebra, functionel analysis and python or R programming.

Topics:

Scalars, Vectors, Matrices and Tensors

Multiplying Matrices and Vectors

Identity and Inverse Matrices

Eigen decomposition

The Moore-Penrose Pseudo inverse

The Trace Operator

Singular Value Décomposition

Principal Components Analysis(PCA)

Linear Discriminant Analysis (LDA)

Matrix Methods in Signal Processing

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours )or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

1 Project use case : insurance, banking, health, climate, etc...

References :

An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021. https://www.python.org



Module 3: Probability, Statistics and Modelling



Objective:

This course in probability and statistical modelling is essential to give you the foundation to undertake studies in data science & intelligence. In the world of deep and machine learning, we manipulate inferential statistical models in the form of large vectors.

Prerequisite:

L2 or L3 level in linear algebra, probability, statistics and Python or R programming.

Topics:

Random Variables

Probability Distributions

Marginal Probability

Conditional Probability

Expectation, Variance and Covariance

Estimating the Correlation

Linear and Logistic Regression

Least Squares and Maximum Likelihood

Multiple Regression

Model Selection

Multivariate statistics

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

References :

An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021.



Module 4: Optimization For Data Science



Objective:

This course describes the mathematical tools needed to optimize statistical learning models. It will give the mathematical foundations of convex optimization, and describe the different approaches used for the construction of efficient convex optimization algorithms.

Prerequisite:

course code : NCD1.2and Fuctional analysis

Topics:

Convexity

Gradient Methods

Proximal algorithms

Coordinate Descent Methods

Subgradient Methods

Primal-Dual context and certificates

Lagrange and Fenchel Duality

Second-Order Methods

Quasi-Newton Methods

Gradient-Free and Zero-Order Optimization.

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

References :

S. Boyd and L. Vandenberghe. Convex Optimization. CUP.

Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.

Skills blocks 2: APPLIED MACHINE AND DEEP LEARNING

Module 5: Computational Optimisation for Data Science



Objective:

This course will help you understand and implement convex optimization algorithms that are very useful in industry. Convex optimisation is highly essential in machine and deep learning.

Prerequisite:

Convex optimisation or course code: ODS1.4 .

Topics:

Conjugate gradient for linear systems

Conjugate gradient for general functions

Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm

Davidon-Fletcher-Powell algorithm

Batch Gradient Descent

Stochastic Gradient Descent

Mini-batch Gradient Descent

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

1 Project use case : insurance, banking, health, climate, etc...

References :

S. Boyd and L. Vandenberghe. Convex Optimization. CUP.

Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.



Module 6: Machine Learning I: Supervised Machine Learning



Objective:

This course is very operational and will allow you to process text, image, sound and time series data (discrete, continuous, qualitative and quantitative variables) in different use cases.

Prerequisite:

Probability and statistics, linear algebra Optimisation, Basic Programming Skills Python or course code : NCD1.2/ PSM1.3 /CODS2.1.

Topics:

Features extraction and Labelization

Split data for training, testing and validation

Model setting and training / Model evaluation

Overfitting /Underfitting/Training Data/SteppingBack

Hyperparameter Tuning and Model Selection

Linear, Polynomial, Ridge, Lasso, Logistic, Softmax Regression

Regularized Linear and Elastic Net

Training and Cost Function

Decision Boundaries

Training a Binary Classifier

Measuring Accuracy Using Cross-Validation/ Confusion Matrix

Precision/Recall /Tradeoff / ROC Curve

Multiclass Classification / SVM Classification

Adding Similarity Features

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours )or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

1 final exam

1 Project use case : insurance, banking, health, climate, etc...

References :

An Introduction to Statistical Learning, with applications in R Published August 1, 2021.

Bishop, Christopher M. Pattern Recognition and Machine Learning. Vol. 1. New York : Springer, 2006.



Module 7: Deep Learning



Objective:

This course is very operational and will allow you to process text, image, sound and time series data (discrete, continuous, qualitative and quantitative variables) in various use cases such as vision and NLP. This course uses TensorFlow as the main programming tool.

Prerequisite:

Machine Learning, Optimisation and programming via Python

Topics:

Deep Learning concept

Neural Network (NN)

Convolutional neural network (CNN)

Time Series Forecasting

Recurrent Neural Network (RNN)

Long Short-Term Memory (LSTM)

Deep Scattering Transform Network (DSN )

Hybrid Recurrent Scattering Neural Network

DeepDream and style transfer

Word embedding, Machine Translation and Seq2Seq (NLP)

kNN, SVM, SoftMax, two-layer network

PyTorch / Tensorflow

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

Project use case : insurance, banking, health, climate, Robotics...

1 final exam

References :

An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021.



Module 8: Deep & Reinforcement Learning



Objective:

This course tackles the problems of learning and decision making under uncertainty and focuses on reinforcement learning and the multi-armed bandit.

Prerequisite:

Probability, stocastic calculus,Deep Learning and Python programming

Topics:

Markov decision processes and dynamic programming .

Stochastic and adversarial multi-arm bandit .

Tabular Reinforcement learning.

Deep learning for reinforcement .

Deep Q-Network (DQN) .

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

Project use case : insurance, banking, health, climate, Robotics...

1 final exam

References :

Sutton, R. et Barto, A. Reinforcement Learning: An Introduction. Processus decisionnels de Markov et Intelligence Artificielle, 2008. Editeurs O. Sigaud et O. Buffet. Algorithms for Reinforcement Learning. Cs. Szepesvari, 2009

Skills blocks 3: Advanced Statistical Learning

Module 9: High Dimension Statistics



Objective:

The theory of high-dimensional statistics will help you better understand machine and deep learning. In this course we will focus on non-asymptotic statistical problems where the number of variables can be greater than the sample size (P>>n). This phenomenon is called "high-dimensional curse" because contrary to asymtotic statistics, it leads to problems of numerical bias, inference or estimator.

Prerequisite:

Knowledge of linear algebra, matrix calculus, graph theory, optimization, probability and statistics. R or Python programming language. Course code : NCFD1.2/PSM1.3/OAMDLL2.1

Topics:

Sub-Gaussian Random Variables .

Linear Regression Model .

Misspecified Linear Models .

Minimax Lower Bounds .

Matrix estimation .

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

Project use case : insurance, banking, health, climate, Robotics...

1 final exam

References :

P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.

S. Boyd and L. Vandenberghe. Convex Optimization. CUP.

Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.



Module 10: Probabilistic Graphical Models



Objective:

The objective of this course is to familiarize you with the statistical modeling of complex multivariate data via probabilistic graphical models or Bayesian networks. Applications in signal processing, computer vision and AI demonstrate this.

Prerequisite:

Knowledge of linear algebra, matrix calculus, graph theory, optimization, probability and statistics. R or Python programming language. Course code : HDS3.1 or NCFD1.2/PSM1.3/OAMDLL2.1.

Topics:

Directed and undirected graphical models

Maximum likelihood

Linear regression

Logistic regression

Gaussian Mixture Models and clustering

Exponential family distributions

Sum-product algorithm and exact inference

Hidden Markov models

Approximate inference

Bayesian methods .

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

Project use case : insurance, banking, health, climate, Robotics...

1 final exam

References :

P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.

S. Boyd and L. Vandenberghe. Convex Optimization. CUP.

Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.



Module 11: Distributed High dimension statistics



Objective:

In the context of our R&D and teaching activities, these workshops will keep us up to date. In this workshop, we review research papers focused on distributed deep learning to achieve efficiency and scalability of deep learning work on distributed and parallel systems.

Prerequisite:

Course code : HDS3.1/ PGM3.2/ DL2.4

Topics:

High dimension statistics and complexity

Federated Learning

Deep Learning on HPC systems

Deep Learning on heterogeneous infrastructure (GPU, CPU)

VQuantum Computing for Deep & Machine learning .

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

3 homeworks with implementations of algorithms with Python

Project use case : insurance, banking, health, climate, Robotics...

1 final exam

References :

P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.

S. Boyd and L. Vandenberghe. Convex Optimization. CUP.

Y. Nesterov. Introductory Lectures on Convex Optimization. Springer

P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9 .

Skills blocks 4: Artificiel Intelligence Industrialization

Module 12: Hybrid Cloud and Edge Computing



Objective:

This course will enable you to set up a big data architecture, implement and put your data projects into production.

Prerequisite:

knowledge of the Linux operating system (shell), Python programming.

Topics:

Overview of Cloud Technologies

IaaS Programming Interfaces

The REST and Python programming interfaces of AWS

OpenStack and CloudStack

Containerized Kubernetes

Hybrid Cloud Infrastructures

Edge Computing an IoT

Organization:

8 Lecture and Tutoriel of 3 hours (Total: 24 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

2 homeworks with implementations of algorithms with Python

1 project

References :

https://kubecampus.io/kubernetes/courses/

https://www.balena.io/etcher

https://www.ubuntu-fr.org



Module 13: Machine Learning Engineering For Production (MLOps



Objective:

This course will enable you to better manage the life cycle of Deep Learning models (versioning and storage of data and the model, history, traceability, monitoring, etc.). To scale up trained models, deploy web applications available to clients and to orchestrate the various data processing operations linked to the models.

Prerequisite:

knowledge of the Linux operating system (shell), Python programming. And Machine Learning ,Deep Learning, Python,Linux and Kubernetes

Topics:

Issues of a MLOps

MLflow, model storage with S3

Creating an API with TensorFlow serving, Flask, Celery, FastAPI and Redis

Use of widgets to add interactivity to web applications (Streamlit, Dash and Panel)

Task orchestration with Airflow

Introduction to the features and architecture of Apache Airflow

Navigating the user interface and using the CLI

Organization:

10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

1 project

References :

https://mlflow.org

https://airflow.apache.org

https://fastapi.tiangolo.com



Module 14: Data Storage and Parallel Computing



Objective:

In the context of our R&D and teaching activities, these workshops will keep us up to date. In this workshop, we review research papers focused on distributed deep learning to achieve efficiency and scalability of deep learning work on distributed and parallel systems.

Prerequisite:

Course code : This course will enable you to set up a big data architecture, implement and put into production your AI algorithms in distributed mode.

Topics:

Distributed Big data architecture and data lake via Hadoop

NoSQL distributed data storage

Parallel machine learning via PySPARK Mlib

Data migration and partitioning via Dataiku

Dynamic Dashboard via Azure ML Service, Power bi, NLP, ML

Organization:

12 Lecture and Tutoriel of 3 hours (Total: 36 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.

Validation :

1 project

References :

https://spark.apache.org

https://www.elastic.co/fr/what-is/elk-stack

https://www.tensorflow.org/guide/distributed_training?hl=fr