Getting Started with High-Performance Data Analytics (HPDA)

Getting Started with High-Performance Data Analytics (HPDA)

Getting Started with High-Performance Data Analytics
  • Date de début A venir en 2022
  • Durée 12 hours
  • Lieu Maison du Savoir
    2, avenue de l’Université
    L-4365 Esch-sur-Alzette
  • Langue Anglais
  • Prix HT 1360.00 

Contexte de la formation

Underlining Luxembourg’s data-driven innovation strategy, LuxProvide and the Competence Centre collaborate on an exclusive training catalogue related to High Performance Computing and MeluXina, Luxembourg’s brand-new supercomputer.

Bringing Data Analytics to the Next Level with MeluXina Supercomputer

This course is about obtaining working knowledge of some of the core python libraries used in proof-of-concept and prototyping and understand the structure of a Data Science project. Furthermore, to gain hands-on knowledge of TensorFlow library for machine learning, Deep learning, as well as statistical visualisation (Seaborn). Finally, to become familiar with distributed computing and Big Data concepts and their implementation using Horovod.


At the end of this course, the successful attendee will

  • have knowledge about
    • Python notebooks and how the computation is mapped onto hardware infrastructure
    • effective data-science project workflows
    • common data analytics Python libraries and their strengths and weaknesses
    • Big Data problems using distributed computing
  • and be able to
    • work with Python notebooks on MeluXina
    • read in a data set from file or object storage for analysis
    • make statistical analysis on data in a NumPy array or in a Panda dataframe
    • make visualisations of data using modern libraries
    • define, train and evaluate simple machine learning models TensorFlow
    • choose the suitable data analytics library for the job to be done
  • in order to
    • independently analyse and visualise data sets of any size on MeluXina

Programme de la formation

This training is divided into several modules, on 8 & 9 February 2022. 

  • Introduction to Data Analytics
    • Intro to Jupyter Notebooks
    • Load data with Pandas
    • Clean data and automate web download
    • Separate datasets into train-validation-test
    • Visualize variables
    • Run lineal regression on data
    • Visualize and Interpret results
  • Machine Learning
    • Intro to ML regression algorithms( linear, SVM, regularization, random forest)
    • Perform PCA to reduce dimensionality
    • Run SVM regression on data
    • Run the model in inference-mode
    • Creating python scripts and executing from terminal
  • Distributed Computing
    • Intro to distributed computing
    • Delayed computations and computing graphs
    • Setting up Dask client
    • Distributed load of large dataset
    • Principal Component Analysis for dimensionality reduction
  • Accelerated Machine Learning
    • Read with CuDF and comparison with Pandas
    • ML algorithms with CuML and comparison with sklearn
    • Display html with Plotly
  • Deep Learning
    • Load and preprocess dataset
    • Construct DL model using the Sequential API in TF2
    • Compile model
    • Define EarlyStop and CheckPoint callbacks
    • Train model
    • Evaluate model
  • GPU-Accelerated Deep Learning
    • Intro to distributed DL (Sharing gradients)
    • Initialize Horovod
    • Pin processes to (available) GPU
    • Use distributed optimizer and broadcasting
    • Call script using the “horovod” MPI-wrapper
    • Deploy the TF model with TensorRT


From LuxProvide :

  • Dr Alban ROUSSET
  • Dr Farouk MANSOURI
  • Dr Luis VELA
  • Dr Matthieu LEFEBVRE



If you have any question regarding the training, feel free to contact:
Pierre De La Celle +352 26 15 92 43

Partager cette formation sur :