Getting Started with High-Performance Data Analytics (HPDA)

Getting Started with High-Performance Data Analytics (HPDA)

Getting Started with High-Performance Data Analytics
  • Starting date 25 October 2021
  • Duration 12 hours
  • Location Maison du Savoir
    2, avenue de l’Université
    L-4365 Esch-sur-Alzette
  • Language Anglais
  • Price (excluding VAT) 1360.00 
Sign up

Training Context

Underlining Luxembourg’s data-driven innovation strategy, LuxProvide and the Competence Centre collaborate on an exclusive training catalogue related to High Performance Computing and MeluXina, Luxembourg’s brand-new supercomputer.

Bringing Data Analytics to the Next Level with MeluXina Supercomputer

This course is about obtaining working knowledge of some of the core python libraries used in proof-of-concept and prototyping and understand the structure of a Data Science project. Furthermore, to gain hands-on knowledge of TensorFlow library for machine learning, Deep learning, as well as statistical visualisation (Seaborn). Finally, to become familiar with distributed computing and Big Data concepts and their implementation using Horovod.

Goals

At the end of this course, the successful attendee will

  • have knowledge about
    • Python notebooks and how the computation is mapped onto hardware infrastructure
    • effective data-science project workflows
    • common data analytics Python libraries and their strengths and weaknesses
    • Big Data problems using distributed computing
  • and be able to
    • work with Python notebooks on MeluXina
    • read in a data set from file or object storage for analysis
    • make statistical analysis on data in a NumPy array or in a Panda dataframe
    • make visualisations of data using modern libraries
    • define, train and evaluate simple machine learning models TensorFlow
    • choose the suitable data analytics library for the job to be done
  • in order to
    • independently analyse and visualise data sets of any size on MeluXina

Training Content

This training is divided into several modules, on 25 & 26 October 2021. 

  • Introduction to Data Analytics
    • Intro to Jupyter Notebooks
    • Load data with Pandas
    • Clean data and automate web download
    • Separate datasets into train-validation-test
    • Visualize variables
    • Run lineal regression on data
    • Visualize and Interpret results
  • Machine Learning
    • Intro to ML regression algorithms( linear, SVM, regularization, random forest)
    • Perform PCA to reduce dimensionality
    • Run SVM regression on data
    • Run the model in inference-mode
    • Creating python scripts and executing from terminal
  • Distributed Computing
    • Intro to distributed computing
    • Delayed computations and computing graphs
    • Setting up Dask client
    • Distributed load of large dataset
    • Principal Component Analysis for dimensionality reduction
  • Accelerated Machine Learning
    • Read with CuDF and comparison with Pandas
    • ML algorithms with CuML and comparison with sklearn
    • Display html with Plotly
  • Deep Learning
    • Load and preprocess dataset
    • Construct DL model using the Sequential API in TF2
    • Compile model
    • Define EarlyStop and CheckPoint callbacks
    • Train model
    • Evaluate model
  • GPU-Accelerated Deep Learning
    • Intro to distributed DL (Sharing gradients)
    • Initialize Horovod
    • Pin processes to (available) GPU
    • Use distributed optimizer and broadcasting
    • Call script using the “horovod” MPI-wrapper
    • Deploy the TF model with TensorRT

Teachers

From LuxProvide :

  • Dr Alban ROUSSET
  • Dr Farouk MANSOURI
  • Dr Luis VELA
  • Dr Matthieu LEFEBVRE
  • Dr Wahid MAINASSARA

Contact

If you have any question regarding the training, feel free to contact:
Pierre De La Celle
pierre.delacelle@competence.lu/ +352 26 15 92 43

Share this training on :