Contribution to High Performance Computing and Big Data Infrastructure Convergence

Michael Mercier

Thèse Année : 2019

Contribution to High Performance Computing and Big Data Infrastructure Convergence

Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

(1, 2)

1
2

Michael Mercier

Fonction : Auteur

Laboratoire d'Informatique de Grenoble

Data Aware Large Scale Computing

Résumé

The amount of data produced, either in the scientific community and the commercial world, is constantly growing. The field of Big Data has emerged to handle a large amount of data on distributed computing infrastructures. High-Performance Computer (HPC) infrastructures are made for intensive parallel computations. The HPC community is also facing more and more data because of new high definition sensors and large physics apparatus. The convergence of the two fields is currently happening. In fact, the HPC community is already using Big Data tools, but they are not integrated correctly, especially at the level of the file system and the Resources and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, and what are the challenges for the HPC infrastructures, we have studied multiple aspects of the convergence: we have made a survey on the software provisioning methods, with a focus on data-intensive applications. We also propose a new RJMS collaboration technique called BeBiDa which is based on 50 lines of code whereas similar solutions use at least 1000x more. We evaluate this mechanismon real conditions and in a simulation with our simulator Batsim.

La quantité de données produites dans le monde scientifique comme dans le monde commercial, est en constante augmentation. Le domaine du traitement de donnée à large échelle, appelé “Big Data”, a été inventé pour traiter des données sur de larges infrastructures informatiques distribuées. Mais l’intégration de système Big Data sur des machines de calcul intensif pose de nombreux problèmes. En effet, les gestionnaires de ressources ainsi que les systèmes de fichier de super calculateurs ne sont pas penser pour ce type de travail. Le sujet de cette thèse est de trouver la meilleure approche pour faire interagir ces deux gestionnaires de ressources et de traiter les différents problèmes soulevés par les mouvements de données et leur ordonnancement.

Mots clés

High performance computing Big Data Resouces management IT infrastructure Convergence Simulation

Super calculateur Gestion de données Gestion de ressources Infrastructure Informatique Convergence Simulation

Domaines

Calcul formel [cs.SC]

Fichier principal

MERCIER_2019_archivage.pdf (2.48 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://hal.science/tel-02284996

Soumis le : jeudi 31 octobre 2019-11:15:18

Dernière modification le : jeudi 4 avril 2024-21:10:43

Dates et versions

tel-02284996 , version 1 (12-09-2019)

tel-02284996 , version 2 (31-10-2019)

Identifiants

HAL Id : tel-02284996 , version 2

Citer

Michael Mercier. Contribution to High Performance Computing and Big Data Infrastructure Convergence. Symbolic Computation [cs.SC]. Université Grenoble Alpes, 2019. English. ⟨NNT : 2019GREAM031⟩. ⟨tel-02284996v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG STAR GRID5000 LIG_SRCPR INRIA2 LIG-SRCPR-DATAMOVE SILECS LIG_SIDCH

509 Consultations

1205 Téléchargements

Contribution to High Performance Computing and Big Data Infrastructure Convergence

Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager