|
|
|
## SCAI Computing Cluster Usage Policy
|
|
|
|
|
|
|
|
This document outlines the guidelines for using the Sorbonne Cluster for AI (SCAI) computing cluster.
|
|
|
|
|
|
|
|
**1. Infrastructure Overview:**
|
|
|
|
|
|
|
|
The SCAI cluster comprises a total of 92 GPUs under SLURM management, providing researchers with substantial computational power for various AI-related tasks. The infrastructure includes:
|
|
|
|
|
|
|
|
* **8 A100 GPUs**: Hosted by SCAI (DGX)
|
|
|
|
* **84 GPUs**: Hosted by the MLIA team from ISIR laboratory, encompassing a variety of models including RTX series, Titan series, and A6000/A5000.
|
|
|
|
|
|
|
|
Access to the cluster is provided through the Hacienda entry point, offering a central hub for managing jobs and resources. Storage options include high-availability NFS (/home), shared storage (/data), local space on compute nodes, and user quotas of approximately 400GB.
|
|
|
|
|
|
|
|
**2. Project Acceptance Policy:**
|
|
|
|
|
|
|
|
The SCAI cluster aims to support a wide range of AI research projects while ensuring fair access and preventing bottlenecks. The following guidelines will be applied when evaluating project requests:
|
|
|
|
|
|
|
|
* **Small Projects**: Projects involving computer vision, regular machine learning, or small-scale language modeling are generally well-suited for execution on smaller GPUs or even directly on Hacienda's CPUs.
|
|
|
|
* **Projects with Limited VRAM Requirements**: If a project requires specific GPUs but has limited VRAM needs, access will be granted if the appropriate GPU models are available.
|
|
|
|
* **LLM Inference**: Projects focused on Large Language Model (LLM) inference requiring substantial VRAM can be accommodated if a suitable GPU with ample memory is accessible.
|
|
|
|
|
|
|
|
* **Very Large Projects**: Due to resource constraints, projects demanding a significant number of GPUs and large amounts of VRAM will not currently be accepted by the SCAI cluster. In such cases, researchers are encouraged to explore alternative resources like the Jean Zay supercomputing center (GENCI), where they can access more extensive infrastructure.
|
|
|
|
|
|
|
|
**3. Project Submission Procedure:**
|
|
|
|
|
|
|
|
Researchers interested in utilizing the SCAI cluster should submit a brief project proposal outlining:
|
|
|
|
|
|
|
|
* Project goals and objectives
|
|
|
|
* Required computational resources (GPU type, VRAM, CPU cores)
|
|
|
|
* Estimated runtime and duration
|
|
|
|
* Number of user accounts needed
|
|
|
|
* Amount of storage needed
|
|
|
|
* Data sensitivity: researchers should mention if the data used in the cluster are sensitive or not
|
|
|
|
|
|
|
|
The SCAI team will review proposals and make decisions based on the guidelines outlined above.
|
|
|
|
|
|
|
|
**4. Fair Usage Policy:**
|
|
|
|
|
|
|
|
To ensure equitable access for all users, a fair usage policy will be implemented, including:
|
|
|
|
|
|
|
|
|
|
|
|
* **Job Priority**: Higher priority may be given to projects with shorter runtimes or those deemed critical to research progress.
|
|
|
|
* **Resource Limits**: Quotas on GPU usage and storage space may be imposed to prevent monopolization of resources.
|
|
|
|
|
|
|
|
**5. Support and Community:**
|
|
|
|
|
|
|
|
The SCAI team is committed to providing support to users through documentation, forums, and occasional workshops. We encourage collaboration and knowledge sharing within the user community to maximize the impact of the cluster.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This policy document will evolve as needed based on user feedback and the evolving needs of the AI research community at Sorbonne.
|
|
|
|
|
|
|
|
Next step: [usage documentation](https://gitlabsu.sorbonne-universite.fr/baptiste.gregorutti/test/-/wikis/GPU-Cluster-documentation) |
|
|
|
\ No newline at end of file |