You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Customers serving very large language models (>500GB) need a fast way to load the model into GPU memory as fast as possible. With the launch of EFA support for FSx for Lustre, a throughput of 1200Gbps can be supported. If I am not mistaken this requires the inclusion of the the GDS driver in the AMI.
Why is this needed:
Loading large language models into GPU memory is a very time consuming process. Downloading it every time is not feasible. S3, EFS, EBS based options all have considerable performance penalties (e.g. extra copies through a bounce buffer in the CPU’s memory). GPUDirect Storage enables a direct data path between FSxL storage and GPU memory
The text was updated successfully, but these errors were encountered:
What would you like to be added:
Customers serving very large language models (>500GB) need a fast way to load the model into GPU memory as fast as possible. With the launch of EFA support for FSx for Lustre, a throughput of 1200Gbps can be supported. If I am not mistaken this requires the inclusion of the the GDS driver in the AMI.
Why is this needed:
Loading large language models into GPU memory is a very time consuming process. Downloading it every time is not feasible. S3, EFS, EBS based options all have considerable performance penalties (e.g. extra copies through a bounce buffer in the CPU’s memory). GPUDirect Storage enables a direct data path between FSxL storage and GPU memory
The text was updated successfully, but these errors were encountered: