#

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

Here are 787 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Updated Mar 12, 2025
Python

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava deepseek-llm deepseek llama3 llama3-1 deepseek-v3 deepseek-r1 deepseek-r1-zero

Updated Mar 12, 2025
Python

numba

numba / numba

NumPy aware dynamic Python compiler using LLVM

python compiler numpy llvm parallel cuda numba

Updated Mar 6, 2025
Python

cupy / cupy

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Mar 10, 2025
Python

replicate / cog

Containers for machine learning

docker machine-learning ai deep-learning containers tensorflow cuda pytorch

Updated Mar 12, 2025
Python

chainer / chainer

A flexible framework of neural networks for deep learning

python machine-learning deep-learning neural-network chainer gpu numpy cuda neural-networks cudnn cupy

Updated Aug 28, 2023
Python

nvitop

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

console monitoring gpu grafana cuda prometheus nvidia prometheus-exporter curses nvml top command-line-tool htop grafana-dashboard nvidia-smi monitoring-tool process-monitoring gpu-monitoring resource-monitor

Updated Feb 22, 2025
Python

NVIDIAGameWorks / kaolin

A PyTorch Library for Accelerating 3D Deep Learning Research

cuda pytorch artificial-intelligence neural-networks camera-api rasterization 3d-deep-learning differentiable-rendering differentiable-lighting

Updated Feb 28, 2025
Python

Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

python deep-learning gpu cuda jittor

Updated Mar 5, 2025
Python

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

machine-learning deep-learning cuda pytorch nvidia jetson tensorrt libtorch

Updated Mar 12, 2025
Python

NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

Updated Mar 5, 2024
Python

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8

Updated Mar 11, 2025
Python

CoinCheung / pytorch-loss

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

cuda pytorch ema triplet-loss label-smoothing focal-loss amsoftmax dice-loss mish lovasz-softmax partial-fc

Updated Oct 17, 2024
Python

pytorch / torchrec

Pytorch domain library for recommendation systems

deep-learning gpu cuda pytorch recommendation-system sharding recommender-system

Updated Mar 12, 2025
Python

inducer / pycuda

CUDA integration for Python, plus shiny features

python gpu array cuda scientific-computing gpu-computing multidimensional-arrays pycuda

Updated Feb 7, 2025
Python

viseron

roflcoopter / viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

Updated Mar 11, 2025
Python

pytorch / ao

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Mar 12, 2025
Python

pykeen

pykeen / pykeen

🤖 A Python library for learning and evaluating knowledge graph embeddings

python machine-learning deep-learning cuda torch link-prediction knowledge-base-completion knowledge-graph-embeddings knowledge-graphs pykeen

Updated Mar 7, 2025
Python

OpenPPL / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

open-source caffe deep-learning neural-network cuda pytorch quantization onnx

Updated Mar 28, 2024
Python

deepmd-kit

deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics

nodejs python c deep-learning cpp tensorflow cuda molecular-dynamics pytorch computational-chemistry lammps materials-science paddle ipi rocm ase jax potential-energy deepmd

Updated Mar 11, 2025
Python

Created by Nvidia

Released June 23, 2007

Followers: 243 followers
Website: developer.nvidia.com/cuda-zone
Wikipedia: Wikipedia

Related Topics

nvcc