Public
impls
16 items
..
Go back to parent directory
epilogue.hpp
255 B
Jan 23, 2026
HPP
Last modified by WebDev
runtime_utils.hpp
11.5 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_bf16_gemm.hpp
20.69 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_bmk_bnk_mn.hpp
5 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_fp8_gemm_1d1d.hpp
22.67 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_tf32_hc_prenorm_gemm.hpp
6.02 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm90_bf16_gemm.hpp
20.16 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm90_bmk_bnk_mn.hpp
4.5 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm90_fp8_gemm_1d1d.hpp
10.24 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm90_fp8_gemm_1d2d.hpp
17.14 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm90_tf32_hc_prenorm_gemm.hpp
6.01 KB
Jan 23, 2026
HPP
Last modified by WebDev
smxx_clean_logits.hpp
2.53 KB
Jan 23, 2026
HPP
Last modified by WebDev
smxx_cublaslt.hpp
8.71 KB
Jan 23, 2026
HPP
Last modified by WebDev
smxx_fp8_mqa_logits.hpp
6.63 KB
Jan 23, 2026
HPP
Last modified by WebDev
smxx_fp8_paged_mqa_logits.hpp
11.16 KB
Jan 23, 2026
HPP
Last modified by WebDev
smxx_layout.hpp
10.21 KB
Jan 23, 2026
HPP
Last modified by WebDev
About
DeepGEMM is a low-level, high-performance library specifically designed for matrix multiplication operations on NVIDIA GPUs, with a special focus on optimizing large AI models like DeepSeek's.
101 files
23 folders
799.33 KB total size
0 open issues
0 open pull requests
0 watchers
0 forks
0 stars
131 views
Updated Jan 23, 2026
Languages
C++
64.8%
Python
31.2%
YAML
3.2%
Text
0.3%
Shell
0.3%
LICENSE
0.2%