Public
kernel
8 items
..
Go back to parent directory
fmha_causal_tile_scheduler.hpp
6.8 KB
Jan 23, 2026
HPP
Last modified by WebDev
fmha_kernel_bwd_convert.hpp
6.55 KB
Jan 23, 2026
HPP
Last modified by WebDev
fmha_kernel_bwd_sum_OdO.hpp
6.8 KB
Jan 23, 2026
HPP
Last modified by WebDev
fmha_options.hpp
2.79 KB
Jan 23, 2026
HPP
Last modified by WebDev
fmha_tile_scheduler.hpp
5.41 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_fmha_bwd_kernel_tma_warpspecialized.hpp
75.87 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_fmha_bwd_mla_kernel_tma_warpspecialized.hpp
76.27 KB
Jan 23, 2026
HPP
Last modified by WebDev
sm100_fmha_fwd_kernel_tma_warpspecialized.hpp
25.52 KB
Jan 23, 2026
HPP
Last modified by WebDev
About
FlashMLA is a collection of highly optimized attention kernels (核心代码模块) developed by DeepSeek-AI. It's not a user-facing app, but rather a foundational library used to power their large language models like DeepSeek-V3 and DeepSeek-V3.2-Exp.
130 files
53 folders
1.13 MB total size
0 open issues
0 open pull requests
0 watchers
0 forks
0 stars
161 views
Updated Jan 23, 2026
Languages
C++
60.1%
C
20.3%
Python
19.5%
LICENSE
0.2%