flashmla - Code/csrc/sm100/prefill/dense/collective - AppsGM

Public

collective 7 items

Go back to parent directory

fmha_common.hpp 4.99 KB

HPP Last modified by WebDev

fmha_fusion.hpp 12.74 KB

HPP Last modified by WebDev

sm100_fmha_fwd_epilogue_tma_warpspecialized.hpp 8.34 KB

HPP Last modified by WebDev

sm100_fmha_fwd_mainloop_tma_warpspecialized.hpp 44.56 KB

HPP Last modified by WebDev

sm100_fmha_load_tma_warpspecialized.hpp 11.3 KB

HPP Last modified by WebDev

sm100_fmha_mla_fwd_mainloop_tma_warpspecialized.hpp 45.53 KB

HPP Last modified by WebDev

sm100_fmha_mla_load_tma_warpspecialized.hpp 12.57 KB

HPP Last modified by WebDev

About

FlashMLA is a collection of highly optimized attention kernels (核心代码模块) developed by DeepSeek-AI. It's not a user-facing app, but rather a foundational library used to power their large language models like DeepSeek-V3 and DeepSeek-V3.2-Exp.

130 files

53 folders

1.13 MB total size

0 open issues

0 open pull requests

0 watchers

0 forks

0 stars

381 views

Updated Jan 23, 2026

Recent Commits View all

Initial commit - Upload project 'flashmla'

WebDev committed Jan 23, 2026

Languages

C++ 60.1%

C 20.3%

Python 19.5%

LICENSE 0.2%