flashmla - Code/csrc/sm100/prefill/dense/kernel - AppsGM

Public

kernel 8 items

Go back to parent directory

fmha_causal_tile_scheduler.hpp 6.8 KB

HPP Last modified by WebDev

fmha_kernel_bwd_convert.hpp 6.55 KB

HPP Last modified by WebDev

fmha_kernel_bwd_sum_OdO.hpp 6.8 KB

HPP Last modified by WebDev

fmha_options.hpp 2.79 KB

HPP Last modified by WebDev

fmha_tile_scheduler.hpp 5.41 KB

HPP Last modified by WebDev

sm100_fmha_bwd_kernel_tma_warpspecialized.hpp 75.87 KB

HPP Last modified by WebDev

sm100_fmha_bwd_mla_kernel_tma_warpspecialized.hpp 76.27 KB

HPP Last modified by WebDev

sm100_fmha_fwd_kernel_tma_warpspecialized.hpp 25.52 KB

HPP Last modified by WebDev

About

FlashMLA is a collection of highly optimized attention kernels (核心代码模块) developed by DeepSeek-AI. It's not a user-facing app, but rather a foundational library used to power their large language models like DeepSeek-V3 and DeepSeek-V3.2-Exp.

130 files

53 folders

1.13 MB total size

0 open issues

0 open pull requests

0 watchers

0 forks

0 stars

376 views

Updated Jan 23, 2026

Recent Commits View all

Initial commit - Upload project 'flashmla'

WebDev committed Jan 23, 2026

Languages

C++ 60.1%

C 20.3%

Python 19.5%

LICENSE 0.2%