Skip to main content
Public
README.md 2.96 KB

DualPipe

DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

Schedules

!dualpipe

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication

DualPipeV

DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in their blog post. Thanks to them for this efficient schedule!

Schedules

!dualpipev

Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.

Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)

| Method | Bubble | Parameter Per Device | Activation Per Device | #Devices | |-------------|---------------------------------|----------------------|-----------------------|----------| | 1F1B | (PP-1)(𝐹+𝐡) | 1Γ— | PP | PP | | ZB1P | (PP-1)(𝐹+𝐡-2π‘Š) | 1Γ— | PP | PP | | DualPipe | (PP/2-1)(𝐹&𝐡+𝐡-3π‘Š) | 2Γ— | PP+1 | PP | | DualPipeV | (PP/2-1)(𝐹&𝐡+𝐡-3π‘Š) | 2Γ— | PP+1 | PP/2 |

PP denotes the number of pp stages (even). 𝐹 denotes the execution time of a forward chunk, 𝐡 denotes the execution time of a full backward chunk, π‘Š denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐡 denotes the execution time of two mutually overlapped forward and backward chunks.

Quick Start

The usage is shown in the following example:

BASH
1
2
python examples/example_dualpipe.py

python examples/example_dualpipev.py

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

Requirements

  • PyTorch 2.0 and above
  • Developers

    DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

    Citation

    BIBTEX
    1
    2
    @misc{deepseekai2025deepseekv3technicalreport,
    

    title={DeepSeek-V3 Technical Report}, author={DeepSeek-AI}, year={2025}, eprint={2412.19437}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.19437}, }

    About

    DualPipe is a pipeline parallelism algorithm developed by DeepSeek-AI for efficiently training gigantic AI models like Dee


    13 files
    3 folders
    1.33 MB total size
    0 open issues
    0 open pull requests
    0 watchers
    0 forks
    0 stars
    86 views
    Updated Jan 23, 2026
    Languages
    Python 97.9%
    LICENSE 2.1%