All Benchmarks Aggregated Report
Layer Norm
| Implementation | Description |
|---|---|
| HF Kernels Layer Norm | HuggingFace kernels implementation |
| PyTorch Layer Norm | PyTorch native implementation |
Rotary Position Embeddings
| Implementation | Description |
|---|---|
| HF Kernels Rotary | HuggingFace kernels implementation |
| PyTorch Rotary | PyTorch native implementation |
Flash Attention
| Implementation | Description |
|---|---|
| Flash Attention | Flash Attention implementation |
| HF Kernels Flash Attention | HuggingFace kernels Flash Attention |
| HF Kernels Flash Attention 3 | HuggingFace kernels Flash Attention 3 |
| Memory Efficient Attention | Memory efficient attention implementation |
| Sage Attention | Sage attention implementation |
| xFormers | xFormers attention implementation |
Causal Conv1D
| Implementation | Description |
|---|---|
| HF Kernels Causal Conv1D | HuggingFace kernels implementation |
| PyTorch Causal Conv1D | PyTorch native implementation |
Activation
| Implementation | Description |
|---|---|
| HF Kernels SwiGLU | HuggingFace kernels SwiGLU implementation |
| PyTorch SwiGLU | PyTorch native SwiGLU implementation |
ReLU
| Implementation | Description |
|---|---|
| HF Kernels ReLU | HuggingFace kernels ReLU implementation |
| PyTorch ReLU | PyTorch native ReLU implementation |