SwiGLU Activation Benchmarks - Aggregated Results

This document combines benchmark results from multiple SwiGLU activation implementations.

Combined Summary and Visualization

2025-10-31T20:14:01.265668 image/svg+xml Matplotlib v3.10.7, https://matplotlib.org/ cuda_T128_D768 cuda_T128_D1024 cuda_T128_D2048 cuda_T256_D768 cuda_T256_D1024 cuda_T256_D2048 cuda_T512_D768 cuda_T512_D1024 cuda_T512_D2048 Workload 0.025 0.030 0.035 0.040 0.045 0.050 Latency P50 (ms) Attention Implementation Latency hf_kernels_swiglu torch_eager
▶ code ▼ output ▶ uv-logs | Cell: combine | 4.32s | Raw
======================================================================
LOADING BENCHMARK DATA
======================================================================
✓ HF Kernels SwiGLU             : /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/2775e6386f1caf1fda935a997130c06dcaf7641efb0db21560c35301fdabfd9b
✓ PyTorch SwiGLU                : /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/661ca38adec8893d7c284140e922da661f0afcea4aaff6a3bf48a6494ce7c6eb

  ✓ Found HF Kernels SwiGLU
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/2775e6386f1caf1fda935a997130c06dcaf7641efb0db21560c35301fdabfd9b/activation.jsonl
  ✓ Found PyTorch SwiGLU
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/activation/impls/.uvnote/cache/661ca38adec8893d7c284140e922da661f0afcea4aaff6a3bf48a6494ce7c6eb/activation.jsonl

======================================================================
Summary: 2 found, 0 skipped, 0 missing
======================================================================

COMBINED BENCHMARK SUMMARY

impl                     wl                  p50(ms)  ok
hf_kernels_swiglu        cuda_T128_D1024        0.03  True
hf_kernels_swiglu        cuda_T128_D2048        0.03  True
hf_kernels_swiglu        cuda_T128_D768         0.02  True
hf_kernels_swiglu        cuda_T256_D1024        0.03  True
hf_kernels_swiglu        cuda_T256_D2048        0.03  True
hf_kernels_swiglu        cuda_T256_D768         0.03  True
hf_kernels_swiglu        cuda_T512_D1024        0.03  True
hf_kernels_swiglu        cuda_T512_D2048        0.03  True
hf_kernels_swiglu        cuda_T512_D768         0.03  True
torch_eager              cuda_T128_D1024        0.05  True
torch_eager              cuda_T128_D2048        0.05  True
torch_eager              cuda_T128_D768         0.04  True
torch_eager              cuda_T256_D1024        0.05  True
torch_eager              cuda_T256_D2048        0.05  True
torch_eager              cuda_T256_D768         0.05  True
torch_eager              cuda_T512_D1024        0.05  True
torch_eager              cuda_T512_D2048        0.05  True
torch_eager              cuda_T512_D768         0.05  True

GENERATING COMBINED VISUALIZATION

Loaded 18 records
✓ Visualization saved as latency.svg
Saved latency.png
✓ Visualization saved as latency.svg
✓ SVG visualization ready!

ANALYSIS COMPLETE
Total implementations analyzed: 2

Implementations included:
  ✓ HF Kernels SwiGLU
  ✓ PyTorch SwiGLU
▶ UV Install Logs

Artifacts:

latency.svg
2025-10-31T20:14:01.265668 image/svg+xml Matplotlib v3.10.7, https://matplotlib.org/ cuda_T128_D768 cuda_T128_D1024 cuda_T128_D2048 cuda_T256_D768 cuda_T256_D1024 cuda_T256_D2048 cuda_T512_D768 cuda_T512_D1024 cuda_T512_D2048 Workload 0.025 0.030 0.035 0.040 0.045 0.050 Latency P50 (ms) Attention Implementation Latency hf_kernels_swiglu torch_eager