Causal Conv1D Benchmarks - Aggregated Results

This document combines benchmark results from multiple Causal Conv1D implementations.

Combined Summary and Visualization

2025-10-31T20:14:05.716143 image/svg+xml Matplotlib v3.10.7, https://matplotlib.org/ cuda_B2_D64_S128_W2 cuda_B2_D64_S128_W4 cuda_B2_D64_S512_W2 cuda_B2_D64_S512_W4 cuda_B2_D64_S2048_W2 cuda_B2_D64_S2048_W4 cuda_B2_D2048_S128_W2 cuda_B2_D2048_S128_W4 cuda_B2_D2048_S512_W2 cuda_B2_D2048_S512_W4 cuda_B2_D2048_S2048_W2 cuda_B2_D2048_S2048_W4 cuda_B4_D64_S128_W2 cuda_B4_D64_S128_W4 cuda_B4_D64_S512_W2 cuda_B4_D64_S512_W4 cuda_B4_D64_S2048_W2 cuda_B4_D64_S2048_W4 cuda_B4_D2048_S128_W2 cuda_B4_D2048_S128_W4 cuda_B4_D2048_S512_W2 cuda_B4_D2048_S512_W4 cuda_B4_D2048_S2048_W2 cuda_B4_D2048_S2048_W4 Workload 0.1 0.2 0.3 0.4 0.5 Latency P50 (ms) Attention Implementation Latency hf_kernels_causal_conv1d torch_eager
▶ code ▼ output ▶ uv-logs | Cell: combine | 4.43s | Raw
======================================================================
LOADING BENCHMARK DATA
======================================================================
✓ HF Kernels Causal Conv1D      : /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/7a691bd653e23c412c5d29fbc92ea1454823ea437864cf9473fc561b116ef3d9
✓ PyTorch Causal Conv1D         : /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/70757e27f2df1dfde4905a24527bb4ca6f0f8df7dac2e2ecaa0ddc359c7d5e64

  ✓ Found HF Kernels Causal Conv1D
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/7a691bd653e23c412c5d29fbc92ea1454823ea437864cf9473fc561b116ef3d9/causal_conv1d.jsonl
  ✓ Found PyTorch Causal Conv1D
     Path: /__w/kernels-benchmarks/kernels-benchmarks/benches/causal_conv1d/impls/.uvnote/cache/70757e27f2df1dfde4905a24527bb4ca6f0f8df7dac2e2ecaa0ddc359c7d5e64/causal_conv1d.jsonl

======================================================================
Summary: 2 found, 0 skipped, 0 missing
======================================================================

COMBINED BENCHMARK SUMMARY

impl                     wl                  p50(ms)  ok
hf_kernels_causal_conv1d cuda_B2_D2048_S128_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D2048_S128_W4     0.05  True
hf_kernels_causal_conv1d cuda_B2_D2048_S2048_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D2048_S2048_W4     0.05  True
hf_kernels_causal_conv1d cuda_B2_D2048_S512_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D2048_S512_W4     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S128_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S128_W4     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S2048_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S2048_W4     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S512_W2     0.05  True
hf_kernels_causal_conv1d cuda_B2_D64_S512_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S128_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S128_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S2048_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S2048_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S512_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D2048_S512_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S128_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S128_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S2048_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S2048_W4     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S512_W2     0.05  True
hf_kernels_causal_conv1d cuda_B4_D64_S512_W4     0.05  True
torch_eager              cuda_B2_D2048_S128_W2     0.09  True
torch_eager              cuda_B2_D2048_S128_W4     0.08  True
torch_eager              cuda_B2_D2048_S2048_W2     0.15  True
torch_eager              cuda_B2_D2048_S2048_W4     0.16  True
torch_eager              cuda_B2_D2048_S512_W2     0.09  True
torch_eager              cuda_B2_D2048_S512_W4     0.09  True
torch_eager              cuda_B2_D64_S128_W2     0.07  True
torch_eager              cuda_B2_D64_S128_W4     0.09  True
torch_eager              cuda_B2_D64_S2048_W2     0.09  True
torch_eager              cuda_B2_D64_S2048_W4     0.09  True
torch_eager              cuda_B2_D64_S512_W2     0.09  True
torch_eager              cuda_B2_D64_S512_W4     0.09  True
torch_eager              cuda_B4_D2048_S128_W2     0.09  True
torch_eager              cuda_B4_D2048_S128_W4     0.09  True
torch_eager              cuda_B4_D2048_S2048_W2     0.49  True
torch_eager              cuda_B4_D2048_S2048_W4     0.50  True
torch_eager              cuda_B4_D2048_S512_W2     0.10  True
torch_eager              cuda_B4_D2048_S512_W4     0.10  True
torch_eager              cuda_B4_D64_S128_W2     0.09  True
torch_eager              cuda_B4_D64_S128_W4     0.08  True
torch_eager              cuda_B4_D64_S2048_W2     0.09  True
torch_eager              cuda_B4_D64_S2048_W4     0.09  True
torch_eager              cuda_B4_D64_S512_W2     0.09  True
torch_eager              cuda_B4_D64_S512_W4     0.09  True

GENERATING COMBINED VISUALIZATION

Loaded 48 records
✓ Visualization saved as latency.svg
Saved latency.png
✓ Visualization saved as latency.svg
✓ SVG visualization ready!

ANALYSIS COMPLETE
Total implementations analyzed: 2

Implementations included:
  ✓ HF Kernels Causal Conv1D
  ✓ PyTorch Causal Conv1D
▶ UV Install Logs

Artifacts:

latency.svg
2025-10-31T20:14:05.716143 image/svg+xml Matplotlib v3.10.7, https://matplotlib.org/ cuda_B2_D64_S128_W2 cuda_B2_D64_S128_W4 cuda_B2_D64_S512_W2 cuda_B2_D64_S512_W4 cuda_B2_D64_S2048_W2 cuda_B2_D64_S2048_W4 cuda_B2_D2048_S128_W2 cuda_B2_D2048_S128_W4 cuda_B2_D2048_S512_W2 cuda_B2_D2048_S512_W4 cuda_B2_D2048_S2048_W2 cuda_B2_D2048_S2048_W4 cuda_B4_D64_S128_W2 cuda_B4_D64_S128_W4 cuda_B4_D64_S512_W2 cuda_B4_D64_S512_W4 cuda_B4_D64_S2048_W2 cuda_B4_D64_S2048_W4 cuda_B4_D2048_S128_W2 cuda_B4_D2048_S128_W4 cuda_B4_D2048_S512_W2 cuda_B4_D2048_S512_W4 cuda_B4_D2048_S2048_W2 cuda_B4_D2048_S2048_W4 Workload 0.1 0.2 0.3 0.4 0.5 Latency P50 (ms) Attention Implementation Latency hf_kernels_causal_conv1d torch_eager