Bench

The Bench module provides statistical benchmarking capabilities for measuring code performance. It supports warmup iterations, multiple measured runs, and provides comprehensive statistics including mean, median, percentiles, and standard deviation.

Running Benchmarks

Bench.run

String -> Int -> Int -> (() -> a) -> BenchResult

Runs a statistical benchmark on the provided function (thunk). The function executes warmup iterations first to allow JIT optimization and cache warming, then performs the measured iterations to collect timing statistics.

Parameters

name - A descriptive name for the benchmark
warmup - Number of warmup iterations (not measured)
iterations - Number of measured iterations
thunk - A zero-argument function to benchmark

# Basic benchmark with 10 warmup and 100 measured iterations
result = Bench.run "fibonacci" 10 100 (fn => fib 30)

println "Mean: ${result.mean_ns} ns"
println "Median: ${result.median_ns} ns"
println "Ops/sec: ${result.ops_per_sec}"

Result Fields

Bench.run returns a BenchResult record with the following fields:

Field	Type	Description
`name`	String	The benchmark name passed to `Bench.run`
`iterations`	Int	Number of measured iterations performed
`mean_ns`	Int	Mean (average) execution time in nanoseconds
`median_ns`	Int	Median execution time in nanoseconds
`min_ns`	Int	Minimum execution time in nanoseconds
`max_ns`	Int	Maximum execution time in nanoseconds
`p99_ns`	Int	99th percentile execution time in nanoseconds
`std_dev`	Float	Standard deviation of execution times
`ops_per_sec`	Float	Operations per second based on mean time

Examples

Comparing Algorithms

# Define two sorting implementations
bubble-sort = fn(list) =>
  # ... bubble sort implementation
  list

quick-sort = fn(list) =>
  # ... quick sort implementation
  list

# Create test data
test-data = [5, 2, 8, 1, 9, 3, 7, 4, 6]

# Benchmark both implementations
bubble-result = Bench.run "bubble-sort" 5 50 (fn => bubble-sort test-data)
quick-result = Bench.run "quick-sort" 5 50 (fn => quick-sort test-data)

println "Bubble Sort: ${bubble-result.mean_ns} ns"
println "Quick Sort: ${quick-result.mean_ns} ns"

Benchmarking with Different Input Sizes

benchmark-size = fn(n) =>
  data = range 1 n
  Bench.run "sum-${n}" 10 100 (fn => fold (+) 0 data)

# Test with increasing sizes
results = [100, 1000, 10000] |> map benchmark-size

each results (fn(r) =>
  println "${r.name}: ${r.mean_ns} ns (${r.ops_per_sec} ops/sec)"
)

Detailed Statistics Report

result = Bench.run "complex-operation" 20 200 (fn =>
  # Some complex computation
  [1, 2, 3, 4, 5]
    |> map (fn(x) => x * x)
    |> filter (fn(x) => x > 5)
    |> fold (+) 0
)

println "Benchmark: ${result.name}"
println "Iterations: ${result.iterations}"
println "---"
println "Mean:    ${result.mean_ns} ns"
println "Median:  ${result.median_ns} ns"
println "Min:     ${result.min_ns} ns"
println "Max:     ${result.max_ns} ns"
println "P99:     ${result.p99_ns} ns"
println "Std Dev: ${result.std_dev}"
println "---"
println "Throughput: ${result.ops_per_sec} ops/sec"

Best Practices

Warmup Iterations

Use warmup iterations to ensure the code being benchmarked has been optimized and caches are warm. A typical starting point is 5-20 warmup iterations depending on code complexity.

Sufficient Iterations

More iterations provide more accurate statistics. For stable results, use at least 50-100 measured iterations. For micro-benchmarks of very fast operations, consider 1000+ iterations.

Avoid Side Effects

The benchmarked function should be pure when possible. Side effects like I/O operations can add significant variance to timing measurements.

Interpreting Results

Use median_ns for a robust central tendency measure that is less affected by outliers. Compare p99_ns to understand worst-case performance. A high std_dev indicates variable performance that may need investigation.