Bench

The Bench module provides statistical benchmarking capabilities for measuring code performance. It supports warmup iterations, multiple measured runs, and provides comprehensive statistics including mean, median, percentiles, and standard deviation.

Running Benchmarks

Bench.run
String -> Int -> Int -> (() -> a) -> BenchResult
Runs a statistical benchmark on the provided function (thunk). The function executes warmup iterations first to allow JIT optimization and cache warming, then performs the measured iterations to collect timing statistics.

Parameters

  • name - A descriptive name for the benchmark
  • warmup - Number of warmup iterations (not measured)
  • iterations - Number of measured iterations
  • thunk - A zero-argument function to benchmark
# Basic benchmark with 10 warmup and 100 measured iterations
result = Bench.run "fibonacci" 10 100 (fn => fib 30)

println "Mean: ${result.mean_ns} ns"
println "Median: ${result.median_ns} ns"
println "Ops/sec: ${result.ops_per_sec}"

Result Fields

Bench.run returns a BenchResult record with the following fields:

Field Type Description
name String The benchmark name passed to Bench.run
iterations Int Number of measured iterations performed
mean_ns Int Mean (average) execution time in nanoseconds
median_ns Int Median execution time in nanoseconds
min_ns Int Minimum execution time in nanoseconds
max_ns Int Maximum execution time in nanoseconds
p99_ns Int 99th percentile execution time in nanoseconds
std_dev Float Standard deviation of execution times
ops_per_sec Float Operations per second based on mean time

Examples

Comparing Algorithms

# Define two sorting implementations
bubble-sort = fn(list) =>
  # ... bubble sort implementation
  list

quick-sort = fn(list) =>
  # ... quick sort implementation
  list

# Create test data
test-data = [5, 2, 8, 1, 9, 3, 7, 4, 6]

# Benchmark both implementations
bubble-result = Bench.run "bubble-sort" 5 50 (fn => bubble-sort test-data)
quick-result = Bench.run "quick-sort" 5 50 (fn => quick-sort test-data)

println "Bubble Sort: ${bubble-result.mean_ns} ns"
println "Quick Sort: ${quick-result.mean_ns} ns"

Benchmarking with Different Input Sizes

benchmark-size = fn(n) =>
  data = range 1 n
  Bench.run "sum-${n}" 10 100 (fn => fold (+) 0 data)

# Test with increasing sizes
results = [100, 1000, 10000] |> map benchmark-size

each results (fn(r) =>
  println "${r.name}: ${r.mean_ns} ns (${r.ops_per_sec} ops/sec)"
)

Detailed Statistics Report

result = Bench.run "complex-operation" 20 200 (fn =>
  # Some complex computation
  [1, 2, 3, 4, 5]
    |> map (fn(x) => x * x)
    |> filter (fn(x) => x > 5)
    |> fold (+) 0
)

println "Benchmark: ${result.name}"
println "Iterations: ${result.iterations}"
println "---"
println "Mean:    ${result.mean_ns} ns"
println "Median:  ${result.median_ns} ns"
println "Min:     ${result.min_ns} ns"
println "Max:     ${result.max_ns} ns"
println "P99:     ${result.p99_ns} ns"
println "Std Dev: ${result.std_dev}"
println "---"
println "Throughput: ${result.ops_per_sec} ops/sec"

Best Practices

Warmup Iterations

Use warmup iterations to ensure the code being benchmarked has been optimized and caches are warm. A typical starting point is 5-20 warmup iterations depending on code complexity.

Sufficient Iterations

More iterations provide more accurate statistics. For stable results, use at least 50-100 measured iterations. For micro-benchmarks of very fast operations, consider 1000+ iterations.

Avoid Side Effects

The benchmarked function should be pure when possible. Side effects like I/O operations can add significant variance to timing measurements.

Interpreting Results

Use median_ns for a robust central tendency measure that is less affected by outliers. Compare p99_ns to understand worst-case performance. A high std_dev indicates variable performance that may need investigation.