dataframe

High-performance columnar DataFrame for Kit with SIMD acceleration

Features

  • Core DataFrame Operations: Select, filter, sort, group by, join, aggregate
  • Statistical Functions: Variance, standard deviation, quantiles, correlation, skewness, kurtosis
  • Reshaping: Pivot, melt, crosstab, stack, unstack, transpose
  • Window Functions: Row number, rank, lead/lag, cumulative sums, rolling aggregations
  • Column Expressions: Polars-style column transformations and comparisons
  • Parallel Operations: Partition-aware aggregations using Kit's concurrency primitives
  • Extended I/O: Integration with Parquet, Arrow IPC, and SQLite
  • DateTime Support: Parse, extract, format, and compare timestamp columns
  • Lazy Evaluation: Query optimization with predicate pushdown and operation fusion

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
dev/gapminder.kitDevelopment script for gapminder
dev/iris.kitDevelopment script for iris
dev/mtcars.kitDevelopment script for mtcars
dev/penguins.kitDevelopment script for penguins
dev/tips.kitDevelopment script for tips
dev/titanic.kitDevelopment script for titanic
examples/dataframe.kitExample: dataframe
kit.tomlPackage manifest with metadata and dependencies
src/col.kitModule for col
src/dataframe.kitDataFrame error type for typed error handling.
src/datetime.kitModule for datetime
src/duration.kitModule for duration
src/eval.kitModule for eval
src/expr.kitJoin type enumeration
src/io.kitModule for io
src/optimize.kitModule for optimize
src/parallel.kitModule for parallel
src/reshape.kitCreate a pivot table from a DataFrame.
src/rolling.kitModule for rolling
src/stats.kitCalculate sample variance of a column (ddof=1).
src/str.kitConvert column values to lowercase.
src/window.kitAdd 1-indexed row numbers as a new column.
tests/col.test.kitTests for col
tests/dataframe.test.kitTests for dataframe
tests/io.test.kitTests for io
tests/parallel.test.kitTests for parallel
tests/reshape.test.kitTests for reshape
tests/stats.test.kitTests for stats
zig/dataframe.zigZig FFI module for dataframe
zig/kit_ffi.zigZig FFI module for kit ffi
zig/reshape.zigZig FFI module for reshape
zig/stats.zigZig FFI module for stats
zig/string_ops.zigZig FFI module for string ops
zig/window_ops.zigZig FFI module for window ops

Dependencies

  • CSV - CSV parsing for from-csv / to-csv
  • kit-arrow - Apache Arrow in-memory format (read-arrow / write-arrow)
  • kit-parquet - Apache Parquet columnar storage (read-parquet / write-parquet)
  • kit-sqlite - SQLite database access (read-sql / to-sql)

Installation

kit add gitlab.com/kit-lang/packages/kit-dataframe.git

Usage

Basic Operations

import Kit.Dataframe as DataFrame

# Create from records
df = DataFrame.from-records [
  {name: "Alice", age: 30, salary: 75000}, 
  {name: "Bob", age: 25, salary: 55000}, 
  {name: "Carol", age: 35, salary: 85000}
]

# Basic operations
filtered = DataFrame.filter (fn(row) => row.age > 28) df
sorted = DataFrame.sort "salary" df
selected = DataFrame.select ["name", "salary"] df

# Aggregations
total = DataFrame.sum df "salary"
avg = DataFrame.mean df "age"

Statistical Functions

import DataFrame.Stats as Stats

# Variance and standard deviation
variance = Stats.var df "returns"
std-dev = Stats.std-sample df "returns"

# Quantiles and percentiles
median = Stats.quantile df "price" 0.5
q1 = Stats.percentile df "price" 25.0

# Correlation and covariance
corr = Stats.corr df "x" "y"
cov = Stats.cov df "x" "y"

Reshaping

import DataFrame.Reshape as Reshape

# Pivot table
pivoted = Reshape.pivot df {
  index: ["date"], 
  columns: "product", 
  values: "sales", 
  aggfunc: :sum
}

# Melt (unpivot)
melted = Reshape.melt df {
  id-vars: ["date"], 
  value-vars: ["q1", "q2", "q3"], 
  var-name: "quarter", 
  value-name: "sales"
}

# Crosstab
cross = Reshape.crosstab df "category" "status"

Column Expressions

import DataFrame.Col as Col

# Scale and offset
df2 = df
  |> Col.scale-col "price" 1.1 "marked_up"
  |> Col.offset-col "score" 10 "adjusted"

# Comparisons
df3 = df
  |> Col.gt-col "salary" 60000.0 "is_high_earner"
  |> Col.eq-col "status" "active" "is_active"

# Categorize
df4 = Col.categorize "age" "age_group" [
  {max: 18, label: "child"}, 
  {max: 65, label: "adult"}, 
  {max: Float.infinity, label: "senior"}
] df

Parallel Operations

import DataFrame.Parallel as Par

# Parallel aggregations
total = Par.par-sum df "amount"
avg = Par.par-mean df "score"

# Partitioned operations (map-reduce pattern)
sum = Par.partitioned-sum df "value" 4  # 4 partitions

DateTime Operations

import DataFrame.DateTimeCol as DT

# Parse datetime strings
df2 = df
  |> DT.parse-iso-col "timestamp" "parsed_ts"

# Extract components
df3 = df
  |> DT.year-col "parsed_ts" "year"
  |> DT.month-col "parsed_ts" "month"
  |> DT.weekday-col "parsed_ts" "day_of_week"

# Format timestamps
df4 = DT.format-col "parsed_ts" "%Y-%m-%d" "date_string" df

Extended I/O

import DataFrame.IO as IO

# Parquet (requires kit-parquet)
df = IO.read-parquet "data.parquet" |> Result.unwrap
IO.write-parquet df "output.parquet"

# Arrow IPC
df = IO.read-arrow "data.arrow" |> Result.unwrap
IO.write-arrow df "output.arrow"

# SQLite (requires kit-sqlite)
db = SQLite.connect "data.db"
df = IO.read-sql db "SELECT * FROM users" |> Result.unwrap
IO.to-sql df db "users" :replace

Interactive REPL

kit-dataframe ships with preloaded REPL sessions for exploring classic datasets interactively. Each preload creates a ready-to-use DataFrame with pre-built subsets and helper functions.

Available Datasets

DatasetModuleRowsDescription
dev/iris.kitIris150Fisher's Iris flower measurements (sepal/petal dimensions by species)
dev/mtcars.kitMtcars32Motor Trend 1974 car road tests (mpg, hp, weight, etc.)
dev/titanic.kitTitanic100Titanic passenger survival data (class, sex, age, fare)
dev/penguins.kitPenguins150Palmer Penguins morphometrics (bill, flipper, mass by species)
dev/tips.kitTips50Restaurant tipping data (total bill, tip, day, time)
dev/gapminder.kitGapminder66Global development indicators (life expectancy, GDP, population)

Running a REPL Session

From the kit-dataframe package directory:

kit repl --preload dev/iris.kit

The REPL prompt shows the module name (e.g., Iris≫) and prints available variables and helpers on startup:

Iris≫ preview iris
   sepal-length  sepal-width  petal-length  petal-width  species
0          5.1          3.5           1.4          0.2    setosa
1          4.9          3.0           1.4          0.2    setosa
2          4.7          3.2           1.3          0.2    setosa
3          4.6          3.1           1.5          0.2    setosa
4          5.0          3.6           1.4          0.2    setosa

[150 rows x 5 columns]

Each preload provides:

  • Pre-built subsets — filtered views by category (e.g., setosa, auto, survived)
  • `preview df` — show first 5 rows as a formatted table
  • `info df` — shape, columns, and summary statistics
  • `col-stats col df` — mean, std, min, max, median for a column
  • `compare-by-* col` — compare a measurement across groups
  • `corr col1 col2` — Pearson correlation between two columns
  • `top n col` / `bottom n col` — top/bottom n rows by a column
  • `sorted col` — sort by any column

Tests

Run the test suite:

cd packages/kit-dataframe
kit dev

License

MIT License - see LICENSE for details.

Exported Functions & Types

parse-col

Parse string column to timestamp using Kit's Time.parse. Creates a new integer column with Unix timestamps (milliseconds).

NonEmptyString -> String -> NonEmptyString -> DataFrame -> DataFrame

parse-iso-col

Parse ISO 8601 datetime string column. Format: "2024-01-15T10:30:00Z" or "2024-01-15 10:30:00"

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

parse-date-col

Parse date-only string column (no time component). Format: "2024-01-15"

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

year-col

Extract year from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

month-col

Extract month (1-12) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

day-col

Extract day of month (1-31) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

hour-col

Extract hour (0-23) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

minute-col

Extract minute (0-59) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

second-col

Extract second (0-59) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

weekday-col

Extract day of week (0=Sunday, 6=Saturday) from timestamp column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

add-days-col

Add days to timestamp column.

NonEmptyString -> Int -> DataFrame -> DataFrame

add-months-col

Add months to timestamp column.

NonEmptyString -> Int -> DataFrame -> DataFrame

add-years-col

Add years to timestamp column.

NonEmptyString -> Int -> DataFrame -> DataFrame

diff-col

Calculate difference between two timestamp columns in milliseconds.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

diff-days-col

Calculate difference in days between two timestamp columns.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

diff-hours-col

Calculate difference in hours between two timestamp columns.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

format-col

Format timestamp column to string using strftime format.

NonEmptyString -> String -> NonEmptyString -> DataFrame -> DataFrame

format-iso-col

Format timestamp column to ISO 8601 string.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

format-date-col

Format timestamp column to date string (YYYY-MM-DD).

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

format-time-col

Format timestamp column to time string (HH:MM:SS).

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

is-before-col

Check if timestamp column values are before a reference timestamp.

NonEmptyString -> Int -> NonEmptyString -> DataFrame -> DataFrame

is-after-col

Check if timestamp column values are after a reference timestamp.

NonEmptyString -> Int -> NonEmptyString -> DataFrame -> DataFrame

is-between-col

Check if timestamp column values are between two timestamps.

NonEmptyString -> Int -> Int -> NonEmptyString -> DataFrame -> DataFrame

now-col

Add current timestamp column to DataFrame.

NonEmptyString -> DataFrame -> DataFrame

components-col

Convert timestamp to components record column. Returns a column where each value is {year, month, day, hour, minute, second}.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

EvalError

Evaluation errors that can occur during expression evaluation

Variants

EvalDataFrameError {message}
EvalColumnNotFound {column}
EvalTypeMismatch {expected, got}
EvalInvalidOperation {operation, reason}

eval

Evaluate a DataFrame expression tree. Recursively traverses the tree, executing operations bottom-up.

DFExpr -> Result a EvalError

eval-optimized

Evaluate with optimization. Applies the optimizer before evaluation for better performance.

DFExpr -> Result a EvalError

expr-hash

Compute a hash key for an expression. Used for memoization to identify repeated subexpressions.

DFExpr -> String

eval-memoized

Evaluate with memoization. Caches results of subexpressions to avoid redundant computation.

DFExpr -> Result a EvalError

eval-memoized-cache-size

Get the cache size after memoized evaluation.

DFExpr -> Int

eval!

Evaluate and unwrap, panicking on error. Use for scripts where errors should halt execution.

DFExpr -> a

eval-optimized!

Evaluate optimized and unwrap, panicking on error.

DFExpr -> a

collect

Collect: Synonym for eval-optimized. Named to match Polars/Spark terminology.

DFExpr -> Result a EvalError

collect!

Collect and unwrap, panicking on error.

DFExpr -> a

read-parquet

Read a Parquet file into a DataFrame. Uses kit-parquet to read the file, then converts to DataFrame via records.

NonEmptyString -> Result DataFrame String

match IO.read-parquet "data.parquet"
  | Ok df -> DataFrame.print df
  | Err e -> print "Error: ${e}"

write-parquet

Write a DataFrame to a Parquet file. Converts DataFrame to records, then writes via kit-parquet.

DataFrame -> NonEmptyString -> Result Unit String

IO.write-parquet df "output.parquet"

write-parquet-compressed

Write a DataFrame to Parquet with compression options.

Compression options: :snappy, :gzip, :lz4, :zstd, :uncompressed

DataFrame -> NonEmptyString -> Symbol -> Result Unit String

IO.write-parquet-compressed df "output.parquet" :zstd

read-arrow

Read an Arrow IPC file into a DataFrame. Arrow IPC (Feather) format is efficient for temporary storage and IPC.

NonEmptyString -> Result DataFrame String

match IO.read-arrow "data.arrow"
  | Ok df -> DataFrame.print df
  | Err e -> print "Error: ${e}"

write-arrow

Write a DataFrame to an Arrow IPC file.

DataFrame -> NonEmptyString -> Result Unit String

IO.write-arrow df "data.arrow"

read-sql

Execute a SQL query and return results as a DataFrame. The query results are converted to DataFrame records.

{query: String -> Result List a, ..} -> String -> Result DataFrame String

db = SQLite.connect "data.db"
match IO.read-sql db "SELECT id, name, age FROM users"
  | Ok df -> DataFrame.print df
  | Err e -> print "Error: ${e}"

to-sql

Write a DataFrame to a SQLite table. Creates the table if it doesn't exist, or inserts into existing table.

Options for if-exists: :replace - Drop and recreate table :append - Insert rows into existing table :fail - Return error if table exists (default)

DataFrame -> {execute: String -> Result Int a, query: String -> Result List b, ..} -> String -> Symbol -> Result Int String

db = SQLite.connect "data.db"
IO.to-sql df db "users" :replace

read-csv-chunked

Read a CSV file in chunks, applying a processor to each chunk. Useful for processing files larger than available memory.

NonEmptyString -> PositiveInt -> (DataFrame -> a) -> Result Unit String

IO.read-csv-chunked "huge.csv" 10000 (fn(chunk) =>
  chunk
    |> DataFrame.filter (fn(row) => row.valid?)
    |> process-and-save
)

mean

Rolling mean with specified window size. Returns None for first (window-1) rows.

String -> Int -> String -> DataFrame -> DataFrame

sum

Rolling sum with specified window size.

String -> Int -> String -> DataFrame -> DataFrame

std

Rolling standard deviation with specified window size (sample std, ddof=1).

String -> Int -> String -> DataFrame -> DataFrame

min

Rolling minimum with specified window size.

String -> Int -> String -> DataFrame -> DataFrame

max

Rolling maximum with specified window size.

String -> Int -> String -> DataFrame -> DataFrame

JoinKind

Join type enumeration

Variants

Inner
LeftOuter
RightOuter
FullOuter

DFExpr

Lazy DataFrame expression tree. Constructing an expression does not perform any computation - it builds a tree structure that can be optimized and then evaluated.

Type parameters: - a: The DataFrame value type (opaque) - b: Predicate/mapper function type (opaque)

Variants

Lit {a}
Select {DFExpr, _1}
Drop {DFExpr, _1}
Filter {DFExpr, a}
MapCol {DFExpr, String, a}
Sort {DFExpr, String, Bool}
Slice {DFExpr, Int, Int}
Head {DFExpr, Int}
Tail {DFExpr, Int}
GroupBy {DFExpr, _1}
Agg {DFExpr, a}
GroupByAgg {DFExpr, _1, a}
Join {DFExpr, DFExpr, String, JoinKind}
Concat {DFExpr, DFExpr}
WithColumn {DFExpr, String, a}
Rename {DFExpr, a}
Unique {DFExpr, _1}
Sample {DFExpr, Int}
FillNone {DFExpr, String, a}
DropNone {DFExpr, String}
SortDesc {DFExpr, String}
TopN {DFExpr, String, Int}

of

Create a literal expression from a DataFrame. This is the entry point for building lazy expressions.

a -> DFExpr

select

Select specific columns from the DataFrame.

[String] -> DFExpr -> DFExpr

drop

Drop specific columns from the DataFrame.

[String] -> DFExpr -> DFExpr

filter

Filter rows using a predicate function.

a -> DFExpr -> DFExpr

map-column

Transform a column using a function.

NonEmptyString -> a -> DFExpr -> DFExpr

sort

Sort by column in ascending order.

NonEmptyString -> DFExpr -> DFExpr

sort-desc

Sort by column in descending order.

NonEmptyString -> DFExpr -> DFExpr

sort-by

Sort by column with explicit direction.

NonEmptyString -> Bool -> DFExpr -> DFExpr

slice

Slice rows from start to end (exclusive).

NonNegativeInt -> NonNegativeInt -> DFExpr -> DFExpr

Take first n rows.

PositiveInt -> DFExpr -> DFExpr

tail

Take last n rows.

PositiveInt -> DFExpr -> DFExpr

group-by

Group by specified columns. Must be followed by an aggregate operation.

[String] -> DFExpr -> DFExpr

aggregate

Apply aggregations to grouped DataFrame.

a -> DFExpr -> DFExpr

group-by-agg

Combined group-by and aggregate in one operation.

[String] -> a -> DFExpr -> DFExpr

inner-join

Inner join with another DataFrame expression.

DFExpr -> NonEmptyString -> DFExpr -> DFExpr

left-join

Left outer join with another DataFrame expression.

DFExpr -> NonEmptyString -> DFExpr -> DFExpr

right-join

Right outer join with another DataFrame expression.

DFExpr -> NonEmptyString -> DFExpr -> DFExpr

outer-join

Full outer join with another DataFrame expression.

DFExpr -> NonEmptyString -> DFExpr -> DFExpr

concat

Concatenate two DataFrame expressions vertically.

DFExpr -> DFExpr -> DFExpr

with-column

Add or replace a column with values.

NonEmptyString -> a -> DFExpr -> DFExpr

rename

Rename columns using a mapping record.

a -> DFExpr -> DFExpr

unique

Get unique rows based on specified columns.

[String] -> DFExpr -> DFExpr

sample

Take a random sample of n rows.

PositiveInt -> DFExpr -> DFExpr

fill-none

Fill missing values in a column.

NonEmptyString -> a -> DFExpr -> DFExpr

drop-none

Drop rows with missing values in a column.

NonEmptyString -> DFExpr -> DFExpr

top-n

Optimized top-N operation (head of sorted data). More efficient than sort followed by head.

NonEmptyString -> PositiveInt -> DFExpr -> DFExpr

is-literal?

Check if expression is a literal (base case).

DFExpr -> Bool

depth

Get the depth of the expression tree.

DFExpr -> Int

node-count

Count the number of nodes in the expression tree.

DFExpr -> Int

with-fn

Add a column computed from row values using a function. The function receives each row as a record.

NonEmptyString -> (Record -> a) -> DataFrame -> Result DataFrame String

df |> Col.with-fn "total" (fn(row) => row.price * row.qty)

with-many

Add multiple computed columns sequentially. Each spec is a record with {name: String, fn: fn(row) => value}.

[{name: String, fn: Record -> a}] -> DataFrame -> Result DataFrame String

df |> Col.with-many [
  {name: "total", fn: fn(row) => row.price * row.qty},
  {name: "tax", fn: fn(row) => row.price * 0.08}
]

scale-col

Scale a column by a factor, storing in new column.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

offset-col

Offset a column by an amount, storing in new column.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

gt-col

Add boolean column for values > threshold.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

lt-col

Add boolean column for values < threshold.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

ge-col

Add boolean column for values >= threshold.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

le-col

Add boolean column for values <= threshold.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

eq-col

Add boolean column for values equal to target.

NonEmptyString -> a -> NonEmptyString -> DataFrame -> Result DataFrame String

ne-col

Add boolean column for values not equal to target.

NonEmptyString -> a -> NonEmptyString -> DataFrame -> Result DataFrame String

trim

Trim whitespace from column values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

str-len

Get string length for each value.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

contains

Check if values contain a substring.

NonEmptyString -> String -> NonEmptyString -> DataFrame -> Result DataFrame String

starts-with

Check if values start with a prefix.

NonEmptyString -> String -> NonEmptyString -> DataFrame -> Result DataFrame String

ends-with

Check if values end with a suffix.

NonEmptyString -> String -> NonEmptyString -> DataFrame -> Result DataFrame String

fill-empty

Fill empty strings with a default value.

NonEmptyString -> String -> NonEmptyString -> DataFrame -> Result DataFrame String

is-empty

Create boolean column for empty values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

is-not-empty

Create boolean column for non-empty values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

categorize

Categorize numeric values into bins. bins is a list of {max: Float, label: String} in ascending order.

NonEmptyString -> NonEmptyString -> [{max: Float, label: String}] -> DataFrame -> Result DataFrame String

df |> Col.categorize "age" "age_group" [
  {max: 18, label: "child"},
  {max: 65, label: "adult"},
  {max: Float.infinity, label: "senior"}
]

indicator

Create indicator column (1 for true, 0 for false).

NonEmptyString -> a -> NonEmptyString -> DataFrame -> Result DataFrame String

indicator-gt

Create indicator from comparison.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

abs-col

Apply absolute value to column.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

round-col

Round column values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

floor-col

Floor column values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

ceil-col

Ceiling column values.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

log-col

Apply natural log to column.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

sqrt-col

Apply square root to column.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

exp-col

Apply exponential to column.

NonEmptyString -> NonEmptyString -> DataFrame -> Result DataFrame String

pow-col

Apply power to column values.

NonEmptyString -> Float -> NonEmptyString -> DataFrame -> Result DataFrame String

DataFrameError

DataFrame error type for typed error handling. Variants distinguish between different failure modes.

Variants

DataFrameParseError {message}
DataFrameColumnError {message}
DataFrameRowError {message}
DataFrameIOError {message}
DataFrameConversionError {message}

parse-csv

Parse a CSV string into a DataFrame. The first line is treated as column headers. Returns Result with Ok(DataFrame) or Err(message).

String -> Result a b

read-csv

Read a CSV file into a DataFrame. Returns Result with Ok(DataFrame) or Err(message).

String -> Result a IOError

optimize

Recursively optimize an expression tree. First optimizes all sub-expressions bottom-up, then applies rewrite rules.

DFExpr -> DFExpr

count-rewrites

Count the number of optimizations applied. Useful for debugging and profiling the optimizer.

DFExpr -> DFExpr -> Int

stats

Get optimization statistics as a record.

DFExpr -> DFExpr -> {original_nodes: Int, optimized_nodes: Int, reduction: Int}

from-millis-col

Create duration from milliseconds column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

from-seconds-col

Create duration from seconds column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

from-minutes-col

Create duration from minutes column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

from-hours-col

Create duration from hours column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

from-days-col

Create duration from days column.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

parse-col

Parse duration string column (e.g., "2h30m").

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

to-millis-col

Convert duration to milliseconds.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

to-seconds-col

Convert duration to seconds.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

to-minutes-col

Convert duration to minutes.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

to-hours-col

Convert duration to hours.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

to-days-col

Convert duration to days.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

add-col

Add two duration columns.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

subtract-col

Subtract second duration column from first.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

multiply-col

Multiply duration column by a scalar.

NonEmptyString -> Int -> DataFrame -> DataFrame

divide-col

Divide duration column by a scalar.

NonEmptyString -> Int -> DataFrame -> DataFrame

negate-col

Negate duration values in column.

NonEmptyString -> DataFrame -> DataFrame

abs-col

Get absolute value of duration column.

NonEmptyString -> DataFrame -> DataFrame

format-col

Format duration as human-readable string (e.g., "2h 30m").

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

format-long-col

Format duration with full unit names (e.g., "2 hours 30 minutes").

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

format-abbrev-col

Format duration with abbreviated units (e.g., "2h30m").

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

is-zero-col

Check if duration values are zero.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

is-negative-col

Check if duration values are negative.

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

is-positive-col

Check if duration values are positive (not zero or negative).

NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

add-to-time-col

Add duration column to timestamp column.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

subtract-from-time-col

Subtract duration column from timestamp column.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

between-times-col

Calculate duration between two timestamp columns.

NonEmptyString -> NonEmptyString -> NonEmptyString -> DataFrame -> DataFrame

par-map-column

Apply a function to each value in a column.

NonEmptyString -> (a -> b) -> DataFrame -> Result DataFrame String

Par.map-column "price" (fn(x) => x * 1.1) df

par-filter

Filter rows using a predicate function.

(Record -> Bool) -> DataFrame -> Result DataFrame String

Par.filter (fn(row) => row.age > 30) df

par-sum

Calculate sum of a column.

DataFrame -> NonEmptyString -> Result Float String

Par.sum df "amount"

par-mean

Calculate mean of a column.

DataFrame -> NonEmptyString -> Result Float String

Par.mean df "score"

par-min

Find minimum value in a column.

DataFrame -> NonEmptyString -> Result Float String

Par.min df "temperature"

par-max

Find maximum value in a column.

DataFrame -> NonEmptyString -> Result Float String

Par.max df "revenue"

par-count

Count rows matching a predicate.

(Record -> Bool) -> DataFrame -> Result Int String

Par.count (fn(row) => row.status == "active") df

partitioned-sum

Partition-aware sum: splits data into chunks and aggregates. This pattern is ready for parallel execution.

DataFrame -> NonEmptyString -> PositiveInt -> Result Float String

partitioned-mean

Partition-aware mean: partitioned sum divided by count.

DataFrame -> NonEmptyString -> PositiveInt -> Result Float String

partitioned-min

Partition-aware min: finds minimum in each partition then combines.

DataFrame -> NonEmptyString -> PositiveInt -> Result Float String

partitioned-max

Partition-aware max: finds maximum in each partition then combines.

DataFrame -> NonEmptyString -> PositiveInt -> Result Float String