arrow

Kind	ffi-c
Capabilities	ffi file
Categories	data-format analytics ffi
Keywords	arrow columnar data analytics dataframe

Apache Arrow columnar data format for Kit

Files

File	Description
`.editorconfig`	Editor formatting configuration
`.gitignore`	Git ignore rules for build artifacts and dependencies
`.tool-versions`	asdf tool versions (Zig, Kit)
`LICENSE`	MIT license file
`README.md`	This file
`c/kit_arrow.cpp`	C++ FFI wrapper for Apache Arrow
`c/kit_arrow.h`	C header for FFI wrapper
`examples/analytics.kit`	Example: aggregation and sorting
`examples/basic.kit`	Basic usage example
`examples/file-io.kit`	Example: reading and writing Arrow IPC files
`examples/tables.kit`	Example: creating and querying tables
`kit.toml`	Package manifest with metadata and dependencies
`src/arrow.kit`	kit-arrow: Apache Arrow Columnar Data Format for Kit
`tests/arrow.test.kit`	Tests for arrow
`tests/types.test.kit`	Tests for data types

Dependencies

kit-parquet - Apache Parquet columnar storage

Native Dependencies

Apache Arrow C++ library is required:

Platform	Install Command
macOS	`brew install apache-arrow`
Ubuntu	`sudo apt install libarrow-dev`
Fedora	`sudo dnf install libarrow-devel`

Installation

kit add gitlab.com/kit-lang/packages/kit-arrow.git

Usage

Table Builder

The easiest way to create Arrow tables is with the fluent builder:

import Kit.Arrow as Arrow

create-users = fn =>
  Arrow.build-table
    |> Arrow.add-column "id" Arrow.int64 [1, 2, 3]
    |> Arrow.add-column "name" Arrow.string-type ["Alice", "Bob", "Carol"]
    |> Arrow.add-column "score" Arrow.float64 [95.5, 87.3, 91.0]
    |> Arrow.finalize

main = fn =>
  match create-users
    | Ok tbl ->
      println "Rows: ${show (Arrow.table-num-rows tbl)}"
      println "Columns: ${show (Arrow.table-num-columns tbl)}"

      # DataFrame operations
      match Arrow.table-head 2 tbl
        | Ok top2 ->
          println "First 2 rows: ${show (Arrow.table-num-rows top2)}"
          Arrow.free-table top2
        | Err e -> println "Error: ${show e}"

      Arrow.free-table tbl
    | Err e -> println "Error: ${show e}"

main

The builder supports all primitive Arrow types (int8 through uint64, float32, float64, bool-type, string-type, binary-type, date-type, timestamp-type). Schema is derived automatically from the add-column calls.

Reading Files

import Kit.Arrow as Arrow

main = fn =>
  # Read a Parquet file
  match Arrow.read-parquet "data.parquet"
    | Ok tbl ->
      println "Loaded ${show (Arrow.table-num-rows tbl)} rows"

      total = Arrow.sum "id" tbl
      avg = Arrow.mean "id" tbl
      println "Sum: ${show total}, Mean: ${show avg}"

      Arrow.free-table tbl
    | Err e -> println "Error: ${show e}"

  # Also supports Arrow IPC and CSV
  # Arrow.read-ipc "data.arrow"
  # Arrow.read-csv "data.csv"

main

Development

Running Examples

Run examples with the interpreter:

kit run examples/basic.kit

Compile examples to a native binary:

kit build examples/basic.kit && ./basic

Running Tests

Run the test suite:

kit test

Run the test suite with coverage:

kit test --coverage

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

Format and check source files in src/
Run tests in tests/ with coverage

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/arrow/, making it available for import as Kit.Arrow in other projects.

License

This package is released under the MIT License - see LICENSE for details.

Apache Arrow is an open-source columnar memory format maintained by the Apache Software Foundation under the Apache License 2.0.

Exported Functions & Types

`ArrowError`

Arrow error type for typed error handling.

Variants

ArrowCreateError {message}

ArrowIOError {message}

ArrowAccessError {message}

`DataType`

Arrow data types for columnar arrays.

These types correspond to Apache Arrow's type system and determine how data is stored in memory. Arrow uses fixed-width types for efficient SIMD operations and variable-width types for strings and binary data.

Primitive Types: - Int8Type, Int16Type, Int32Type, Int64Type: Signed integers - UInt8Type, UInt16Type, UInt32Type, UInt64Type: Unsigned integers - Float32Type, Float64Type: IEEE 754 floating-point numbers - BoolType: Boolean values (stored as bit-packed array) - StringType: UTF-8 encoded variable-length strings - BinaryType: Variable-length binary data

Temporal Types: - DateType: Date as days since Unix epoch (1970-01-01) - TimestampType: Timestamp as microseconds since Unix epoch

Nested Types: - ListType DataType: Variable-length list of elements - StructType List Field: Record with named fields

Variants

Int8Type

Int16Type

Int32Type

Int64Type

UInt8Type

UInt16Type

UInt32Type

UInt64Type

Float32Type

Float64Type

BoolType

StringType

BinaryType

DateType

TimestampType

ListType {DataType}

StructType {List, Field}

`int8`

8-bit signed integer type. Range: -128 to 127

`int16`

16-bit signed integer type. Range: -32,768 to 32,767

`int32`

32-bit signed integer type. Range: -2,147,483,648 to 2,147,483,647

`int64`

64-bit signed integer type. Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

`uint8`

8-bit unsigned integer type. Range: 0 to 255

`uint16`

16-bit unsigned integer type. Range: 0 to 65,535

`uint32`

32-bit unsigned integer type. Range: 0 to 4,294,967,295

`uint64`

64-bit unsigned integer type. Range: 0 to 18,446,744,073,709,551,615

`float32`

32-bit IEEE 754 single-precision floating-point type. Precision: ~7 decimal digits

`float64`

64-bit IEEE 754 double-precision floating-point type. Precision: ~15-17 decimal digits

`bool-type`

Boolean type stored as bit-packed array. Values: true or false

`string-type`

UTF-8 encoded variable-length string type. Stored as offsets + data buffer for efficient slicing.

`binary-type`

Variable-length binary data type. Similar to string but without UTF-8 encoding requirement.

`date-type`

Date type representing days since Unix epoch (1970-01-01). Stored as 32-bit signed integer.

`timestamp-type`

Timestamp type representing microseconds since Unix epoch. Stored as 64-bit signed integer.

`list-type`

Creates a list type containing elements of the given type.

DataType -> DataType

# List of integers
int-list-type = Arrow.list-type Arrow.int64
# Nested list of strings
nested-type = Arrow.list-type (Arrow.list-type Arrow.string-type)

`struct-type`

Creates a struct type with the given fields.

List Field -> DataType

point-type = Arrow.struct-type [
  Arrow.field "x" Arrow.float64,
  Arrow.field "y" Arrow.float64
]

`field`

Creates a nullable field with the given name and data type. Nullable fields can contain None/null values in addition to values of the specified type.

NonEmptyString -> DataType -> Field

age-field = Arrow.field "age" Arrow.int32  # Can be None

`field-not-null`

Creates a non-nullable field with the given name and data type. Non-nullable fields must always contain a value; null is not permitted. Use for required columns that should never be missing.

NonEmptyString -> DataType -> Field

id-field = Arrow.field-not-null "id" Arrow.int64  # Required

`int64-array`

Creates an Int64 array from a list of integers. Allocates a new Arrow array in columnar format with 64-bit signed integers.

List Int -> Result Array ArrowError

match Arrow.int64-array [1, 2, 3, 4, 5]
  | Ok arr ->
    print "Created array with ${Arrow.array-length arr} elements"
    Arrow.free-array arr
  | Err msg -> print "Error: ${msg}"

`float64-array`

Creates a Float64 array from a list of floats. Allocates a new Arrow array in columnar format with 64-bit floating-point numbers.

List Float -> Result Array ArrowError

`string-array`

Creates a String array from a list of strings. Allocates a new Arrow array for UTF-8 encoded variable-length strings.

List String -> Result Array ArrowError

`bool-array`

Creates a Bool array from a list of booleans. Allocates a new Arrow array for boolean values stored as bit-packed data.

List Bool -> Result Array ArrowError

`array-length`

Returns the length of an array.

Array -> Int

`get`

Gets the value at the given index, returning None if out of bounds or null.

Array -> Int -> Option a

`is-null?`

Checks if the value at the given index is null.

Parameters:

- `arr - Array` - The Arrow array
- `index - Int` - Zero-based index to check

Returns:

Array -> Int -> Bool

if Arrow.is-null? arr 5 then
  print "Element 5 is null"

`null-count`

Returns the count of null values in the array.

Parameters:

- `arr - Array` - The Arrow array

Returns:

Array -> Int

nulls = Arrow.null-count arr
print "Array has ${nulls} null values"

`to-list`

Converts an array to a list of Option values.

Creates a Kit list from an Arrow array, preserving null information. Note: This copies all data from columnar to row format.

Parameters:

- `arr - Array` - The Arrow array to convert

Returns:

Array -> List (Option a)

values = Arrow.to-list arr
List.iter (fn(v) => match v | Some x -> print x | None -> print "null") values

`free-array`

Frees the memory used by an array.

Releases the native memory allocated for the Arrow array. Must be called when the array is no longer needed to prevent memory leaks.

Parameters:

- `arr - Array` - The Arrow array to free

Returns:

Warning: Do not use the array after calling this function.

Array -> Unit

`schema`

Creates a schema from a list of fields.

A schema defines the structure of a table or record batch by specifying the names, types, and nullability of columns.

Parameters:

- `fields - List Field` - Ordered list of field definitions

Returns: - `Ok Schema` on success- `Err String` if creation fails

List Field -> Result Schema ArrowError

match Arrow.schema [
  Arrow.field "id" Arrow.int64,
  Arrow.field "name" Arrow.string-type,
  Arrow.field "price" Arrow.float64
]
  | Ok s -> # use schema
  | Err msg -> print "Error: ${msg}"

`free-schema`

Frees the memory used by a schema.

Parameters:

- `s - Schema` - The schema to free

Returns:

Schema -> Unit

`record-batch`

Creates a record batch from a schema and list of column arrays.

A record batch is a contiguous chunk of columnar data. All arrays must have the same length and match the schema field types.

Parameters:

- `schema - Schema` - Schema defining column structure
- `columns - List Array` - Arrays for each column (must match schema)

Returns:

Errors: - Returns Err if arrays have mismatched lengths - Returns Err if column count doesn't match schema

Schema -> List Array -> Result RecordBatch ArrowError

`num-rows`

Returns the number of rows in a record batch.

Parameters:

- `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

`num-columns`

Returns the number of columns in a record batch.

Parameters:

- `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

`column`

Gets a column from a record batch by index.

Parameters:

- `batch - RecordBatch` - The record batch
- `index - Int` - Zero-based column index

Returns: - `Ok Array` with the column data- `Err String` if index out of bounds

RecordBatch -> Int -> Result Array ArrowError

`free-batch`

Frees the memory used by a record batch.

Parameters:

- `batch - RecordBatch` - The batch to free

Returns:

RecordBatch -> Unit

`table`

Creates a table from a schema and list of column arrays.

Tables are the primary data structure for analytical operations. Unlike RecordBatch, tables can be backed by multiple chunks.

Parameters:

- `schema - Schema` - Schema defining column structure
- `columns - List Array` - Arrays for each column

Returns:

Schema -> List Array -> Result Table ArrowError

match Arrow.table schema [ids, names, prices]
  | Ok tbl ->
    print "Created table with ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Error: ${msg}"

`table-column`

Gets a table column by index.

Parameters:

- `tbl - Table` - The table
- `index - Int` - Zero-based column index

Returns:

Table -> Int -> Result Array ArrowError

`table-column-by-name`

Gets a table column by name.

Parameters:

- `tbl - Table` - The table
- `name - String` - Column name

Returns: - `Err` if column name not found

Table -> NonEmptyString -> Result Array ArrowError

`table-num-rows`

Returns the number of rows in a table.

Parameters:

- `tbl - Table` - The table

Returns:

Table -> Int

`table-num-columns`

Returns the number of columns in a table.

Parameters:

- `tbl - Table` - The table

Returns:

Table -> Int

`free-table`

Frees the memory used by a table.

Parameters:

- `tbl - Table` - The table to free

Returns:

Table -> Unit

`build-table`

Creates an empty table builder.

Use with |> and add-column to incrementally build a table.

TableBuilder

Arrow.build-table
  |> Arrow.add-column "x" Arrow.int64 [1, 2]
  |> Arrow.finalize

`add-column`

Adds a column to the table builder.

Eagerly creates the Arrow array from the given values. If a previous column creation failed, this is a no-op (the error is preserved).

Supported types: all primitive types (int8 through uint64, float32, float64, bool-type, string-type, binary-type, date-type, timestamp-type). Nested types (list-type, struct-type) are not supported; create those arrays manually and use Arrow.table directly.

Parameters:

- `builder - TableBuilder` - The builder (piped with `|>`)
- `col-name - String` - Column name
- `dtype - DataType` - Arrow data type for this column
- `values - List a` - Column values

Returns:

TableBuilder -> String -> DataType -> List a -> TableBuilder

builder |> Arrow.add-column "score" Arrow.float64 [95.5, 87.3]

`finalize`

Finalizes the builder, creating a table from all accumulated columns.

If any add-column call failed, returns that error. Otherwise creates the schema and table from the accumulated fields and arrays.

Parameters:

- `builder - TableBuilder` - The completed builder

Returns:

TableBuilder -> Result Table ArrowError

Arrow.build-table
  |> Arrow.add-column "x" Arrow.int64 [1, 2, 3]
  |> Arrow.finalize

`table-head`

Returns the first n rows of a table.

Parameters:

- `n - Int` - Number of rows to return
- `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

match Arrow.table-head 10 tbl
  | Ok top10 -> # process first 10 rows

`table-tail`

Returns the last n rows of a table.

Parameters:

- `n - Int` - Number of rows to return
- `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

`slice`

Slices a table from start index with given length.

Parameters:

- `start - Int` - Starting row index (0-based)
- `length - Int` - Number of rows to include
- `tbl - Table` - Source table

Returns:

Int -> Int -> Table -> Result Table ArrowError

# Get rows 100-199
match Arrow.slice 100 100 tbl
  | Ok page -> # process page

`sum`

Calculates the sum of values in a numeric column.

Computes: $\sum_{i=0}^{n-1} x_i$

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Float

`mean`

Calculates the arithmetic mean of values in a numeric column.

Computes: $\bar{x} = \frac{1}{n}\sum_{i=0}^{n-1} x_i$

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Float

`min`

Returns the minimum value in a numeric column.

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Float

`max`

Returns the maximum value in a numeric column.

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Float

`count`

Counts non-null values in a column.

Parameters:

- `col-name - String` - Name of the column
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Int

`table-sort`

Sorts a table by column in ascending order.

Parameters:

- `col-name - String` - Column to sort by
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Result Table ArrowError

match Arrow.table-sort "price" products
  | Ok sorted -> # cheapest first

`table-sort-desc`

Sorts a table by column in descending order.

Parameters:

- `col-name - String` - Column to sort by
- `tbl - Table` - Source table

Returns:

NonEmptyString -> Table -> Result Table ArrowError

`table-concat`

Concatenates two tables vertically (row-wise union).

Tables must have compatible schemas.

Parameters:

- `tbl1 - Table` - First table
- `tbl2 - Table` - Second table

Returns:

Table -> Table -> Result Table ArrowError

`rename-column`

Renames a column in a table.

Parameters:

- `old-name - String` - Current column name
- `new-name - String` - New column name
- `tbl - Table` - Source table

Returns:

NonEmptyString -> NonEmptyString -> Table -> Result Table ArrowError

`column-name`

Gets the column name at a given index.

Parameters:

- `index - Int` - Zero-based column index
- `tbl - Table` - The table

Returns:

Int -> Table -> String

`write-ipc`

Writes a table to an Arrow IPC (Feather) file.

Arrow IPC is a binary format for efficient storage and transfer of Arrow data. IPC files preserve the exact schema and support memory-mapping for zero-copy reads.

Parameters:

- `tbl - Table` - Table to write
- `path - String` - Output file path (typically .arrow or .feather extension)

Returns:

Table -> NonEmptyString -> Result Unit ArrowError

match Arrow.write-ipc tbl "data.arrow"
  | Ok -> print "Saved successfully"
  | Err msg -> print "Failed: ${msg}"

`read-ipc`

Reads a table from an Arrow IPC (Feather) file.

Parameters:

- `path - String` - Path to the IPC file

Returns:

NonEmptyString -> Result Table ArrowError

match Arrow.read-ipc "data.arrow"
  | Ok tbl ->
    print "Loaded ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"

`write-parquet`

Writes a table to a Parquet file.

Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding for large datasets.

Parameters:

- `tbl - Table` - Table to write
- `path - String` - Output file path (typically .parquet extension)

Returns:

Table -> NonEmptyString -> Result Unit ArrowError

match Arrow.write-parquet tbl "data.parquet"
  | Ok -> print "Saved to Parquet"
  | Err msg -> print "Failed: ${msg}"

`read-parquet`

Reads a table from a Parquet file.

Parameters:

- `path - String` - Path to the Parquet file

Returns:

NonEmptyString -> Result Table ArrowError

match Arrow.read-parquet "data.parquet"
  | Ok tbl -> # process table
  | Err msg -> print "Failed: ${msg}"

`read-csv`

Reads a CSV file into a table.

Automatically infers column types from the data. For explicit schema control, read the CSV and then cast columns as needed.

Parameters:

- `path - String` - Path to the CSV file

Returns:

NonEmptyString -> Result Table ArrowError

match Arrow.read-csv "data.csv"
  | Ok tbl ->
    print "Loaded CSV with ${Arrow.table-num-columns tbl} columns"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"