arrow

Apache Arrow columnar data format for Kit

Files

FileDescription
kit.tomlPackage manifest with metadata and dependencies
src/arrow.kitColumnar arrays, schemas, tables, and file I/O
tests/arrow.test.kitTests for error types and data type constants
examples/analytics.kitSales analysis with aggregations
examples/basic.kitArray creation and value access
examples/file-io.kitCSV, Parquet, and IPC file operations
examples/tables.kitMulti-column tables with schema
LICENSEMIT license file

Dependencies

  • parquet

Installation

kit add gitlab.com/kit-lang/packages/kit-arrow.git

Usage

import Kit.Arrow

License

MIT License - see LICENSE for details.

Exported Functions & Types

ArrowError

Arrow error type for typed error handling.

Variants

ArrowCreateError {message}
ArrowIOError {message}
ArrowAccessError {message}

DataType

Arrow data types for columnar arrays.

These types correspond to Apache Arrow's type system and determine how data is stored in memory. Arrow uses fixed-width types for efficient SIMD operations and variable-width types for strings and binary data.

Primitive Types: - Int8Type, Int16Type, Int32Type, Int64Type: Signed integers - UInt8Type, UInt16Type, UInt32Type, UInt64Type: Unsigned integers - Float32Type, Float64Type: IEEE 754 floating-point numbers - BoolType: Boolean values (stored as bit-packed array) - StringType: UTF-8 encoded variable-length strings - BinaryType: Variable-length binary data

Temporal Types: - DateType: Date as days since Unix epoch (1970-01-01) - TimestampType: Timestamp as microseconds since Unix epoch

Nested Types: - ListType DataType: Variable-length list of elements - StructType List Field: Record with named fields

Variants

Int8Type
Int16Type
Int32Type
Int64Type
UInt8Type
UInt16Type
UInt32Type
UInt64Type
Float32Type
Float64Type
BoolType
StringType
BinaryType
DateType
TimestampType
ListType {DataType}
StructType {List, Field}

int8

8-bit signed integer type. Range: -128 to 127

int16

16-bit signed integer type. Range: -32,768 to 32,767

int32

32-bit signed integer type. Range: -2,147,483,648 to 2,147,483,647

int64

64-bit signed integer type. Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

uint8

8-bit unsigned integer type. Range: 0 to 255

uint16

16-bit unsigned integer type. Range: 0 to 65,535

uint32

32-bit unsigned integer type. Range: 0 to 4,294,967,295

uint64

64-bit unsigned integer type. Range: 0 to 18,446,744,073,709,551,615

float32

32-bit IEEE 754 single-precision floating-point type. Precision: ~7 decimal digits

float64

64-bit IEEE 754 double-precision floating-point type. Precision: ~15-17 decimal digits

bool-type

Boolean type stored as bit-packed array. Values: true or false

string-type

UTF-8 encoded variable-length string type. Stored as offsets + data buffer for efficient slicing.

binary-type

Variable-length binary data type. Similar to string but without UTF-8 encoding requirement.

date-type

Date type representing days since Unix epoch (1970-01-01). Stored as 32-bit signed integer.

timestamp-type

Timestamp type representing microseconds since Unix epoch. Stored as 64-bit signed integer.

list-type

Creates a list type containing elements of the given type.

DataType -> DataType

# List of integers
int-list-type = Arrow.list-type Arrow.int64
# Nested list of strings
nested-type = Arrow.list-type (Arrow.list-type Arrow.string-type)

struct-type

Creates a struct type with the given fields.

List Field -> DataType

point-type = Arrow.struct-type [
  Arrow.field "x" Arrow.float64,
  Arrow.field "y" Arrow.float64
]

field

Creates a nullable field with the given name and data type. Nullable fields can contain None/null values in addition to values of the specified type.

String -> DataType -> Field

age-field = Arrow.field "age" Arrow.int32  # Can be None

field-not-null

Creates a non-nullable field with the given name and data type. Non-nullable fields must always contain a value; null is not permitted. Use for required columns that should never be missing.

String -> DataType -> Field

id-field = Arrow.field-not-null "id" Arrow.int64  # Required

int64-array

Creates an Int64 array from a list of integers. Allocates a new Arrow array in columnar format with 64-bit signed integers.

List Int -> Result Array ArrowError

match Arrow.int64-array [1, 2, 3, 4, 5]
  | Ok arr ->
    print "Created array with ${Arrow.array-length arr} elements"
    Arrow.free-array arr
  | Err msg -> print "Error: ${msg}"

float64-array

Creates a Float64 array from a list of floats. Allocates a new Arrow array in columnar format with 64-bit floating-point numbers.

List Float -> Result Array ArrowError

string-array

Creates a String array from a list of strings. Allocates a new Arrow array for UTF-8 encoded variable-length strings.

List String -> Result Array ArrowError

bool-array

Creates a Bool array from a list of booleans. Allocates a new Arrow array for boolean values stored as bit-packed data.

List Bool -> Result Array ArrowError

array-length

Returns the length of an array.

Array -> Int

get

Gets the value at the given index, returning None if out of bounds or null.

Array -> Int -> Option a

is-null?

Checks if the value at the given index is null.

Parameters:

  • - `arr - Array` - The Arrow array
  • - `index - Int` - Zero-based index to check

Returns:

Array -> Int -> Bool

if Arrow.is-null? arr 5 then
  print "Element 5 is null"

null-count

Returns the count of null values in the array.

Parameters:

  • - `arr - Array` - The Arrow array

Returns:

Array -> Int

nulls = Arrow.null-count arr
print "Array has ${nulls} null values"

to-list

Converts an array to a list of Option values.

Creates a Kit list from an Arrow array, preserving null information. Note: This copies all data from columnar to row format.

Parameters:

  • - `arr - Array` - The Arrow array to convert

Returns:

Array -> List (Option a)

values = Arrow.to-list arr
List.iter (fn(v) => match v | Some x -> print x | None -> print "null") values

free-array

Frees the memory used by an array.

Releases the native memory allocated for the Arrow array. Must be called when the array is no longer needed to prevent memory leaks.

Parameters:

  • - `arr - Array` - The Arrow array to free

Returns:

Warning: Do not use the array after calling this function.

Array -> Unit

schema

Creates a schema from a list of fields.

A schema defines the structure of a table or record batch by specifying the names, types, and nullability of columns.

Parameters:

  • - `fields - List Field` - Ordered list of field definitions

Returns: - `Ok Schema` on success- `Err String` if creation fails

List Field -> Result Schema ArrowError

match Arrow.schema [
  Arrow.field "id" Arrow.int64,
  Arrow.field "name" Arrow.string-type,
  Arrow.field "price" Arrow.float64
]
  | Ok s -> # use schema
  | Err msg -> print "Error: ${msg}"

free-schema

Frees the memory used by a schema.

Parameters:

  • - `s - Schema` - The schema to free

Returns:

Schema -> Unit

record-batch

Creates a record batch from a schema and list of column arrays.

A record batch is a contiguous chunk of columnar data. All arrays must have the same length and match the schema field types.

Parameters:

  • - `schema - Schema` - Schema defining column structure
  • - `columns - List Array` - Arrays for each column (must match schema)

Returns:

Errors: - Returns Err if arrays have mismatched lengths - Returns Err if column count doesn't match schema

Schema -> List Array -> Result RecordBatch ArrowError

num-rows

Returns the number of rows in a record batch.

Parameters:

  • - `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

num-columns

Returns the number of columns in a record batch.

Parameters:

  • - `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

column

Gets a column from a record batch by index.

Parameters:

  • - `batch - RecordBatch` - The record batch
  • - `index - Int` - Zero-based column index

Returns: - `Ok Array` with the column data- `Err String` if index out of bounds

RecordBatch -> Int -> Result Array ArrowError

free-batch

Frees the memory used by a record batch.

Parameters:

  • - `batch - RecordBatch` - The batch to free

Returns:

RecordBatch -> Unit

table

Creates a table from a schema and list of column arrays.

Tables are the primary data structure for analytical operations. Unlike RecordBatch, tables can be backed by multiple chunks.

Parameters:

  • - `schema - Schema` - Schema defining column structure
  • - `columns - List Array` - Arrays for each column

Returns:

Schema -> List Array -> Result Table ArrowError

match Arrow.table schema [ids, names, prices]
  | Ok tbl ->
    print "Created table with ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Error: ${msg}"

table-column

Gets a table column by index.

Parameters:

  • - `tbl - Table` - The table
  • - `index - Int` - Zero-based column index

Returns:

Table -> Int -> Result Array ArrowError

table-column-by-name

Gets a table column by name.

Parameters:

  • - `tbl - Table` - The table
  • - `name - String` - Column name

Returns: - `Err` if column name not found

Table -> String -> Result Array ArrowError

table-num-rows

Returns the number of rows in a table.

Parameters:

  • - `tbl - Table` - The table

Returns:

Table -> Int

table-num-columns

Returns the number of columns in a table.

Parameters:

  • - `tbl - Table` - The table

Returns:

Table -> Int

free-table

Frees the memory used by a table.

Parameters:

  • - `tbl - Table` - The table to free

Returns:

Table -> Unit

table-head

Returns the first n rows of a table.

Parameters:

  • - `n - Int` - Number of rows to return
  • - `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

match Arrow.table-head 10 tbl
  | Ok top10 -> # process first 10 rows

table-tail

Returns the last n rows of a table.

Parameters:

  • - `n - Int` - Number of rows to return
  • - `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

slice

Slices a table from start index with given length.

Parameters:

  • - `start - Int` - Starting row index (0-based)
  • - `length - Int` - Number of rows to include
  • - `tbl - Table` - Source table

Returns:

Int -> Int -> Table -> Result Table ArrowError

# Get rows 100-199
match Arrow.slice 100 100 tbl
  | Ok page -> # process page

sum

Calculates the sum of values in a numeric column.

Computes: $\sum_{i=0}^{n-1} x_i$

Parameters:

  • - `col-name - String` - Name of the numeric column
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Float

mean

Calculates the arithmetic mean of values in a numeric column.

Computes: $\bar{x} = \frac{1}{n}\sum_{i=0}^{n-1} x_i$

Parameters:

  • - `col-name - String` - Name of the numeric column
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Float

min

Returns the minimum value in a numeric column.

Parameters:

  • - `col-name - String` - Name of the numeric column
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Float

max

Returns the maximum value in a numeric column.

Parameters:

  • - `col-name - String` - Name of the numeric column
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Float

count

Counts non-null values in a column.

Parameters:

  • - `col-name - String` - Name of the column
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Int

table-sort

Sorts a table by column in ascending order.

Parameters:

  • - `col-name - String` - Column to sort by
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Result Table ArrowError

match Arrow.table-sort "price" products
  | Ok sorted -> # cheapest first

table-sort-desc

Sorts a table by column in descending order.

Parameters:

  • - `col-name - String` - Column to sort by
  • - `tbl - Table` - Source table

Returns:

String -> Table -> Result Table ArrowError

table-concat

Concatenates two tables vertically (row-wise union).

Tables must have compatible schemas.

Parameters:

  • - `tbl1 - Table` - First table
  • - `tbl2 - Table` - Second table

Returns:

Table -> Table -> Result Table ArrowError

rename-column

Renames a column in a table.

Parameters:

  • - `old-name - String` - Current column name
  • - `new-name - String` - New column name
  • - `tbl - Table` - Source table

Returns:

String -> String -> Table -> Result Table ArrowError

column-name

Gets the column name at a given index.

Parameters:

  • - `index - Int` - Zero-based column index
  • - `tbl - Table` - The table

Returns:

Int -> Table -> String

write-ipc

Writes a table to an Arrow IPC (Feather) file.

Arrow IPC is a binary format for efficient storage and transfer of Arrow data. IPC files preserve the exact schema and support memory-mapping for zero-copy reads.

Parameters:

  • - `tbl - Table` - Table to write
  • - `path - String` - Output file path (typically .arrow or .feather extension)

Returns:

Table -> String -> Result Unit ArrowError

match Arrow.write-ipc tbl "data.arrow"
  | Ok -> print "Saved successfully"
  | Err msg -> print "Failed: ${msg}"

read-ipc

Reads a table from an Arrow IPC (Feather) file.

Parameters:

  • - `path - String` - Path to the IPC file

Returns:

String -> Result Table ArrowError

match Arrow.read-ipc "data.arrow"
  | Ok tbl ->
    print "Loaded ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"

write-parquet

Writes a table to a Parquet file.

Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding for large datasets.

Parameters:

  • - `tbl - Table` - Table to write
  • - `path - String` - Output file path (typically .parquet extension)

Returns:

Table -> String -> Result Unit ArrowError

match Arrow.write-parquet tbl "data.parquet"
  | Ok -> print "Saved to Parquet"
  | Err msg -> print "Failed: ${msg}"

read-parquet

Reads a table from a Parquet file.

Parameters:

  • - `path - String` - Path to the Parquet file

Returns:

String -> Result Table ArrowError

match Arrow.read-parquet "data.parquet"
  | Ok tbl -> # process table
  | Err msg -> print "Failed: ${msg}"

read-csv

Reads a CSV file into a table.

Automatically infers column types from the data. For explicit schema control, read the CSV and then cast columns as needed.

Parameters:

  • - `path - String` - Path to the CSV file

Returns:

String -> Result Table ArrowError

match Arrow.read-csv "data.csv"
  | Ok tbl ->
    print "Loaded CSV with ${Arrow.table-num-columns tbl} columns"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"