arrow

Kind	ffi-c
Categories	data-format analytics ffi
Keywords	arrow columnar data analytics dataframe

Apache Arrow columnar data format for Kit

Files

File	Description
`kit.toml`	Package manifest with metadata and dependencies
`src/arrow.kit`	Columnar arrays, schemas, tables, and file I/O
`tests/arrow.test.kit`	Tests for error types and data type constants
`examples/analytics.kit`	Sales analysis with aggregations
`examples/basic.kit`	Array creation and value access
`examples/file-io.kit`	CSV, Parquet, and IPC file operations
`examples/tables.kit`	Multi-column tables with schema
`LICENSE`	MIT license file

Dependencies

parquet

Installation

kit add gitlab.com/kit-lang/packages/kit-arrow.git

Usage

import Kit.Arrow

License

MIT License - see LICENSE for details.

Exported Functions & Types

`ArrowError`

Arrow error type for typed error handling.

Variants

ArrowCreateError {message}

ArrowIOError {message}

ArrowAccessError {message}

`DataType`

Arrow data types for columnar arrays.

These types correspond to Apache Arrow's type system and determine how data is stored in memory. Arrow uses fixed-width types for efficient SIMD operations and variable-width types for strings and binary data.

Primitive Types: - Int8Type, Int16Type, Int32Type, Int64Type: Signed integers - UInt8Type, UInt16Type, UInt32Type, UInt64Type: Unsigned integers - Float32Type, Float64Type: IEEE 754 floating-point numbers - BoolType: Boolean values (stored as bit-packed array) - StringType: UTF-8 encoded variable-length strings - BinaryType: Variable-length binary data

Temporal Types: - DateType: Date as days since Unix epoch (1970-01-01) - TimestampType: Timestamp as microseconds since Unix epoch

Nested Types: - ListType DataType: Variable-length list of elements - StructType List Field: Record with named fields

Variants

Int8Type

Int16Type

Int32Type

Int64Type

UInt8Type

UInt16Type

UInt32Type

UInt64Type

Float32Type

Float64Type

BoolType

StringType

BinaryType

DateType

TimestampType

ListType {DataType}

StructType {List, Field}

`int8`

8-bit signed integer type. Range: -128 to 127

`int16`

16-bit signed integer type. Range: -32,768 to 32,767

`int32`

32-bit signed integer type. Range: -2,147,483,648 to 2,147,483,647

`int64`

64-bit signed integer type. Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

`uint8`

8-bit unsigned integer type. Range: 0 to 255

`uint16`

16-bit unsigned integer type. Range: 0 to 65,535

`uint32`

32-bit unsigned integer type. Range: 0 to 4,294,967,295

`uint64`

64-bit unsigned integer type. Range: 0 to 18,446,744,073,709,551,615

`float32`

32-bit IEEE 754 single-precision floating-point type. Precision: ~7 decimal digits

`float64`

64-bit IEEE 754 double-precision floating-point type. Precision: ~15-17 decimal digits

`bool-type`

Boolean type stored as bit-packed array. Values: true or false

`string-type`

UTF-8 encoded variable-length string type. Stored as offsets + data buffer for efficient slicing.

`binary-type`

Variable-length binary data type. Similar to string but without UTF-8 encoding requirement.

`date-type`

Date type representing days since Unix epoch (1970-01-01). Stored as 32-bit signed integer.

`timestamp-type`

Timestamp type representing microseconds since Unix epoch. Stored as 64-bit signed integer.

`list-type`

Creates a list type containing elements of the given type.

DataType -> DataType

# List of integers
int-list-type = Arrow.list-type Arrow.int64
# Nested list of strings
nested-type = Arrow.list-type (Arrow.list-type Arrow.string-type)

`struct-type`

Creates a struct type with the given fields.

List Field -> DataType

point-type = Arrow.struct-type [
  Arrow.field "x" Arrow.float64,
  Arrow.field "y" Arrow.float64
]

`field`

Creates a nullable field with the given name and data type. Nullable fields can contain None/null values in addition to values of the specified type.

String -> DataType -> Field

age-field = Arrow.field "age" Arrow.int32  # Can be None

`field-not-null`

Creates a non-nullable field with the given name and data type. Non-nullable fields must always contain a value; null is not permitted. Use for required columns that should never be missing.

String -> DataType -> Field

id-field = Arrow.field-not-null "id" Arrow.int64  # Required

`int64-array`

Creates an Int64 array from a list of integers. Allocates a new Arrow array in columnar format with 64-bit signed integers.

List Int -> Result Array ArrowError

match Arrow.int64-array [1, 2, 3, 4, 5]
  | Ok arr ->
    print "Created array with ${Arrow.array-length arr} elements"
    Arrow.free-array arr
  | Err msg -> print "Error: ${msg}"

`float64-array`

Creates a Float64 array from a list of floats. Allocates a new Arrow array in columnar format with 64-bit floating-point numbers.

List Float -> Result Array ArrowError

`string-array`

Creates a String array from a list of strings. Allocates a new Arrow array for UTF-8 encoded variable-length strings.

List String -> Result Array ArrowError

`bool-array`

Creates a Bool array from a list of booleans. Allocates a new Arrow array for boolean values stored as bit-packed data.

List Bool -> Result Array ArrowError

`array-length`

Returns the length of an array.

Array -> Int

`get`

Gets the value at the given index, returning None if out of bounds or null.

Array -> Int -> Option a

`is-null?`

Checks if the value at the given index is null.

Parameters:

- `arr - Array` - The Arrow array
- `index - Int` - Zero-based index to check

Returns:

Array -> Int -> Bool

if Arrow.is-null? arr 5 then
  print "Element 5 is null"

`null-count`

Returns the count of null values in the array.

Parameters:

- `arr - Array` - The Arrow array

Returns:

Array -> Int

nulls = Arrow.null-count arr
print "Array has ${nulls} null values"

`to-list`

Converts an array to a list of Option values.

Creates a Kit list from an Arrow array, preserving null information. Note: This copies all data from columnar to row format.

Parameters:

- `arr - Array` - The Arrow array to convert

Returns:

Array -> List (Option a)

values = Arrow.to-list arr
List.iter (fn(v) => match v | Some x -> print x | None -> print "null") values

`free-array`

Frees the memory used by an array.

Releases the native memory allocated for the Arrow array. Must be called when the array is no longer needed to prevent memory leaks.

Parameters:

- `arr - Array` - The Arrow array to free

Returns:

Warning: Do not use the array after calling this function.

Array -> Unit

`schema`

Creates a schema from a list of fields.

A schema defines the structure of a table or record batch by specifying the names, types, and nullability of columns.

Parameters:

- `fields - List Field` - Ordered list of field definitions

Returns: - `Ok Schema` on success- `Err String` if creation fails

List Field -> Result Schema ArrowError

match Arrow.schema [
  Arrow.field "id" Arrow.int64,
  Arrow.field "name" Arrow.string-type,
  Arrow.field "price" Arrow.float64
]
  | Ok s -> # use schema
  | Err msg -> print "Error: ${msg}"

`free-schema`

Frees the memory used by a schema.

Parameters:

- `s - Schema` - The schema to free

Returns:

Schema -> Unit

`record-batch`

Creates a record batch from a schema and list of column arrays.

A record batch is a contiguous chunk of columnar data. All arrays must have the same length and match the schema field types.

Parameters:

- `schema - Schema` - Schema defining column structure
- `columns - List Array` - Arrays for each column (must match schema)

Returns:

Errors: - Returns Err if arrays have mismatched lengths - Returns Err if column count doesn't match schema

Schema -> List Array -> Result RecordBatch ArrowError

`num-rows`

Returns the number of rows in a record batch.

Parameters:

- `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

`num-columns`

Returns the number of columns in a record batch.

Parameters:

- `batch - RecordBatch` - The record batch

Returns:

RecordBatch -> Int

`column`

Gets a column from a record batch by index.

Parameters:

- `batch - RecordBatch` - The record batch
- `index - Int` - Zero-based column index

Returns: - `Ok Array` with the column data- `Err String` if index out of bounds

RecordBatch -> Int -> Result Array ArrowError

`free-batch`

Frees the memory used by a record batch.

Parameters:

- `batch - RecordBatch` - The batch to free

Returns:

RecordBatch -> Unit

`table`

Creates a table from a schema and list of column arrays.

Tables are the primary data structure for analytical operations. Unlike RecordBatch, tables can be backed by multiple chunks.

Parameters:

- `schema - Schema` - Schema defining column structure
- `columns - List Array` - Arrays for each column

Returns:

Schema -> List Array -> Result Table ArrowError

match Arrow.table schema [ids, names, prices]
  | Ok tbl ->
    print "Created table with ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Error: ${msg}"

`table-column`

Gets a table column by index.

Parameters:

- `tbl - Table` - The table
- `index - Int` - Zero-based column index

Returns:

Table -> Int -> Result Array ArrowError

`table-column-by-name`

Gets a table column by name.

Parameters:

- `tbl - Table` - The table
- `name - String` - Column name

Returns: - `Err` if column name not found

Table -> String -> Result Array ArrowError

`table-num-rows`

Returns the number of rows in a table.

Parameters:

- `tbl - Table` - The table

Returns:

Table -> Int

`table-num-columns`

Returns the number of columns in a table.

Parameters:

- `tbl - Table` - The table

Returns:

Table -> Int

`free-table`

Frees the memory used by a table.

Parameters:

- `tbl - Table` - The table to free

Returns:

Table -> Unit

`table-head`

Returns the first n rows of a table.

Parameters:

- `n - Int` - Number of rows to return
- `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

match Arrow.table-head 10 tbl
  | Ok top10 -> # process first 10 rows

`table-tail`

Returns the last n rows of a table.

Parameters:

- `n - Int` - Number of rows to return
- `tbl - Table` - Source table

Returns:

Int -> Table -> Result Table ArrowError

`slice`

Slices a table from start index with given length.

Parameters:

- `start - Int` - Starting row index (0-based)
- `length - Int` - Number of rows to include
- `tbl - Table` - Source table

Returns:

Int -> Int -> Table -> Result Table ArrowError

# Get rows 100-199
match Arrow.slice 100 100 tbl
  | Ok page -> # process page

`sum`

Calculates the sum of values in a numeric column.

Computes: $\sum_{i=0}^{n-1} x_i$

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

String -> Table -> Float

`mean`

Calculates the arithmetic mean of values in a numeric column.

Computes: $\bar{x} = \frac{1}{n}\sum_{i=0}^{n-1} x_i$

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

String -> Table -> Float

`min`

Returns the minimum value in a numeric column.

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

String -> Table -> Float

`max`

Returns the maximum value in a numeric column.

Parameters:

- `col-name - String` - Name of the numeric column
- `tbl - Table` - Source table

Returns:

String -> Table -> Float

`count`

Counts non-null values in a column.

Parameters:

- `col-name - String` - Name of the column
- `tbl - Table` - Source table

Returns:

String -> Table -> Int

`table-sort`

Sorts a table by column in ascending order.

Parameters:

- `col-name - String` - Column to sort by
- `tbl - Table` - Source table

Returns:

String -> Table -> Result Table ArrowError

match Arrow.table-sort "price" products
  | Ok sorted -> # cheapest first

`table-sort-desc`

Sorts a table by column in descending order.

Parameters:

- `col-name - String` - Column to sort by
- `tbl - Table` - Source table

Returns:

String -> Table -> Result Table ArrowError

`table-concat`

Concatenates two tables vertically (row-wise union).

Tables must have compatible schemas.

Parameters:

- `tbl1 - Table` - First table
- `tbl2 - Table` - Second table

Returns:

Table -> Table -> Result Table ArrowError

`rename-column`

Renames a column in a table.

Parameters:

- `old-name - String` - Current column name
- `new-name - String` - New column name
- `tbl - Table` - Source table

Returns:

String -> String -> Table -> Result Table ArrowError

`column-name`

Gets the column name at a given index.

Parameters:

- `index - Int` - Zero-based column index
- `tbl - Table` - The table

Returns:

Int -> Table -> String

`write-ipc`

Writes a table to an Arrow IPC (Feather) file.

Arrow IPC is a binary format for efficient storage and transfer of Arrow data. IPC files preserve the exact schema and support memory-mapping for zero-copy reads.

Parameters:

- `tbl - Table` - Table to write
- `path - String` - Output file path (typically .arrow or .feather extension)

Returns:

Table -> String -> Result Unit ArrowError

match Arrow.write-ipc tbl "data.arrow"
  | Ok -> print "Saved successfully"
  | Err msg -> print "Failed: ${msg}"

`read-ipc`

Reads a table from an Arrow IPC (Feather) file.

Parameters:

- `path - String` - Path to the IPC file

Returns:

String -> Result Table ArrowError

match Arrow.read-ipc "data.arrow"
  | Ok tbl ->
    print "Loaded ${Arrow.table-num-rows tbl} rows"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"

`write-parquet`

Writes a table to a Parquet file.

Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding for large datasets.

Parameters:

- `tbl - Table` - Table to write
- `path - String` - Output file path (typically .parquet extension)

Returns:

Table -> String -> Result Unit ArrowError

match Arrow.write-parquet tbl "data.parquet"
  | Ok -> print "Saved to Parquet"
  | Err msg -> print "Failed: ${msg}"

`read-parquet`

Reads a table from a Parquet file.

Parameters:

- `path - String` - Path to the Parquet file

Returns:

String -> Result Table ArrowError

match Arrow.read-parquet "data.parquet"
  | Ok tbl -> # process table
  | Err msg -> print "Failed: ${msg}"

`read-csv`

Reads a CSV file into a table.

Automatically infers column types from the data. For explicit schema control, read the CSV and then cast columns as needed.

Parameters:

- `path - String` - Path to the CSV file

Returns:

String -> Result Table ArrowError

match Arrow.read-csv "data.csv"
  | Ok tbl ->
    print "Loaded CSV with ${Arrow.table-num-columns tbl} columns"
    Arrow.free-table tbl
  | Err msg -> print "Failed: ${msg}"