arrow
| Kind | ffi-c |
|---|---|
| Categories | data-format analytics ffi |
| Keywords | arrow columnar data analytics dataframe |
Apache Arrow columnar data format for Kit
Files
| File | Description |
|---|---|
kit.toml | Package manifest with metadata and dependencies |
src/arrow.kit | Columnar arrays, schemas, tables, and file I/O |
tests/arrow.test.kit | Tests for error types and data type constants |
examples/analytics.kit | Sales analysis with aggregations |
examples/basic.kit | Array creation and value access |
examples/file-io.kit | CSV, Parquet, and IPC file operations |
examples/tables.kit | Multi-column tables with schema |
LICENSE | MIT license file |
Dependencies
parquet
Installation
kit add gitlab.com/kit-lang/packages/kit-arrow.gitUsage
import Kit.ArrowLicense
MIT License - see LICENSE for details.
Exported Functions & Types
ArrowError
Arrow error type for typed error handling.
Variants
ArrowCreateError {message}ArrowIOError {message}ArrowAccessError {message}DataType
Arrow data types for columnar arrays.
These types correspond to Apache Arrow's type system and determine how data is stored in memory. Arrow uses fixed-width types for efficient SIMD operations and variable-width types for strings and binary data.
Primitive Types: - Int8Type, Int16Type, Int32Type, Int64Type: Signed integers - UInt8Type, UInt16Type, UInt32Type, UInt64Type: Unsigned integers - Float32Type, Float64Type: IEEE 754 floating-point numbers - BoolType: Boolean values (stored as bit-packed array) - StringType: UTF-8 encoded variable-length strings - BinaryType: Variable-length binary data
Temporal Types: - DateType: Date as days since Unix epoch (1970-01-01) - TimestampType: Timestamp as microseconds since Unix epoch
Nested Types: - ListType DataType: Variable-length list of elements - StructType List Field: Record with named fields
Variants
Int8TypeInt16TypeInt32TypeInt64TypeUInt8TypeUInt16TypeUInt32TypeUInt64TypeFloat32TypeFloat64TypeBoolTypeStringTypeBinaryTypeDateTypeTimestampTypeListType {DataType}StructType {List, Field}int8
8-bit signed integer type. Range: -128 to 127
int16
16-bit signed integer type. Range: -32,768 to 32,767
int32
32-bit signed integer type. Range: -2,147,483,648 to 2,147,483,647
int64
64-bit signed integer type. Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
uint8
8-bit unsigned integer type. Range: 0 to 255
uint16
16-bit unsigned integer type. Range: 0 to 65,535
uint32
32-bit unsigned integer type. Range: 0 to 4,294,967,295
uint64
64-bit unsigned integer type. Range: 0 to 18,446,744,073,709,551,615
float32
32-bit IEEE 754 single-precision floating-point type. Precision: ~7 decimal digits
float64
64-bit IEEE 754 double-precision floating-point type. Precision: ~15-17 decimal digits
bool-type
Boolean type stored as bit-packed array. Values: true or false
string-type
UTF-8 encoded variable-length string type. Stored as offsets + data buffer for efficient slicing.
binary-type
Variable-length binary data type. Similar to string but without UTF-8 encoding requirement.
date-type
Date type representing days since Unix epoch (1970-01-01). Stored as 32-bit signed integer.
timestamp-type
Timestamp type representing microseconds since Unix epoch. Stored as 64-bit signed integer.
list-type
Creates a list type containing elements of the given type.
DataType -> DataType
# List of integers
int-list-type = Arrow.list-type Arrow.int64
# Nested list of strings
nested-type = Arrow.list-type (Arrow.list-type Arrow.string-type)struct-type
Creates a struct type with the given fields.
List Field -> DataType
point-type = Arrow.struct-type [
Arrow.field "x" Arrow.float64,
Arrow.field "y" Arrow.float64
]field
Creates a nullable field with the given name and data type. Nullable fields can contain None/null values in addition to values of the specified type.
String -> DataType -> Field
age-field = Arrow.field "age" Arrow.int32 # Can be Nonefield-not-null
Creates a non-nullable field with the given name and data type. Non-nullable fields must always contain a value; null is not permitted. Use for required columns that should never be missing.
String -> DataType -> Field
id-field = Arrow.field-not-null "id" Arrow.int64 # Requiredint64-array
Creates an Int64 array from a list of integers. Allocates a new Arrow array in columnar format with 64-bit signed integers.
List Int -> Result Array ArrowError
match Arrow.int64-array [1, 2, 3, 4, 5]
| Ok arr ->
print "Created array with ${Arrow.array-length arr} elements"
Arrow.free-array arr
| Err msg -> print "Error: ${msg}"float64-array
Creates a Float64 array from a list of floats. Allocates a new Arrow array in columnar format with 64-bit floating-point numbers.
List Float -> Result Array ArrowError
string-array
Creates a String array from a list of strings. Allocates a new Arrow array for UTF-8 encoded variable-length strings.
List String -> Result Array ArrowError
bool-array
Creates a Bool array from a list of booleans. Allocates a new Arrow array for boolean values stored as bit-packed data.
List Bool -> Result Array ArrowError
array-length
Returns the length of an array.
Array -> Int
get
Gets the value at the given index, returning None if out of bounds or null.
Array -> Int -> Option a
is-null?
Checks if the value at the given index is null.
Parameters:
- `arr- Array` - The Arrow array- `index- Int` - Zero-based index to check
Returns:
Array -> Int -> Bool
if Arrow.is-null? arr 5 then
print "Element 5 is null"null-count
Returns the count of null values in the array.
Parameters:
- `arr- Array` - The Arrow array
Returns:
Array -> Int
nulls = Arrow.null-count arr
print "Array has ${nulls} null values"to-list
Converts an array to a list of Option values.
Creates a Kit list from an Arrow array, preserving null information. Note: This copies all data from columnar to row format.
Parameters:
- `arr- Array` - The Arrow array to convert
Returns:
Array -> List (Option a)
values = Arrow.to-list arr
List.iter (fn(v) => match v | Some x -> print x | None -> print "null") valuesfree-array
Frees the memory used by an array.
Releases the native memory allocated for the Arrow array. Must be called when the array is no longer needed to prevent memory leaks.
Parameters:
- `arr- Array` - The Arrow array to free
Returns:
Warning: Do not use the array after calling this function.
Array -> Unit
schema
Creates a schema from a list of fields.
A schema defines the structure of a table or record batch by specifying the names, types, and nullability of columns.
Parameters:
- `fields- List Field` - Ordered list of field definitions
Returns: - `Ok Schema` on success- `Err String` if creation fails
List Field -> Result Schema ArrowError
match Arrow.schema [
Arrow.field "id" Arrow.int64,
Arrow.field "name" Arrow.string-type,
Arrow.field "price" Arrow.float64
]
| Ok s -> # use schema
| Err msg -> print "Error: ${msg}"free-schema
Frees the memory used by a schema.
Parameters:
- `s- Schema` - The schema to free
Returns:
Schema -> Unit
record-batch
Creates a record batch from a schema and list of column arrays.
A record batch is a contiguous chunk of columnar data. All arrays must have the same length and match the schema field types.
Parameters:
- `schema- Schema` - Schema defining column structure- `columns- List Array` - Arrays for each column (must match schema)
Returns:
Errors: - Returns Err if arrays have mismatched lengths - Returns Err if column count doesn't match schema
Schema -> List Array -> Result RecordBatch ArrowError
num-rows
Returns the number of rows in a record batch.
Parameters:
- `batch- RecordBatch` - The record batch
Returns:
RecordBatch -> Int
num-columns
Returns the number of columns in a record batch.
Parameters:
- `batch- RecordBatch` - The record batch
Returns:
RecordBatch -> Int
column
Gets a column from a record batch by index.
Parameters:
- `batch- RecordBatch` - The record batch- `index- Int` - Zero-based column index
Returns: - `Ok Array` with the column data- `Err String` if index out of bounds
RecordBatch -> Int -> Result Array ArrowError
free-batch
Frees the memory used by a record batch.
Parameters:
- `batch- RecordBatch` - The batch to free
Returns:
RecordBatch -> Unit
table
Creates a table from a schema and list of column arrays.
Tables are the primary data structure for analytical operations. Unlike RecordBatch, tables can be backed by multiple chunks.
Parameters:
- `schema- Schema` - Schema defining column structure- `columns- List Array` - Arrays for each column
Returns:
Schema -> List Array -> Result Table ArrowError
match Arrow.table schema [ids, names, prices]
| Ok tbl ->
print "Created table with ${Arrow.table-num-rows tbl} rows"
Arrow.free-table tbl
| Err msg -> print "Error: ${msg}"table-column
Gets a table column by index.
Parameters:
- `tbl- Table` - The table- `index- Int` - Zero-based column index
Returns:
Table -> Int -> Result Array ArrowError
table-column-by-name
Gets a table column by name.
Parameters:
- `tbl- Table` - The table- `name- String` - Column name
Returns: - `Err` if column name not found
Table -> String -> Result Array ArrowError
table-num-rows
Returns the number of rows in a table.
Parameters:
- `tbl- Table` - The table
Returns:
Table -> Int
table-num-columns
Returns the number of columns in a table.
Parameters:
- `tbl- Table` - The table
Returns:
Table -> Int
free-table
Frees the memory used by a table.
Parameters:
- `tbl- Table` - The table to free
Returns:
Table -> Unit
table-head
Returns the first n rows of a table.
Parameters:
- `n- Int` - Number of rows to return- `tbl- Table` - Source table
Returns:
Int -> Table -> Result Table ArrowError
match Arrow.table-head 10 tbl
| Ok top10 -> # process first 10 rowstable-tail
Returns the last n rows of a table.
Parameters:
- `n- Int` - Number of rows to return- `tbl- Table` - Source table
Returns:
Int -> Table -> Result Table ArrowError
slice
Slices a table from start index with given length.
Parameters:
- `start- Int` - Starting row index (0-based)- `length- Int` - Number of rows to include- `tbl- Table` - Source table
Returns:
Int -> Int -> Table -> Result Table ArrowError
# Get rows 100-199
match Arrow.slice 100 100 tbl
| Ok page -> # process pagesum
Calculates the sum of values in a numeric column.
Computes: $\sum_{i=0}^{n-1} x_i$
Parameters:
- `col-name- String` - Name of the numeric column- `tbl- Table` - Source table
Returns:
String -> Table -> Float
mean
Calculates the arithmetic mean of values in a numeric column.
Computes: $\bar{x} = \frac{1}{n}\sum_{i=0}^{n-1} x_i$
Parameters:
- `col-name- String` - Name of the numeric column- `tbl- Table` - Source table
Returns:
String -> Table -> Float
min
Returns the minimum value in a numeric column.
Parameters:
- `col-name- String` - Name of the numeric column- `tbl- Table` - Source table
Returns:
String -> Table -> Float
max
Returns the maximum value in a numeric column.
Parameters:
- `col-name- String` - Name of the numeric column- `tbl- Table` - Source table
Returns:
String -> Table -> Float
count
Counts non-null values in a column.
Parameters:
- `col-name- String` - Name of the column- `tbl- Table` - Source table
Returns:
String -> Table -> Int
table-sort
Sorts a table by column in ascending order.
Parameters:
- `col-name- String` - Column to sort by- `tbl- Table` - Source table
Returns:
String -> Table -> Result Table ArrowError
match Arrow.table-sort "price" products
| Ok sorted -> # cheapest firsttable-sort-desc
Sorts a table by column in descending order.
Parameters:
- `col-name- String` - Column to sort by- `tbl- Table` - Source table
Returns:
String -> Table -> Result Table ArrowError
table-concat
Concatenates two tables vertically (row-wise union).
Tables must have compatible schemas.
Parameters:
- `tbl1- Table` - First table- `tbl2- Table` - Second table
Returns:
Table -> Table -> Result Table ArrowError
rename-column
Renames a column in a table.
Parameters:
- `old-name- String` - Current column name- `new-name- String` - New column name- `tbl- Table` - Source table
Returns:
String -> String -> Table -> Result Table ArrowError
column-name
Gets the column name at a given index.
Parameters:
- `index- Int` - Zero-based column index- `tbl- Table` - The table
Returns:
Int -> Table -> String
write-ipc
Writes a table to an Arrow IPC (Feather) file.
Arrow IPC is a binary format for efficient storage and transfer of Arrow data. IPC files preserve the exact schema and support memory-mapping for zero-copy reads.
Parameters:
- `tbl- Table` - Table to write- `path- String` - Output file path (typically .arrow or .feather extension)
Returns:
Table -> String -> Result Unit ArrowError
match Arrow.write-ipc tbl "data.arrow"
| Ok -> print "Saved successfully"
| Err msg -> print "Failed: ${msg}"read-ipc
Reads a table from an Arrow IPC (Feather) file.
Parameters:
- `path- String` - Path to the IPC file
Returns:
String -> Result Table ArrowError
match Arrow.read-ipc "data.arrow"
| Ok tbl ->
print "Loaded ${Arrow.table-num-rows tbl} rows"
Arrow.free-table tbl
| Err msg -> print "Failed: ${msg}"write-parquet
Writes a table to a Parquet file.
Parquet is a columnar storage format optimized for analytics workloads. It provides efficient compression and encoding for large datasets.
Parameters:
- `tbl- Table` - Table to write- `path- String` - Output file path (typically .parquet extension)
Returns:
Table -> String -> Result Unit ArrowError
match Arrow.write-parquet tbl "data.parquet"
| Ok -> print "Saved to Parquet"
| Err msg -> print "Failed: ${msg}"read-parquet
Reads a table from a Parquet file.
Parameters:
- `path- String` - Path to the Parquet file
Returns:
String -> Result Table ArrowError
match Arrow.read-parquet "data.parquet"
| Ok tbl -> # process table
| Err msg -> print "Failed: ${msg}"read-csv
Reads a CSV file into a table.
Automatically infers column types from the data. For explicit schema control, read the CSV and then cast columns as needed.
Parameters:
- `path- String` - Path to the CSV file
Returns:
String -> Result Table ArrowError
match Arrow.read-csv "data.csv"
| Ok tbl ->
print "Loaded CSV with ${Arrow.table-num-columns tbl} columns"
Arrow.free-table tbl
| Err msg -> print "Failed: ${msg}"