parquet

Apache Parquet columnar storage format for Kit

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
c/kit_parquet.cppC++ FFI wrapper for Apache Parquet
c/kit_parquet.hC header for FFI wrapper
examples/analytics.kitExample: analytics with Arrow tables
examples/basic.kitBasic usage example
examples/read.kitExample: reading Parquet files
examples/sample.parquetSample Parquet file for examples
examples/write.kitExample: writing Parquet files
kit.tomlPackage manifest with metadata and dependencies
src/parquet.kitkit-parquet: Apache Parquet columnar storage for Kit
tests/parquet.test.kitTests for parquet

Dependencies

No Kit package dependencies.

Native Dependencies

Apache Arrow C++ library with Parquet support is required:

PlatformInstall Command
macOSbrew install apache-arrow
Ubuntusudo apt install libarrow-dev libparquet-dev
Fedorasudo dnf install libarrow-devel parquet-libs-devel

Installation

kit add gitlab.com/kit-lang/packages/kit-parquet.git

Usage

import Kit.Parquet as Parquet

main = fn(-env: Env) =>
  # Check if a file is valid Parquet
  if Parquet.is-parquet? "data.parquet" then
    println "Valid Parquet file"
    Parquet.summary "data.parquet"
  else
    println "Not a valid Parquet file"

  # Open and inspect a Parquet file
  match Parquet.open "data.parquet"
    | Ok reader ->
      defer Parquet.close reader

      meta = Parquet.metadata reader
      println "Rows: ${show meta.num-rows}"
      println "Columns: ${show meta.num-columns}"
      println "Row Groups: ${show meta.num-row-groups}"

      # Read column names
      names = Parquet.column-names reader
      println "Column names: ${show names}"

      # Read entire file as Arrow table
      match Parquet.read-table reader
        | Ok table-ptr -> println "Read table successfully"
        | Err e -> println "Read error: ${show e}"

    | Err e ->
      println "Could not open file: ${show e}"

main

Development

Running Examples

Run examples with the interpreter:

kit run examples/basic.kit

Compile examples to a native binary:

kit build examples/basic.kit && ./basic

Running Tests

Run the test suite:

kit test

Run the test suite with coverage:

kit test --coverage

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

  1. Format and check source files in src/
  2. Run tests in tests/ with coverage

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/parquet/, making it available for import as Kit.Parquet in other projects.

License

This package is released under the MIT License - see LICENSE for details.

Apache Parquet is an open-source columnar storage format maintained by the Apache Software Foundation under the Apache License 2.0.

Exported Functions & Types

ParquetError

Parquet error type with specific variants for different failure modes.

Variants

ParquetReadError {message}
ParquetWriteError {message}
ParquetAccessError {message}

uncompressed

Compression

snappy

Compression

gzip

Compression

lz4

Compression

zstd

Compression

brotli

Compression

default-options

WriteOptions

options

WriteOptions

with-compression

WriteOptions -> Compression -> WriteOptions

with-row-group-size

WriteOptions -> PositiveInt -> WriteOptions

with-metadata

WriteOptions -> NonEmptyString -> String -> WriteOptions

open

Open a Parquet file for reading

String -> Result Reader ParquetError

close

Close a Parquet reader

Reader -> Void

read-table

Read entire file as Arrow table

Reader -> Result Ptr ParquetError

read-row-group

Read a specific row group as Arrow record batch

Reader -> NonNegativeInt -> Result Ptr ParquetError

read-column

Read a specific column

Reader -> Int -> Result Ptr ParquetError

read

Read entire Parquet file into Arrow table (convenience function)

NonEmptyString -> Result Ptr ParquetError

read-rows

Read with row selection

NonEmptyString -> Int -> NonNegativeInt -> Result Ptr ParquetError

read-columns

Read specific columns only

NonEmptyString -> List String -> Result Ptr ParquetError

create-writer

Create a Parquet writer with Arrow schema

NonEmptyString -> Ptr -> Result Writer ParquetError

create-writer-with-options

Create writer with options

NonEmptyString -> Ptr -> WriteOptions -> Result Writer ParquetError

write-table

Write Arrow table to Parquet

Writer -> Ptr -> Result () ParquetError

write-batch

Write Arrow record batch to Parquet

Writer -> Ptr -> Result () ParquetError

close-writer

Close writer and finalize file

Writer -> Result () ParquetError

write

Write Arrow table to Parquet file (convenience function)

NonEmptyString -> Ptr -> Ptr -> Result () ParquetError

write-with-options

Write with options

NonEmptyString -> Ptr -> Ptr -> WriteOptions -> Result () ParquetError

metadata

Get file metadata

Reader -> FileMetadata

num-rows

Get number of rows

Reader -> Int

num-row-groups

Get number of row groups

Reader -> Int

num-columns

Get number of columns

Reader -> Int

column-name

Get column name by index

Reader -> NonNegativeInt -> Option String

column-names

Get all column names

Reader -> List String

get-metadata

Get custom metadata value

Reader -> NonEmptyString -> Option String

row-group-metadata

Get row group metadata

Reader -> NonNegativeInt -> Result RowGroupMetadata ParquetError

column-stats

Get column statistics for a row group

Reader -> Int -> Int -> Result {min: Int, max: Int, null-count: Int, distinct-count: Int} ParquetError

schema

Get Arrow schema from Parquet file

Reader -> Result Ptr ParquetError

column-descriptor

Get column descriptor

Reader -> NonNegativeInt -> Result ColumnDescriptor ParquetError

is-parquet?

Check if file is a valid Parquet file

NonEmptyString -> Bool

file-info

Get Parquet file size info

NonEmptyString -> Result {path: String, num-rows: Int, num-row-groups: Int, num-columns: Int, columns: List String} ParquetError

summary

Print file summary

NonEmptyString -> Void