parquet
| Kind | ffi-c |
|---|---|
| Capabilities | ffi file |
| Categories | data-format analytics ffi |
| Keywords | parquet columnar storage analytics arrow |
Apache Parquet columnar storage format for Kit
Files
| File | Description |
|---|---|
.editorconfig | Editor formatting configuration |
.gitignore | Git ignore rules for build artifacts and dependencies |
.tool-versions | asdf tool versions (Zig, Kit) |
LICENSE | MIT license file |
README.md | This file |
c/kit_parquet.cpp | C++ FFI wrapper for Apache Parquet |
c/kit_parquet.h | C header for FFI wrapper |
examples/analytics.kit | Example: analytics with Arrow tables |
examples/basic.kit | Basic usage example |
examples/read.kit | Example: reading Parquet files |
examples/sample.parquet | Sample Parquet file for examples |
examples/write.kit | Example: writing Parquet files |
kit.toml | Package manifest with metadata and dependencies |
src/parquet.kit | kit-parquet: Apache Parquet columnar storage for Kit |
tests/parquet.test.kit | Tests for parquet |
Dependencies
No Kit package dependencies.
Native Dependencies
Apache Arrow C++ library with Parquet support is required:
| Platform | Install Command |
|---|---|
| macOS | brew install apache-arrow |
| Ubuntu | sudo apt install libarrow-dev libparquet-dev |
| Fedora | sudo dnf install libarrow-devel parquet-libs-devel |
Installation
kit add gitlab.com/kit-lang/packages/kit-parquet.gitUsage
import Kit.Parquet as Parquet
main = fn(-env: Env) =>
# Check if a file is valid Parquet
if Parquet.is-parquet? "data.parquet" then
println "Valid Parquet file"
Parquet.summary "data.parquet"
else
println "Not a valid Parquet file"
# Open and inspect a Parquet file
match Parquet.open "data.parquet"
| Ok reader ->
defer Parquet.close reader
meta = Parquet.metadata reader
println "Rows: ${show meta.num-rows}"
println "Columns: ${show meta.num-columns}"
println "Row Groups: ${show meta.num-row-groups}"
# Read column names
names = Parquet.column-names reader
println "Column names: ${show names}"
# Read entire file as Arrow table
match Parquet.read-table reader
| Ok table-ptr -> println "Read table successfully"
| Err e -> println "Read error: ${show e}"
| Err e ->
println "Could not open file: ${show e}"
mainDevelopment
Running Examples
Run examples with the interpreter:
kit run examples/basic.kitCompile examples to a native binary:
kit build examples/basic.kit && ./basicRunning Tests
Run the test suite:
kit testRun the test suite with coverage:
kit test --coverageRunning kit dev
Run the standard development workflow (format, check, test):
kit devThis will:
- Format and check source files in
src/ - Run tests in
tests/with coverage
Generating Documentation
Generate API documentation from doc comments:
kit docNote: Kit sources with doc comments (##) will generate HTML documents in docs/*.html
Cleaning Build Artifacts
Remove generated files, caches, and build artifacts:
kit task cleanNote: Defined in kit.toml.
Local Installation
To install this package locally for development:
kit installThis installs the package to ~/.kit/packages/@kit/parquet/, making it available for import as Kit.Parquet in other projects.
License
This package is released under the MIT License - see LICENSE for details.
Apache Parquet is an open-source columnar storage format maintained by the Apache Software Foundation under the Apache License 2.0.
Exported Functions & Types
ParquetError
Parquet error type with specific variants for different failure modes.
Variants
ParquetReadError {message}ParquetWriteError {message}ParquetAccessError {message}uncompressed
Compression
snappy
Compression
gzip
Compression
lz4
Compression
zstd
Compression
brotli
Compression
default-options
WriteOptions
options
WriteOptions
with-compression
WriteOptions -> Compression -> WriteOptions
with-row-group-size
WriteOptions -> PositiveInt -> WriteOptions
with-metadata
WriteOptions -> NonEmptyString -> String -> WriteOptions
open
Open a Parquet file for reading
String -> Result Reader ParquetError
close
Close a Parquet reader
Reader -> Void
read-table
Read entire file as Arrow table
Reader -> Result Ptr ParquetError
read-row-group
Read a specific row group as Arrow record batch
Reader -> NonNegativeInt -> Result Ptr ParquetError
read-column
Read a specific column
Reader -> Int -> Result Ptr ParquetError
read
Read entire Parquet file into Arrow table (convenience function)
NonEmptyString -> Result Ptr ParquetError
read-rows
Read with row selection
NonEmptyString -> Int -> NonNegativeInt -> Result Ptr ParquetError
read-columns
Read specific columns only
NonEmptyString -> List String -> Result Ptr ParquetError
create-writer
Create a Parquet writer with Arrow schema
NonEmptyString -> Ptr -> Result Writer ParquetError
create-writer-with-options
Create writer with options
NonEmptyString -> Ptr -> WriteOptions -> Result Writer ParquetError
write-table
Write Arrow table to Parquet
Writer -> Ptr -> Result () ParquetError
write-batch
Write Arrow record batch to Parquet
Writer -> Ptr -> Result () ParquetError
close-writer
Close writer and finalize file
Writer -> Result () ParquetError
write
Write Arrow table to Parquet file (convenience function)
NonEmptyString -> Ptr -> Ptr -> Result () ParquetError
write-with-options
Write with options
NonEmptyString -> Ptr -> Ptr -> WriteOptions -> Result () ParquetError
metadata
Get file metadata
Reader -> FileMetadata
num-rows
Get number of rows
Reader -> Int
num-row-groups
Get number of row groups
Reader -> Int
num-columns
Get number of columns
Reader -> Int
column-name
Get column name by index
Reader -> NonNegativeInt -> Option String
column-names
Get all column names
Reader -> List String
get-metadata
Get custom metadata value
Reader -> NonEmptyString -> Option String
row-group-metadata
Get row group metadata
Reader -> NonNegativeInt -> Result RowGroupMetadata ParquetError
column-stats
Get column statistics for a row group
Reader -> Int -> Int -> Result {min: Int, max: Int, null-count: Int, distinct-count: Int} ParquetError
schema
Get Arrow schema from Parquet file
Reader -> Result Ptr ParquetError
column-descriptor
Get column descriptor
Reader -> NonNegativeInt -> Result ColumnDescriptor ParquetError
is-parquet?
Check if file is a valid Parquet file
NonEmptyString -> Bool
file-info
Get Parquet file size info
NonEmptyString -> Result {path: String, num-rows: Int, num-row-groups: Int, num-columns: Int, columns: List String} ParquetError
summary
Print file summary
NonEmptyString -> Void