kreuzberg

Kreuzberg document extraction bindings for Kit

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
examples/basic.kitBasic usage example
examples/batch.kitExample: batch
examples/info.kitExample: info
examples/ocr.kitExample: ocr
examples/validation.kitExample: validation
include/kreuzberg.hKreuzberg C FFI header
kit.tomlPackage manifest with metadata and dependencies
src/kreuzberg.kitKit wrapper API for Kreuzberg document extraction
tests/all-functions.test.kitTests for API surface coverage
tests/error-types.test.kitTests for Kreuzberg error variants
tests/kreuzberg.test.kitTests for package constants and helpers
tests/real-file.test.kitTests for real-file helper behavior
zig/kit_ffi.zigShared Kit Zig FFI helpers
zig/kreuzberg.zigZig FFI module for Kreuzberg

Dependencies

No Kit package dependencies.

This package requires the native libkreuzberg_ffi library to be available at runtime. Install or build the Kreuzberg FFI library and make sure it can be found from one of the configured native library paths, such as /usr/local/lib or /opt/homebrew/lib.

Installation

kit add gitlab.com/kit-lang/packages/kit-kreuzberg.git

Usage

import Kit.Kreuzberg as Kreuzberg

main = fn =>
  # Show the native Kreuzberg library version, if available.
  match Kreuzberg.version
    | Some version -> println "Kreuzberg version: ${version}"
    | None -> println "Kreuzberg version: not available"

  # Extract plain text from a document path.
  match Kreuzberg.extract-file "document.pdf"
    | Ok content ->
        println "Content length: ${String.length content} characters"
    | Err e ->
        println "Extraction failed: ${e}"

  # Extract full document metadata.
  match Kreuzberg.extract-file-full "document.pdf"
    | Ok doc ->
        println "MIME type: ${doc.mime-type}"
        println "Content length: ${String.length doc.content} characters"
    | Err e ->
        println "Full extraction failed: ${e}"

  # Detect MIME types from file paths or byte prefixes.
  match Kreuzberg.detect-mime "document.pdf"
    | Some mime -> println "document.pdf: ${mime}"
    | None -> println "document.pdf: unknown MIME type"

  match Kreuzberg.detect-mime-bytes "%PDF-1.4"
    | Some mime -> println "PDF bytes: ${mime}"
    | None -> println "Could not detect MIME type"

  # Validate MIME types before extracting.
  match Kreuzberg.validate-mime-type "application/pdf"
    | Some canonical -> println "Canonical MIME type: ${canonical}"
    | None -> println "Invalid MIME type"

  # Extract from raw bytes with an explicit MIME type.
  sample = "Hello from Kit and Kreuzberg."
  match Kreuzberg.extract-bytes sample "text/plain"
    | Ok content -> println "Extracted from bytes: ${String.length content} chars"
    | Err e -> println "Bytes extraction failed: ${e}"

main

Development

Running Examples

Run examples with the interpreter:

kit run examples/basic.kit --allow=ffi --allow=file

Compile examples to a native binary:

kit build examples/basic.kit --allow=ffi --allow=file && ./basic

Running Tests

Run the test suite:

kit test --allow=ffi --allow=file

Run the test suite with coverage:

kit test --coverage --allow=ffi --allow=file

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

  1. Format and check source files in src/
  2. Run tests in tests/ with coverage

Running Parity Checks

Run interpreter/compiler parity for the examples:

kit parity --no-spinner --failures-only

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/kreuzberg/, making it available for import as Kit.Kreuzberg in other projects.

License

This package is released under the MIT License - see LICENSE for details.

Exported Functions & Types

KreuzbergError

Error type for Kreuzberg document extraction operations. Variants distinguish between extraction, configuration, and unsupported format errors.

Variants

KreuzbergExtractError {message}
KreuzbergConfigError {message}
KreuzbergUnsupportedError {message}