opencl

Cross-platform GPU computing via OpenCL (works on NVIDIA, AMD, Intel, and Apple GPUs)

Works with OpenCL implementations from Apple, Intel, AMD, NVIDIA, Mesa, and ROCm.

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and generated files
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
examples/basic.kitBasic OpenCL platform discovery example
kit.tomlPackage manifest, native linker settings, and development tasks
src/main.kitPublic Kit API for OpenCL platform, device, memory, program, and kernel operations
tests/opencl.test.kitKit-side API and type tests
zig/kit_ffi.zigLocal Kit FFI helper definitions used by the Zig bindings
zig/opencl.zigZig FFI implementation that calls the OpenCL C API
zig/opencl_fixed.zigAlternate union-based Zig FFI helper implementation kept for compatibility reference

Dependencies

No Kit package dependencies.

This package requires Kit's FFI capability and a native OpenCL runtime.

Requirements

PlatformRequirement
macOSOpenCL.framework, included with macOS
LinuxOpenCL ICD loader, OpenCL headers, and a vendor driver/runtime

Linux package examples:

DistributionCommand
Ubuntu/Debiansudo apt install ocl-icd-opencl-dev
Fedorasudo dnf install ocl-icd-devel
Archsudo pacman -S ocl-icd

Install one driver/runtime for your hardware:

VendorPackage/runtime
NVIDIAnvidia-opencl-icd
AMDmesa-opencl-icd or rocm-opencl-runtime
Intelintel-opencl-icd

Installation

kit add gitlab.com/kit-lang/packages/kit-opencl.git

Usage

import Opencl as CL

main = fn =>
  match CL.get-platforms 0
    | Err e ->
      println "Failed to enumerate OpenCL platforms: ${show e}"
    | Ok platforms ->
      println "Found ${length platforms} OpenCL platform(s)"
      show-platforms platforms 1

show-platforms = fn(platforms, idx) =>
  match platforms
    | [] -> no-op
    | [platform | rest] ->
      match CL.platform-info platform
        | Ok info ->
          println "Platform ${idx}: ${info.name}"
          println "  Vendor: ${info.vendor}"
          println "  Version: ${info.version}"
        | Err e ->
          println "Platform ${idx}: ${show e}"

      show-platforms rest (idx + 1)

main

get-platforms currently takes a placeholder 0 argument. This keeps the API stable across the interpreter and compiled backend while Kit's zero-arity imported extern-zig auto-call path is being worked through.

To run kernels, discover a device with get-devices, create a context and queue, create buffers, build an OpenCL C program, create a kernel, set kernel arguments, enqueue work, finish the queue, and read results back:

# Given a selected device:
ctx = CL.create-context [device] |> Result.unwrap
defer CL.release-context ctx

queue = CL.create-queue ctx device |> Result.unwrap
defer CL.release-queue queue

source = "__kernel void scale(__global float* data) { int i = get_global_id(0); data[i] = data[i] * 2.0f; }"
program = CL.create-program ctx source |> Result.unwrap
defer CL.release-program program

CL.build-program program [device] "" |> Result.unwrap

kernel = CL.create-kernel program "scale" |> Result.unwrap
defer CL.release-kernel kernel

API Overview

Platform and Device Discovery

FunctionDescription
get-platforms 0Get all OpenCL platforms
get-devices platform device-typeGet devices of a type from a platform
platform-info platformGet platform details: name, vendor, version, profile
device-info deviceGet device details: name, type, compute units, memory, clock

Context and Queue

FunctionDescription
create-context devicesCreate a context for one or more devices
release-context ctxRelease context resources
create-queue ctx deviceCreate a command queue for a device
release-queue queueRelease queue resources
flush queueFlush queued commands without blocking
finish queueBlock until queued commands complete

Memory

FunctionDescription
create-buffer ctx size flagsCreate a device buffer
release-buffer bufferRelease a buffer
write-buffer-f32 queue buffer dataWrite List Float data
read-buffer-f32 queue buffer countRead float data
write-buffer-i32 queue buffer dataWrite List Int data
read-buffer-i32 queue buffer countRead int data

Program and Kernel

FunctionDescription
create-program ctx sourceCreate a program from OpenCL C source
build-program program devices optionsCompile a program for devices
release-program programRelease program resources
create-kernel program nameCreate a kernel from a built program
release-kernel kernelRelease kernel resources
set-arg-buffer kernel index bufferSet a buffer argument
set-arg-int kernel index valueSet an integer argument
set-arg-float kernel index valueSet a float argument
enqueue-kernel queue kernel global-size local-sizeEnqueue kernel execution

Types

type DeviceType =
  | DeviceCPU
  | DeviceGPU
  | DeviceAccelerator
  | DeviceAll

type MemFlags =
  | ReadWrite
  | ReadOnly
  | WriteOnly

type Platform = Platform {id: Int}
type Device = Device {id: Int}
type Context = Context {handle: Int}
type CommandQueue = CommandQueue {handle: Int}
type Buffer = Buffer {handle: Int, size: Int}
type Program = Program {handle: Int}
type Kernel = Kernel {handle: Int}

Most operations return Result value OpenCLError:

type OpenCLError =
  | OpenCLError {code: Int, message: String}
  | NoPlatforms {message: String}
  | NoDevices {message: String}
  | BuildError {message: String, log: String}
  | InvalidArgument {message: String}
  | OutOfMemory {message: String}

OpenCL Kernel Notes

OpenCL kernels are written in OpenCL C:

__kernel void scale(__global float* data, const int n) {
    int i = get_global_id(0);
    if (i < n) {
        data[i] = data[i] * 2.0f;
    }
}

Common OpenCL C functions:

FunctionDescription
get_global_id(dim)Global work-item ID in a dimension
get_local_id(dim)Local work-item ID within a work-group
get_group_id(dim)Work-group ID
get_global_size(dim)Total number of work-items
get_local_size(dim)Work-items per work-group
barrier(CLK_LOCAL_MEM_FENCE)Synchronize work-items in a work-group

Development

Running Examples

Run examples with the interpreter:

kit run examples/basic.kit

Compile examples to a native binary:

kit build examples/basic.kit && ./basic

Running Tests

Run the test suite:

kit test

Run the test suite with coverage:

kit test --coverage

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

  1. Format and check source files in src/
  2. Check examples in examples/
  3. Run tests in tests/ with coverage

Checking Interpreter/Compiler Parity

Run the package parity check:

kit parity --no-spinner --failures-only

This builds and runs each example through both the interpreter and compiler, then compares output. Keep examples deterministic; OpenCL device enumeration can differ by backend, driver, and host configuration.

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html.

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/opencl/, making it available for import as Opencl in other projects.

Troubleshooting

No platforms found

Make sure the OpenCL ICD loader and at least one vendor runtime are installed. On Linux, clinfo is useful for checking whether the system can see OpenCL platforms and devices.

On macOS, kit.toml links OpenCL.framework through the [native] section. On Linux, make sure the OpenCL headers and ICD loader development package are installed.

Different devices between interpreter and compiled binary

OpenCL platform and device visibility can depend on process environment, driver behavior, and backend-specific native linking. Prefer deterministic examples and tests that do not assume a specific GPU is present.

License

This package is released under the MIT License - see LICENSE for details.

Exported Functions & Types

OpenCLError

Error types for OpenCL operations.

Variants

OpenCLError {code, message}
OpenCL API call failed
NoPlatforms {message}
No platforms found
NoDevices {message}
No devices found
BuildError {message, log}
Program build failed
InvalidArgument {message}
Invalid argument
OutOfMemory {message}
Out of memory

Platform

An OpenCL platform (implementation like Intel, NVIDIA, etc.)

Variants

Platform {id}

Device

An OpenCL device (GPU, CPU, or accelerator)

Variants

Device {id}

DeviceType

Device type enumeration

Variants

DeviceCPU
DeviceGPU
DeviceAccelerator
DeviceAll

PlatformInfo

Information about an OpenCL platform.

Variants

PlatformInfo {name, vendor, version, profile}

DeviceInfo

Information about an OpenCL device.

Variants

DeviceInfo {name, vendor, version, device-type, max-compute-units, max-work-group-size, global-mem-size, local-mem-size, max-clock-freq}

Context

An OpenCL context (execution environment)

Variants

Context {handle}

CommandQueue

An OpenCL command queue

Variants

CommandQueue {handle}

Buffer

An OpenCL memory buffer

Variants

Buffer {handle, size}

Program

An OpenCL program (compiled kernel code)

Variants

Program {handle}

Kernel

An OpenCL kernel (function to execute)

Variants

Kernel {handle}

Event

An OpenCL event (for synchronization)

Variants

Event {handle}

get-platforms

Returns all available OpenCL platforms. Pass 0 as the placeholder argument.

Int -> Result [Platform] OpenCLError

get-devices

Returns devices of a given type for a platform.

Platform -> DeviceType -> Result [Device] OpenCLError

platform-info

Gets information about a platform.

Platform -> Result PlatformInfo OpenCLError

device-info

Gets information about a device.

Device -> Result DeviceInfo OpenCLError

create-context

Creates a context for the given devices.

[Device] -> Result Context OpenCLError

release-context

Releases a context.

Context -> Result () OpenCLError

create-queue

Creates a command queue for a device in a context.

Context -> Device -> Result CommandQueue OpenCLError

release-queue

Releases a command queue.

CommandQueue -> Result () OpenCLError

flush

Flushes commands in a queue (non-blocking).

CommandQueue -> Result () OpenCLError

finish

Waits for all commands in a queue to complete.

CommandQueue -> Result () OpenCLError

MemFlags

Memory flags for buffer creation

Variants

ReadWrite
ReadOnly
WriteOnly

create-buffer

Creates a buffer in device memory.

Context -> Int -> MemFlags -> Result Buffer OpenCLError

release-buffer

Releases a buffer.

Buffer -> Result () OpenCLError

write-buffer-f32

Writes float data to a buffer.

CommandQueue -> Buffer -> [Float] -> Result () OpenCLError

read-buffer-f32

Reads float data from a buffer.

CommandQueue -> Buffer -> Int -> Result [Float] OpenCLError

write-buffer-i32

Writes int data to a buffer.

CommandQueue -> Buffer -> [Int] -> Result () OpenCLError

read-buffer-i32

Reads int data from a buffer.

CommandQueue -> Buffer -> Int -> Result [Int] OpenCLError

create-program

Creates a program from OpenCL C source code.

Context -> String -> Result Program OpenCLError

build-program

Builds a program for the specified devices.

Program -> [Device] -> String -> Result () OpenCLError

release-program

Releases a program.

Program -> Result () OpenCLError

create-kernel

Creates a kernel from a built program.

Program -> String -> Result Kernel OpenCLError

release-kernel

Releases a kernel.

Kernel -> Result () OpenCLError

set-arg-buffer

Sets a buffer argument for a kernel.

Kernel -> Int -> Buffer -> Result () OpenCLError

set-arg-int

Sets an integer argument for a kernel.

Kernel -> Int -> Int -> Result () OpenCLError

set-arg-float

Sets a float argument for a kernel.

Kernel -> Int -> Float -> Result () OpenCLError

enqueue-kernel

Enqueues a kernel for execution. global-size: Total number of work items local-size: Work items per work group (use [] for auto)

CommandQueue -> Kernel -> [Int] -> [Int] -> Result () OpenCLError