opencl
| Kind | ffi-zig |
|---|---|
| Capabilities | ffi |
| Categories | gpu parallel numeric ffi |
| Keywords | opencl gpu parallel compute cross-platform zig-ffi |
Cross-platform GPU computing via OpenCL (works on NVIDIA, AMD, Intel, and Apple GPUs)
Works on NVIDIA, AMD, Intel, and Apple GPUs.
Files
| File | Description |
|---|---|
.editorconfig | Editor formatting configuration |
.gitignore | Git ignore rules for build artifacts and dependencies |
.tool-versions | asdf tool versions (Zig, Kit) |
LICENSE | MIT license file |
README.md | This file |
examples/basic.kit | Basic usage example |
kit.toml | Package manifest with metadata and dependencies |
src/main.kit | Kit OpenCL Package |
tests/opencl.test.kit | Tests for opencl |
zig/kit_ffi.zig | Zig FFI module for kit ffi |
zig/opencl.zig | Zig FFI module for opencl |
Dependencies
No Kit package dependencies.
Requirements
- Kit compiler
- macOS: Built-in OpenCL.framework (no installation needed)
- Linux: OpenCL ICD loader and GPU driver (see install commands below)
Linux OpenCL Installation
| Distribution | Command |
|---|---|
| Ubuntu/Debian | sudo apt install ocl-icd-opencl-dev |
| Fedora | sudo dnf install ocl-icd-devel |
| Arch | sudo pacman -S ocl-icd |
Additionally, install a driver for your GPU:
- NVIDIA:
nvidia-opencl-icd - AMD:
mesa-opencl-icdorrocm-opencl-runtime - Intel:
intel-opencl-icd
Installation
In your project directory, add the package as a dependency:
kit add gitlab.com/kit-lang/packages/kit-opencl.gitThen import it in your Kit code:
import Opencl as CLQuick Start
import Opencl as CL
main = fn =>
# Discover platforms and devices
platforms = CL.get-platforms() |> Result.unwrap
println "Found ${length platforms} platform(s)"
# Get first GPU device
match platforms
| [p | _] ->
match CL.get-devices p CL.DeviceGPU
| Ok [device | _] ->
info = CL.device-info device |> Result.unwrap
println "Using: ${info.name}"
# Create context and queue
ctx = CL.create-context [device] |> Result.unwrap
queue = CL.create-queue ctx device |> Result.unwrap
# Vector addition kernel
source = "__kernel void add(__global float* a, __global float* b, __global float* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; }"
# Build program and kernel
program = CL.create-program ctx source |> Result.unwrap
CL.build-program program [device] "" |> Result.unwrap
kernel = CL.create-kernel program "add" |> Result.unwrap
# Create buffers
a = [1.0, 2.0, 3.0, 4.0]
b = [5.0, 6.0, 7.0, 8.0]
buf-a = CL.create-buffer ctx 16 CL.ReadOnly |> Result.unwrap
buf-b = CL.create-buffer ctx 16 CL.ReadOnly |> Result.unwrap
buf-c = CL.create-buffer ctx 16 CL.WriteOnly |> Result.unwrap
# Write data, execute, read result
CL.write-buffer-f32 queue buf-a a |> Result.unwrap
CL.write-buffer-f32 queue buf-b b |> Result.unwrap
CL.set-arg-buffer kernel 0 buf-a |> Result.unwrap
CL.set-arg-buffer kernel 1 buf-b |> Result.unwrap
CL.set-arg-buffer kernel 2 buf-c |> Result.unwrap
CL.enqueue-kernel queue kernel [4] [] |> Result.unwrap
CL.finish queue |> Result.unwrap
result = CL.read-buffer-f32 queue buf-c 4 |> Result.unwrap
println "Result: ${show result}" # [6.0, 8.0, 10.0, 12.0]
# Cleanup
CL.release-kernel kernel
CL.release-program program
CL.release-buffer buf-a
CL.release-buffer buf-b
CL.release-buffer buf-c
CL.release-queue queue
CL.release-context ctx
| _ -> println "No GPU found"
| [] -> println "No platforms found"
mainAPI Reference
Platform and Device Discovery
| Function | Description |
|---|---|
get-platforms() | Get all OpenCL platforms |
get-devices(platform, type) | Get devices of a type from a platform |
platform-info(platform) | Get platform details (name, vendor, version) |
device-info(device) | Get device details (name, compute units, memory) |
Context and Queue
| Function | Description |
|---|---|
create-context(devices) | Create context for a list of devices |
release-context(ctx) | Release context resources |
create-queue(ctx, device) | Create command queue for a device |
release-queue(queue) | Release queue resources |
flush(queue) | Flush pending commands |
finish(queue) | Wait for all commands to complete |
Memory Management
| Function | Description |
|---|---|
create-buffer(ctx, size, flags) | Create a buffer on the device |
release-buffer(buffer) | Release buffer resources |
write-buffer-f32(queue, buffer, data) | Write float data to buffer |
read-buffer-f32(queue, buffer, count) | Read floats from buffer |
write-buffer-i32(queue, buffer, data) | Write int data to buffer |
read-buffer-i32(queue, buffer, count) | Read ints from buffer |
Program and Kernel
| Function | Description |
|---|---|
create-program(ctx, source) | Create program from OpenCL C source |
build-program(program, devices, options) | Compile program for devices |
release-program(program) | Release program resources |
create-kernel(program, name) | Create kernel from compiled program |
release-kernel(kernel) | Release kernel resources |
Kernel Execution
| Function | Description |
|---|---|
set-arg-buffer(kernel, index, buffer) | Set buffer argument |
set-arg-int(kernel, index, value) | Set int argument |
set-arg-float(kernel, index, value) | Set float argument |
enqueue-kernel(queue, kernel, global, local) | Execute kernel |
Types
Error Types
type OpenCLError =
| OpenCLError {code: Int, message: String}
| NoPlatforms {message: String}
| NoDevices {message: String}
| BuildError {message: String, log: String}
| InvalidArgument {message: String}
| OutOfMemory {message: String}Device and Memory Types
type DeviceType = DeviceCPU | DeviceGPU | DeviceAccelerator | DeviceAll
type MemFlags = ReadWrite | ReadOnly | WriteOnlyHandle Types
type Platform = Platform {id: Int}
type Device = Device {id: Int}
type Context = Context {handle: Int}
type CommandQueue = CommandQueue {handle: Int}
type Buffer = Buffer {handle: Int, size: Int}
type Program = Program {handle: Int}
type Kernel = Kernel {handle: Int}Info Types
type PlatformInfo = PlatformInfo {
name: String,
vendor: String,
version: String,
profile: String
}
type DeviceInfo = DeviceInfo {
name: String,
vendor: String,
version: String,
device-type: String,
max-compute-units: Int,
max-work-group-size: Int,
global-mem-size: Int,
local-mem-size: Int,
max-clock-freq: Int
}OpenCL C Kernel Basics
OpenCL kernels are written in OpenCL C (a subset of C99):
__kernel void my_kernel(__global float* input,
__global float* output,
const int n) {
int i = get_global_id(0); // Global work-item ID
if (i < n) {
output[i] = input[i] * 2.0f;
}
}Key OpenCL C Functions
| Function | Description |
|---|---|
get_global_id(dim) | Global work-item ID in dimension |
get_local_id(dim) | Local work-item ID within work-group |
get_group_id(dim) | Work-group ID |
get_global_size(dim) | Total number of work-items |
get_local_size(dim) | Work-items per work-group |
barrier(CLK_LOCAL_MEM_FENCE) | Synchronize work-group |
Examples
Running Examples
cd kit-opencl
# Platform discovery and vector addition
kit run examples/basic.kitLicense
MIT License - see LICENSE for details.
Exported Functions & Types
OpenCLError
Error types for OpenCL operations.
Variants
OpenCLError {code, message}NoPlatforms {message}NoDevices {message}BuildError {message, log}InvalidArgument {message}OutOfMemory {message}Platform
An OpenCL platform (implementation like Intel, NVIDIA, etc.)
Variants
Platform {id}Device
An OpenCL device (GPU, CPU, or accelerator)
Variants
Device {id}DeviceType
Device type enumeration
Variants
DeviceCPUDeviceGPUDeviceAcceleratorDeviceAllPlatformInfo
Information about an OpenCL platform.
Variants
PlatformInfo {name, vendor, version, profile}DeviceInfo
Information about an OpenCL device.
Variants
DeviceInfo {name, vendor, version, device-type, max-compute-units, max-work-group-size, global-mem-size, local-mem-size, max-clock-freq}Context
An OpenCL context (execution environment)
Variants
Context {handle}CommandQueue
An OpenCL command queue
Variants
CommandQueue {handle}Buffer
An OpenCL memory buffer
Variants
Buffer {handle, size}Program
An OpenCL program (compiled kernel code)
Variants
Program {handle}Kernel
An OpenCL kernel (function to execute)
Variants
Kernel {handle}Event
An OpenCL event (for synchronization)
Variants
Event {handle}get-platforms
Returns all available OpenCL platforms.
() -> Result [Platform] OpenCLError
get-devices
Returns devices of a given type for a platform.
Platform -> DeviceType -> Result [Device] OpenCLError
platform-info
Gets information about a platform.
Platform -> Result PlatformInfo OpenCLError
device-info
Gets information about a device.
Device -> Result DeviceInfo OpenCLError
create-context
Creates a context for the given devices.
[Device] -> Result Context OpenCLError
release-context
Releases a context.
Context -> Result () OpenCLError
create-queue
Creates a command queue for a device in a context.
Context -> Device -> Result CommandQueue OpenCLError
release-queue
Releases a command queue.
CommandQueue -> Result () OpenCLError
flush
Flushes commands in a queue (non-blocking).
CommandQueue -> Result () OpenCLError
finish
Waits for all commands in a queue to complete.
CommandQueue -> Result () OpenCLError
MemFlags
Memory flags for buffer creation
Variants
ReadWriteReadOnlyWriteOnlycreate-buffer
Creates a buffer in device memory.
Context -> Int -> MemFlags -> Result Buffer OpenCLError
release-buffer
Releases a buffer.
Buffer -> Result () OpenCLError
write-buffer-f32
Writes float data to a buffer.
CommandQueue -> Buffer -> [Float] -> Result () OpenCLError
read-buffer-f32
Reads float data from a buffer.
CommandQueue -> Buffer -> Int -> Result [Float] OpenCLError
write-buffer-i32
Writes int data to a buffer.
CommandQueue -> Buffer -> [Int] -> Result () OpenCLError
read-buffer-i32
Reads int data from a buffer.
CommandQueue -> Buffer -> Int -> Result [Int] OpenCLError
create-program
Creates a program from OpenCL C source code.
Context -> String -> Result Program OpenCLError
build-program
Builds a program for the specified devices.
Program -> [Device] -> String -> Result () OpenCLError
release-program
Releases a program.
Program -> Result () OpenCLError
create-kernel
Creates a kernel from a built program.
Program -> String -> Result Kernel OpenCLError
release-kernel
Releases a kernel.
Kernel -> Result () OpenCLError
set-arg-buffer
Sets a buffer argument for a kernel.
Kernel -> Int -> Buffer -> Result () OpenCLError
set-arg-int
Sets an integer argument for a kernel.
Kernel -> Int -> Int -> Result () OpenCLError
set-arg-float
Sets a float argument for a kernel.
Kernel -> Int -> Float -> Result () OpenCLError
enqueue-kernel
Enqueues a kernel for execution. global-size: Total number of work items local-size: Work items per work group (use [] for auto)
CommandQueue -> Kernel -> [Int] -> [Int] -> Result () OpenCLError