opencl

Cross-platform GPU computing via OpenCL (works on NVIDIA, AMD, Intel, and Apple GPUs)

Works on NVIDIA, AMD, Intel, and Apple GPUs.

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
examples/basic.kitBasic usage example
kit.tomlPackage manifest with metadata and dependencies
src/main.kitKit OpenCL Package
tests/opencl.test.kitTests for opencl
zig/kit_ffi.zigZig FFI module for kit ffi
zig/opencl.zigZig FFI module for opencl

Dependencies

No Kit package dependencies.

Requirements

  • Kit compiler
  • macOS: Built-in OpenCL.framework (no installation needed)
  • Linux: OpenCL ICD loader and GPU driver (see install commands below)

Linux OpenCL Installation

DistributionCommand
Ubuntu/Debiansudo apt install ocl-icd-opencl-dev
Fedorasudo dnf install ocl-icd-devel
Archsudo pacman -S ocl-icd

Additionally, install a driver for your GPU:

  • NVIDIA: nvidia-opencl-icd
  • AMD: mesa-opencl-icd or rocm-opencl-runtime
  • Intel: intel-opencl-icd

Installation

In your project directory, add the package as a dependency:

kit add gitlab.com/kit-lang/packages/kit-opencl.git

Then import it in your Kit code:

import Opencl as CL

Quick Start

import Opencl as CL

main = fn =>
  # Discover platforms and devices
  platforms = CL.get-platforms() |> Result.unwrap
  println "Found ${length platforms} platform(s)"

  # Get first GPU device
  match platforms
    | [p | _] ->
      match CL.get-devices p CL.DeviceGPU
        | Ok [device | _] ->
          info = CL.device-info device |> Result.unwrap
          println "Using: ${info.name}"

          # Create context and queue
          ctx = CL.create-context [device] |> Result.unwrap
          queue = CL.create-queue ctx device |> Result.unwrap

          # Vector addition kernel
          source = "__kernel void add(__global float* a, __global float* b, __global float* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; }"

          # Build program and kernel
          program = CL.create-program ctx source |> Result.unwrap
          CL.build-program program [device] "" |> Result.unwrap
          kernel = CL.create-kernel program "add" |> Result.unwrap

          # Create buffers
          a = [1.0, 2.0, 3.0, 4.0]
          b = [5.0, 6.0, 7.0, 8.0]

          buf-a = CL.create-buffer ctx 16 CL.ReadOnly |> Result.unwrap
          buf-b = CL.create-buffer ctx 16 CL.ReadOnly |> Result.unwrap
          buf-c = CL.create-buffer ctx 16 CL.WriteOnly |> Result.unwrap

          # Write data, execute, read result
          CL.write-buffer-f32 queue buf-a a |> Result.unwrap
          CL.write-buffer-f32 queue buf-b b |> Result.unwrap

          CL.set-arg-buffer kernel 0 buf-a |> Result.unwrap
          CL.set-arg-buffer kernel 1 buf-b |> Result.unwrap
          CL.set-arg-buffer kernel 2 buf-c |> Result.unwrap

          CL.enqueue-kernel queue kernel [4] [] |> Result.unwrap
          CL.finish queue |> Result.unwrap

          result = CL.read-buffer-f32 queue buf-c 4 |> Result.unwrap
          println "Result: ${show result}"  # [6.0, 8.0, 10.0, 12.0]

          # Cleanup
          CL.release-kernel kernel
          CL.release-program program
          CL.release-buffer buf-a
          CL.release-buffer buf-b
          CL.release-buffer buf-c
          CL.release-queue queue
          CL.release-context ctx
        | _ -> println "No GPU found"
    | [] -> println "No platforms found"

main

API Reference

Platform and Device Discovery

FunctionDescription
get-platforms()Get all OpenCL platforms
get-devices(platform, type)Get devices of a type from a platform
platform-info(platform)Get platform details (name, vendor, version)
device-info(device)Get device details (name, compute units, memory)

Context and Queue

FunctionDescription
create-context(devices)Create context for a list of devices
release-context(ctx)Release context resources
create-queue(ctx, device)Create command queue for a device
release-queue(queue)Release queue resources
flush(queue)Flush pending commands
finish(queue)Wait for all commands to complete

Memory Management

FunctionDescription
create-buffer(ctx, size, flags)Create a buffer on the device
release-buffer(buffer)Release buffer resources
write-buffer-f32(queue, buffer, data)Write float data to buffer
read-buffer-f32(queue, buffer, count)Read floats from buffer
write-buffer-i32(queue, buffer, data)Write int data to buffer
read-buffer-i32(queue, buffer, count)Read ints from buffer

Program and Kernel

FunctionDescription
create-program(ctx, source)Create program from OpenCL C source
build-program(program, devices, options)Compile program for devices
release-program(program)Release program resources
create-kernel(program, name)Create kernel from compiled program
release-kernel(kernel)Release kernel resources

Kernel Execution

FunctionDescription
set-arg-buffer(kernel, index, buffer)Set buffer argument
set-arg-int(kernel, index, value)Set int argument
set-arg-float(kernel, index, value)Set float argument
enqueue-kernel(queue, kernel, global, local)Execute kernel

Types

Error Types

type OpenCLError =
  | OpenCLError {code: Int, message: String}
  | NoPlatforms {message: String}
  | NoDevices {message: String}
  | BuildError {message: String, log: String}
  | InvalidArgument {message: String}
  | OutOfMemory {message: String}

Device and Memory Types

type DeviceType = DeviceCPU | DeviceGPU | DeviceAccelerator | DeviceAll
type MemFlags = ReadWrite | ReadOnly | WriteOnly

Handle Types

type Platform = Platform {id: Int}
type Device = Device {id: Int}
type Context = Context {handle: Int}
type CommandQueue = CommandQueue {handle: Int}
type Buffer = Buffer {handle: Int, size: Int}
type Program = Program {handle: Int}
type Kernel = Kernel {handle: Int}

Info Types

type PlatformInfo = PlatformInfo {
  name: String, 
  vendor: String, 
  version: String, 
  profile: String
}

type DeviceInfo = DeviceInfo {
  name: String, 
  vendor: String, 
  version: String, 
  device-type: String, 
  max-compute-units: Int, 
  max-work-group-size: Int, 
  global-mem-size: Int, 
  local-mem-size: Int, 
  max-clock-freq: Int
}

OpenCL C Kernel Basics

OpenCL kernels are written in OpenCL C (a subset of C99):

__kernel void my_kernel(__global float* input,
                        __global float* output,
                        const int n) {
    int i = get_global_id(0);  // Global work-item ID
    if (i < n) {
        output[i] = input[i] * 2.0f;
    }
}

Key OpenCL C Functions

FunctionDescription
get_global_id(dim)Global work-item ID in dimension
get_local_id(dim)Local work-item ID within work-group
get_group_id(dim)Work-group ID
get_global_size(dim)Total number of work-items
get_local_size(dim)Work-items per work-group
barrier(CLK_LOCAL_MEM_FENCE)Synchronize work-group

Examples

Running Examples

cd kit-opencl

# Platform discovery and vector addition
kit run examples/basic.kit

License

MIT License - see LICENSE for details.

Exported Functions & Types

OpenCLError

Error types for OpenCL operations.

Variants

OpenCLError {code, message}
OpenCL API call failed
NoPlatforms {message}
No platforms found
NoDevices {message}
No devices found
BuildError {message, log}
Program build failed
InvalidArgument {message}
Invalid argument
OutOfMemory {message}
Out of memory

Platform

An OpenCL platform (implementation like Intel, NVIDIA, etc.)

Variants

Platform {id}

Device

An OpenCL device (GPU, CPU, or accelerator)

Variants

Device {id}

DeviceType

Device type enumeration

Variants

DeviceCPU
DeviceGPU
DeviceAccelerator
DeviceAll

PlatformInfo

Information about an OpenCL platform.

Variants

PlatformInfo {name, vendor, version, profile}

DeviceInfo

Information about an OpenCL device.

Variants

DeviceInfo {name, vendor, version, device-type, max-compute-units, max-work-group-size, global-mem-size, local-mem-size, max-clock-freq}

Context

An OpenCL context (execution environment)

Variants

Context {handle}

CommandQueue

An OpenCL command queue

Variants

CommandQueue {handle}

Buffer

An OpenCL memory buffer

Variants

Buffer {handle, size}

Program

An OpenCL program (compiled kernel code)

Variants

Program {handle}

Kernel

An OpenCL kernel (function to execute)

Variants

Kernel {handle}

Event

An OpenCL event (for synchronization)

Variants

Event {handle}

get-platforms

Returns all available OpenCL platforms.

() -> Result [Platform] OpenCLError

get-devices

Returns devices of a given type for a platform.

Platform -> DeviceType -> Result [Device] OpenCLError

platform-info

Gets information about a platform.

Platform -> Result PlatformInfo OpenCLError

device-info

Gets information about a device.

Device -> Result DeviceInfo OpenCLError

create-context

Creates a context for the given devices.

[Device] -> Result Context OpenCLError

release-context

Releases a context.

Context -> Result () OpenCLError

create-queue

Creates a command queue for a device in a context.

Context -> Device -> Result CommandQueue OpenCLError

release-queue

Releases a command queue.

CommandQueue -> Result () OpenCLError

flush

Flushes commands in a queue (non-blocking).

CommandQueue -> Result () OpenCLError

finish

Waits for all commands in a queue to complete.

CommandQueue -> Result () OpenCLError

MemFlags

Memory flags for buffer creation

Variants

ReadWrite
ReadOnly
WriteOnly

create-buffer

Creates a buffer in device memory.

Context -> Int -> MemFlags -> Result Buffer OpenCLError

release-buffer

Releases a buffer.

Buffer -> Result () OpenCLError

write-buffer-f32

Writes float data to a buffer.

CommandQueue -> Buffer -> [Float] -> Result () OpenCLError

read-buffer-f32

Reads float data from a buffer.

CommandQueue -> Buffer -> Int -> Result [Float] OpenCLError

write-buffer-i32

Writes int data to a buffer.

CommandQueue -> Buffer -> [Int] -> Result () OpenCLError

read-buffer-i32

Reads int data from a buffer.

CommandQueue -> Buffer -> Int -> Result [Int] OpenCLError

create-program

Creates a program from OpenCL C source code.

Context -> String -> Result Program OpenCLError

build-program

Builds a program for the specified devices.

Program -> [Device] -> String -> Result () OpenCLError

release-program

Releases a program.

Program -> Result () OpenCLError

create-kernel

Creates a kernel from a built program.

Program -> String -> Result Kernel OpenCLError

release-kernel

Releases a kernel.

Kernel -> Result () OpenCLError

set-arg-buffer

Sets a buffer argument for a kernel.

Kernel -> Int -> Buffer -> Result () OpenCLError

set-arg-int

Sets an integer argument for a kernel.

Kernel -> Int -> Int -> Result () OpenCLError

set-arg-float

Sets a float argument for a kernel.

Kernel -> Int -> Float -> Result () OpenCLError

enqueue-kernel

Enqueues a kernel for execution. global-size: Total number of work items local-size: Work items per work group (use [] for auto)

CommandQueue -> Kernel -> [Int] -> [Int] -> Result () OpenCLError