embeddings

Lightweight vector embeddings store for RAG applications

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
examples/postgres-store.kitExample: postgres store
examples/simple-rag.kitExample: simple rag
examples/sqlite-store.kitExample: sqlite store
examples/test-sqlite.kitExample: test sqlite
examples/vector-ops.kitExample: vector ops
kit.tomlPackage manifest with metadata and dependencies
src/main.kitMain module
src/postgres.kitModule for postgres
src/sqlite.kitModule for sqlite
tests/embeddings.test.kitTests for embeddings
tests/types.test.kitTests for types

Architecture

RAG Pipeline

flowchart LR A[Document] --> B[Chunk] B --> C[Embed] C --> D[Store] E[Query] --> F[Embed] F --> G[Search] D --> G G --> H[Top K Results]

Backend Abstraction

graph TD A[Embeddings API] --> B{Backend} B -->|PostgreSQL| C[postgres.kit] B -->|SQLite| D[sqlite.kit] C --> E[pgvector Extension] D --> F[sqlite-vec Extension]

Dependencies

  • json
  • postgres
  • sqlite

Installation

kit add gitlab.com/kit-lang/packages/kit-embeddings.git

Usage

import Kit.Embeddings

License

MIT License - see LICENSE for details.

Exported Functions & Types

dot

Dot product of two vectors. Uses Kit's internal SIMD-accelerated implementation.

List Float -> List Float -> Float

Embeddings.dot [1.0, 2.0, 3.0] [4.0, 5.0, 6.0]  # => 32.0

magnitude

Magnitude (L2 norm) of a vector. Returns sqrt(sum of squares).

List Float -> Float

Embeddings.magnitude [3.0, 4.0]  # => 5.0

scale

Scale a vector by a scalar value. Returns a new vector with each element multiplied by the scalar.

List Float -> Float -> List Float

Embeddings.scale [1.0, 2.0, 3.0] 2.0  # => [2.0, 4.0, 6.0]

normalize

Normalize a vector to unit length. Returns zero vector if input has zero magnitude.

List Float -> List Float

Embeddings.normalize [3.0, 4.0]  # => [0.6, 0.8]

add

Element-wise addition of two vectors.

List Float -> List Float -> List Float

sub

Element-wise subtraction of two vectors.

List Float -> List Float -> List Float

cosine-similarity

Cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction, 0 = orthogonal, -1 = opposite direction.

Formula: dot(a,b) / (|a| * |b|)

List Float -> List Float -> Float

Embeddings.cosine-similarity [1.0, 0.0] [1.0, 0.0]  # => 1.0
Embeddings.cosine-similarity [1.0, 0.0] [0.0, 1.0]  # => 0.0

euclidean-distance

Euclidean distance between two vectors. Returns the L2 distance (straight-line distance).

Formula: sqrt(sum((a[i] - b[i])^2))

List Float -> List Float -> Float

Embeddings.euclidean-distance [0.0, 0.0] [3.0, 4.0]  # => 5.0

angular-distance

Angular distance between two vectors. Returns value in [0, 1], where 0 = identical direction, 0.5 = orthogonal, 1 = opposite direction.

Formula: arccos(cosine-similarity) / pi

List Float -> List Float -> Float

Embeddings.angular-distance [1.0, 0.0] [1.0, 0.0]  # => 0.0
Embeddings.angular-distance [1.0, 0.0] [0.0, 1.0]  # => 0.5

to-bytes

Serialize a list of floats to bytes (IEEE 754 f64). Each float is stored as 8 bytes in little-endian format. Useful for storing embeddings as BLOBs in databases.

List Float -> Bytes

bytes = Embeddings.to-bytes [1.0, 2.0, 3.0]  # 24 bytes

from-bytes

Deserialize bytes to a list of floats. Expects IEEE 754 f64 format (8 bytes per float).

Bytes -> List Float

floats = Embeddings.from-bytes bytes  # [1.0, 2.0, 3.0]

similarity-fn

Get similarity function by metric name. Returns a function (a, b) -> Float.

Supported metrics: - :cosine - Cosine similarity (higher = more similar) - :euclidean - Negative Euclidean distance (higher = more similar) - :dot - Dot product (higher = more similar)

Symbol -> (List Float -> List Float -> Float)

sim-fn = Embeddings.similarity-fn :cosine
score = sim-fn vec-a vec-b

distance-fn

Get distance function by metric name. Returns a function (a, b) -> Float where lower = more similar.

Supported metrics: - :cosine - 1 - cosine similarity - :euclidean - Euclidean distance - :angular - Angular distance [0, 1]

Symbol -> (List Float -> List Float -> Float)

dist-fn = Embeddings.distance-fn :euclidean
distance = dist-fn vec-a vec-b

sqlite-create

Create or open an embedding store with SQLite backend.

String -> Int -> Symbol -> Result EmbeddingStore String

sqlite-open

Open an existing embedding store with SQLite backend.

String -> Int -> Symbol -> Result EmbeddingStore String

sqlite-close

Close the embedding store.

EmbeddingStore -> Unit

sqlite-upsert

Insert or update an embedding.

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

sqlite-insert

Insert an embedding (alias for upsert).

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

sqlite-get

Get an embedding by ID.

EmbeddingStore -> String -> Option Record

sqlite-delete

Delete an embedding by ID.

EmbeddingStore -> String -> Result Unit String

sqlite-count

Count total embeddings in the store.

EmbeddingStore -> Int

sqlite-exists?

Check if an embedding exists by ID.

EmbeddingStore -> String -> Bool

Search for top-k similar embeddings.

EmbeddingStore -> List Float -> Int -> Result (List Record) String

sqlite-search-threshold

Search with a minimum score threshold.

EmbeddingStore -> List Float -> Float -> Result (List Record) String

sqlite-search-filter

Search with metadata filtering.

EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String

sqlite-upsert-batch

Insert multiple embeddings at once.

EmbeddingStore -> List Record -> Result Int String

sqlite-get-all

Get all embeddings (for small stores).

EmbeddingStore -> List Record

sqlite-clear

Clear all embeddings from the store.

EmbeddingStore -> Result Unit String

postgres-create

Create or open an embedding store with PostgreSQL backend.

String -> Int -> Symbol -> Result PgEmbeddingStore String

Search for top-k similar embeddings using pgvector.

PgEmbeddingStore -> List Float -> Int -> Result (List Record) String

postgres-upsert

Insert or update an embedding.

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

postgres-delete

Delete an embedding by ID.

PgEmbeddingStore -> String -> Result Unit String

postgres-count

Count total embeddings in the store.

PgEmbeddingStore -> Int

postgres-create-index

Create HNSW index for fast approximate nearest neighbor search.

PgEmbeddingStore -> Int -> Int -> Result Unit String

create

Create or open an embedding store. Creates the embeddings table with pgvector column if it doesn't exist.

Parameters:

String -> Int -> Symbol -> Result PgEmbeddingStore String

store = Embeddings.Postgres.create "postgresql://localhost/mydb" 1536 :cosine

create-with-table

Create an embedding store with a custom table name.

String -> Int -> Symbol -> NonEmptyString -> Result PgEmbeddingStore String

store = Embeddings.Postgres.create-with-table conn-string 1536 :cosine "my_vectors"

create-index

Create HNSW index for fast approximate nearest neighbor search. Call this after creating the store for better search performance.

Parameters:

PgEmbeddingStore -> Int -> Int -> Result Unit String

Embeddings.Postgres.create-index store 16 64

create-index-default

Create HNSW index with default parameters.

PgEmbeddingStore -> Result Unit String

open

Open an existing embedding store. Does not create table if it doesn't exist.

String -> Int -> Symbol -> Result PgEmbeddingStore String

store = Embeddings.Postgres.open "postgresql://localhost/mydb" 1536 :cosine

open-with-table

Open an existing embedding store with a custom table name.

String -> Int -> Symbol -> NonEmptyString -> Result PgEmbeddingStore String

close

Close the embedding store.

PgEmbeddingStore -> Unit

upsert

Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.

Parameters:

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

Embeddings.Postgres.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"

insert

Insert an embedding (alias for upsert).

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

get

Get an embedding by ID. Returns the embedding record or None if not found.

PgEmbeddingStore -> String -> Option Record

match Embeddings.Postgres.get store "doc1"
  | Some result -> println result.content
  | None -> println "Not found"

delete

Delete an embedding by ID.

PgEmbeddingStore -> String -> Result Unit String

Embeddings.Postgres.delete store "doc1"

count

Count total embeddings in the store.

PgEmbeddingStore -> Int

n = Embeddings.Postgres.count store  # => 1234

exists?

Check if an embedding exists by ID.

PgEmbeddingStore -> String -> Bool

Search for top-k similar embeddings using pgvector's native operators. Uses the configured metric for similarity comparison.

Parameters:

Returns:

PgEmbeddingStore -> List Float -> Int -> Result (List Record) String

match Embeddings.Postgres.search store query-vec 10
  | Ok results ->
      results |> each fn(r) => println "${r.score}: ${r.content}"
  | Err e -> println "Search failed"

search-threshold

Search with a minimum score threshold. Only returns results with score >= threshold.

PgEmbeddingStore -> List Float -> Float -> Result (List Record) String

search-where

Search with metadata filtering using JSONB operators. Filter is a SQL WHERE clause fragment for the metadata column.

PgEmbeddingStore -> List Float -> Int -> String -> Result (List Record) String

# Find documents from a specific source
Embeddings.Postgres.search-where store query-vec 10 "metadata->>'source' = 'api'"

Set the ef_search parameter for HNSW index. Higher values give more accurate results but slower queries. Default is 40. Typical range: 10-200.

PgEmbeddingStore -> Int -> Result Unit String

Embeddings.Postgres.set-ef-search store 100

upsert-batch

Insert multiple embeddings at once using a transaction. More efficient than individual inserts.

PgEmbeddingStore -> List Record -> Result Int String

get-all

Get all embeddings (for small stores). Warning: May be slow for large stores. Consider using search instead.

PgEmbeddingStore -> List Record

clear

Clear all embeddings from the store.

PgEmbeddingStore -> Result Unit String

truncate

Truncate the table (faster than delete for large tables).

PgEmbeddingStore -> Result Unit String

vacuum

Vacuum the table to reclaim space and update statistics.

PgEmbeddingStore -> Result Unit String

reindex

Reindex the HNSW index (useful after large batch inserts).

PgEmbeddingStore -> Result Unit String

drop-index

Drop the HNSW index.

PgEmbeddingStore -> Result Unit String

create

Create or open an embedding store. Creates the embeddings table if it doesn't exist.

Parameters:

NonEmptyString -> Int -> Symbol -> Result EmbeddingStore String

store = Embeddings.SQLite.create "knowledge.db" 1536 :cosine

open

Open an existing embedding store. Does not create table if it doesn't exist.

NonEmptyString -> Int -> Symbol -> Result EmbeddingStore String

store = Embeddings.SQLite.open "knowledge.db" 1536 :cosine

close

Close the embedding store.

EmbeddingStore -> Unit

upsert

Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.

Parameters:

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

Embeddings.SQLite.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"

insert

Insert an embedding (alias for upsert).

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

get

Get an embedding by ID. Returns the embedding record or None if not found.

EmbeddingStore -> String -> Option Record

match Embeddings.SQLite.get store "doc1"
  | Some result -> println result.content
  | None -> println "Not found"

delete

Delete an embedding by ID.

EmbeddingStore -> String -> Result Unit String

Embeddings.SQLite.delete store "doc1"

count

Count total embeddings in the store.

EmbeddingStore -> Int

n = Embeddings.SQLite.count store  # => 1234

exists?

Check if an embedding exists by ID.

EmbeddingStore -> String -> Bool

Search for top-k similar embeddings. Performs brute-force similarity comparison against all stored embeddings.

Parameters:

Returns:

EmbeddingStore -> List Float -> Int -> Result (List Record) String

match Embeddings.SQLite.search store query-vec 10
  | Ok results ->
      results |> each fn(r) => println "${r.score}: ${r.content}"
  | Err e -> println "Search failed"

search-threshold

Search with a minimum score threshold. Only returns results with score >= threshold.

EmbeddingStore -> List Float -> Float -> Result (List Record) String

search-filter

Search with metadata filtering. Only searches embeddings where filter function returns true.

EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String

upsert-batch

Insert multiple embeddings at once. More efficient than individual inserts.

EmbeddingStore -> List Record -> Result Int String

get-all

Get all embeddings (for small stores). Warning: May be slow for large stores.

EmbeddingStore -> List Record

clear

Clear all embeddings from the store.

EmbeddingStore -> Result Unit String