embeddings

Lightweight vector embeddings store for RAG applications

Files

FileDescription
kit.tomlPackage manifest with metadata and dependencies
src/main.kitVector ops, similarity metrics, and backend re-exports
src/postgres.kitPostgreSQL backend with pgvector and HNSW indexing
src/sqlite.kitSQLite backend with JSON vector serialization
tests/embeddings.test.kitTests for vector ops and similarity metrics
examples/postgres-store.kitpgvector storage with HNSW index and metadata filter
examples/simple-rag.kitBasic retrieval-augmented generation workflow
examples/sqlite-store.kitSQLite persistent storage with cosine search
examples/test-sqlite.kitSQLite backend integration test
examples/vector-ops.kitSIMD vector operations and byte serialization
LICENSEMIT license file

Architecture

RAG Pipeline

flowchart LR A[Document] --> B[Chunk] B --> C[Embed] C --> D[Store] E[Query] --> F[Embed] F --> G[Search] D --> G G --> H[Top K Results]

Backend Abstraction

graph TD A[Embeddings API] --> B{Backend} B -->|PostgreSQL| C[postgres.kit] B -->|SQLite| D[sqlite.kit] C --> E[pgvector Extension] D --> F[sqlite-vec Extension]

Dependencies

  • json
  • postgres
  • sqlite

Installation

kit add gitlab.com/kit-lang/packages/kit-embeddings.git

Usage

import Kit.Embeddings

License

MIT License - see LICENSE for details.

Exported Functions & Types

dot

Dot product of two vectors. Uses Kit's internal SIMD-accelerated implementation.

List Float -> List Float -> Float

Embeddings.dot [1.0, 2.0, 3.0] [4.0, 5.0, 6.0]  # => 32.0

magnitude

Magnitude (L2 norm) of a vector. Returns sqrt(sum of squares).

List Float -> Float

Embeddings.magnitude [3.0, 4.0]  # => 5.0

scale

Scale a vector by a scalar value. Returns a new vector with each element multiplied by the scalar.

List Float -> Float -> List Float

Embeddings.scale [1.0, 2.0, 3.0] 2.0  # => [2.0, 4.0, 6.0]

normalize

Normalize a vector to unit length. Returns zero vector if input has zero magnitude.

List Float -> List Float

Embeddings.normalize [3.0, 4.0]  # => [0.6, 0.8]

add

Element-wise addition of two vectors.

List Float -> List Float -> List Float

sub

Element-wise subtraction of two vectors.

List Float -> List Float -> List Float

cosine-similarity

Cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction, 0 = orthogonal, -1 = opposite direction.

Formula: dot(a,b) / (|a| * |b|)

List Float -> List Float -> Float

Embeddings.cosine-similarity [1.0, 0.0] [1.0, 0.0]  # => 1.0
Embeddings.cosine-similarity [1.0, 0.0] [0.0, 1.0]  # => 0.0

euclidean-distance

Euclidean distance between two vectors. Returns the L2 distance (straight-line distance).

Formula: sqrt(sum((a[i] - b[i])^2))

List Float -> List Float -> Float

Embeddings.euclidean-distance [0.0, 0.0] [3.0, 4.0]  # => 5.0

angular-distance

Angular distance between two vectors. Returns value in [0, 1], where 0 = identical direction, 0.5 = orthogonal, 1 = opposite direction.

Formula: arccos(cosine-similarity) / pi

List Float -> List Float -> Float

Embeddings.angular-distance [1.0, 0.0] [1.0, 0.0]  # => 0.0
Embeddings.angular-distance [1.0, 0.0] [0.0, 1.0]  # => 0.5

to-bytes

Serialize a list of floats to bytes (IEEE 754 f64). Each float is stored as 8 bytes in little-endian format. Useful for storing embeddings as BLOBs in databases.

List Float -> Bytes

bytes = Embeddings.to-bytes [1.0, 2.0, 3.0]  # 24 bytes

from-bytes

Deserialize bytes to a list of floats. Expects IEEE 754 f64 format (8 bytes per float).

Bytes -> List Float

floats = Embeddings.from-bytes bytes  # [1.0, 2.0, 3.0]

similarity-fn

Get similarity function by metric name. Returns a function (a, b) -> Float.

Supported metrics: - :cosine - Cosine similarity (higher = more similar) - :euclidean - Negative Euclidean distance (higher = more similar) - :dot - Dot product (higher = more similar)

Symbol -> (List Float -> List Float -> Float)

sim-fn = Embeddings.similarity-fn :cosine
score = sim-fn vec-a vec-b

distance-fn

Get distance function by metric name. Returns a function (a, b) -> Float where lower = more similar.

Supported metrics: - :cosine - 1 - cosine similarity - :euclidean - Euclidean distance - :angular - Angular distance [0, 1]

Symbol -> (List Float -> List Float -> Float)

dist-fn = Embeddings.distance-fn :euclidean
distance = dist-fn vec-a vec-b

sqlite-create

Create or open an embedding store with SQLite backend.

String -> Int -> Symbol -> Result EmbeddingStore String

sqlite-open

Open an existing embedding store with SQLite backend.

String -> Int -> Symbol -> Result EmbeddingStore String

sqlite-close

Close the embedding store.

EmbeddingStore -> Unit

sqlite-upsert

Insert or update an embedding.

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

sqlite-insert

Insert an embedding (alias for upsert).

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

sqlite-get

Get an embedding by ID.

EmbeddingStore -> String -> Option Record

sqlite-delete

Delete an embedding by ID.

EmbeddingStore -> String -> Result Unit String

sqlite-count

Count total embeddings in the store.

EmbeddingStore -> Int

sqlite-exists?

Check if an embedding exists by ID.

EmbeddingStore -> String -> Bool

Search for top-k similar embeddings.

EmbeddingStore -> List Float -> Int -> Result (List Record) String

sqlite-search-threshold

Search with a minimum score threshold.

EmbeddingStore -> List Float -> Float -> Result (List Record) String

sqlite-search-filter

Search with metadata filtering.

EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String

sqlite-upsert-batch

Insert multiple embeddings at once.

EmbeddingStore -> List Record -> Result Int String

sqlite-get-all

Get all embeddings (for small stores).

EmbeddingStore -> List Record

sqlite-clear

Clear all embeddings from the store.

EmbeddingStore -> Result Unit String

postgres-create

Create or open an embedding store with PostgreSQL backend.

String -> Int -> Symbol -> Result PgEmbeddingStore String

Search for top-k similar embeddings using pgvector.

PgEmbeddingStore -> List Float -> Int -> Result (List Record) String

postgres-upsert

Insert or update an embedding.

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

postgres-delete

Delete an embedding by ID.

PgEmbeddingStore -> String -> Result Unit String

postgres-count

Count total embeddings in the store.

PgEmbeddingStore -> Int

postgres-create-index

Create HNSW index for fast approximate nearest neighbor search.

PgEmbeddingStore -> Int -> Int -> Result Unit String

Embedding store backed by PostgreSQL with pgvector. Contains database connection, vector dimension, and similarity metric.

create

Create or open an embedding store. Creates the embeddings table with pgvector column if it doesn't exist.

Parameters:

String -> Int -> Symbol -> Result PgEmbeddingStore String

store = Embeddings.Postgres.create "postgresql://localhost/mydb" 1536 :cosine

create-with-table

Create an embedding store with a custom table name.

String -> Int -> Symbol -> String -> Result PgEmbeddingStore String

store = Embeddings.Postgres.create-with-table conn-string 1536 :cosine "my_vectors"

create-index

Create HNSW index for fast approximate nearest neighbor search. Call this after creating the store for better search performance.

Parameters:

PgEmbeddingStore -> Int -> Int -> Result Unit String

Embeddings.Postgres.create-index store 16 64

create-index-default

Create HNSW index with default parameters.

PgEmbeddingStore -> Result Unit String

open

Open an existing embedding store. Does not create table if it doesn't exist.

String -> Int -> Symbol -> Result PgEmbeddingStore String

store = Embeddings.Postgres.open "postgresql://localhost/mydb" 1536 :cosine

open-with-table

Open an existing embedding store with a custom table name.

String -> Int -> Symbol -> String -> Result PgEmbeddingStore String

close

Close the embedding store.

PgEmbeddingStore -> Unit

upsert

Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.

Parameters:

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

Embeddings.Postgres.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"

insert

Insert an embedding (alias for upsert).

PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

get

Get an embedding by ID. Returns the embedding record or None if not found.

PgEmbeddingStore -> String -> Option Record

match Embeddings.Postgres.get store "doc1"
  | Some result -> println result.content
  | None -> println "Not found"

delete

Delete an embedding by ID.

PgEmbeddingStore -> String -> Result Unit String

Embeddings.Postgres.delete store "doc1"

count

Count total embeddings in the store.

PgEmbeddingStore -> Int

n = Embeddings.Postgres.count store  # => 1234

exists?

Check if an embedding exists by ID.

PgEmbeddingStore -> String -> Bool

Search for top-k similar embeddings using pgvector's native operators. Uses the configured metric for similarity comparison.

Parameters:

Returns:

PgEmbeddingStore -> List Float -> Int -> Result (List Record) String

match Embeddings.Postgres.search store query-vec 10
  | Ok results ->
      results |> each fn(r) => println "${r.score}: ${r.content}"
  | Err e -> println "Search failed"

search-threshold

Search with a minimum score threshold. Only returns results with score >= threshold.

PgEmbeddingStore -> List Float -> Float -> Result (List Record) String

search-where

Search with metadata filtering using JSONB operators. Filter is a SQL WHERE clause fragment for the metadata column.

PgEmbeddingStore -> List Float -> Int -> String -> Result (List Record) String

# Find documents from a specific source
Embeddings.Postgres.search-where store query-vec 10 "metadata->>'source' = 'api'"

Set the ef_search parameter for HNSW index. Higher values give more accurate results but slower queries. Default is 40. Typical range: 10-200.

PgEmbeddingStore -> Int -> Result Unit String

Embeddings.Postgres.set-ef-search store 100

upsert-batch

Insert multiple embeddings at once using a transaction. More efficient than individual inserts.

PgEmbeddingStore -> List Record -> Result Int String

get-all

Get all embeddings (for small stores). Warning: May be slow for large stores. Consider using search instead.

PgEmbeddingStore -> List Record

clear

Clear all embeddings from the store.

PgEmbeddingStore -> Result Unit String

truncate

Truncate the table (faster than delete for large tables).

PgEmbeddingStore -> Result Unit String

vacuum

Vacuum the table to reclaim space and update statistics.

PgEmbeddingStore -> Result Unit String

reindex

Reindex the HNSW index (useful after large batch inserts).

PgEmbeddingStore -> Result Unit String

drop-index

Drop the HNSW index.

PgEmbeddingStore -> Result Unit String

Embedding store backed by SQLite. Contains database connection, vector dimension, and similarity metric.

create

Create or open an embedding store. Creates the embeddings table if it doesn't exist.

Parameters:

String -> Int -> Symbol -> Result EmbeddingStore String

store = Embeddings.SQLite.create "knowledge.db" 1536 :cosine

open

Open an existing embedding store. Does not create table if it doesn't exist.

String -> Int -> Symbol -> Result EmbeddingStore String

store = Embeddings.SQLite.open "knowledge.db" 1536 :cosine

close

Close the embedding store.

EmbeddingStore -> Unit

upsert

Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.

Parameters:

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

Embeddings.SQLite.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"

insert

Insert an embedding (alias for upsert).

EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String

get

Get an embedding by ID. Returns the embedding record or None if not found.

EmbeddingStore -> String -> Option Record

match Embeddings.SQLite.get store "doc1"
  | Some result -> println result.content
  | None -> println "Not found"

delete

Delete an embedding by ID.

EmbeddingStore -> String -> Result Unit String

Embeddings.SQLite.delete store "doc1"

count

Count total embeddings in the store.

EmbeddingStore -> Int

n = Embeddings.SQLite.count store  # => 1234

exists?

Check if an embedding exists by ID.

EmbeddingStore -> String -> Bool

Search for top-k similar embeddings. Performs brute-force similarity comparison against all stored embeddings.

Parameters:

Returns:

EmbeddingStore -> List Float -> Int -> Result (List Record) String

match Embeddings.SQLite.search store query-vec 10
  | Ok results ->
      results |> each fn(r) => println "${r.score}: ${r.content}"
  | Err e -> println "Search failed"

search-threshold

Search with a minimum score threshold. Only returns results with score >= threshold.

EmbeddingStore -> List Float -> Float -> Result (List Record) String

search-filter

Search with metadata filtering. Only searches embeddings where filter function returns true.

EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String

upsert-batch

Insert multiple embeddings at once. More efficient than individual inserts.

EmbeddingStore -> List Record -> Result Int String

get-all

Get all embeddings (for small stores). Warning: May be slow for large stores.

EmbeddingStore -> List Record

clear

Clear all embeddings from the store.

EmbeddingStore -> Result Unit String