embeddings
| Kind | kit |
|---|---|
| Capabilities | net |
| Categories | machine-learning database |
| Keywords | embeddings vector rag similarity machine-learning |
Lightweight vector embeddings store for RAG applications
Files
| File | Description |
|---|---|
.editorconfig | Editor formatting configuration |
.gitignore | Git ignore rules for build artifacts and dependencies |
.tool-versions | asdf tool versions (Zig, Kit) |
LICENSE | MIT license file |
README.md | This file |
examples/postgres-store.kit | PostgreSQL pgvector store example |
examples/simple-rag.kit | Simple retrieval-augmented generation example |
examples/sqlite-store.kit | SQLite store example |
examples/test-sqlite.kit | SQLite backend smoke example |
examples/vector-ops.kit | Vector operations example |
kit.toml | Package manifest with metadata and dependencies |
src/main.kit | Main package module and top-level re-exports |
src/postgres.kit | PostgreSQL pgvector backend |
src/sqlite.kit | SQLite backend |
src/vector.kit | Pure vector operations and similarity metrics |
tests/embeddings.test.kit | Tests for vector operations |
tests/types.test.kit | Tests for embedding records, metrics, and result shapes |
Architecture
RAG Pipeline
Backend Abstraction
Dependencies
postgressqlite
Installation
kit add gitlab.com/kit-lang/packages/kit-embeddings.gitUsage
import Kit.Embeddings.Vector as Embeddings
import Kit.Embeddings.Sqlite as Store
main = fn =>
a = [1.0, 2.0, 3.0, 4.0]
b = [4.0, 3.0, 2.0, 1.0]
similarity = Embeddings.cosine-similarity a b
println "similarity = ${similarity}"
match Store.create "/tmp/embeddings.db" 4 :cosine
| Err e -> println "Failed to create store: ${e}"
| Ok store ->
defer Store.close store
Store.upsert store "doc1" "The quick brown fox" [0.1, 0.2, 0.3, 0.4] "{\"source\":\"web\"}"
Store.upsert store "doc2" "Machine learning intro" [0.8, 0.7, 0.6, 0.5] "{\"source\":\"api\"}"
match Store.search store [0.12, 0.22, 0.32, 0.42] 2
| Ok results ->
results |> List.for-each (fn(r) =>
println "${r.score}: ${r.content}"
)
| Err e -> println "Search failed: ${e}"
mainFor PostgreSQL with pgvector:
import Kit.Embeddings.Postgres as Store
main = fn =>
match Store.create "postgresql://localhost/embeddings_demo" 4 :cosine
| Err e -> println "Failed to create store: ${e}"
| Ok store ->
defer Store.close store
Store.create-index-default store
Store.upsert store "doc1" "Machine learning intro" [0.8, 0.7, 0.6, 0.5] "{\"source\":\"paper\"}"
mainDevelopment
Running Examples
Run examples with the interpreter:
kit run examples/vector-ops.kitCompile examples to a native binary:
kit build examples/vector-ops.kit && ./vector-opsRunning Tests
Run the test suite:
kit testRun the test suite with coverage:
kit test --coverageRunning kit dev
Run the standard development workflow (format, check, test):
kit devThis will:
- Format and check source files in
src/ - Type check examples in
examples/ - Run tests in
tests/with coverage
Running Parity
Run interpreter/compiler parity checks for examples:
kit parity --no-spinner --failures-onlyGenerating Documentation
Generate API documentation from doc comments:
kit docNote: Kit sources with doc comments (##) will generate HTML documents in docs/*.html
Cleaning Build Artifacts
Remove generated files, caches, and build artifacts:
kit task cleanNote: Defined in kit.toml.
Local Installation
To install this package locally for development:
kit installThis installs the package to ~/.kit/packages/@kit/embeddings/, making it available for import as Kit.Embeddings in other projects.
License
This package is released under the MIT License - see LICENSE for details.
Exported Functions & Types
dot
Dot product of two vectors. Uses Kit's internal SIMD-accelerated implementation.
List Float -> List Float -> Float
Embeddings.dot [1.0, 2.0, 3.0] [4.0, 5.0, 6.0] # => 32.0magnitude
Magnitude (L2 norm) of a vector. Returns sqrt(sum of squares).
List Float -> Float
Embeddings.magnitude [3.0, 4.0] # => 5.0scale
Scale a vector by a scalar value. Returns a new vector with each element multiplied by the scalar.
List Float -> Float -> List Float
Embeddings.scale [1.0, 2.0, 3.0] 2.0 # => [2.0, 4.0, 6.0]normalize
Normalize a vector to unit length. Returns zero vector if input has zero magnitude.
List Float -> List Float
Embeddings.normalize [3.0, 4.0] # => [0.6, 0.8]add
Element-wise addition of two vectors.
List Float -> List Float -> List Float
sub
Element-wise subtraction of two vectors.
List Float -> List Float -> List Float
cosine-similarity
Cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction, 0 = orthogonal, -1 = opposite direction.
Formula: dot(a,b) / (|a| * |b|)
List Float -> List Float -> Float
Embeddings.cosine-similarity [1.0, 0.0] [1.0, 0.0] # => 1.0
Embeddings.cosine-similarity [1.0, 0.0] [0.0, 1.0] # => 0.0euclidean-distance
Euclidean distance between two vectors. Returns the L2 distance (straight-line distance).
Formula: sqrt(sum((a[i] - b[i])^2))
List Float -> List Float -> Float
Embeddings.euclidean-distance [0.0, 0.0] [3.0, 4.0] # => 5.0angular-distance
Angular distance between two vectors. Returns value in [0, 1], where 0 = identical direction, 0.5 = orthogonal, 1 = opposite direction.
Formula: arccos(cosine-similarity) / pi
List Float -> List Float -> Float
Embeddings.angular-distance [1.0, 0.0] [1.0, 0.0] # => 0.0
Embeddings.angular-distance [1.0, 0.0] [0.0, 1.0] # => 0.5to-bytes
Serialize a list of floats to bytes (IEEE 754 f64). Each float is stored as 8 bytes in little-endian format. Useful for storing embeddings as BLOBs in databases.
List Float -> Bytes
bytes = Embeddings.to-bytes [1.0, 2.0, 3.0] # 24 bytesfrom-bytes
Deserialize bytes to a list of floats. Expects IEEE 754 f64 format (8 bytes per float).
Bytes -> List Float
floats = Embeddings.from-bytes bytes # [1.0, 2.0, 3.0]similarity-fn
Get similarity function by metric name. Returns a function (a, b) -> Float.
Supported metrics: - :cosine - Cosine similarity (higher = more similar) - :euclidean - Negative Euclidean distance (higher = more similar) - :dot - Dot product (higher = more similar)
Keyword -> (List Float -> List Float -> Float)
sim-fn = Embeddings.similarity-fn :cosine
score = sim-fn vec-a vec-bdistance-fn
Get distance function by metric name. Returns a function (a, b) -> Float where lower = more similar.
Supported metrics: - :cosine - 1 - cosine similarity - :euclidean - Euclidean distance - :angular - Angular distance [0, 1]
Keyword -> (List Float -> List Float -> Float)
dist-fn = Embeddings.distance-fn :euclidean
distance = dist-fn vec-a vec-bsqlite-create
Create or open an embedding store with SQLite backend.
String -> Int -> Symbol -> Result EmbeddingStore String
sqlite-open
Open an existing embedding store with SQLite backend.
String -> Int -> Symbol -> Result EmbeddingStore String
sqlite-close
Close the embedding store.
EmbeddingStore -> Unit
sqlite-upsert
Insert or update an embedding.
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
sqlite-insert
Insert an embedding (alias for upsert).
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
sqlite-get
Get an embedding by ID.
EmbeddingStore -> String -> Option Record
sqlite-delete
Delete an embedding by ID.
EmbeddingStore -> String -> Result Unit String
sqlite-count
Count total embeddings in the store.
EmbeddingStore -> Int
sqlite-exists?
Check if an embedding exists by ID.
EmbeddingStore -> String -> Bool
sqlite-search
Search for top-k similar embeddings.
EmbeddingStore -> List Float -> Int -> Result (List Record) String
sqlite-search-threshold
Search with a minimum score threshold.
EmbeddingStore -> List Float -> Float -> Result (List Record) String
sqlite-search-filter
Search with metadata filtering.
EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String
sqlite-upsert-batch
Insert multiple embeddings at once.
EmbeddingStore -> List Record -> Result Int String
sqlite-get-all
Get all embeddings (for small stores).
EmbeddingStore -> List Record
sqlite-clear
Clear all embeddings from the store.
EmbeddingStore -> Result Unit String
postgres-create
Create or open an embedding store with PostgreSQL backend.
String -> Int -> Symbol -> Result PgEmbeddingStore String
postgres-search
Search for top-k similar embeddings using pgvector.
PgEmbeddingStore -> List Float -> Int -> Result (List Record) String
postgres-upsert
Insert or update an embedding.
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
postgres-delete
Delete an embedding by ID.
PgEmbeddingStore -> String -> Result Unit String
postgres-count
Count total embeddings in the store.
PgEmbeddingStore -> Int
postgres-create-index
Create HNSW index for fast approximate nearest neighbor search.
PgEmbeddingStore -> Int -> Int -> Result Unit String
dot
Dot product of two vectors. Uses Kit's internal SIMD-accelerated implementation.
List Float -> List Float -> Float
Embeddings.Vector.dot [1.0, 2.0, 3.0] [4.0, 5.0, 6.0] # => 32.0magnitude
Magnitude (L2 norm) of a vector. Returns sqrt(sum of squares).
List Float -> Float
Embeddings.Vector.magnitude [3.0, 4.0] # => 5.0scale
Scale a vector by a scalar value. Returns a new vector with each element multiplied by the scalar.
List Float -> Float -> List Float
Embeddings.Vector.scale [1.0, 2.0, 3.0] 2.0 # => [2.0, 4.0, 6.0]normalize
Normalize a vector to unit length. Returns zero vector if input has zero magnitude.
List Float -> List Float
Embeddings.Vector.normalize [3.0, 4.0] # => [0.6, 0.8]add
Element-wise addition of two vectors.
List Float -> List Float -> List Float
sub
Element-wise subtraction of two vectors.
List Float -> List Float -> List Float
cosine-similarity
Cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction, 0 = orthogonal, -1 = opposite direction.
Formula: dot(a,b) / (|a| * |b|)
List Float -> List Float -> Float
Embeddings.Vector.cosine-similarity [1.0, 0.0] [1.0, 0.0] # => 1.0
Embeddings.Vector.cosine-similarity [1.0, 0.0] [0.0, 1.0] # => 0.0euclidean-distance
Euclidean distance between two vectors. Returns the L2 distance (straight-line distance).
Formula: sqrt(sum((a[i] - b[i])^2))
List Float -> List Float -> Float
Embeddings.Vector.euclidean-distance [0.0, 0.0] [3.0, 4.0] # => 5.0angular-distance
Angular distance between two vectors. Returns value in [0, 1], where 0 = identical direction, 0.5 = orthogonal, 1 = opposite direction.
Formula: arccos(cosine-similarity) / pi
List Float -> List Float -> Float
Embeddings.Vector.angular-distance [1.0, 0.0] [1.0, 0.0] # => 0.0
Embeddings.Vector.angular-distance [1.0, 0.0] [0.0, 1.0] # => 0.5to-bytes
Serialize a list of floats to bytes (IEEE 754 f64). Each float is stored as 8 bytes in little-endian format. Useful for storing embeddings as BLOBs in databases.
List Float -> Bytes
bytes = Embeddings.Vector.to-bytes [1.0, 2.0, 3.0] # 24 bytesfrom-bytes
Deserialize bytes to a list of floats. Expects IEEE 754 f64 format (8 bytes per float).
Bytes -> List Float
floats = Embeddings.Vector.from-bytes bytes # [1.0, 2.0, 3.0]similarity-fn
Get similarity function by metric name. Returns a function (a, b) -> Float.
Supported metrics: - :cosine - Cosine similarity (higher = more similar) - :euclidean - Negative Euclidean distance (higher = more similar) - :dot - Dot product (higher = more similar)
Keyword -> (List Float -> List Float -> Float)
sim-fn = Embeddings.Vector.similarity-fn :cosine
score = sim-fn vec-a vec-bdistance-fn
Get distance function by metric name. Returns a function (a, b) -> Float where lower = more similar.
Supported metrics: - :cosine - 1 - cosine similarity - :euclidean - Euclidean distance - :angular - Angular distance [0, 1]
Keyword -> (List Float -> List Float -> Float)
dist-fn = Embeddings.Vector.distance-fn :euclidean
distance = dist-fn vec-a vec-bPgEmbeddingStore
Embedding store backed by PostgreSQL with pgvector. Contains database connection, vector dimension, and similarity metric.
Variants
PgEmbeddingStore {db, dimension, metric, table-name}create
Create or open an embedding store. Creates the embeddings table with pgvector column if it doesn't exist.
Parameters:
String -> Int -> Symbol -> Result PgEmbeddingStore String
store = Embeddings.Postgres.create "postgresql://localhost/mydb" 1536 :cosinecreate-with-table
Create an embedding store with a custom table name.
String -> Int -> Symbol -> NonEmptyString -> Result PgEmbeddingStore String
store = Embeddings.Postgres.create-with-table conn-string 1536 :cosine "my_vectors"create-index
Create HNSW index for fast approximate nearest neighbor search. Call this after creating the store for better search performance.
Parameters:
PgEmbeddingStore -> Int -> Int -> Result Unit String
Embeddings.Postgres.create-index store 16 64create-index-default
Create HNSW index with default parameters.
PgEmbeddingStore -> Result Unit String
open
Open an existing embedding store. Does not create table if it doesn't exist.
String -> Int -> Symbol -> Result PgEmbeddingStore String
store = Embeddings.Postgres.open "postgresql://localhost/mydb" 1536 :cosineopen-with-table
Open an existing embedding store with a custom table name.
String -> Int -> Symbol -> NonEmptyString -> Result PgEmbeddingStore String
close
Close the embedding store.
PgEmbeddingStore -> Unit
upsert
Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.
Parameters:
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
Embeddings.Postgres.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\":\"api\"}"insert
Insert an embedding (alias for upsert).
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
get
Get an embedding by ID. Returns the embedding record or None if not found.
PgEmbeddingStore -> String -> Option Record
match Embeddings.Postgres.get store "doc1"
| Some result -> println result.content
| None -> println "Not found"delete
Delete an embedding by ID.
PgEmbeddingStore -> String -> Result Unit String
Embeddings.Postgres.delete store "doc1"count
Count total embeddings in the store.
PgEmbeddingStore -> Int
n = Embeddings.Postgres.count store # => 1234exists?
Check if an embedding exists by ID.
PgEmbeddingStore -> String -> Bool
search
Search for top-k similar embeddings using pgvector's native operators. Uses the configured metric for similarity comparison.
Parameters:
Returns:
PgEmbeddingStore -> List Float -> Int -> Result (List Record) String
match Embeddings.Postgres.search store query-vec 10
| Ok results ->
results |> each fn(r) => println "${r.score}: ${r.content}"
| Err e -> println "Search failed"search-threshold
Search with a minimum score threshold. Only returns results with score >= threshold.
PgEmbeddingStore -> List Float -> Float -> Result (List Record) String
search-where
Search with metadata filtering using JSONB operators. Filter is a SQL WHERE clause fragment for the metadata column.
PgEmbeddingStore -> List Float -> Int -> String -> Result (List Record) String
# Find documents from a specific source
Embeddings.Postgres.search-where store query-vec 10 "metadata->>'source' = 'api'"set-ef-search
Set the ef_search parameter for HNSW index. Higher values give more accurate results but slower queries. Default is 40. Typical range: 10-200.
PgEmbeddingStore -> Int -> Result Unit String
Embeddings.Postgres.set-ef-search store 100upsert-batch
Insert multiple embeddings at once using a transaction. More efficient than individual inserts.
PgEmbeddingStore -> List Record -> Result Int String
get-all
Get all embeddings (for small stores). Warning: May be slow for large stores. Consider using search instead.
PgEmbeddingStore -> List Record
clear
Clear all embeddings from the store.
PgEmbeddingStore -> Result Unit String
truncate
Truncate the table (faster than delete for large tables).
PgEmbeddingStore -> Result Unit String
vacuum
Vacuum the table to reclaim space and update statistics.
PgEmbeddingStore -> Result Unit String
reindex
Reindex the HNSW index (useful after large batch inserts).
PgEmbeddingStore -> Result Unit String
drop-index
Drop the HNSW index.
PgEmbeddingStore -> Result Unit String
EmbeddingStore
Embedding store backed by SQLite. Contains database connection, vector dimension, and similarity metric.
Variants
EmbeddingStore {db, dimension, metric}create
Create or open an embedding store. Creates the embeddings table if it doesn't exist.
Parameters:
NonEmptyString -> Int -> Symbol -> Result EmbeddingStore String
store = Embeddings.Sqlite.create "knowledge.db" 1536 :cosineopen
Open an existing embedding store. Does not create table if it doesn't exist.
NonEmptyString -> Int -> Symbol -> Result EmbeddingStore String
store = Embeddings.Sqlite.open "knowledge.db" 1536 :cosineclose
Close the embedding store.
EmbeddingStore -> Unit
upsert
Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.
Parameters:
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
Embeddings.Sqlite.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\":\"api\"}"insert
Insert an embedding (alias for upsert).
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
get
Get an embedding by ID. Returns the embedding record or None if not found.
EmbeddingStore -> String -> Option Record
match Embeddings.Sqlite.get store "doc1"
| Some result -> println result.content
| None -> println "Not found"delete
Delete an embedding by ID.
EmbeddingStore -> String -> Result Unit String
Embeddings.SQLite.delete store "doc1"count
Count total embeddings in the store.
EmbeddingStore -> Int
n = Embeddings.SQLite.count store # => 1234exists?
Check if an embedding exists by ID.
EmbeddingStore -> String -> Bool
search
Search for top-k similar embeddings. Performs brute-force similarity comparison against all stored embeddings.
Parameters:
Returns:
EmbeddingStore -> List Float -> Int -> Result (List Record) String
match Embeddings.SQLite.search store query-vec 10
| Ok results ->
results |> each fn(r) => println "${r.score}: ${r.content}"
| Err e -> println "Search failed"search-threshold
Search with a minimum score threshold. Only returns results with score >= threshold.
EmbeddingStore -> List Float -> Float -> Result (List Record) String
search-filter
Search with metadata filtering. Only searches embeddings where filter function returns true.
EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String
upsert-batch
Insert multiple embeddings at once. More efficient than individual inserts.
EmbeddingStore -> List Record -> Result Int String
get-all
Get all embeddings (for small stores). Warning: May be slow for large stores.
EmbeddingStore -> List Record
clear
Clear all embeddings from the store.
EmbeddingStore -> Result Unit String