embeddings
| Kind | kit |
|---|---|
| Categories | machine-learning database |
| Keywords | embeddings vector rag similarity machine-learning |
Lightweight vector embeddings store for RAG applications
Files
| File | Description |
|---|---|
kit.toml | Package manifest with metadata and dependencies |
src/main.kit | Vector ops, similarity metrics, and backend re-exports |
src/postgres.kit | PostgreSQL backend with pgvector and HNSW indexing |
src/sqlite.kit | SQLite backend with JSON vector serialization |
tests/embeddings.test.kit | Tests for vector ops and similarity metrics |
examples/postgres-store.kit | pgvector storage with HNSW index and metadata filter |
examples/simple-rag.kit | Basic retrieval-augmented generation workflow |
examples/sqlite-store.kit | SQLite persistent storage with cosine search |
examples/test-sqlite.kit | SQLite backend integration test |
examples/vector-ops.kit | SIMD vector operations and byte serialization |
LICENSE | MIT license file |
Architecture
RAG Pipeline
Backend Abstraction
Dependencies
jsonpostgressqlite
Installation
kit add gitlab.com/kit-lang/packages/kit-embeddings.gitUsage
import Kit.EmbeddingsLicense
MIT License - see LICENSE for details.
Exported Functions & Types
dot
Dot product of two vectors. Uses Kit's internal SIMD-accelerated implementation.
List Float -> List Float -> Float
Embeddings.dot [1.0, 2.0, 3.0] [4.0, 5.0, 6.0] # => 32.0magnitude
Magnitude (L2 norm) of a vector. Returns sqrt(sum of squares).
List Float -> Float
Embeddings.magnitude [3.0, 4.0] # => 5.0scale
Scale a vector by a scalar value. Returns a new vector with each element multiplied by the scalar.
List Float -> Float -> List Float
Embeddings.scale [1.0, 2.0, 3.0] 2.0 # => [2.0, 4.0, 6.0]normalize
Normalize a vector to unit length. Returns zero vector if input has zero magnitude.
List Float -> List Float
Embeddings.normalize [3.0, 4.0] # => [0.6, 0.8]add
Element-wise addition of two vectors.
List Float -> List Float -> List Float
sub
Element-wise subtraction of two vectors.
List Float -> List Float -> List Float
cosine-similarity
Cosine similarity between two vectors. Returns value in [-1, 1], where 1 = identical direction, 0 = orthogonal, -1 = opposite direction.
Formula: dot(a,b) / (|a| * |b|)
List Float -> List Float -> Float
Embeddings.cosine-similarity [1.0, 0.0] [1.0, 0.0] # => 1.0
Embeddings.cosine-similarity [1.0, 0.0] [0.0, 1.0] # => 0.0euclidean-distance
Euclidean distance between two vectors. Returns the L2 distance (straight-line distance).
Formula: sqrt(sum((a[i] - b[i])^2))
List Float -> List Float -> Float
Embeddings.euclidean-distance [0.0, 0.0] [3.0, 4.0] # => 5.0angular-distance
Angular distance between two vectors. Returns value in [0, 1], where 0 = identical direction, 0.5 = orthogonal, 1 = opposite direction.
Formula: arccos(cosine-similarity) / pi
List Float -> List Float -> Float
Embeddings.angular-distance [1.0, 0.0] [1.0, 0.0] # => 0.0
Embeddings.angular-distance [1.0, 0.0] [0.0, 1.0] # => 0.5to-bytes
Serialize a list of floats to bytes (IEEE 754 f64). Each float is stored as 8 bytes in little-endian format. Useful for storing embeddings as BLOBs in databases.
List Float -> Bytes
bytes = Embeddings.to-bytes [1.0, 2.0, 3.0] # 24 bytesfrom-bytes
Deserialize bytes to a list of floats. Expects IEEE 754 f64 format (8 bytes per float).
Bytes -> List Float
floats = Embeddings.from-bytes bytes # [1.0, 2.0, 3.0]similarity-fn
Get similarity function by metric name. Returns a function (a, b) -> Float.
Supported metrics: - :cosine - Cosine similarity (higher = more similar) - :euclidean - Negative Euclidean distance (higher = more similar) - :dot - Dot product (higher = more similar)
Symbol -> (List Float -> List Float -> Float)
sim-fn = Embeddings.similarity-fn :cosine
score = sim-fn vec-a vec-bdistance-fn
Get distance function by metric name. Returns a function (a, b) -> Float where lower = more similar.
Supported metrics: - :cosine - 1 - cosine similarity - :euclidean - Euclidean distance - :angular - Angular distance [0, 1]
Symbol -> (List Float -> List Float -> Float)
dist-fn = Embeddings.distance-fn :euclidean
distance = dist-fn vec-a vec-bsqlite-create
Create or open an embedding store with SQLite backend.
String -> Int -> Symbol -> Result EmbeddingStore String
sqlite-open
Open an existing embedding store with SQLite backend.
String -> Int -> Symbol -> Result EmbeddingStore String
sqlite-close
Close the embedding store.
EmbeddingStore -> Unit
sqlite-upsert
Insert or update an embedding.
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
sqlite-insert
Insert an embedding (alias for upsert).
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
sqlite-get
Get an embedding by ID.
EmbeddingStore -> String -> Option Record
sqlite-delete
Delete an embedding by ID.
EmbeddingStore -> String -> Result Unit String
sqlite-count
Count total embeddings in the store.
EmbeddingStore -> Int
sqlite-exists?
Check if an embedding exists by ID.
EmbeddingStore -> String -> Bool
sqlite-search
Search for top-k similar embeddings.
EmbeddingStore -> List Float -> Int -> Result (List Record) String
sqlite-search-threshold
Search with a minimum score threshold.
EmbeddingStore -> List Float -> Float -> Result (List Record) String
sqlite-search-filter
Search with metadata filtering.
EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String
sqlite-upsert-batch
Insert multiple embeddings at once.
EmbeddingStore -> List Record -> Result Int String
sqlite-get-all
Get all embeddings (for small stores).
EmbeddingStore -> List Record
sqlite-clear
Clear all embeddings from the store.
EmbeddingStore -> Result Unit String
postgres-create
Create or open an embedding store with PostgreSQL backend.
String -> Int -> Symbol -> Result PgEmbeddingStore String
postgres-search
Search for top-k similar embeddings using pgvector.
PgEmbeddingStore -> List Float -> Int -> Result (List Record) String
postgres-upsert
Insert or update an embedding.
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
postgres-delete
Delete an embedding by ID.
PgEmbeddingStore -> String -> Result Unit String
postgres-count
Count total embeddings in the store.
PgEmbeddingStore -> Int
postgres-create-index
Create HNSW index for fast approximate nearest neighbor search.
PgEmbeddingStore -> Int -> Int -> Result Unit String
Embedding store backed by PostgreSQL with pgvector. Contains database connection, vector dimension, and similarity metric.
create
Create or open an embedding store. Creates the embeddings table with pgvector column if it doesn't exist.
Parameters:
String -> Int -> Symbol -> Result PgEmbeddingStore String
store = Embeddings.Postgres.create "postgresql://localhost/mydb" 1536 :cosinecreate-with-table
Create an embedding store with a custom table name.
String -> Int -> Symbol -> String -> Result PgEmbeddingStore String
store = Embeddings.Postgres.create-with-table conn-string 1536 :cosine "my_vectors"create-index
Create HNSW index for fast approximate nearest neighbor search. Call this after creating the store for better search performance.
Parameters:
PgEmbeddingStore -> Int -> Int -> Result Unit String
Embeddings.Postgres.create-index store 16 64create-index-default
Create HNSW index with default parameters.
PgEmbeddingStore -> Result Unit String
open
Open an existing embedding store. Does not create table if it doesn't exist.
String -> Int -> Symbol -> Result PgEmbeddingStore String
store = Embeddings.Postgres.open "postgresql://localhost/mydb" 1536 :cosineopen-with-table
Open an existing embedding store with a custom table name.
String -> Int -> Symbol -> String -> Result PgEmbeddingStore String
close
Close the embedding store.
PgEmbeddingStore -> Unit
upsert
Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.
Parameters:
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
Embeddings.Postgres.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"insert
Insert an embedding (alias for upsert).
PgEmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
get
Get an embedding by ID. Returns the embedding record or None if not found.
PgEmbeddingStore -> String -> Option Record
match Embeddings.Postgres.get store "doc1"
| Some result -> println result.content
| None -> println "Not found"delete
Delete an embedding by ID.
PgEmbeddingStore -> String -> Result Unit String
Embeddings.Postgres.delete store "doc1"count
Count total embeddings in the store.
PgEmbeddingStore -> Int
n = Embeddings.Postgres.count store # => 1234exists?
Check if an embedding exists by ID.
PgEmbeddingStore -> String -> Bool
search
Search for top-k similar embeddings using pgvector's native operators. Uses the configured metric for similarity comparison.
Parameters:
Returns:
PgEmbeddingStore -> List Float -> Int -> Result (List Record) String
match Embeddings.Postgres.search store query-vec 10
| Ok results ->
results |> each fn(r) => println "${r.score}: ${r.content}"
| Err e -> println "Search failed"search-threshold
Search with a minimum score threshold. Only returns results with score >= threshold.
PgEmbeddingStore -> List Float -> Float -> Result (List Record) String
search-where
Search with metadata filtering using JSONB operators. Filter is a SQL WHERE clause fragment for the metadata column.
PgEmbeddingStore -> List Float -> Int -> String -> Result (List Record) String
# Find documents from a specific source
Embeddings.Postgres.search-where store query-vec 10 "metadata->>'source' = 'api'"set-ef-search
Set the ef_search parameter for HNSW index. Higher values give more accurate results but slower queries. Default is 40. Typical range: 10-200.
PgEmbeddingStore -> Int -> Result Unit String
Embeddings.Postgres.set-ef-search store 100upsert-batch
Insert multiple embeddings at once using a transaction. More efficient than individual inserts.
PgEmbeddingStore -> List Record -> Result Int String
get-all
Get all embeddings (for small stores). Warning: May be slow for large stores. Consider using search instead.
PgEmbeddingStore -> List Record
clear
Clear all embeddings from the store.
PgEmbeddingStore -> Result Unit String
truncate
Truncate the table (faster than delete for large tables).
PgEmbeddingStore -> Result Unit String
vacuum
Vacuum the table to reclaim space and update statistics.
PgEmbeddingStore -> Result Unit String
reindex
Reindex the HNSW index (useful after large batch inserts).
PgEmbeddingStore -> Result Unit String
drop-index
Drop the HNSW index.
PgEmbeddingStore -> Result Unit String
Embedding store backed by SQLite. Contains database connection, vector dimension, and similarity metric.
create
Create or open an embedding store. Creates the embeddings table if it doesn't exist.
Parameters:
String -> Int -> Symbol -> Result EmbeddingStore String
store = Embeddings.SQLite.create "knowledge.db" 1536 :cosineopen
Open an existing embedding store. Does not create table if it doesn't exist.
String -> Int -> Symbol -> Result EmbeddingStore String
store = Embeddings.SQLite.open "knowledge.db" 1536 :cosineclose
Close the embedding store.
EmbeddingStore -> Unit
upsert
Insert or update an embedding. If an embedding with the same ID exists, it will be replaced.
Parameters:
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
Embeddings.SQLite.upsert store "doc1" "Hello world" [0.1, 0.2, ...] "{\"source\": \"api\"}"insert
Insert an embedding (alias for upsert).
EmbeddingStore -> String -> String -> List Float -> String -> Result Unit String
get
Get an embedding by ID. Returns the embedding record or None if not found.
EmbeddingStore -> String -> Option Record
match Embeddings.SQLite.get store "doc1"
| Some result -> println result.content
| None -> println "Not found"delete
Delete an embedding by ID.
EmbeddingStore -> String -> Result Unit String
Embeddings.SQLite.delete store "doc1"count
Count total embeddings in the store.
EmbeddingStore -> Int
n = Embeddings.SQLite.count store # => 1234exists?
Check if an embedding exists by ID.
EmbeddingStore -> String -> Bool
search
Search for top-k similar embeddings. Performs brute-force similarity comparison against all stored embeddings.
Parameters:
Returns:
EmbeddingStore -> List Float -> Int -> Result (List Record) String
match Embeddings.SQLite.search store query-vec 10
| Ok results ->
results |> each fn(r) => println "${r.score}: ${r.content}"
| Err e -> println "Search failed"search-threshold
Search with a minimum score threshold. Only returns results with score >= threshold.
EmbeddingStore -> List Float -> Float -> Result (List Record) String
search-filter
Search with metadata filtering. Only searches embeddings where filter function returns true.
EmbeddingStore -> List Float -> Int -> (Json -> Bool) -> Result (List Record) String
upsert-batch
Insert multiple embeddings at once. More efficient than individual inserts.
EmbeddingStore -> List Record -> Result Int String
get-all
Get all embeddings (for small stores). Warning: May be slow for large stores.
EmbeddingStore -> List Record
clear
Clear all embeddings from the store.
EmbeddingStore -> Result Unit String