pg-parser

PostgreSQL SQL parser - parse, validate, normalize, and fingerprint SQL queries using libpg_query

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
examples/demo.kitExample: demo
kit.tomlPackage manifest with metadata and dependencies
src/pg-parser.kitKit PostgreSQL Parser - Parse, validate, normalize, and fingerprint SQL queries
tests/error-types.test.kitTests for error-types
tests/parser.test.kitTests for parser
zig/MakefileBuild automation rules
zig/kit_ffi.zigZig FFI module for kit ffi
zig/pg_query_shim.cC shim wrapping libpg_query
zig/pg_query_shim.hC header for libpg_query shim

Dependencies

No Kit package dependencies.

Installation

kit add gitlab.com/kit-lang/packages/kit-pg-parser.git

Usage

import Kit.PgParser as Sql

main = fn() =>
  # Parse SQL to JSON AST
  match Sql.parse "SELECT id, name FROM users WHERE active = true"
    | Ok ast -> println "Parse OK: ${String.slice ast 0 100}..."
    | Err e -> println "Parse error: ${SqlParseError.show e}"

  # Check if SQL is valid
  valid = Sql.valid? "SELECT * FROM users"
  println "Valid: ${Bool.show valid}"

  # Normalize SQL (replace literals with $1, $2, etc.)
  match Sql.normalize "SELECT * FROM users WHERE id = 42 AND name = 'Alice'"
    | Ok normalized -> println "Normalized: ${normalized}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Fingerprint for query grouping
  match Sql.fingerprint "SELECT * FROM users WHERE id = 1"
    | Ok fp -> println "Fingerprint: ${fp}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Extract tables referenced in a query
  match Sql.tables "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
    | Ok tables -> println "Tables: ${List.show tables}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Extract column references
  match Sql.columns "SELECT id, name FROM users WHERE active = true"
    | Ok cols -> println "Columns: ${List.show cols}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Detect statement type
  match Sql.statement-type "CREATE TABLE users (id INT)"
    | Ok stmt-type -> println "Type: ${stmt-type}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Split multi-statement SQL
  match Sql.split "SELECT 1; SELECT 2; SELECT 3"
    | Ok stmts -> println "Statements: ${Int.to-string (List.length stmts)}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Deparse (normalize SQL formatting)
  match Sql.deparse "select ID,NAME from USERS where ID=1"
    | Ok deparsed -> println "Deparsed: ${deparsed}"
    | Err e -> println "Error: ${SqlParseError.show e}"

  # Cleanup allocated memory
  Sql.cleanup()

main

API Reference

Core Functions

FunctionSignatureDescription
parseString -> Result String SqlParseErrorParse SQL to JSON AST
valid?String -> BoolCheck if SQL is syntactically valid
normalizeString -> Result String SqlParseErrorReplace literals with $1, $2, etc.
fingerprintString -> Result String SqlParseErrorGenerate a hash for query grouping
deparseString -> Result String SqlParseErrorConvert SQL to canonical format
pg-version() -> StringGet the PostgreSQL parser version
cleanup() -> UnitFree internal parser memory
parse-with-positionString -> Result String {message: String, position: Int}Parse with detailed error position

Multi-Statement Functions

FunctionSignatureDescription
splitString -> Result (List String) SqlParseErrorSplit multi-statement SQL
statement-countString -> Result Int SqlParseErrorCount statements in SQL
statement-typeString -> Result String SqlParseErrorDetect statement type (SELECT, INSERT, etc.)
statement-typesString -> Result (List String) SqlParseErrorDetect types of multiple statements

Extraction Functions

FunctionSignatureDescription
tablesString -> Result (List String) SqlParseErrorExtract table names
columnsString -> Result (List String) SqlParseErrorExtract column references
functionsString -> Result (List String) SqlParseErrorExtract function calls
ctesString -> Result (List String) SqlParseErrorExtract CTE names
joinsString -> Result (List String) SqlParseErrorExtract join types
schemasString -> Result (List String) SqlParseErrorExtract schema names
table-aliasesString -> Result (List String) SqlParseErrorExtract table aliases (alias:table format)
column-aliasesString -> Result (List String) SqlParseErrorExtract column aliases
paramsString -> Result (List Int) SqlParseErrorExtract positional parameter numbers

Detection Functions

FunctionSignatureDescription
utility?String -> Result Bool SqlParseErrorCheck if statement is DDL/utility
classify-statementsString -> Result (List Bool) SqlParseErrorClassify multiple statements as utility/DML
has-cte?String -> Result Bool SqlParseErrorCheck if SQL contains CTEs
has-subquery?String -> Result Bool SqlParseErrorCheck if SQL contains subqueries
has-join?String -> Result Bool SqlParseErrorCheck if SQL contains joins
has-alias?String -> Result Bool SqlParseErrorCheck if SQL contains aliases
has-schema?String -> Result Bool SqlParseErrorCheck if SQL has schema-qualified identifiers
has-params?String -> Result Bool SqlParseErrorCheck if SQL contains $N parameters

Count Functions

FunctionSignatureDescription
join-countString -> Result Int SqlParseErrorCount total joins
subquery-countString -> Result Int SqlParseErrorCount subqueries
param-countString -> Result Int SqlParseErrorCount unique parameters
param-maxString -> Result Int SqlParseErrorGet maximum parameter number

Complexity Analysis

FunctionSignatureDescription
complexityString -> Result QueryComplexity SqlParseErrorDetailed complexity breakdown
complexity-scoreString -> Result Int SqlParseErrorWeighted complexity score

Error Types

type SqlParseError =
  | SqlSyntaxError {message: String, position: Int}
  | SqlInternalError {message: String}

SqlParseError implements Show and Error traits.

Development

Running Examples

Run examples with the interpreter:

kit run examples/demo.kit

Compile examples to a native binary:

kit build examples/demo.kit && ./demo

Running Tests

Run the test suite:

kit test

Run the test suite with coverage:

kit test --coverage

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

  1. Format and check source files in src/
  2. Run tests in tests/ with coverage

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/pg-parser/, making it available for import as Kit.PgParser in other projects.

License

This package is released under the MIT License - see LICENSE for details.

libpg_query is released under the BSD 3-Clause License.

Exported Functions & Types

SqlParseError

SQL parse error type with position information.

Variants

SqlSyntaxError {message, position}
SQL has a syntax error at a specific position
SqlInternalError {message}
Internal parser error (rare)

parse

Parse SQL and return the AST as a JSON string.

Parameters:

Returns:

The JSON AST follows PostgreSQL's internal parse tree structure. See: https://github.com/pganalyze/libpg_query for documentation.

NonEmptyString -> Result String SqlParseError

match PgParser.parse "SELECT id, name FROM users WHERE active = true"
  | Ok ast -> print ast
  | Err e -> print "Error: ${SqlParseError.message e}"

valid?

Check if SQL is syntactically valid.

Parameters:

Returns:

NonEmptyString -> Bool

if PgParser.valid? "SELECT * FROM users" then
  print "Valid SQL"
else
  print "Invalid SQL"

normalize

Normalize SQL by replacing literal values with positional parameters.

This is useful for: - Comparing queries that differ only in literal values - Preparing queries for caching/analysis - Redacting sensitive data from SQL logs

Parameters:

Returns:

NonEmptyString -> Result String SqlParseError

PgParser.normalize "SELECT * FROM users WHERE id = 42 AND name = 'Alice'"
# => Ok "SELECT * FROM users WHERE id = $1 AND name = $2"

fingerprint

Generate a fingerprint hash for query grouping.

Fingerprints allow grouping similar queries that differ only in literal values. This is useful for: - Query analysis and statistics - Identifying slow query patterns - Deduplicating query logs

Parameters:

Returns:

NonEmptyString -> Result String SqlParseError

PgParser.fingerprint "SELECT * FROM users WHERE id = 1"
# => Ok "a0ead580058af585"
PgParser.fingerprint "SELECT * FROM users WHERE id = 999"
# => Ok "a0ead580058af585"  # Same fingerprint!

pg-version

Get the PostgreSQL version that the parser was built with.

Returns:

String

print "Parser uses PostgreSQL ${PgParser.pg-version}"

parse-with-position

Try to parse SQL and return detailed error information.

Unlike parse, this returns both the error message and position in a structured format, useful for editor integrations.

Parameters:

Returns:

NonEmptyString -> Result String {message: String, position: Int}

match PgParser.parse-with-position "SELEC * FROM users"
  | Ok ast -> print "Valid"
  | Err e -> print "Error at ${Int.to-string e.position}: ${e.message}"

cleanup

Clean up internal memory used by the parser.

Call this when you're done using the parser to free allocated memory. This is optional but recommended for long-running applications.

Unit

split

Split SQL containing multiple statements into individual statements.

This handles complex cases like: - Comments containing semicolons - Strings containing semicolons - CREATE RULE statements with embedded queries

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.split "SELECT 1; SELECT 2; SELECT 3"
# => Ok ["SELECT 1", "SELECT 2", "SELECT 3"]

PgParser.split "SELECT 'a;b'; SELECT 1"
# => Ok ["SELECT 'a;b'", "SELECT 1"]  -- semicolon in string handled

statement-count

Count the number of statements in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.statement-count "SELECT 1; SELECT 2"
# => Ok 2

utility?

Check if a single SQL statement is a utility statement.

Utility statements are DDL and administrative commands like: - CREATE, ALTER, DROP - GRANT, REVOKE - VACUUM, ANALYZE, REINDEX - SET, SHOW, RESET - COPY, TRUNCATE - LISTEN, NOTIFY, UNLISTEN

Non-utility (DML) statements: - SELECT, INSERT, UPDATE, DELETE, MERGE

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.utility? "CREATE TABLE users (id INT)"
# => Ok true

PgParser.utility? "SELECT * FROM users"
# => Ok false

classify-statements

Classify multiple statements as utility or DML.

Returns a list of booleans, one per statement, indicating whether each statement is a utility statement.

Parameters:

Returns:

NonEmptyString -> Result (List Bool) SqlParseError

PgParser.classify-statements "CREATE TABLE t (id INT); SELECT * FROM t"
# => Ok [true, false]

tables

Extract all table names referenced in SQL.

Returns unique table names from the query, including: - Tables in FROM clauses - Tables in JOIN clauses - Tables in subqueries - Tables in INSERT/UPDATE/DELETE targets

Schema-qualified names are returned as "schema.table".

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.tables "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["users", "orders"]

PgParser.tables "SELECT * FROM public.users"
# => Ok ["public.users"]

columns

Extract all column references from SQL.

Returns unique column names from the query, including: - Columns in SELECT list - Columns in WHERE clause - Columns in ORDER BY, GROUP BY, HAVING - Columns in JOIN conditions

Qualified column names are returned as "table.column" or "schema.table.column". Star (*) columns are excluded.

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.columns "SELECT id, name FROM users WHERE active = true"
# => Ok ["id", "name", "active"]

PgParser.columns "SELECT u.id, u.name FROM users u"
# => Ok ["u.id", "u.name"]

statement-type

Detect the type of a single SQL statement.

Returns a human-readable statement type like: - "SELECT", "INSERT", "UPDATE", "DELETE", "MERGE" (DML) - "CREATE TABLE", "CREATE INDEX", "CREATE VIEW" (DDL) - "ALTER TABLE", "DROP", "TRUNCATE" (DDL) - "GRANT", "REVOKE", "CREATE ROLE" (DCL) - "BEGIN", "COMMIT", "ROLLBACK" (TCL) - "EXPLAIN", "VACUUM", "ANALYZE" (Utility)

Parameters:

Returns:

NonEmptyString -> Result String SqlParseError

PgParser.statement-type "SELECT * FROM users"
# => Ok "SELECT"

PgParser.statement-type "CREATE TABLE users (id INT)"
# => Ok "CREATE TABLE"

statement-types

Detect the types of multiple SQL statements.

Returns a list of statement types, one per statement.

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.statement-types "SELECT 1; INSERT INTO t VALUES (1); DROP TABLE t"
# => Ok ["SELECT", "INSERT", "DROP"]

deparse

Convert SQL to its canonical form by parsing and deparsing.

This round-trips the SQL through PostgreSQL's parser: 1. Parse the SQL to an internal AST (protobuf format) 2. Deparse the AST back to SQL

The resulting SQL is in PostgreSQL's canonical format: - Consistent formatting and whitespace - Explicit parentheses where needed - Normalized identifier casing

Use cases: - Normalizing SQL formatting - Validating SQL can be round-tripped - Comparing SQL queries for semantic equality - Pretty-printing SQL

Parameters:

Returns:

NonEmptyString -> Result String SqlParseError

PgParser.deparse "select ID,NAME from USERS where ID=1"
# => Ok "SELECT id, name FROM users WHERE (id = 1)"

functions

Extract all function calls from SQL.

Returns unique function names from the query, including: - Aggregate functions (COUNT, SUM, AVG, MIN, MAX) - Scalar functions (COALESCE, NULLIF, GREATEST, LEAST) - Date/time functions (NOW, CURRENT_TIMESTAMP, DATE_TRUNC) - String functions (UPPER, LOWER, CONCAT, SUBSTRING) - User-defined functions

Schema-qualified function names are returned as "schema.function_name".

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.functions "SELECT COUNT(*), MAX(price) FROM orders WHERE created_at > NOW()"
# => Ok ["count", "max", "now"]

PgParser.functions "SELECT public.my_func(id) FROM users"
# => Ok ["public.my_func"]

ctes

Extract all CTE names from SQL.

CTEs (Common Table Expressions) are defined with the WITH clause. This function extracts the names of all CTEs in a query.

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.ctes "WITH active AS (SELECT * FROM users WHERE active) SELECT * FROM active"
# => Ok ["active"]

PgParser.ctes "WITH a AS (SELECT 1), b AS (SELECT 2) SELECT * FROM a, b"
# => Ok ["a", "b"]

has-cte?

Check if SQL contains any CTEs (WITH clause).

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-cte? "WITH x AS (SELECT 1) SELECT * FROM x"
# => Ok true

PgParser.has-cte? "SELECT * FROM users"
# => Ok false

subquery-count

Count the number of subqueries in SQL.

Subqueries are nested SELECT statements that appear in: - WHERE clause (IN, EXISTS, scalar subqueries) - FROM clause (derived tables) - SELECT list (scalar subqueries)

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.subquery-count "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok 1

PgParser.subquery-count "SELECT * FROM (SELECT * FROM a) x WHERE x.id IN (SELECT id FROM b)"
# => Ok 2

has-subquery?

Check if SQL contains any subqueries.

A convenience function that returns a boolean instead of the count.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-subquery? "SELECT * FROM users WHERE id IN (SELECT 1)"
# => Ok true

PgParser.has-subquery? "SELECT * FROM users"
# => Ok false

joins

Extract join types from SQL.

Returns a list of join types found in the query: - "INNER" - INNER JOIN or just JOIN - "LEFT" - LEFT JOIN or LEFT OUTER JOIN - "RIGHT" - RIGHT JOIN or RIGHT OUTER JOIN - "FULL" - FULL JOIN or FULL OUTER JOIN - "CROSS" - CROSS JOIN

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.joins "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["INNER"]

PgParser.joins "SELECT * FROM a LEFT JOIN b ON a.id = b.a_id RIGHT JOIN c ON b.id = c.b_id"
# => Ok ["RIGHT", "LEFT"]

join-count

Count the number of joins in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.join-count "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok 2

has-join?

Check if SQL contains any joins.

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-join? "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok true

PgParser.has-join? "SELECT * FROM users"
# => Ok false

table-aliases

Extract table aliases from SQL.

Table aliases are defined with the AS keyword or implicitly: - FROM users AS u or FROM users u - JOIN orders AS o or JOIN orders o

Returns aliases in the format "alias:table" (e.g., "u:users").

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.table-aliases "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["u:users", "o:orders"]

column-aliases

Extract column aliases from SQL.

Column aliases are defined with the AS keyword: - SELECT id AS user_id - SELECT name AS full_name

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.column-aliases "SELECT id AS user_id, name AS full_name FROM users"
# => Ok ["user_id", "full_name"]

has-alias?

Check if SQL contains any aliases.

A convenience function that returns a boolean indicating whether the query uses any table or column aliases.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-alias? "SELECT * FROM users u"
# => Ok true

PgParser.has-alias? "SELECT * FROM users"
# => Ok false

schemas

Extract schema names from SQL.

Schemas are found in schema-qualified identifiers: - public.users → "public" - sales.orders → "sales"

Returns unique schema names.

Parameters:

Returns:

NonEmptyString -> Result (List String) SqlParseError

PgParser.schemas "SELECT * FROM public.users JOIN sales.orders ON ..."
# => Ok ["public", "sales"]

has-schema?

Check if SQL contains any schema-qualified identifiers.

A convenience function that returns a boolean indicating whether the query uses any schema-qualified table or function names.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-schema? "SELECT * FROM public.users"
# => Ok true

PgParser.has-schema? "SELECT * FROM users"
# => Ok false

params

Extract all positional parameter numbers from SQL.

PostgreSQL uses $1, $2, $3, etc. for positional parameters. This function extracts all unique parameter numbers from a query, sorted in ascending order.

Parameters:

Returns:

NonEmptyString -> Result (List Int) SqlParseError

PgParser.params "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok [1, 2]

PgParser.params "SELECT * FROM users WHERE id = $3 AND name = $1"
# => Ok [1, 3]  -- sorted

param-count

Count the number of unique positional parameters in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.param-count "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok 2

param-max

Get the maximum parameter number in SQL.

This is useful for knowing how many parameters need to be provided. For example, if a query uses $1 AND $3 (skipping $2), this returns 3.

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.param-max "SELECT * FROM users WHERE id = $1 AND name = $5"
# => Ok 5  -- max is 5, even though $2, $3, $4 are missing

has-params?

Check if SQL contains any positional parameters.

A convenience function that returns a boolean indicating whether the query uses $N style positional parameters.

Parameters:

Returns:

NonEmptyString -> Result Bool SqlParseError

PgParser.has-params? "SELECT * FROM users WHERE id = $1"
# => Ok true

PgParser.has-params? "SELECT * FROM users WHERE id = 1"
# => Ok false

QueryComplexity

Query complexity breakdown with individual metrics and total score.

Fields: - tables: Int - Number of tables referenced - joins: Int - Number of JOIN operations - subqueries: Int - Number of subqueries - ctes: Int - Number of CTEs (WITH clauses) - functions: Int - Number of function calls - score: Int - Weighted complexity score

Weights used for score calculation: - tables: 1 (base unit) - joins: 2 (multiplies row combinations) - subqueries: 3 (nested execution, most expensive) - ctes: 2 (similar to subqueries but often optimized) - functions: 1 (adds computation)

Variants

QueryComplexity {tables, joins, subqueries, ctes, functions, score}

complexity

Calculate query complexity with detailed breakdown.

Returns a QueryComplexity record containing individual metrics and a weighted total score. Higher scores indicate more complex queries.

Parameters:

Returns:

NonEmptyString -> Result QueryComplexity SqlParseError

PgParser.complexity "SELECT * FROM users"
# => Ok (QueryComplexity {tables: 1, joins: 0, subqueries: 0, ctes: 0, functions: 0, score: 1})

PgParser.complexity "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok (QueryComplexity {tables: 3, joins: 2, subqueries: 0, ctes: 0, functions: 0, score: 7})

PgParser.complexity "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok (QueryComplexity {tables: 2, joins: 0, subqueries: 1, ctes: 0, functions: 0, score: 5})

complexity-score

Calculate query complexity score (simple version).

Returns just the weighted complexity score as an integer. Use complexity for a detailed breakdown of individual metrics.

Parameters:

Returns:

NonEmptyString -> Result Int SqlParseError

PgParser.complexity-score "SELECT * FROM users"
# => Ok 1

PgParser.complexity-score "SELECT * FROM a JOIN b JOIN c"
# => Ok 7