pg-parser

PostgreSQL SQL parser - parse, validate, normalize, and fingerprint SQL queries using libpg_query

Files

FileDescription
.editorconfigEditor formatting configuration
.gitignoreGit ignore rules for build artifacts and dependencies
.tool-versionsasdf tool versions (Zig, Kit)
LICENSEMIT license file
README.mdThis file
c/pg_query_shim.cC FFI wrapper around libpg_query
c/pg_query_shim.hC header for the FFI wrapper
examples/demo.kitEnd-to-end parser feature demo
kit.tomlPackage manifest with native build metadata
src/pg-parser.kitKit API for parsing, analysis, and error handling
tests/error-types.test.kitTests for SQL error and complexity types
tests/parser.test.kitParser smoke tests using the native shim

Dependencies

No Kit package dependencies.

Native dependency:

brew install libpg_query

The package builds a local shim library from c/pg_query_shim.c and links it with libpg_query.

  • kit-postgres - PostgreSQL client bindings for Kit using libpq

Installation

kit add gitlab.com/kit-lang/packages/kit-pg-parser.git

Usage

import Kit.PgParser as SQL

main = fn =>
  # Parse SQL to PostgreSQL's JSON AST
  match SQL.parse "SELECT id, name FROM users WHERE active = true"
    | Ok ast -> println "Parse OK: ${String.slice ast 0 100}..."
    | Err e -> println "Parse error: ${SQLParseError.show e}"

  # Validate syntax
  valid? = SQL.valid? "SELECT * FROM users"
  println "Valid: ${Bool.show valid?}"

  # Normalize literals for query grouping
  match SQL.normalize "SELECT * FROM users WHERE id = 42 AND name = 'Alice'"
    | Ok normalized -> println "Normalized: ${normalized}"
    | Err e -> println "Normalize error: ${SQLParseError.show e}"

  # Fingerprint structurally similar queries
  match SQL.fingerprint "SELECT * FROM users WHERE id = 1"
    | Ok fp -> println "Fingerprint: ${fp}"
    | Err e -> println "Fingerprint error: ${SQLParseError.show e}"

  # Extract schema information
  match SQL.tables "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
    | Ok tables -> println "Tables: ${List.show tables}"
    | Err e -> println "Tables error: ${SQLParseError.show e}"

  # Get detailed syntax-error position information
  match SQL.parse-with-position "SELEC * FROM users"
    | Ok _ -> println "Unexpected success"
    | Err e -> println "Syntax error at position ${Int.to-string e.position}: ${e.message}"

  SQL.cleanup

main

API Summary

Core

FunctionDescription
parseParse SQL into a PostgreSQL JSON AST
valid?Return whether SQL parses successfully
normalizeReplace literals with $1, $2, etc.
fingerprintGenerate a stable query fingerprint
deparseConvert SQL to canonical PostgreSQL formatting
parse-with-positionParse with {message, position} error details
pg-versionReturn the PostgreSQL parser version
cleanupFree internal parser memory

Analysis

FunctionDescription
split / statement-countSplit and count multi-statement SQL
statement-type / statement-typesDetect statement types such as SELECT, INSERT, and CREATE TABLE
tables, columns, schemasExtract referenced identifiers
functions, ctes, joinsExtract query structure details
table-aliases, column-aliasesExtract aliases
params, param-count, param-maxExtract positional $N parameters
utility?, classify-statementsClassify utility/DDL statements
has-cte?, has-subquery?, has-join?, has-alias?, has-schema?, has-params?Boolean structure checks
subquery-count, join-countCount structural elements
complexity, complexity-scoreCompute query complexity metrics

Error Types

type SQLParseError =
  | SQLSyntaxError {message: String, position: Int}
  | SQLInternalError {message: String}

SQLParseError implements Show and Error.

Most public SQL functions take NonEmptyString and return Result ... SQLParseError when parsing can fail.

Development

Running Examples

Run examples with the interpreter:

kit run examples/demo.kit

Compile examples to a native binary:

kit build examples/demo.kit && ./demo

Running Tests

Run the test suite:

kit test

Run the test suite with coverage:

kit test --coverage

Running kit dev

Run the standard development workflow (format, check, test):

kit dev

This will:

  1. Format and check source files in src/
  2. Run tests in tests/ with coverage

Running Parity

Check interpreter/compiler parity for examples:

kit parity --no-spinner --failures-only

Generating Documentation

Generate API documentation from doc comments:

kit doc

Note: Kit sources with doc comments (##) will generate HTML documents in docs/*.html

Cleaning Build Artifacts

Remove generated files, caches, and build artifacts:

kit task clean

Note: Defined in kit.toml.

Local Installation

To install this package locally for development:

kit install

This installs the package to ~/.kit/packages/@kit/pg-parser/, making it available for import as Kit.PgParser in other projects.

License

This package is released under the MIT License - see LICENSE for details.

libpg_query is released under the BSD 3-Clause License.

Exported Functions & Types

SQLParseError

SQL parse error type with position information.

Variants

SQLSyntaxError {message, position}
SQL has a syntax error at a specific position
SQLInternalError {message}
Internal parser error (rare)

parse

Parse SQL and return the AST as a JSON string.

Parameters:

Returns:

The JSON AST follows PostgreSQL's internal parse tree structure. See: https://github.com/pganalyze/libpg_query for documentation.

NonEmptyString -> Result String SQLParseError

match PgParser.parse "SELECT id, name FROM users WHERE active = true"
  | Ok ast -> print ast
  | Err e -> print "Error: ${SQLParseError.message e}"

valid?

Check if SQL is syntactically valid.

Parameters:

Returns:

NonEmptyString -> Bool

if PgParser.valid? "SELECT * FROM users" then
  print "Valid SQL"
else
  print "Invalid SQL"

normalize

Normalize SQL by replacing literal values with positional parameters.

This is useful for: - Comparing queries that differ only in literal values - Preparing queries for caching/analysis - Redacting sensitive data from SQL logs

Parameters:

Returns:

NonEmptyString -> Result String SQLParseError

PgParser.normalize "SELECT * FROM users WHERE id = 42 AND name = 'Alice'"
# => Ok "SELECT * FROM users WHERE id = $1 AND name = $2"

fingerprint

Generate a fingerprint hash for query grouping.

Fingerprints allow grouping similar queries that differ only in literal values. This is useful for: - Query analysis and statistics - Identifying slow query patterns - Deduplicating query logs

Parameters:

Returns:

NonEmptyString -> Result String SQLParseError

PgParser.fingerprint "SELECT * FROM users WHERE id = 1"
# => Ok "a0ead580058af585"
PgParser.fingerprint "SELECT * FROM users WHERE id = 999"
# => Ok "a0ead580058af585"  # Same fingerprint!

pg-version

Get the PostgreSQL version that the parser was built with.

Returns:

String

print "Parser uses PostgreSQL ${PgParser.pg-version}"

parse-with-position

Try to parse SQL and return detailed error information.

Unlike parse, this returns both the error message and position in a structured format, useful for editor integrations.

Parameters:

Returns:

NonEmptyString -> Result String {message: String, position: Int}

match PgParser.parse-with-position "SELEC * FROM users"
  | Ok ast -> print "Valid"
  | Err e -> print "Error at ${Int.to-string e.position}: ${e.message}"

cleanup

Clean up internal memory used by the parser.

Call this when you're done using the parser to free allocated memory. This is optional but recommended for long-running applications.

Unit

split

Split SQL containing multiple statements into individual statements.

This handles complex cases like: - Comments containing semicolons - Strings containing semicolons - CREATE RULE statements with embedded queries

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.split "SELECT 1; SELECT 2; SELECT 3"
# => Ok ["SELECT 1", "SELECT 2", "SELECT 3"]

PgParser.split "SELECT 'a;b'; SELECT 1"
# => Ok ["SELECT 'a;b'", "SELECT 1"]  -- semicolon in string handled

statement-count

Count the number of statements in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.statement-count "SELECT 1; SELECT 2"
# => Ok 2

utility?

Check if a single SQL statement is a utility statement.

Utility statements are DDL and administrative commands like: - CREATE, ALTER, DROP - GRANT, REVOKE - VACUUM, ANALYZE, REINDEX - SET, SHOW, RESET - COPY, TRUNCATE - LISTEN, NOTIFY, UNLISTEN

Non-utility (DML) statements: - SELECT, INSERT, UPDATE, DELETE, MERGE

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.utility? "CREATE TABLE users (id INT)"
# => Ok true

PgParser.utility? "SELECT * FROM users"
# => Ok false

classify-statements

Classify multiple statements as utility or DML.

Returns a list of booleans, one per statement, indicating whether each statement is a utility statement.

Parameters:

Returns:

NonEmptyString -> Result (List Bool) SQLParseError

PgParser.classify-statements "CREATE TABLE t (id INT); SELECT * FROM t"
# => Ok [true, false]

tables

Extract all table names referenced in SQL.

Returns unique table names from the query, including: - Tables in FROM clauses - Tables in JOIN clauses - Tables in subqueries - Tables in INSERT/UPDATE/DELETE targets

Schema-qualified names are returned as "schema.table".

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.tables "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["users", "orders"]

PgParser.tables "SELECT * FROM public.users"
# => Ok ["public.users"]

columns

Extract all column references from SQL.

Returns unique column names from the query, including: - Columns in SELECT list - Columns in WHERE clause - Columns in ORDER BY, GROUP BY, HAVING - Columns in JOIN conditions

Qualified column names are returned as "table.column" or "schema.table.column". Star (*) columns are excluded.

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.columns "SELECT id, name FROM users WHERE active = true"
# => Ok ["id", "name", "active"]

PgParser.columns "SELECT u.id, u.name FROM users u"
# => Ok ["u.id", "u.name"]

statement-type

Detect the type of a single SQL statement.

Returns a human-readable statement type like: - "SELECT", "INSERT", "UPDATE", "DELETE", "MERGE" (DML) - "CREATE TABLE", "CREATE INDEX", "CREATE VIEW" (DDL) - "ALTER TABLE", "DROP", "TRUNCATE" (DDL) - "GRANT", "REVOKE", "CREATE ROLE" (DCL) - "BEGIN", "COMMIT", "ROLLBACK" (TCL) - "EXPLAIN", "VACUUM", "ANALYZE" (Utility)

Parameters:

Returns:

NonEmptyString -> Result String SQLParseError

PgParser.statement-type "SELECT * FROM users"
# => Ok "SELECT"

PgParser.statement-type "CREATE TABLE users (id INT)"
# => Ok "CREATE TABLE"

statement-types

Detect the types of multiple SQL statements.

Returns a list of statement types, one per statement.

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.statement-types "SELECT 1; INSERT INTO t VALUES (1); DROP TABLE t"
# => Ok ["SELECT", "INSERT", "DROP"]

deparse

Convert SQL to its canonical form by parsing and deparsing.

This round-trips the SQL through PostgreSQL's parser: 1. Parse the SQL to an internal AST (protobuf format) 2. Deparse the AST back to SQL

The resulting SQL is in PostgreSQL's canonical format: - Consistent formatting and whitespace - Explicit parentheses where needed - Normalized identifier casing

Use cases: - Normalizing SQL formatting - Validating SQL can be round-tripped - Comparing SQL queries for semantic equality - Pretty-printing SQL

Parameters:

Returns:

NonEmptyString -> Result String SQLParseError

PgParser.deparse "select ID,NAME from USERS where ID=1"
# => Ok "SELECT id, name FROM users WHERE (id = 1)"

functions

Extract all function calls from SQL.

Returns unique function names from the query, including: - Aggregate functions (COUNT, SUM, AVG, MIN, MAX) - Scalar functions (COALESCE, NULLIF, GREATEST, LEAST) - Date/time functions (NOW, CURRENT_TIMESTAMP, DATE_TRUNC) - String functions (UPPER, LOWER, CONCAT, SUBSTRING) - User-defined functions

Schema-qualified function names are returned as "schema.function_name".

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.functions "SELECT COUNT(*), MAX(price) FROM orders WHERE created_at > NOW()"
# => Ok ["count", "max", "now"]

PgParser.functions "SELECT public.my_func(id) FROM users"
# => Ok ["public.my_func"]

ctes

Extract all CTE names from SQL.

CTEs (Common Table Expressions) are defined with the WITH clause. This function extracts the names of all CTEs in a query.

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.ctes "WITH active AS (SELECT * FROM users WHERE active) SELECT * FROM active"
# => Ok ["active"]

PgParser.ctes "WITH a AS (SELECT 1), b AS (SELECT 2) SELECT * FROM a, b"
# => Ok ["a", "b"]

has-cte?

Check if SQL contains any CTEs (WITH clause).

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-cte? "WITH x AS (SELECT 1) SELECT * FROM x"
# => Ok true

PgParser.has-cte? "SELECT * FROM users"
# => Ok false

subquery-count

Count the number of subqueries in SQL.

Subqueries are nested SELECT statements that appear in: - WHERE clause (IN, EXISTS, scalar subqueries) - FROM clause (derived tables) - SELECT list (scalar subqueries)

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.subquery-count "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok 1

PgParser.subquery-count "SELECT * FROM (SELECT * FROM a) x WHERE x.id IN (SELECT id FROM b)"
# => Ok 2

has-subquery?

Check if SQL contains any subqueries.

A convenience function that returns a boolean instead of the count.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-subquery? "SELECT * FROM users WHERE id IN (SELECT 1)"
# => Ok true

PgParser.has-subquery? "SELECT * FROM users"
# => Ok false

joins

Extract join types from SQL.

Returns a list of join types found in the query: - "INNER" - INNER JOIN or just JOIN - "LEFT" - LEFT JOIN or LEFT OUTER JOIN - "RIGHT" - RIGHT JOIN or RIGHT OUTER JOIN - "FULL" - FULL JOIN or FULL OUTER JOIN - "CROSS" - CROSS JOIN

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.joins "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["INNER"]

PgParser.joins "SELECT * FROM a LEFT JOIN b ON a.id = b.a_id RIGHT JOIN c ON b.id = c.b_id"
# => Ok ["RIGHT", "LEFT"]

join-count

Count the number of joins in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.join-count "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok 2

has-join?

Check if SQL contains any joins.

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-join? "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok true

PgParser.has-join? "SELECT * FROM users"
# => Ok false

table-aliases

Extract table aliases from SQL.

Table aliases are defined with the AS keyword or implicitly: - FROM users AS u or FROM users u - JOIN orders AS o or JOIN orders o

Returns aliases in the format "alias:table" (e.g., "u:users").

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.table-aliases "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["u:users", "o:orders"]

column-aliases

Extract column aliases from SQL.

Column aliases are defined with the AS keyword: - SELECT id AS user_id - SELECT name AS full_name

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.column-aliases "SELECT id AS user_id, name AS full_name FROM users"
# => Ok ["user_id", "full_name"]

has-alias?

Check if SQL contains any aliases.

A convenience function that returns a boolean indicating whether the query uses any table or column aliases.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-alias? "SELECT * FROM users u"
# => Ok true

PgParser.has-alias? "SELECT * FROM users"
# => Ok false

schemas

Extract schema names from SQL.

Schemas are found in schema-qualified identifiers: - public.users → "public" - sales.orders → "sales"

Returns unique schema names.

Parameters:

Returns:

NonEmptyString -> Result (List String) SQLParseError

PgParser.schemas "SELECT * FROM public.users JOIN sales.orders ON ..."
# => Ok ["public", "sales"]

has-schema?

Check if SQL contains any schema-qualified identifiers.

A convenience function that returns a boolean indicating whether the query uses any schema-qualified table or function names.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-schema? "SELECT * FROM public.users"
# => Ok true

PgParser.has-schema? "SELECT * FROM users"
# => Ok false

params

Extract all positional parameter numbers from SQL.

PostgreSQL uses $1, $2, $3, etc. for positional parameters. This function extracts all unique parameter numbers from a query, sorted in ascending order.

Parameters:

Returns:

NonEmptyString -> Result (List Int) SQLParseError

PgParser.params "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok [1, 2]

PgParser.params "SELECT * FROM users WHERE id = $3 AND name = $1"
# => Ok [1, 3]  -- sorted

param-count

Count the number of unique positional parameters in SQL.

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.param-count "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok 2

param-max

Get the maximum parameter number in SQL.

This is useful for knowing how many parameters need to be provided. For example, if a query uses $1 AND $3 (skipping $2), this returns 3.

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.param-max "SELECT * FROM users WHERE id = $1 AND name = $5"
# => Ok 5  -- max is 5, even though $2, $3, $4 are missing

has-params?

Check if SQL contains any positional parameters.

A convenience function that returns a boolean indicating whether the query uses $N style positional parameters.

Parameters:

Returns:

NonEmptyString -> Result Bool SQLParseError

PgParser.has-params? "SELECT * FROM users WHERE id = $1"
# => Ok true

PgParser.has-params? "SELECT * FROM users WHERE id = 1"
# => Ok false

QueryComplexity

Query complexity breakdown with individual metrics and total score.

Fields: - tables: Int - Number of tables referenced - joins: Int - Number of JOIN operations - subqueries: Int - Number of subqueries - ctes: Int - Number of CTEs (WITH clauses) - functions: Int - Number of function calls - score: Int - Weighted complexity score

Weights used for score calculation: - tables: 1 (base unit) - joins: 2 (multiplies row combinations) - subqueries: 3 (nested execution, most expensive) - ctes: 2 (similar to subqueries but often optimized) - functions: 1 (adds computation)

Variants

QueryComplexity {tables, joins, subqueries, ctes, functions, score}

complexity

Calculate query complexity with detailed breakdown.

Returns a QueryComplexity record containing individual metrics and a weighted total score. Higher scores indicate more complex queries.

Parameters:

Returns:

NonEmptyString -> Result QueryComplexity SQLParseError

PgParser.complexity "SELECT * FROM users"
# => Ok (QueryComplexity {tables: 1, joins: 0, subqueries: 0, ctes: 0, functions: 0, score: 1})

PgParser.complexity "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok (QueryComplexity {tables: 3, joins: 2, subqueries: 0, ctes: 0, functions: 0, score: 7})

PgParser.complexity "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok (QueryComplexity {tables: 2, joins: 0, subqueries: 1, ctes: 0, functions: 0, score: 5})

complexity-score

Calculate query complexity score (simple version).

Returns just the weighted complexity score as an integer. Use complexity for a detailed breakdown of individual metrics.

Parameters:

Returns:

NonEmptyString -> Result Int SQLParseError

PgParser.complexity-score "SELECT * FROM users"
# => Ok 1

PgParser.complexity-score "SELECT * FROM a JOIN b JOIN c"
# => Ok 7