pg-parser

PostgreSQL SQL parser - parse, validate, normalize, and fingerprint SQL queries using libpg_query

Files

FileDescription
kit.tomlPackage manifest with metadata and dependencies
src/main.kitSQL parsing, validation, and analysis API
tests/test-parser.kitSQL validation for SELECT, INSERT, UPDATE, DELETE
examples/demo.kitFull API demo: parse, normalize, fingerprint, analyze
LICENSEMIT license file

Dependencies

No Kit package dependencies.

Installation

kit add gitlab.com/kit-lang/packages/kit-pg-parser.git

Usage

import Kit.PgParser

License

MIT License - see LICENSE for details.

Exported Functions & Types

SqlParseError

SQL parse error type with position information.

Variants

SqlSyntaxError {message, position}
SQL has a syntax error at a specific position
SqlInternalError {message}
Internal parser error (rare)

parse

Parse SQL and return the AST as a JSON string.

Parameters:

Returns:

The JSON AST follows PostgreSQL's internal parse tree structure. See: https://github.com/pganalyze/libpg_query for documentation.

String -> Result String SqlParseError

match PgParser.parse "SELECT id, name FROM users WHERE active = true"
  | Ok ast -> print ast
  | Err e -> print "Error: ${SqlParseError.message e}"

valid?

Check if SQL is syntactically valid.

Parameters:

Returns:

String -> Bool

if PgParser.valid? "SELECT * FROM users" then
  print "Valid SQL"
else
  print "Invalid SQL"

normalize

Normalize SQL by replacing literal values with positional parameters.

This is useful for: - Comparing queries that differ only in literal values - Preparing queries for caching/analysis - Redacting sensitive data from SQL logs

Parameters:

Returns:

String -> Result String SqlParseError

PgParser.normalize "SELECT * FROM users WHERE id = 42 AND name = 'Alice'"
# => Ok "SELECT * FROM users WHERE id = $1 AND name = $2"

fingerprint

Generate a fingerprint hash for query grouping.

Fingerprints allow grouping similar queries that differ only in literal values. This is useful for: - Query analysis and statistics - Identifying slow query patterns - Deduplicating query logs

Parameters:

Returns:

String -> Result String SqlParseError

PgParser.fingerprint "SELECT * FROM users WHERE id = 1"
# => Ok "a0ead580058af585"
PgParser.fingerprint "SELECT * FROM users WHERE id = 999"
# => Ok "a0ead580058af585"  # Same fingerprint!

pg-version

Get the PostgreSQL version that the parser was built with.

Returns:

String

print "Parser uses PostgreSQL ${PgParser.pg-version}"

parse-with-position

Try to parse SQL and return detailed error information.

Unlike parse, this returns both the error message and position in a structured format, useful for editor integrations.

Parameters:

Returns:

String -> Result String {message: String, position: Int}

match PgParser.parse-with-position "SELEC * FROM users"
  | Ok ast -> print "Valid"
  | Err e -> print "Error at ${Int.to-string e.position}: ${e.message}"

cleanup

Clean up internal memory used by the parser.

Call this when you're done using the parser to free allocated memory. This is optional but recommended for long-running applications.

Unit

split

Split SQL containing multiple statements into individual statements.

This handles complex cases like: - Comments containing semicolons - Strings containing semicolons - CREATE RULE statements with embedded queries

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.split "SELECT 1; SELECT 2; SELECT 3"
# => Ok ["SELECT 1", "SELECT 2", "SELECT 3"]

PgParser.split "SELECT 'a;b'; SELECT 1"
# => Ok ["SELECT 'a;b'", "SELECT 1"]  -- semicolon in string handled

statement-count

Count the number of statements in SQL.

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.statement-count "SELECT 1; SELECT 2"
# => Ok 2

utility?

Check if a single SQL statement is a utility statement.

Utility statements are DDL and administrative commands like: - CREATE, ALTER, DROP - GRANT, REVOKE - VACUUM, ANALYZE, REINDEX - SET, SHOW, RESET - COPY, TRUNCATE - LISTEN, NOTIFY, UNLISTEN

Non-utility (DML) statements: - SELECT, INSERT, UPDATE, DELETE, MERGE

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.utility? "CREATE TABLE users (id INT)"
# => Ok true

PgParser.utility? "SELECT * FROM users"
# => Ok false

classify-statements

Classify multiple statements as utility or DML.

Returns a list of booleans, one per statement, indicating whether each statement is a utility statement.

Parameters:

Returns:

String -> Result (List Bool) SqlParseError

PgParser.classify-statements "CREATE TABLE t (id INT); SELECT * FROM t"
# => Ok [true, false]

tables

Extract all table names referenced in SQL.

Returns unique table names from the query, including: - Tables in FROM clauses - Tables in JOIN clauses - Tables in subqueries - Tables in INSERT/UPDATE/DELETE targets

Schema-qualified names are returned as "schema.table".

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.tables "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["users", "orders"]

PgParser.tables "SELECT * FROM public.users"
# => Ok ["public.users"]

columns

Extract all column references from SQL.

Returns unique column names from the query, including: - Columns in SELECT list - Columns in WHERE clause - Columns in ORDER BY, GROUP BY, HAVING - Columns in JOIN conditions

Qualified column names are returned as "table.column" or "schema.table.column". Star (*) columns are excluded.

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.columns "SELECT id, name FROM users WHERE active = true"
# => Ok ["id", "name", "active"]

PgParser.columns "SELECT u.id, u.name FROM users u"
# => Ok ["u.id", "u.name"]

statement-type

Detect the type of a single SQL statement.

Returns a human-readable statement type like: - "SELECT", "INSERT", "UPDATE", "DELETE", "MERGE" (DML) - "CREATE TABLE", "CREATE INDEX", "CREATE VIEW" (DDL) - "ALTER TABLE", "DROP", "TRUNCATE" (DDL) - "GRANT", "REVOKE", "CREATE ROLE" (DCL) - "BEGIN", "COMMIT", "ROLLBACK" (TCL) - "EXPLAIN", "VACUUM", "ANALYZE" (Utility)

Parameters:

Returns:

String -> Result String SqlParseError

PgParser.statement-type "SELECT * FROM users"
# => Ok "SELECT"

PgParser.statement-type "CREATE TABLE users (id INT)"
# => Ok "CREATE TABLE"

statement-types

Detect the types of multiple SQL statements.

Returns a list of statement types, one per statement.

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.statement-types "SELECT 1; INSERT INTO t VALUES (1); DROP TABLE t"
# => Ok ["SELECT", "INSERT", "DROP"]

deparse

Convert SQL to its canonical form by parsing and deparsing.

This round-trips the SQL through PostgreSQL's parser: 1. Parse the SQL to an internal AST (protobuf format) 2. Deparse the AST back to SQL

The resulting SQL is in PostgreSQL's canonical format: - Consistent formatting and whitespace - Explicit parentheses where needed - Normalized identifier casing

Use cases: - Normalizing SQL formatting - Validating SQL can be round-tripped - Comparing SQL queries for semantic equality - Pretty-printing SQL

Parameters:

Returns:

String -> Result String SqlParseError

PgParser.deparse "select ID,NAME from USERS where ID=1"
# => Ok "SELECT id, name FROM users WHERE (id = 1)"

functions

Extract all function calls from SQL.

Returns unique function names from the query, including: - Aggregate functions (COUNT, SUM, AVG, MIN, MAX) - Scalar functions (COALESCE, NULLIF, GREATEST, LEAST) - Date/time functions (NOW, CURRENT_TIMESTAMP, DATE_TRUNC) - String functions (UPPER, LOWER, CONCAT, SUBSTRING) - User-defined functions

Schema-qualified function names are returned as "schema.function_name".

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.functions "SELECT COUNT(*), MAX(price) FROM orders WHERE created_at > NOW()"
# => Ok ["count", "max", "now"]

PgParser.functions "SELECT public.my_func(id) FROM users"
# => Ok ["public.my_func"]

ctes

Extract all CTE names from SQL.

CTEs (Common Table Expressions) are defined with the WITH clause. This function extracts the names of all CTEs in a query.

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.ctes "WITH active AS (SELECT * FROM users WHERE active) SELECT * FROM active"
# => Ok ["active"]

PgParser.ctes "WITH a AS (SELECT 1), b AS (SELECT 2) SELECT * FROM a, b"
# => Ok ["a", "b"]

has-cte?

Check if SQL contains any CTEs (WITH clause).

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-cte? "WITH x AS (SELECT 1) SELECT * FROM x"
# => Ok true

PgParser.has-cte? "SELECT * FROM users"
# => Ok false

subquery-count

Count the number of subqueries in SQL.

Subqueries are nested SELECT statements that appear in: - WHERE clause (IN, EXISTS, scalar subqueries) - FROM clause (derived tables) - SELECT list (scalar subqueries)

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.subquery-count "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok 1

PgParser.subquery-count "SELECT * FROM (SELECT * FROM a) x WHERE x.id IN (SELECT id FROM b)"
# => Ok 2

has-subquery?

Check if SQL contains any subqueries.

A convenience function that returns a boolean instead of the count.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-subquery? "SELECT * FROM users WHERE id IN (SELECT 1)"
# => Ok true

PgParser.has-subquery? "SELECT * FROM users"
# => Ok false

joins

Extract join types from SQL.

Returns a list of join types found in the query: - "INNER" - INNER JOIN or just JOIN - "LEFT" - LEFT JOIN or LEFT OUTER JOIN - "RIGHT" - RIGHT JOIN or RIGHT OUTER JOIN - "FULL" - FULL JOIN or FULL OUTER JOIN - "CROSS" - CROSS JOIN

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.joins "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["INNER"]

PgParser.joins "SELECT * FROM a LEFT JOIN b ON a.id = b.a_id RIGHT JOIN c ON b.id = c.b_id"
# => Ok ["RIGHT", "LEFT"]

join-count

Count the number of joins in SQL.

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.join-count "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok 2

has-join?

Check if SQL contains any joins.

A convenience function that returns a boolean instead of the full list.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-join? "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok true

PgParser.has-join? "SELECT * FROM users"
# => Ok false

table-aliases

Extract table aliases from SQL.

Table aliases are defined with the AS keyword or implicitly: - FROM users AS u or FROM users u - JOIN orders AS o or JOIN orders o

Returns aliases in the format "alias:table" (e.g., "u:users").

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.table-aliases "SELECT * FROM users u JOIN orders o ON u.id = o.user_id"
# => Ok ["u:users", "o:orders"]

column-aliases

Extract column aliases from SQL.

Column aliases are defined with the AS keyword: - SELECT id AS user_id - SELECT name AS full_name

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.column-aliases "SELECT id AS user_id, name AS full_name FROM users"
# => Ok ["user_id", "full_name"]

has-alias?

Check if SQL contains any aliases.

A convenience function that returns a boolean indicating whether the query uses any table or column aliases.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-alias? "SELECT * FROM users u"
# => Ok true

PgParser.has-alias? "SELECT * FROM users"
# => Ok false

schemas

Extract schema names from SQL.

Schemas are found in schema-qualified identifiers: - public.users → "public" - sales.orders → "sales"

Returns unique schema names.

Parameters:

Returns:

String -> Result (List String) SqlParseError

PgParser.schemas "SELECT * FROM public.users JOIN sales.orders ON ..."
# => Ok ["public", "sales"]

has-schema?

Check if SQL contains any schema-qualified identifiers.

A convenience function that returns a boolean indicating whether the query uses any schema-qualified table or function names.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-schema? "SELECT * FROM public.users"
# => Ok true

PgParser.has-schema? "SELECT * FROM users"
# => Ok false

params

Extract all positional parameter numbers from SQL.

PostgreSQL uses $1, $2, $3, etc. for positional parameters. This function extracts all unique parameter numbers from a query, sorted in ascending order.

Parameters:

Returns:

String -> Result (List Int) SqlParseError

PgParser.params "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok [1, 2]

PgParser.params "SELECT * FROM users WHERE id = $3 AND name = $1"
# => Ok [1, 3]  -- sorted

param-count

Count the number of unique positional parameters in SQL.

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.param-count "SELECT * FROM users WHERE id = $1 AND status = $2"
# => Ok 2

param-max

Get the maximum parameter number in SQL.

This is useful for knowing how many parameters need to be provided. For example, if a query uses $1 AND $3 (skipping $2), this returns 3.

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.param-max "SELECT * FROM users WHERE id = $1 AND name = $5"
# => Ok 5  -- max is 5, even though $2, $3, $4 are missing

has-params?

Check if SQL contains any positional parameters.

A convenience function that returns a boolean indicating whether the query uses $N style positional parameters.

Parameters:

Returns:

String -> Result Bool SqlParseError

PgParser.has-params? "SELECT * FROM users WHERE id = $1"
# => Ok true

PgParser.has-params? "SELECT * FROM users WHERE id = 1"
# => Ok false

QueryComplexity

Query complexity breakdown with individual metrics and total score.

Fields: - tables: Int - Number of tables referenced - joins: Int - Number of JOIN operations - subqueries: Int - Number of subqueries - ctes: Int - Number of CTEs (WITH clauses) - functions: Int - Number of function calls - score: Int - Weighted complexity score

Weights used for score calculation: - tables: 1 (base unit) - joins: 2 (multiplies row combinations) - subqueries: 3 (nested execution, most expensive) - ctes: 2 (similar to subqueries but often optimized) - functions: 1 (adds computation)

Variants

QueryComplexity {tables, joins, subqueries, ctes, functions, score}

complexity

Calculate query complexity with detailed breakdown.

Returns a QueryComplexity record containing individual metrics and a weighted total score. Higher scores indicate more complex queries.

Parameters:

Returns:

String -> Result QueryComplexity SqlParseError

PgParser.complexity "SELECT * FROM users"
# => Ok (QueryComplexity {tables: 1, joins: 0, subqueries: 0, ctes: 0, functions: 0, score: 1})

PgParser.complexity "SELECT * FROM a JOIN b ON a.id = b.a_id JOIN c ON b.id = c.b_id"
# => Ok (QueryComplexity {tables: 3, joins: 2, subqueries: 0, ctes: 0, functions: 0, score: 7})

PgParser.complexity "SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)"
# => Ok (QueryComplexity {tables: 2, joins: 0, subqueries: 1, ctes: 0, functions: 0, score: 5})

complexity-score

Calculate query complexity score (simple version).

Returns just the weighted complexity score as an integer. Use complexity for a detailed breakdown of individual metrics.

Parameters:

Returns:

String -> Result Int SqlParseError

PgParser.complexity-score "SELECT * FROM users"
# => Ok 1

PgParser.complexity-score "SELECT * FROM a JOIN b JOIN c"
# => Ok 7