SQLdep joins Collibra to build the future of Data Lineage Learn more

SQL Parser

About SQLdep and SQL parsing

SQLdep is a very powerful SQL Parser with a unique proposal—we guarantee to fully support your SQL syntax.

The SQL standard has been here for decades and yet reliable SQL parsers and analysers are almost non-existent. Why? Well, developing a functional parser/analyser is a tough nut to crack as SQL syntax varies in many ways.

We mitigate the risk of unsupported syntax by a guarantee to implement any syntax enhancement required. To educate our tech-savvy users a quick introduction to SQL parsing follows.

Reverse engineering SQL code

Two main steps are needed to reverse engineer SQL code:

  1. syntactic analysis which breaks input SQL statement into tokens according to SQL grammar
  2. semantic analysis which assigns meanings to tokens such as "this token is a column belonging to this table"

Step 1: Syntactic analysis explained

Syntactic analysis actually consists from two sub-steps, that is a lexical analysis and parsing itself.

Lexical analysis

Consider the following simple SQL query:


Lexer's job is to take the input data and create tokens from it. Input is read character by character and when empty space is encountered the token is created.

1st token SELECT
2nd token ACC_ID
3rd token FROM
4th token ACCOUNTS

Lexer enriches every token with additional information like its position in the input data. It also verifies whether token is a keyword or constant (literal or number).

Parsing phase

Parser's job is to provide more detailed information about each token. In order to parse SQL syntax a SQL grammar needs to be embedded into the parser. Each database vendor defines its own grammar and thus the nuances in SQL syntax exists across different database dialects.

SQL parser processes the input and the result is stored in a tree structure. To be precise, the structure is called Abstract Syntax Tree which is afterwards crawled by analyser in a standalone step.

Step by step Syntactic analysis on a example


Consider following grammar while only 2 commands are allowed.

Lexer prepares tokens

1st token DROP
2nd token TABLE
3rd token IF
4th token EXISTS
5th token CUSTOMERS

Parser (top-down)

  1. takes token DROP and identifies that both commands are eligible
  2. takes token TABLE and thus discards DROP TRIGGER and selects DROP TABLE
  3. takes token IF and mark it as keyword only
  4. takes token EXISTS and mark it as keyword only
  5. takes token CUSTOMERS and mark it is the table identifier

In other words, every token in SQL query was assigned its basic meaning. For example the first token DROP helped us to identify type of the statement. Each token from the SQL query gets processed and is stored as a tree node.

Additional properties are set for each node such as the position, its type like an identifier or a keyword, etc. The syntactic analysis is finished once all tokens are processed and the Abstract Syntax Tree is constructed.

Step 2: Semantic analysis explained

Analyser is responsible for crawling the Abstract Syntax Tree and thus uncovering relations between the nodes.

Let's consider the following example:


To illustrate the analyser's job let's assume it is currently processing a tree node which represents the column ACC_ID. Semantic rules are applied in the following manner:

  1. it start seeking whether table alias exists for column ACC_ID
  2. alias A is discovered and needs to be resolved. Thus the analyser jumps to another part of tree containing a FROM clause
  3. it locates alias A within the FROM clause and discovers the dependency to table ACCOUNTS
  4. another look up is done to resolve table owner (schema) which is DWH

With the complex queries containing nested sub-selects, analytical functions and especially when procedural code is used the complexity grows rapidly. Not mentioning the need to capture differences between different database dialects and its versions.

At SQLdep we are committed to provide you with the most advanced and reliable SQL parser regardless the complexity and the caveats of reverse engineering.

Do you have a question?

Our tool has already analysed millions of lines of code for customers all around the globe. We will be happy to answer any of your questions or share our expertise with you.