SQL code parsing

15 August, 2016 10:13:15 AM · Martin

A slight technical insight about how we reverse engineer the SQL code. It is to give you an overview of what is behind SQL code parsing and analysis. Those two parts are often called syntactic and semantic analysis.


Syntactic analysis

First you need the input for the analysis. In our case this means SQL standard representing DML or DDL queries. To successfully process it you need to have a parser. Parser basically breaks down your SQL code into tokens (“words”) and assigns a basic meaning to it.

The parser needs to understand the SQL language. This is done by implementing a set of rules which represents the SQL grammar. Although there is ANSII SQL, almost every database deviates from this standard a bit. This means you have to have as many grammars for as many databases you wish to support.

Result of syntactic analysis is a static structure called Abstract Syntax Tree. And it is what you traverse in the next part.


Semantic analysis

From an engineering standpoint this is where all the fun begins. Here you interpret the SQL code itself. Are you interested to know what columns are used in join conditions? Semantic analysis gives you power to analyse every SQL query and find all the metadata hidden in it.

And the beauty is you can analyse a single query in milliseconds. Suddenly you have the option to process thousands of queries at once and obtain a complete data lineage in your database. Very useful for example for undocumented data warehouses and their ETL.

One particular use case we like is highlighting all relevant lines in SQL query. It greatly speeds ups understanding of what query does. And you are less prone to make mistakes when coding your SQL.

Feel free to parse your own SQL query in our SQL parser -- https://sqldep.com

Read a case study

SQLdep saved GE Money twelve months of work. GE Money was searching for ways to make the development and administration of its 20TB data warehouse faster and more efficient.