Data lineage based on SQL

27 April, 2016 14:03:19 PM · Martin

Data lineage is essential for understanding data flows in your BI ecosystem (ETL jobs, SQL based reporting, ). A proper data lineage helps your developers to quickly identify impact of the change in SQL while keeping the audit trail behind. Finally BI end users are bound to have an access to data lineage as well so they can successfully leverage the value of data itself.

Many Data Warehousing teams have ongoing problems to properly maintain the data lineage without overburdening the developers and slowing down the development cycle. Especially when the ETL jobs are based on hand coded SQL or stored procedures. Attempts to maintain the lineage on a column level are failing in 99% of the cases and the lineage gets broken and ends up in unusable state. This is due to the lack of proper tooling and automation.

Major vendors like Oracle and Informatica are trying to tackle the data lineage problem with providing ETL frameworks. Two issues however prevail: (a) you still have to invest time to model the data lineage before the implementation, (b) to speed things up developers often resort to hand coded Views or SQL thus breaking the data lineage anyhow.

At SQLdep we solve the pain with SQL based data lineage in a very unique way. We fully automate the process of documenting and maintaining the data lineage. It takes 5x more time to document an ETL job manually then to develop it. Through reverse engineering of SQL code we generate the data lineage and provide neatly looking and interactive documentation.

Proper data lineage is a key step in controlling data governance in your BI ecosystem.

Read a case study

SQLdep saved GE Money twelve months of work. GE Money was searching for ways to make the development and administration of its 20TB data warehouse faster and more efficient.