The BI team in Makro faced a challenge with keeping track of the data lineage on their Teradata DWH. Just on the Czech market the company's data grew to 1.5TB in size and the data spread over 10,000 tables and views. It proved more and more difficult to perform impact analysis over multiple data layers, i.e. from the staging layer up to the reporting layer. The senior developer had to manually trace the lineage across the ETL job.
The ETL itself is based on hand-coded SQL transformations which are scattered in materialised tables, views and Teradata macros. The BI team set up the following requirements:
- data lineage has to be tracked across all layers, i.e. from staging to reporting
- analysis needs to be fully automated for any valid Teradata SQL query (including macros)
- unused columns or tables have to be detected to help with DWH cleansing
- the complete data lineage analysis has to be computed in less than 15 minutes
To evaluate the mission we have asked Jakub Chlupis, the BI team lead in Makro, for a quick interview:
How long have you been using SQLdep?
Close to 6 months now.
What was your motivation to use a service for documenting your ETL/DWH?
To decrease the number of incidents and untie the hands of our developers. With increasing Data Warehouse complexity it became unbearable how much time our developers had to spend on manual SQL code analysis.
The other key expectation was to considerably decrease the number of incidents stemming from changing a higher level database view. It was really easy to overlook nested data lineage.
Can you share what impact you have noticed on the DWH incidents?
We have noticed about a 20% decrease in incidents.
One of your requests was very specific. The complete data lineage analysis has to finish in 15 minutes. Can you share why?
This is a request from our dev team to support work-in-progress phases. Fast analysis of our ETL jobs gives us flexibility to run checks during the development process.
What was your evaluation process before deploying SQLdep?
Before committing we asked for a Proof of Concept and verified the service on samples of our ETL / SQL code.
How difficult was the integration of SQLdep to your IT environment?
It took us about 4 hours to automate the uploads and we haven’t touched anything since. The service runs in the cloud and all we need to do now is to open a browser window and see the lineage ready for us.