It’s early in March 2014, and I’m in Prague, Czech Republic - one of the global capitals for delicious beer, beautiful architecture, and a growing technology community. I’m speaking with Martin “Masi” Masarik, and Miroslav “Madhouse” Semora about SQL development, data analysis, and their views on the SQL space, including their reasoning for devoting the last year of their lives to building SQLdep, a visualization and analysis tool for SQL developers and business intelligence analysts.
My name is David Booth, and I’m an advisor/mentor for startup companies. Previously, I was the CEO of ZeroTurnaround (which builds JRebel and LiveRebel for Java teams), and a marketing guy at Jetbrains (known for many developer tools). I'm really into the developer tools space, and I'm excited to start asking questions, so here we go!
How much experience did you have with SQL before building SQLdep?
Masi - My work background is tied to SQL coding very closely. I worked for GE Money Bank for 6 years on a Data-Warehousing team. During that period I worked as Data Analyst, DB developer and IT project manager which gave me a great opportunity to fully explore Business Intelligence from both an IT perspective (how to build & maintain a data warehouse) and as well the business side - experiencing the value that data brings to the company.
Madhouse - I’ve been a software developer for 20 years in C, Java, Perl, Groovy/Grails and Scala. My first commercial software, which I wrote when i was in college, was an SQL query that computed a stock portfolio value from actual rates. Since then, nearly every project I worked on, stored data in relational databases - from desktop database applications to core telco infrastructure.
What do you consider as a challenge when working with SQL?
Using SQL is pretty straight forward, but many DBAs would agree that a lot of complex queries are written pretty poorly. I mean in terms of performance and optimization. Usage of explain plan, knowledge of different join algorithms like hash join and nested loop joins should be more common. Although query optimizers are doing a better job than a few years back it still takes strong skills to perform well on very large datasets.
What inspired you to build SQLdep?
Masi - Laziness mostly. As a former data analyst I was tired of manually analyzing SQL over and over and over again. Although there are a bunch of tools to help you create SQL, surprisingly enough, there are close to none when you need to “reverse-engineer” your code. Or when you inherit somebody else’s code. I wanted to make my life easier.
Madhouse - Masi asked me to :)
Was there a “Eureka” moment when you realized that this is what you had to do?
Masi - We came up with the idea in winter time, so unlike Archimedes, we opted not to go streaking. Summer time might have been a different story.
On a serious note though, until that moment we just didn’t get why there is no mainstream tool like SQLdep. Such a tool should have existed a few years back. Feedback from the SQL community (and paying customers) pretty much proves that. They keep our focus on solving real-life problems, especially automating impact analysis, documenting data-lineage, etc.
In under 140 characters, can you describe SQLdep?
Masi - Must-have tool for anybody who works daily with SQL. We visualize what’s happening inside your code and can save you 30 minutes every day.
Later we tested this, and with spaces, he actually hit 138 characters. Nice!
That sounds marketingy. What does SQLdep really do?
Masi - We do syntactic and semantic analysis of SQL code. The database does that to understand how to manipulate the data, but we do it in our tool to find and store dependencies of that SQL. The more complex a query (or series of queries) the more time you spend when making changes. Because we instantly provide data-lineage and impact analysis, we save you time. For example - let’s say you have a hand-coded ETL process.. with SQLdep, you can have up-to-date and reliable documentation every day. Or maybe you want to migrate that ETL into some other workflow tool -- with our metadata we can save you a lot of a headache. Our biggest customer has a batch of 5000 SQLs he runs everyday and the value we bring to the table is through the roof.
What is your biggest accomplishment working with SQL?
Masi - Not dropping database tablespace by accident ;)
Last Question - What are your thoughts on SQL vs NoSQL vs NewSQL?
Masi - Firstly, I’d suggest not to get too fancy with titles like SQL vs NoSQL vs NewSQL - the focus should be on getting the job done. Since it’s a complex topic, I’ll just scratch the surface.
Let’s break it down into relational databases vs other types (key-value store, document, graph, etc). Now, if you start considering the different scenarios in which you need to store and access data, relational databases would fit well in the vast majority of cases. The more specialized use-case you come up with, the bigger chance you have of utilizing a non-relational db.
For instance, we’ve considered Neo4j (graphDB) to store our metadata after we parse and analyze an SQL query. We wanted to store *any* dependency we come across. Then in real-time we would be able to answer crazy questions like (information overkill follows!): “What queries read columns from table customers, then use any of those columns in group by clause while count(*) is nested within a function?”. So yeah, I believe we could have done it in graphdb. In reality we’ve greatly simplified our logical model and ended up with.. wait for it -- SQLite. It’s an awesome, fast and tiny relational database (binary is about 500kB).