I also agree that relying on database schema is a show stopper. First counterpoi...

JoelJacobson · on Dec 31, 2021

> First counterpoint: As long as data type information isn't completely messed up, I can dump excel spreadsheets in a database, or dump database CSVs in a data lake, and start querying them right away using SQL with complex joins using auto-completion from the dataset alone.

In your example, all you have is data and no foreign keys, that's the show stopper? That means you have all the relationships in your head and that's how you can write complex joins right away? Sure, if that's the case, then you can't use foreign keys since you don't have any. Don't see how this would be a counterpoint though. There is nothing forcing you to use JOIN FOREIGN, you could just do what you describe. But I'm sure you are aware many databases have foreign keys for all relationships to enforce referential integrity. I should have mentioned in the proposal, the scope is limited to such databases.

I enjoyed your example though. I want to share a similar example. It happened to me at least a few times, I've had to deal with data, shipped as multiple CSV files, but without any schema at all. What I tend to do then is to quickly write a very loose data model with mostly text columns, to accept any values. Once the CSVs are in SQL, I can then clean up the data step by step, by inspecting the tables and converting the text columns to proper data types. Next, when suspecting some column(s) in some table seem to be referencing some other column(s) in some other table, based on the content of the columns in both tables, I then try to add a FOREIGN KEY with a suitable name between such column(s). If successful, we know there is referential integrity between the columns, and we know also have a name to describe such relationship. Win-win! Otherwise if the foreign key could not be created, I investigate what rows that only appear in the referencing table that are not present in the referenced table, using a NOT EXISTS (...) query. If the extra rows can safely be deleted, such as if e.g. forgetting to handle empty string values as NULL values, I can then try to create the foreign key again.