Start a new DB instance and restore all production data from latest backup (you are doing regular backups riiight?). Serves two purposes: enabling debugging of production data without actually using production + verifying that your production backups/restores actually work in practice, not just in theory.
If the database is too big, just use last N amount of rows from the database instead of a full dump.
Bonus points if you log actual SQL queries that happen in production to a log file, so you can then re-run last X queries against your new test-production database and log the total run time. After doing bunch of optimizations, you can re-run your little benchmark and see how it compares.
Be careful with the last n rows approach, you have to have an understanding of how the query scales to do this effectively (ie. your n rows are at an appropriate scale to optimise). I assume that if you are at a small enough scale that you are asking this question, restoring the whole backup is the best option.
What do you guys use for large postgresql backups? Pgdump? Setting up replicas? Coming from ms sql, backups seem to be a pain in postgresql world, esp on a live database that is receiving updates.
Yeah, if your database is under 1TB, it's relatively easy and usable to just clone the full thing. But above that, it starts becoming a hassle and you might need to strip away some stuff you don't care about (and won't affect the query plans), otherwise it can take a long time to backup/restore into new instance.
If the database is too big, just use last N amount of rows from the database instead of a full dump.
Bonus points if you log actual SQL queries that happen in production to a log file, so you can then re-run last X queries against your new test-production database and log the total run time. After doing bunch of optimizations, you can re-run your little benchmark and see how it compares.