Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Start a new DB instance and restore all production data from latest backup (you are doing regular backups riiight?). Serves two purposes: enabling debugging of production data without actually using production + verifying that your production backups/restores actually work in practice, not just in theory.

If the database is too big, just use last N amount of rows from the database instead of a full dump.

Bonus points if you log actual SQL queries that happen in production to a log file, so you can then re-run last X queries against your new test-production database and log the total run time. After doing bunch of optimizations, you can re-run your little benchmark and see how it compares.



This is how its done.

Be careful with the last n rows approach, you have to have an understanding of how the query scales to do this effectively (ie. your n rows are at an appropriate scale to optimise). I assume that if you are at a small enough scale that you are asking this question, restoring the whole backup is the best option.


What do you guys use for large postgresql backups? Pgdump? Setting up replicas? Coming from ms sql, backups seem to be a pain in postgresql world, esp on a live database that is receiving updates.


Our hosted postgres instance (GCP) gets nightly backups automatically. Restoring to a new instance is a few buttonclicks in the UI.

This is infrastructure level stuff that someone else should run.


pgbackrest and/or wal-e/wal-g btw. it's important to make wal archive backups if you have a big database. https://www.postgresql.org/docs/14/continuous-archiving.html

so most of the time, both things are good.

there is also:

- Barman

- pg_probackup

(P.S: with archiving you can even have a "delayed" downstream cluster)


Not from Heap, but I use https://github.com/wal-g/wal-g


> If the database is too big, just use last N amount of rows from the database instead of a full dump.

For SQL optimization purposes, you cannot do that – there are high chances you'll get different query plans.


Yeah, if your database is under 1TB, it's relatively easy and usable to just clone the full thing. But above that, it starts becoming a hassle and you might need to strip away some stuff you don't care about (and won't affect the query plans), otherwise it can take a long time to backup/restore into new instance.


But in general case, such clones are not suitable for query optimization tasks.

And as I already mentioned in my other comments here, thin cloning is a great way to solve the speed (and money) problem for larger DBs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: