Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It does seem like it should be faster, but this isn't a vanilla export/import. He is going from unstructured to (I assume) structured data. So he needs to deduplicate and possibly do other operations on the data before importing


What about importing the unstructured data into one table in Postgres and querying that table to populate a structured schema on the same Postgres server? It shouldn't be as IO bound, as its copying data from Postgres to another table rather than from Mongo. Can anyone tell me if that would likely bring a performance improvement?


Probably, when dealing with IO bound problems it's important to think of IO as a really long pipe extremely fat pipe. Think oil pipeline.

The key is keeping the pipe full, requesting a piece of data based on what's in the pipe will mean that it's going to be hard to keep the pipe full. You want to request the data long before you're ever going to need it.

If your SQL queries are done in such a way that they don't depend on data in the pipe they'll work fast, if they aren't you'll be not much better off than whatever was currently in place.


That's usually part of the ETL pipeline. If you look at SSIS and spring integration for example it has the ability to do this in the pipeline.


You mean import the raw data into a json column and then SELECT DISTINCT INTO?

There are only 50 million rows... this sounds like a 5 minute query.


I was assuming he was moving the json into a structured foreign key-> data type of relationship


It was actually a combination of several queries in the first database followed by several inserts into postgresql.


Ratio of query time vs insert time?


i'm curious as well - even if the data was only somewhat structured on the mongo side, an import into a single column table with JSON should be fairly quick with \COPY, then with a couple of indexes applied on your disparate data in the JSON data you should have been able to query that into a new table structure quite easily.

just wondering what the tradeoffs were in OP's eyes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: