Are the obvious reasons that there is so much more activity in GB and RU than other parts in europe others than one can assume (brexit, sanctions)?
I would love to read about changing patterns because of real life events. (Like the rumour that american water plants have problems during superbowl commercials because of toilet flushing but in a larger scale)
This is great. I am very curious about the architectural decisions you've taken here. Is there a blog post / article about them? 80 yrs of historical data -- are you storing that somewhere in PG and the APIs are just fetching it? If so, what indices have you set up to make APIs fetch faster etc. I just fetched 1960 to 2022 in about 12 secs.
Traditional database systems struggle to handle gridded data efficiently. Using PG with time-based indices is memory and storage extensive. It works well for a limited number of locations, but global weather models at 9-12 km resolution have 4 to 6 million grid-cells.
I am exploiting on the homogeneity of gridded data. In a 2D field, calculating the data position for a graphical coordinate is straightforward. Once you add time as a third dimension, you can pick any timestamp at any point on earth. To optimize read speed, all time steps are stored sequentially on disk in a rotated/transposed OLAP cube.
Although the data now consists of millions of floating-point values without accompanying attributes like timestamps or geographical coordinates, the storage requirements are still high. Open-Meteo chunks data into small portions, each covering 10 locations and 2 weeks of data. Each block is individually compressed using an optimized compression scheme.
While this process isn't groundbreaking and is supported by file systems like NetCDF, Zarr, or HDF5, the challenge lies in efficiently working with multiple weather models and updating data with each new weather model run every few hours.
This is great but doesn't work for applications that don't have the luxury of the internet. What are some decent map alternatives for air-gapped (no internet access) applications.
On Android (and iPhone I guess) OsmAnd is a navigation app that works offline (with map data from OpenStreetMap) and exposes an API as well as examples on how to use its core in other apps [1].
Honest question: why is SQLLite needed for local? Why would you not have PG at edge that replicates data with central PG? That way the SQL dialect problem you mentioned wouldn't exist.
That is a much safer way to go for most use cases. Well actually, most use cases don't need edge compute at all, but for those that do, this setup is indeed common, and fine for most apps:
- Say we do edge compute in San Francisco, Montreal, London and Singapore
- Set up a PG master in one place (like San Francisco), and read replicas in every place (San Francisco, Montreal, London and Singapore)
- Have your app query the read replica when possible, only going to the master for writes
In rare cases, maybe any network latency is not OK, you really need an embedded DB for ultimate read performance - then this is pretty interesting. But a backend server truly needing an embedded DB is certainly a rare case. I would imagine this approach would come with some very major downsides, like having to replicate the entire DB to each app instance, as well as the inherent complexity/sketchiness of this setup, when you generally want your DB layer to be rock solid.
This is probably upvoted so high on HN because it's pretty cool/wild, and HN loves SQLite, vs. it being something many ppl should use.
SQLite is much smaller and self-contained than postgres. It's written in ANSI-C and by including one file you have access to a database (which is stored in another single file). It's popular in embedded systems like, I imagine, edge devices
Fair enough; I didn't realize that ":memory:" SQLite accesses did zero syscalls overall, I had assumed that they required shared memory writes that entailed syscalls for barriers and the like.
I'm happy you helped me learn that's not the case! That'll make several things I regularly need to do much, much easier.
The overhead of IPC isn't significant here, unless there's some special use case I'm not thinking of. SQLite might still be faster for small queries for other reasons.
We have multiple roles across the three disciplines. The company has grown 150% since last year. We offer meaningful equity, great work life balance (no on-call) + benefits, and hard problems to solve with wide impact.
Raft (https://goraft.tech/) | Sr. Engineers (Multiple Roles) | Full Time | Remote (US Based)
Raft partners with the DoD to work on three core things:
- Distributed Data Systems (Kafka, Flink, Airflow, Kubeflow)
- Platforms at Scale (K8s, Hardware in the loop, QEMU emulation, C++/Rust)
- Reactive App Dev (Scala, ReactJS, Python/Django)
We have multiple roles across three disciplines. The company has grown 150% since last year. We offer meaningful equity, great work life balance (no on-call) + benefits, and hard problems to solve with wide impact.
> Couch seems like a fantastic way to go for use cases involving syncing data between devices with an offline mode and syncing between clients. It was built from the ground up with replication in mind.
Have you come across any simple examples that show offline mode and syncing between clients with replication?
syncing between clients requires a network between clients and normally, clients only have connections to servers. But if you are in a situation where clients can open TCP connections to other clients, CouchDB can sync over that.