Very cool approach! We build something super similar, also going for content addressed storage and compare&swap as fundamental primitives.
Also commit dag based, but we also wrote this whole knowledge graph / triple-store CRDT data format on top.[1]
We also have p2p syncing of the history so you can use it to track your local work but also to have your agents coordinate within your team.
We had our agents build their own tools on top of that substrate, that way we're vendor independent, this stuff works everywhere from claude web, to self hosted openclaw, you only need to tell your agent to use the faculties.
Because the substrate takes care of everything, every new faculty you write on top of that inherits all of the same properties.
Working on exactly that!
We're local first, but do distributed sync with iroh.
Written in rust and fully open source.
Imho having a graph database that is really easy to use and write new cli applications on top of works much better. You don't need strong schema validation so long as you can gracefully ignore what your schema doesn't expect by viewing queries as type/schema declarations.
Triblespace, while also a data exchange standard like RDF, is closer to Datascript or Datomic. It's a Rust library, and great care has been taken to give it extremely nice DX.
In-memory datasets are cheaply clonable, and support efficient set operations. There's macros that integrate fully into the type system to perform data generation and queries.
// The entity! macro returns a rooted fragment; merge its facts into
// a TribleSet via `+=`.
let herbert = ufoid();
let dune = ufoid();
let mut library = TribleSet::new();
library += entity! { &herbert @
literature::firstname: "Frank",
literature::lastname: "Herbert",
};
library += entity! { &dune @
literature::title: "Dune",
literature::author: &herbert,
literature::quote: ws.put(
"I must not fear. Fear is the mind-killer."
),
};
ws.commit(library, "import dune");
// `checkout(..)` returns a Checkout — a TribleSet paired with the
// commits that produced it, usable for incremental delta queries.
let catalog = ws.checkout(..)?;
let title = "Dune";
// Multi-entity join: find quotes by authors of a given title.
// `_?author` is a pattern-local variable that joins without projecting.
for (f, l, quote) in find!(
(first: String, last: String, quote),
pattern!(&catalog, [
{ _?author @
literature::firstname: ?first,
literature::lastname: ?last
},
{ _?book @
literature::title: title,
literature::author: _?author,
literature::quote: ?quote
}
])
) {
let quote: View<str> = ws.get(quote)?;
let quote = quote.as_ref();
println!("'{quote}'\n - from {title} by {f} {l}.");
}
Data has a fully tracked history like in terminus, but we are overall more CRDT-like with multiple scopes of transactionality.
You can store stuff in either S3 or a single local file (for the local file you can union two databases by concatenating them with `cat`).
We also have just recently added sync through Iroh.
The core idea and main difference between RDF is that RDF is text based and weakly typed, we are binary and strongly typed.
We split everything into two basic structures:
- the tribles (a pun on binary triple), 64byte units that are split into [16byte entity id | 16byte attribute id | 32byte Value] where the first two are basically high entropy identifiers like UUIDs, and the last is either a Blake3 hash, or an inlined <32byte value, with the type being disambiguated by metadata on the attribute ID (itself represented as more tribles)
- blobs, content addressed, arbitrary length
It's pretty easy to see why canonical representations are pretty easy for us, we just take all of the tribles, sort them lexicographically, dedup them, store the resulting array in a blob. Done.
Everythign else is build up from that. Oh and we also have succinct datastructures, but because those are dense but slower, and immutable, we have a custom 256-ary radix trie to do all of the immutable set operations.
The query engine is also custom, we don't have a query planner which gives us 0.5-2.5microseconds of latency for queries depending on the number of joins, with a query engine that is fully extensible via traits in rust.
Fair, it actually started out in JS, moved to Deno, then Zig and ended in Rust.
If I ever find the time I'd like to back port what I have now, up the chain.
It is supposed to be a RDF replacement so it will eventually have to happen, but it's hard work to make everything extremely idiomatically integrated into the host language.
I build a bunch of tools[1] that give the agents means to communicate and coordinate between each other, with me mostly just deciding what goes into the backlog, and checking their summaries whenever they finish a task.
It's surprising to me how well these go together, the transient concept in clojure is essentially a &mut, couple that with a reference counter check and you get fast transient mutations with cheap persistent clone.
All of rusts persistent immutable datastructure libraries like im make use of this for drastically more efficient operations without loss in capability or programming style.
I used the same principle for my rust re-imagination of datascript [1], and the borrow checker together with some merkle dag magic allows for some really interesting optimisations, like set operations with almost no additional overhead.
Which allows me to do stuff like not have insert be the primary way you get data into a database, you simply create database fragments and union them:
let herbert = ufoid();
let dune = ufoid();
let mut library = TribleSet::new();
library += entity! { &herbert @
literature::firstname: "Frank",
literature::lastname: "Herbert",
};
library += entity! { &dune @
literature::title: "Dune",
literature::author: &herbert,
literature::quote: ws.put(
"I must not fear. Fear is the mind-killer."
),
};
ws.commit(library, "import dune");
The entity! macro itself just creates a TribleSet and += is just union.
In Clojure this would have been too expensive because you would have to make a copy of the tries every time, and not be able to reuse the trie nodes due to borrow checking their unique ownership.
The /clear nudge isn't a solution though. Compacting or clearing just means rebuilding context until Claude is actually productive again. The cost comes either way.
I get that 1M context windows cost more than the flat per-token price reflects, because attention scales with context length, but the answer to that is honest pricing or not offering it. Not annoying UX nudges.
What’s actually indefensible is that Claude is already pushing users to shrink context via, I presume, system prompt. At maybe 25% fill:
“This seems like a good opportunity to wrap it up and continue in a fresh context window.”
“Want to continue in a fresh context window? We got a lot of work done and this next step seems to deserve a fresh start!”
If there’s a cost problem, fix the pricing or the architecture. But please stop the model and UI from badgering users into smaller context windows at every opportunity. That is not a solution, it’s service degradation dressed as a tooltip.
The cost issues they're seeing (at least from what they've stated) are from users, not internally. Basically, it takes either $5 or $6.25 (depending on 5m or 1h ttl) to re-ingest a 1M context length conversation into cache for opus 4.6, that's obviously a very high cost, and users are unhappy with it.
I think 400k as a default seems about right from my experience, but just having the ability to control it would be nice. For the record, even just making a tool call at 1M tokens costs 50 cents (which could be amortized if multiple calls are made in a round), so imo costs are just too high at long context lengths for them to be the default.
In my experience a wiki can actually drastically reduce the amount of dead context.
I've handed my local agents a bunch of integrated command line tools (kinda like an office suite for LLMs), including a wiki (https://github.com/triblespace/playground/blob/main/facultie...
) and linkage really helps drastically reduce context bloat because they can pull in fragment by fragment incrementally.
Was also thinking to disambiguate context where you wish to express a tokens function (eg, top) as different from one could use unique ASCII prefix (eg, ∆top) to avoid pollution between the english and the linux binary.
Youd then alias these disambiguated terms and theyd still trigger the correct token autocomplete but would reduce overlap which cause misdirection.
Yeah you can sink a lot of time into a system like that[0].
I spend the years simplifying the custom graph database underneath it all and only recently started building it into tools that an agent can actually call[2]. But so far all the groundwork has actually paid off, the rooster basically paints itself.
I found a wiki to be a surprisingly powerful tool for an agent to have.
And building a bunch of CLI tools that all interconnect on the same knowledge graph substrate has also had a nice compounding effect. (The agent turns themselves are actually stored in the same system, but I haven't gotten around to use that for cool self-referential meta reasoning capabilities.)
Hasn't HN been traditionally a place where makers share the experience they had with building things?
Especially when you have someone working on autonomous research agents it doesn't seem that off to lament how much time you can sink into the underlying substrate. In my particular case the work started long before LLMs to make actual research easier, the fact that it can also be used by agents for research is just a happy accident.
then you seem to be somewhat blinded by your aversion to AI assisted engineering, because if https://github.com/triblespace/triblespace-rs is a "shitty vibecoded project", then I don't know what a good project actually looks like to you. That codebase has years of human blood sweat and tears in it, implements novel data-structures, has it's own WCO optimal join-algorithm, cutting edge succinct data-structures that are hand-rolled to supplement the former, new ideas on graph based RDF-like CRDTs, efficient graph canonicalisation, content addressing and metadata management, implements row types in rust, has really polished typed queries that seamlessly integrate into rusts type system, lockless left-right data structures, a single file database format where concatenation is database union, is orders of magnitude faster than similar databases like oxigraph... does it also have to cure cancer and suck you off to meet your bar?
I got 4 more github stars and someone dropping into the tiny tiny discord just from mentioning it, why do you think that is?
When was the last time you created something and put it out to the world? Your only big post on here is a lament of your wife not giving you children as if she was some expired carton of milk that owes you (that's something you discuss with your partner if you respect them and not strangers on the internet, and 39 is completely fine to have children as a woman - https://www.youtube.com/watch?v=6YIz9jZPzvo).
Even your critique isn't an act of creation, neither creative nor substantial and doesn't go beyond an egotistical "I don't like it when people post their project and share their experiences when AI is involved" on _social_ media.
Is there even something you're proud of enough to share and present, or is all this bitterness the result of envy for those that have?
“In many ways, the work of a critic is easy. We risk very little, yet enjoy a position over those who offer up their work and their selves to our judgment. We thrive on negative criticism, which is fun to write and to read. But the bitter truth we critics must face is that, in the grand scheme of things, the average piece of junk is probably more meaningful than our criticism designating it so. But there are times when a critic truly risks something, and that is in the discovery and defense of the new. The world is often unkind to new talent, new creations. The new needs friends. Last night, I experienced something new, an extraordinary meal from a singularly unexpected source. To say that both the meal and its maker have challenged my preconceptions about fine cooking is a gross understatement. They have rocked me to my core. In the past, I have made no secret of my disdain for Chef Gusteau's famous motto: "Anyone can cook." But I realize, only now do I truly understand what he meant. Not everyone can become a great artist, but a great artist can come from anywhere. It is difficult to imagine more humble origins than those of the genius now cooking at Gusteau's, who is, in this critic's opinion, nothing less than the finest chef in France. I will be returning to Gusteau's soon, hungry for more.”
I build a whole database around the idea of using the smallest plausible random identifiers, because that seems to be the only "golden disk" we have for universal communication, except for maybe some convergence property of latent spaces with large enough embodied foundation models.
It's weird that they are really under appreciated in the scientific data management and library science community, and many issues that require large organisations at the moment could just have been better identifiers.
To me the ship of Theseus question is about extrinsic (random / named) identifiers vs. intrinsic (hash / embedding) identifiers.
Also commit dag based, but we also wrote this whole knowledge graph / triple-store CRDT data format on top.[1]
We also have p2p syncing of the history so you can use it to track your local work but also to have your agents coordinate within your team.
We had our agents build their own tools on top of that substrate, that way we're vendor independent, this stuff works everywhere from claude web, to self hosted openclaw, you only need to tell your agent to use the faculties.
Because the substrate takes care of everything, every new faculty you write on top of that inherits all of the same properties.
1: https://github.com/triblespace/triblespace-rs
2: https://github.com/triblespace/faculties
reply