With "intelligence" (or whatever you want to call it) and speed both seeming to ramp up quickly with local models I wonder what the growth rate and ceiling(?) might be in this space. Will this kind of iq and performance work with just e.g: 16GB RAM in a couple years? Is there a new kind of Moore's law to be defined here?
Squeezing a model like this complete with 'big model smell' into 16GB...Honestly it's not even possible or feasibly possible today.
It'll require some kind of:
- breakthrough in architecture or
- breakthrough in hardware or
- some breakthrough quantisization technique
The problem is that all the parameters need to be in memory, even the ones that aren't active (say for Mixture Of Expert Models) because switching parametrs in and out of ram is far too slow.
We show that EMO – a 1B-active, 14B-total-parameter (8-expert active, 128-expert total) MoE trained on 1 trillion tokens – supports selective expert use: for a given task or domain, we can use only a small subset of experts (just 12.5% of total experts) while retaining near full-model performance."
The people working at the leading edge of this stuff seem to believe that there is a need for parallel models that solve different problems.
A crow exhibits some degree of intelligence in what is a very small brain compared to humans. There is overlap in the problem solving skills of the dumbest humans and the smartest crows.
So the question is: what is that? Yann LeCun seems to think it’s what we now call world models. World models predict behaviour as opposed to predicting structured data (like language.)
If your model can predict how some world works (how you define world largely depends on the size of your training data), then in theory it is able to reason about cause and effect.
If you can combine cause and effect reasoning with language, you might get something truly intelligent.
That’s where things seem to be going. Once we have a prototype of that system, there will be many questions about how much data you really need. We’ve seen how even shrinking LLMs with 1-bit quantization can lead to models that exhibit a fairly strong understanding of language.
I don’t think it’s unreasonable to expect to see some very intelligent low (relatively) memory AI systems in the next couple years.
byroot sets a great example sharing his code optimization expertise. His blog has many great improvements like this. A 7x improvement in Dir.join and similar calls?! Thank you, byroot!
"Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning."
>12. Restrictions apply. Some data is not transmitted through VPN.... See https://g.co/pixel/vpn for details.
Does anyone know what data doesn't go through the vpn?
On the positive side it lists a 24+ hour battery life!! This is huge for me!! ..but it has a footnote, as well
> 6. Battery life depends upon many factors and usage of certain features will decrease battery life. Actual battery life may be lower. Over time, Pixel software will manage battery performance to help maintain battery health as your battery ages. See https://g.co/pixel/battery-tests and https://g.co/pixel/batteryhealth for details.
I like my phone more, but battery life on hers is way better to the point I regret buying mine, it barely lasted a day out when on vacations, and I'm not a super heavy phone user, but look for restaurants, open maps, take pictures, ask Gemini stuff and I'd be at 50% by the time she was at 75.
> Does anyone know what data doesn't go through the vpn?
I can't speak to exactly what data doesn't go through their VPN but I know carrier apps tend to not play nice with VPNs, especially the Google Fi app (as it relies on its connection and what IP its on to coordinate switching between their various carrier contracts and that seems to break under a VPN).
And also seemingly Wi-fi calling has been problematic over VPN for as long as I can remember so that's usually a safe bet for exclusion.
reply