Interesting comment about the minimum size of flash storage being 18nm here: htt...

reitzensteinm · on June 30, 2012

It's different this time (and I cringe saying that), because we're rapidly approaching fundamental limits to scaling down silicon.

The concerns in the past have been mainly to do with lithography; eg, when the feature size of the silicon went below the wavelength of the light we were using, we had to make masks that utilized difference patterns. This is a mere manufacturing problem.

But now we're getting to fundamental limits. Even if we had the ability to place the atoms however we wanted them, there's an intrinsic limit. You can't make a transistor out of half an atom.

We already hit a wall with frequency; for the longest time, it looked like speeds would go up and up. It's not an apples to apples comparison, because the Pentium 4 had a long pipeline, but a 3.8ghz Prescott was released in 2005 - which is exactly the maximum turbo frequency of the 2011 Sandy Bridge 2600k I'm typing this now on. Ivy has it beaten by just 100mhz.

Now, that's not to say that computation will stop progressing. But it's not going to look like last year's CPU just smaller for much longer; some pretty fundamental changes are going to have to be made. Dynamically reconfiguring memristor circuits are what excites me, but it's just as likely to be something else instead.

As far as flash memory in particular goes, I'm no expert, but cell durability is falling substantially with each shrink (on average; Intel bucked the trend with their 25nm flash), and so the usable limit to feature size may come more quickly than with standard transistors.

But the industry has managed to push through walls that seemed just as intrinsic before, so I wouldn't bet my life savings on it.

davidb_ · on July 1, 2012

> We already hit a wall with frequency

This is true, but the way you wrote it ignores the fundamental problem - it's not we can't make transistors switch any quicker, it's that doing so causes such an increase in temperature that we risk damaging the device. That's why you can read about overclockers using things like liquid nitrogen to run chips at 8 GHz.

Cooling mechanisms like microchannel cold plates and, as we continue with 3D-ICs, interlayer cooling, can allow for higher frequencies.

reitzensteinm · on July 1, 2012

I don't think better heat extraction would really change that much for today's CPUs (certainly when we head towards 3D chips it will become critical).

Gate delays are smaller at low temperatures; those LN2 overclocking runs aren't just fast because of efficient heat dissipation from the CPU, they're fast because the chip is being actively cooled to below room temperature.

So while heat dissipation is a factor, we're also close to the electrical limits as well. Otherwise water cooling (replacing the stock heat spreader) would get closer to LN2 runs. ALUs run at higher frequencies than the rest of the chip, but they're designed to do so (you'd have to shorten the gate pathways like a P4 to do that to the entire chip).

But ultimately, performance per watt is almost universally optimised for these days. It's critical in servers, laptops, mobile phones - The demand for 6ghz, 300W CPUs would be limited to workstation chips, even though we could probably engineer them to be reliable.

Power consumption is always going to increase super linearly with respect to frequency, probably as a fundamental property of any method of computation we use.

Leynos · on July 1, 2012

A few years ago, there was talk of producing chips with three dimensional stacks of NAND cells (for example, http://www.semi.org/en/node/38361?id=sgurow0811 ). Has there has been any movement on this front? Every article I can find on the subject is a year old or more. This strikes me as the ideal way around the limits of individual NAND cell size, with an obvious proviso that drastically new manufacturing techniques would need to be perfected before this limit is hit.

reitzensteinm · on July 1, 2012

If it can be commercially produced, that would definitely set us up for a while for flash (CPUs would be thermally limited); though interestingly, not really that long!

If each layer was 50 nm high, and you built the chip up to an unrealistic 1 centimeter high (eg, a 1 cm^3 chip instead of 1 cm^2), chosen because that would pretty easily fill a 2.5" drive, that would give you:

(1 centimeter) / (50 nanometers) = 200,000 times today's capacity.

Which is only only 18 doublings, or 36 years more of Moore's Law (assuming the pessimistic 24 month end), or roughly the gap between a Commodore 64 and a decent laptop today. Some people still working in the industry have gone through a larger increase. I've gone through a 1000 fold myself, and I'm only 25.

There are sure to be a whole bunch more we can do to get more capacity, but it's pretty mind blowing to think that the theoretical limits to storage are within our lifetimes on an exponential scale. So as much as Moore's law hasn't failed us yet, it certainly will at some point (probably in the form of the doublings themselves exponentially taking longer and longer).

ak217 · on July 1, 2012

They're only within our lifetimes if you require the devices to be the size of modern day silicon chips. There's nothing preventing us from building bigger devices - I mean, my laptop and smartphone's SSDs already essentially act as caches for much larger remote storage and compute hardware.

And given that nature has managed to cram this amazing sentient device into a space the size of our skull, using a pretty inefficient design process, I'd say the problem will be not the quantity of the building blocks, but how they're organized :)

wtallis · on July 1, 2012

If we were willing to accept SSDs being as unreliable as human memory, we could increase capacities by an order of magnitude with current technology. In fact, if SSD controller design weren't so tricky, someone would have taken advantage of this already to build a pretty decent enterprise-scale caching system.

nickolai · on July 1, 2012

>being as unreliable as human memory,

That's actually considered to be a feature. http://en.wikipedia.org/wiki/Hyperthymesia

cbsmith · on July 1, 2012

> But now we're getting to fundamental limits. Even if we had the ability to place the atoms however we wanted them, there's an intrinsic limit. You can't make a transistor out of half an atom.

I believe though that limit is well below 18 nm. Last I heard, they'd done a transistor with 1.5nm, and they weren't saying that was the limit. I'm not sure what the magic is with 18nm, but I'd sure like to know.

adsr · on July 1, 2012

I've been under the impression that NAND will someday be replaced with something like MRam. Not sure how far of that is or how it's progressing, would be interested in knowing if someone knows more about this.

jrabone · on June 30, 2012

I read an article somewhere (sorry, don't have reference to hand) that suggested it was diminishing returns from error correction vs number of levels per cell rather than lithography limits per se.

jpxxx · on July 1, 2012

I think you're both talking about different sides of the same problem.

Visualize a single SSD cell as a bucket of electrons. If it's full of electrons, it stands for 1. If it's empty, it stands for 0. Reading this is fast and unambiguous - there's either lots or none. (Single Level Cell type)

Now visualize a bucket that can be filled with no electrons, a third full, two-thirds full, or full of electrons. Now you've got 4 possibilities, so you can encode 2 bits! (Multiple Level Cell type)

Most consumer SSDs use the latter kind, MLC. It's slower to read and the results have to be amplified with better error correction, but you get so much more storage per chip that it's usually worth doing it this way.

The problem is that once you've cranked each of these cells down to 18nm or whatever, you're talking about holding and measuring (at most) 100 electrons per cell. What's that, 0, 33, 66, and 100 electrons? Crank it down even further and you can hold even less.

I think I pulled the numbers out of my ass, but the idea is essentially correct. We're close to the point where it's too difficult and error-prone to get a good read on a cell, requiring too much error correction to make engineering sense and rendering the MLC technique infeasible.

Also going below 18nm is going to be a pain in the ass for other reasons. So it's sort of a mutual dead end.

bodyfour · on July 1, 2012

All of the vendors are still using power-of-two MLC (i.e. 4 or 8 levels) right? I wonder when we'll start seeing 3 level (1.58 bits/cell) or 5 level (2.32 bits/cell) MLC start appearing? It's more complex to visualize, but the controller already abstracts the details of the storage quite a bit so it's not hard to imagine it spreading your 512 bytes across 2585 instead of 2048 cells. Some implementations are already doing compression so you could even have the algorithm's output be a tristate stream instead of a binary one if that's marginally more efficient for the silicon.

Another thing I wonder is how much neighboring cells interfere with each other. If that's the case, the most efficient data packing might even rely on bit encodings for multi-cell groups that avoid the combinations that are most likely to cause interference (similar to the 64b/66b encoding in gigabit ethernet, for example) Again, the controller has to do this type of thing already to implement ECC so it seems like a straightforward extension.

Note that I'm not saying that either of these techniques avoid the "dead end" you're talking about -- they're both just ways of squeezing a tiny bit more out of the density/reliability curve at the margins. I'm just imagining how complicated flash controllers might get as they try to capture the last bits of life in the technology.

wmf · on July 1, 2012

Edit: Ignore this; I wasn't paying attention.

http://www.anandtech.com/show/5067/understanding-tlc-nand/

http://forums.theregister.co.uk/forum/1/2012/06/25/Chris_Mel...

bodyfour · on July 1, 2012

TLC is (confusingly) 8-level flash, not 3-level, so it's still an integer number of bits per cell.

gpvos · on July 1, 2012

What worried me most is that SSDs with the apparently current 22nm size have only 2k/3k write cycles, where older ones had 10k. Seems a bit low to me.

philjohn · on July 1, 2012

It is low, but controller technology has come on as well. cxtreme systems have a 2xnm endurance test going on at the moment and a 256 GB Samsung 830 has got to 2.4 PB written so far and is still going strong (at an average rate of over 200 MB/s).