Slimium: Debloating the Chromium Browser with Feature Subsetting

lifthrasiir · on Nov 10, 2020

That reminds me of .kkrieger, a 96KB FPS developed by the demoscene group Farbrausch which was aggressively subsetted in the same fashion at the very last minute [1]. This approach worked very well... until it didn't. It is interesting to see the history repeating itself---with the same caveat!

[1] https://fgiesen.wordpress.com/2012/04/08/metaprogramming-for...

dheera · on Nov 10, 2020

I'd actually be interested in a Chromium browser that instead of removing HTML5 functionality, replaces it with simulated fake data.

For example, allow access to the geolocation API, but give it a fake location. Allow access to third-party cookies, but shove garbage back at the server. Allow access to the camera and microphone but play a specified pre-recorded video back for both.

This helps get past websites that evilly require access to these permissions to function.

It also would help in the case of e.g. store websites that want your location permission to give you the closest store website, and in which I am okay with giving them a city-level accuracy (because I'd be entering the city name anyway in a manual search) but don't want my exact coordinates going to them.

graderjs · on Nov 10, 2020

Actually i spoof the location to say you're at area 51 in my remote isolated browser. It's done using chrome DevTools protocol, which is very simple to use, using the method `Emulation.setGeolocationOverride`[1]

You can try the demo https://demo.browsergap.dosyago.com/

and you want to go to the "Browser Geolocation" link at this site https://mylocation.org/

[0]: https://github.com/c9fe/ViewFinder

[1]: https://chromedevtools.github.io/devtools-protocol/tot/Emula...

nikisweeting · on Nov 10, 2020

This is doable right now with some effort, which I kind of like because it keeps the cat-and-mouse game paused for a while.

The moment someone makes a one-click browser that lets the average person spoof all these things, then sites are going to start building captcha-style verification mechanisms like having to "wink three times" in your video to get it to proceed.

dheera · on Nov 10, 2020

> mechanisms like having to "wink three times" in your video to get it to proceed

Deepfakes are already here, we have that cat (or mouse) taken care of.

nikisweeting · on Nov 10, 2020

But someone would still have to program that SnapCam filter specifically for each captcha, and that process is not widespread and easy yet, so it will pause for a few more years hopefully.

hedora · on Nov 10, 2020

I’d love for this idea to be applied to tracking cookies.

Just keep a pool of a hundred million of them, and send some at random with each request.

dheera · on Nov 10, 2020

Hmm maybe I could use injected JavaScript and allow third party cookies in Chromium but use my injected JavaScript to randomly flip some bits in all third party cookies. And first party cookies for sites that I haven't explicitly whitelisted.

I also wonder how many database backends this will trip up due to invalid IDs, though that's not really my intention.

octoberfranklin · on Nov 10, 2020

> This helps get past websites that evilly require access to these permissions to function.

Wow, what website is that evil?

dheera · on Nov 10, 2020

I'm not sure but for a while WeChat (the app) would not let you login if you did not give it location permissions. I think they've changed that but they still unscrupulously scan your Wi-Fi networks from time to time.

I haven't yet seen a website do that, but they conceivably could.

mschuster91 · on Nov 10, 2020

Lots of media presences that either clearly didn't get the memo on "how to implement GDPR correctly" or who still are determined on selling out their users, for one.

Technically speaking, at least in German jurisdiction the way to completely be in compliance with GDPR is to not even load a single library or other asset from any third party servers (the point being that the third party receives the datum of the user's visit timestamp, their IP address, and the referrer) without the user's explicit consent. Anything else opens you up for issues.

dheera · on Nov 10, 2020

I don't personally believe anyone needs to actually implement GDPR unless they have a company registered in the EU.

I do believe in privacy and the general intent of GDPR, but EU laws shouldn't take effect outside their jurisdiction any more than Iranian laws or Chinese laws.

nikisweeting · on Nov 9, 2020

Would love to use this for ArchiveBox so we can get smaller Docker image sizes while still including Chromium headless.

Is there a docker POC anywhere I can check out?

https://github.com/cxreet/chromium-debloating seems empty at the moment.

graderjs · on Nov 10, 2020

Isn't the image size mostly independent of chrome? Docker images are gigabytes, right? But chrome is only ~60 Mb.

amarshall · on Nov 10, 2020

Images can be quite small. Debian slim image is 25.9 MB compressed. Alpine image is 2.7 MB compressed. Meanwhile, Selenium’s standalone Chromium image is 348.1 MB compressed.

josteink · on Nov 10, 2020

> Meanwhile, Selenium’s standalone Chromium image is 348.1 MB compressed.

Selenium implies you have quite a bit of a developer stack, runtimes and SDKs installed too. It would be unfair to attribute all that usage to Chrome alone.

hiisukun · on Nov 10, 2020

Not aware of the docker image in question, but a docker image starts empty (0 bytes). A full operating system (Ubuntu 20.04) base is is 75MB.

You can probably slim that down a bit if you needed, but installing your software on top might only take a few MB. So cutting chrome down from 60MB could well be a good goal.

nikisweeting · on Nov 10, 2020

Unfortunately our image is ~450MB right now because we have install both Python and JS, Chrome, fonts, vido codecs, and a bunch of other software for all the extractors.

Most of it is from this killer 266MB line, but unfortunately we need all these things:

    apt-get update -qq && \
    apt-get install -qq -y --no-install-recommends \
        wget curl chromium git ffmpeg youtube-dl \
        fontconfig fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-symbola fonts-noto fonts-freefont-ttf && \  
    rm -rf /var/lib/apt/lists/*

https://hub.docker.com/layers/nikisweeting/archivebox/latest...

Any 10MB here or there will help, whether it's from Chrome core or something else. If you have any suggestions I'm all ears!

graderjs · on Nov 10, 2020

Yeah the fonts take alot. Why do you need to slim the images down? Do you really need the fonts present to archive the content, I thought you were saving the HTML, wouldn't these fonts just be needed to render?

nikisweeting · on Nov 10, 2020

The fonts are for PDF and screenshot rendering.

graderjs · on Nov 10, 2020

Have you considered doing the install at usage time? If you're trying to save on bandwidth of downloaded images....

nikisweeting · on Nov 11, 2020

Doesn't help, they're used almost immediately.

dflock · on Nov 10, 2020

Maybe give this a try: https://github.com/docker-slim/docker-slim

hilbert42 · on Nov 11, 2020

I've only just had time to glance at this paper and I've already found it to be very informative. There are types of web operations that I've not much need for so I've long advocated that it would be good if there were ways of either rendering them inoperative or depreciating certain aspects of their operation for the purposes of either improving user privacy/security or improving the browser's performance.

In particular, the JavaScript engine could be modified in ways that would allow it to be tailored to the user's requirements such as to improve the speed of the rendering of web pages by having it ignore the huge amounts of dross/bloat that's now found in a large percentage of of web pages. Same would apply to user privacy, JS could be tweaked to either ignore certain requests for privacy-sapping info or where necessary spoof them with quasi-random junk ('quasi-random' here refers to data that's inaccurate but its format and structure still satisfies any validity checks the server may perform). The same philosophy applies to other aspects of browser operation that I won't bother to cover here.

To do this clearly requires a detailed understanding of the browser's complex operations and that's not easy for someone—even a good programmer—who only approaches a browser's internal functions as a means to an end to quickly change some facet of its operation.

It seems from what I've read so far, this paper could initiate new more detailed work and or lead to the development of tools that would make browser tailoring much easier and simpler.

nickodell · on Nov 9, 2020

So what happens if a removed function is called? Or can you guarantee that won't happen?

hcs · on Nov 10, 2020

They replace it with some illegal instruction, so the process crashes. With a lot of logic running in per-origin processes, this may only bring down one tab instead of the whole browser.

Edit: Or maybe the error handling can avoid killing a process? This is what the paper says, but I feel like a child process would almost certainly be killed:

> Code elimination is trivial because we nullify unused code with illegal instructions based on known binary function boundaries. Once the instructions triggers a Chromium’s error handling routine that catches an exception, an error page shows an “Aw, Snap!” message by default instead of crashing a whole Chromium process. (section 5, p467)

johntb86 · on Nov 10, 2020

Chromium displays an "Aw Snap!" message when a renderer process dies.

LeifCarrotson · on Nov 10, 2020

For some, sure, if an API implements "new foo(), new foo(string), new foo(string, int), and new foo(string, int, int)" but the code that uses the library only uses "new foo()" and sets the other properties after creating the object then it would eliminate the extra constructors. Figuring out what stuff gets called is really hard in something as big as Chromium, where there are so many components that in some spots have their own domain-specific language to link stuff together.

For others, no, they do allow the minifier to break some sites: they analyze the Alexa top 1000 and check which functions get called and how often by those sites.

taneq · on Nov 10, 2020

Were they deliberately going for a connotation of "slimy"? If not they might want to add an 'm' to make it "Slimmium".

hnarn · on Nov 10, 2020

English is not nearly a consistent enough language to make the argument that “slimium” can’t be spelled as-is and still be pronounced like “slim” — and personally I think “slimmium” just looks weird.

taneq · on Nov 11, 2020

The two-consonants-means-preceeding-vowel-is-short rule is pretty fundamental though.

justaj · on Nov 10, 2020

I've uploaded this PDF to archive.org [0] because I find it preposterous that acm.org _requires_ cookies despite them being blocked client-side.

[0] https://archive.org/details/3372297.3417866

RantyDave · on Nov 10, 2020

With the exception of reducing the exploitable surface, what is the point of debloating chrome?

Code only gets pulled off disk and into memory, and out of memory and into the cache if it actually gets called yes? So the initialisation code for something like WebUSB would get called as the browser wakes up, but then never called again and the majority of it would never even make it off the disk. Are there really gains to be made here?

Rebelgecko · on Nov 10, 2020

Probably disk space. Saving webUSB once isn't a big deal, but if you have 30 copies of Chromium on your computer it could add up

londons_explore · on Nov 10, 2020

Even SSD's have appreciable latency. That means you typically read far more code into RAM than you need to run.

Having said that, the executable size in RAM is tiny compared to the runtime ram usage.

javajosh · on Nov 9, 2020

Needs a "(PDF)", dang.

LeifCarrotson · on Nov 9, 2020

I understand that you're referring to the username, not the expletive...but who is still browsing HN on a machine where it's a problem when you click a link and it's unexpectedly a PDF? That mattered several years ago, but I exoect most users probably have embedded PDF viewers in their browsers today.

javajosh · on Nov 10, 2020

Honestly? Because I'm setup to download PDFs automatically, and the indicator that this has happened is too inconspicuous. So I often click a link twice or even three times before realizing its a PDF, and then I have to clean up 2 or 3 files.

ShamelessC · on Nov 10, 2020

Every time I clean my downloads folder I find a few "(1)" strings appended to several duplicates I've downloaded by mistake. Mostly due to the UX issue you mentioned but I'm also mildly ADHD.

Kinda makes you wonder why browsers don't have a built in (content based) deduping feature. I'm sure some users actually desire this behavior now that they're used to it.

Can anyone recommend a Firefox extension that can dedupe and clean your downloads for you?

kazinator · on Nov 10, 2020

Why not just use a general deduping utility and unleash it on your Downloads folder? That will cover files that have been removed from the browser's downloads list, too.

qqii · on Nov 10, 2020

It's still not an especially pleasant experience on mobile.

TheRealPomax · on Nov 10, 2020

Every mobile phone? You click the link expecting an article, and your phone starts downloading a large PDF instead (4MB, what the what?).

This needs a (pdf) in the title, because it's a pdf file, not an web page.

codetrotter · on Nov 10, 2020

> your phone starts downloading a large PDF instead (4MB, what the what?).

I agree about the parent point of marking PDF as such but still feel compelled to point out that a 4 MB PDF is not significantly bigger than a lot of regular pages that are posted to HN.

In this very moment the currently highest ranked link, which is just a regular web page, on the HN front page had a total network payload of 3,937 KiB according to Google PageSpeed.

I sympathize with what you are saying though in general, and I think it is sad that so much mobile bandwidth is needed in order to read stuff.

The way that I personally get around this is that when I am on mobile I mostly restrict myself to reading HN comments instead of clicking through on any of the featured links themselves. But depending on how you like to use your phone I realize this might not be feasible to switch habit into.

TheRealPomax · on Nov 11, 2020

Except a lot of regular pages are just pages and so it is trivial to read them at 20kb size because you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them. A 4mb PDF is a single 4mb file. A 4mb webpage is more like 100 separate files, most of which can be entirely forbidden from loading at all, and many of which can be set to load only once tapped on (like images).

codetrotter · on Nov 12, 2020

> you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them

I think most people do not do this on mobile. Even among the HN crowd. For example, on iPhone that I use, Safari does not support extensions. So for most people on mobile the site will download everything.

RandallBrown · on Nov 10, 2020

iPhones handle PDFs just fine?

paulirish · on Nov 10, 2020

neodymiumphish · on Nov 10, 2020

Materialistic is the best Android HN app I'm aware of, and it doesn't support PDFs until I open the article outside the app.

mhh__ · on Nov 10, 2020

The issue for me isn't viewing the pdf but getting rid of it afterwards.

Almost all of my storage used on my phone other media are PDFs, but I can't split them up like I would on a PC e.g. I want to keep the textbooks and datasheets but I don't need every paper I've ever read.

lern_too_spel · on Nov 10, 2020

What debloating tools are there for the Android NDK? For any code that ends up dexed, R8 takes care of it, so it's not terrible if I add a library dependency with a lot of code I don't need. I haven't found anything similar for native libraries.

chenxiaolong · on Nov 10, 2020

One thing I've done in a project before was statically linking all of my native library's dependencies into the library itself. Everything would be compiled with `-ffunction-sections -fdata-sections` and the final shared library linked with `-Wl,--gc-sections`. I'd call `strip` on the `.so` as well.

This worked well because the JVM code only ever called the native API I exposed and did not need to access the dependencies' APIs.

lern_too_spel · on Nov 10, 2020

I'm specifically looking for what R8 calls "tree-shaking."

high_priest · on Nov 10, 2020

There is a tremendous need for a minimalistic chromium project that would coexist well with the Linux Tiled Window Manager ecosystem. With downloads passed outside the browser, favorites stored in some sort of JSON and support for native WM tabbing.

skizziepop · on Nov 10, 2020

"tremendous" might be overstating the market of all 8 suckless surf users

wffurr · on Nov 10, 2020

There's only 8 of them but they need it real bad.

rnhmjoj · on Nov 10, 2020

What about qutebrowser? It's based on QtWebEngine (a sort of ungoogled chromium), has a minimal UI, extremely customisable and can be extended with userscripts (nothing to do with js). You can do stuff like editing textareas in your editor by spawning a subprocess, open a video with mpv, pipe something to dmenu.

octoberfranklin · on Nov 10, 2020

It's called qutebrowser. It is awesome.

tsjq · on Nov 10, 2020

Does it still have google's tracking after this debloating ?

tsjq · on Nov 10, 2020

Is this UnGoogled or not ?

a012 · on Nov 10, 2020

No, they try to reduce the features in Chromium (less codes less bugs) while Ungoogled chromium just patch everything related to Google.

taneq · on Nov 10, 2020

I interpreted the question as, "is this version also ungoogled or does it still have Chrome's tracking features?"

TheRealPomax · on Nov 10, 2020

This is definitely (pdf)

asdfk-12 · on Nov 10, 2020

Please add [PDF] to the title of this post.

systemvoltage · on Nov 10, 2020

We need to stop browser development besides security patches. No new features and start taking features off.

octoberfranklin · on Nov 10, 2020

and world peace

and a pony