Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Slimium: Debloating the Chromium Browser with Feature Subsetting (acm.org)
129 points by feross on Nov 9, 2020 | hide | past | favorite | 65 comments


That reminds me of .kkrieger, a 96KB FPS developed by the demoscene group Farbrausch which was aggressively subsetted in the same fashion at the very last minute [1]. This approach worked very well... until it didn't. It is interesting to see the history repeating itself---with the same caveat!

[1] https://fgiesen.wordpress.com/2012/04/08/metaprogramming-for...


I'd actually be interested in a Chromium browser that instead of removing HTML5 functionality, replaces it with simulated fake data.

For example, allow access to the geolocation API, but give it a fake location. Allow access to third-party cookies, but shove garbage back at the server. Allow access to the camera and microphone but play a specified pre-recorded video back for both.

This helps get past websites that evilly require access to these permissions to function.

It also would help in the case of e.g. store websites that want your location permission to give you the closest store website, and in which I am okay with giving them a city-level accuracy (because I'd be entering the city name anyway in a manual search) but don't want my exact coordinates going to them.


Actually i spoof the location to say you're at area 51 in my remote isolated browser. It's done using chrome DevTools protocol, which is very simple to use, using the method `Emulation.setGeolocationOverride`[1]

You can try the demo https://demo.browsergap.dosyago.com/

and you want to go to the "Browser Geolocation" link at this site https://mylocation.org/

[0]: https://github.com/c9fe/ViewFinder

[1]: https://chromedevtools.github.io/devtools-protocol/tot/Emula...


This is doable right now with some effort, which I kind of like because it keeps the cat-and-mouse game paused for a while.

The moment someone makes a one-click browser that lets the average person spoof all these things, then sites are going to start building captcha-style verification mechanisms like having to "wink three times" in your video to get it to proceed.


> mechanisms like having to "wink three times" in your video to get it to proceed

Deepfakes are already here, we have that cat (or mouse) taken care of.


But someone would still have to program that SnapCam filter specifically for each captcha, and that process is not widespread and easy yet, so it will pause for a few more years hopefully.


I’d love for this idea to be applied to tracking cookies.

Just keep a pool of a hundred million of them, and send some at random with each request.


Hmm maybe I could use injected JavaScript and allow third party cookies in Chromium but use my injected JavaScript to randomly flip some bits in all third party cookies. And first party cookies for sites that I haven't explicitly whitelisted.

I also wonder how many database backends this will trip up due to invalid IDs, though that's not really my intention.


> This helps get past websites that evilly require access to these permissions to function.

Wow, what website is that evil?


I'm not sure but for a while WeChat (the app) would not let you login if you did not give it location permissions. I think they've changed that but they still unscrupulously scan your Wi-Fi networks from time to time.

I haven't yet seen a website do that, but they conceivably could.


Lots of media presences that either clearly didn't get the memo on "how to implement GDPR correctly" or who still are determined on selling out their users, for one.

Technically speaking, at least in German jurisdiction the way to completely be in compliance with GDPR is to not even load a single library or other asset from any third party servers (the point being that the third party receives the datum of the user's visit timestamp, their IP address, and the referrer) without the user's explicit consent. Anything else opens you up for issues.


I don't personally believe anyone needs to actually implement GDPR unless they have a company registered in the EU.

I do believe in privacy and the general intent of GDPR, but EU laws shouldn't take effect outside their jurisdiction any more than Iranian laws or Chinese laws.


Would love to use this for ArchiveBox so we can get smaller Docker image sizes while still including Chromium headless.

Is there a docker POC anywhere I can check out?

https://github.com/cxreet/chromium-debloating seems empty at the moment.


Isn't the image size mostly independent of chrome? Docker images are gigabytes, right? But chrome is only ~60 Mb.


Images can be quite small. Debian slim image is 25.9 MB compressed. Alpine image is 2.7 MB compressed. Meanwhile, Selenium’s standalone Chromium image is 348.1 MB compressed.


> Meanwhile, Selenium’s standalone Chromium image is 348.1 MB compressed.

Selenium implies you have quite a bit of a developer stack, runtimes and SDKs installed too. It would be unfair to attribute all that usage to Chrome alone.


Not aware of the docker image in question, but a docker image starts empty (0 bytes). A full operating system (Ubuntu 20.04) base is is 75MB.

You can probably slim that down a bit if you needed, but installing your software on top might only take a few MB. So cutting chrome down from 60MB could well be a good goal.


Unfortunately our image is ~450MB right now because we have install both Python and JS, Chrome, fonts, vido codecs, and a bunch of other software for all the extractors.

Most of it is from this killer 266MB line, but unfortunately we need all these things:

    apt-get update -qq && \
    apt-get install -qq -y --no-install-recommends \
        wget curl chromium git ffmpeg youtube-dl \
        fontconfig fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-symbola fonts-noto fonts-freefont-ttf && \  
    rm -rf /var/lib/apt/lists/*

https://hub.docker.com/layers/nikisweeting/archivebox/latest...

Any 10MB here or there will help, whether it's from Chrome core or something else. If you have any suggestions I'm all ears!


Yeah the fonts take alot. Why do you need to slim the images down? Do you really need the fonts present to archive the content, I thought you were saving the HTML, wouldn't these fonts just be needed to render?


The fonts are for PDF and screenshot rendering.


Have you considered doing the install at usage time? If you're trying to save on bandwidth of downloaded images....


Doesn't help, they're used almost immediately.



I've only just had time to glance at this paper and I've already found it to be very informative. There are types of web operations that I've not much need for so I've long advocated that it would be good if there were ways of either rendering them inoperative or depreciating certain aspects of their operation for the purposes of either improving user privacy/security or improving the browser's performance.

In particular, the JavaScript engine could be modified in ways that would allow it to be tailored to the user's requirements such as to improve the speed of the rendering of web pages by having it ignore the huge amounts of dross/bloat that's now found in a large percentage of of web pages. Same would apply to user privacy, JS could be tweaked to either ignore certain requests for privacy-sapping info or where necessary spoof them with quasi-random junk ('quasi-random' here refers to data that's inaccurate but its format and structure still satisfies any validity checks the server may perform). The same philosophy applies to other aspects of browser operation that I won't bother to cover here.

To do this clearly requires a detailed understanding of the browser's complex operations and that's not easy for someone—even a good programmer—who only approaches a browser's internal functions as a means to an end to quickly change some facet of its operation.

It seems from what I've read so far, this paper could initiate new more detailed work and or lead to the development of tools that would make browser tailoring much easier and simpler.


So what happens if a removed function is called? Or can you guarantee that won't happen?


They replace it with some illegal instruction, so the process crashes. With a lot of logic running in per-origin processes, this may only bring down one tab instead of the whole browser.

Edit: Or maybe the error handling can avoid killing a process? This is what the paper says, but I feel like a child process would almost certainly be killed:

> Code elimination is trivial because we nullify unused code with illegal instructions based on known binary function boundaries. Once the instructions triggers a Chromium’s error handling routine that catches an exception, an error page shows an “Aw, Snap!” message by default instead of crashing a whole Chromium process. (section 5, p467)


Chromium displays an "Aw Snap!" message when a renderer process dies.


For some, sure, if an API implements "new foo(), new foo(string), new foo(string, int), and new foo(string, int, int)" but the code that uses the library only uses "new foo()" and sets the other properties after creating the object then it would eliminate the extra constructors. Figuring out what stuff gets called is really hard in something as big as Chromium, where there are so many components that in some spots have their own domain-specific language to link stuff together.

For others, no, they do allow the minifier to break some sites: they analyze the Alexa top 1000 and check which functions get called and how often by those sites.


Were they deliberately going for a connotation of "slimy"? If not they might want to add an 'm' to make it "Slimmium".


English is not nearly a consistent enough language to make the argument that “slimium” can’t be spelled as-is and still be pronounced like “slim” — and personally I think “slimmium” just looks weird.


The two-consonants-means-preceeding-vowel-is-short rule is pretty fundamental though.


I've uploaded this PDF to archive.org [0] because I find it preposterous that acm.org _requires_ cookies despite them being blocked client-side.

[0] https://archive.org/details/3372297.3417866


With the exception of reducing the exploitable surface, what is the point of debloating chrome?

Code only gets pulled off disk and into memory, and out of memory and into the cache if it actually gets called yes? So the initialisation code for something like WebUSB would get called as the browser wakes up, but then never called again and the majority of it would never even make it off the disk. Are there really gains to be made here?


Probably disk space. Saving webUSB once isn't a big deal, but if you have 30 copies of Chromium on your computer it could add up


Even SSD's have appreciable latency. That means you typically read far more code into RAM than you need to run.

Having said that, the executable size in RAM is tiny compared to the runtime ram usage.


Needs a "(PDF)", dang.


I understand that you're referring to the username, not the expletive...but who is still browsing HN on a machine where it's a problem when you click a link and it's unexpectedly a PDF? That mattered several years ago, but I exoect most users probably have embedded PDF viewers in their browsers today.


Honestly? Because I'm setup to download PDFs automatically, and the indicator that this has happened is too inconspicuous. So I often click a link twice or even three times before realizing its a PDF, and then I have to clean up 2 or 3 files.


Every time I clean my downloads folder I find a few "(1)" strings appended to several duplicates I've downloaded by mistake. Mostly due to the UX issue you mentioned but I'm also mildly ADHD.

Kinda makes you wonder why browsers don't have a built in (content based) deduping feature. I'm sure some users actually desire this behavior now that they're used to it.

Can anyone recommend a Firefox extension that can dedupe and clean your downloads for you?


Why not just use a general deduping utility and unleash it on your Downloads folder? That will cover files that have been removed from the browser's downloads list, too.


It's still not an especially pleasant experience on mobile.


Every mobile phone? You click the link expecting an article, and your phone starts downloading a large PDF instead (4MB, what the what?).

This needs a (pdf) in the title, because it's a pdf file, not an web page.


> your phone starts downloading a large PDF instead (4MB, what the what?).

I agree about the parent point of marking PDF as such but still feel compelled to point out that a 4 MB PDF is not significantly bigger than a lot of regular pages that are posted to HN.

In this very moment the currently highest ranked link, which is just a regular web page, on the HN front page had a total network payload of 3,937 KiB according to Google PageSpeed.

I sympathize with what you are saying though in general, and I think it is sad that so much mobile bandwidth is needed in order to read stuff.

The way that I personally get around this is that when I am on mobile I mostly restrict myself to reading HN comments instead of clicking through on any of the featured links themselves. But depending on how you like to use your phone I realize this might not be feasible to switch habit into.


Except a lot of regular pages are just pages and so it is trivial to read them at 20kb size because you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them. A 4mb PDF is a single 4mb file. A 4mb webpage is more like 100 separate files, most of which can be entirely forbidden from loading at all, and many of which can be set to load only once tapped on (like images).


> you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them

I think most people do not do this on mobile. Even among the HN crowd. For example, on iPhone that I use, Safari does not support extensions. So for most people on mobile the site will download everything.


iPhones handle PDFs just fine?


Ok!


Materialistic is the best Android HN app I'm aware of, and it doesn't support PDFs until I open the article outside the app.


The issue for me isn't viewing the pdf but getting rid of it afterwards.

Almost all of my storage used on my phone other media are PDFs, but I can't split them up like I would on a PC e.g. I want to keep the textbooks and datasheets but I don't need every paper I've ever read.


What debloating tools are there for the Android NDK? For any code that ends up dexed, R8 takes care of it, so it's not terrible if I add a library dependency with a lot of code I don't need. I haven't found anything similar for native libraries.


One thing I've done in a project before was statically linking all of my native library's dependencies into the library itself. Everything would be compiled with `-ffunction-sections -fdata-sections` and the final shared library linked with `-Wl,--gc-sections`. I'd call `strip` on the `.so` as well.

This worked well because the JVM code only ever called the native API I exposed and did not need to access the dependencies' APIs.


I'm specifically looking for what R8 calls "tree-shaking."


There is a tremendous need for a minimalistic chromium project that would coexist well with the Linux Tiled Window Manager ecosystem. With downloads passed outside the browser, favorites stored in some sort of JSON and support for native WM tabbing.


"tremendous" might be overstating the market of all 8 suckless surf users


There's only 8 of them but they need it real bad.


What about qutebrowser? It's based on QtWebEngine (a sort of ungoogled chromium), has a minimal UI, extremely customisable and can be extended with userscripts (nothing to do with js). You can do stuff like editing textareas in your editor by spawning a subprocess, open a video with mpv, pipe something to dmenu.


It's called qutebrowser. It is awesome.


Does it still have google's tracking after this debloating ?


Is this UnGoogled or not ?


No, they try to reduce the features in Chromium (less codes less bugs) while Ungoogled chromium just patch everything related to Google.


I interpreted the question as, "is this version also ungoogled or does it still have Chrome's tracking features?"


This is definitely (pdf)


Please add [PDF] to the title of this post.


We need to stop browser development besides security patches. No new features and start taking features off.


and world peace

and a pony




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: