That reminds me of .kkrieger, a 96KB FPS developed by the demoscene group Farbrausch which was aggressively subsetted in the same fashion at the very last minute [1]. This approach worked very well... until it didn't. It is interesting to see the history repeating itself---with the same caveat!
I'd actually be interested in a Chromium browser that instead of removing HTML5 functionality, replaces it with simulated fake data.
For example, allow access to the geolocation API, but give it a fake location. Allow access to third-party cookies, but shove garbage back at the server. Allow access to the camera and microphone but play a specified pre-recorded video back for both.
This helps get past websites that evilly require access to these permissions to function.
It also would help in the case of e.g. store websites that want your location permission to give you the closest store website, and in which I am okay with giving them a city-level accuracy (because I'd be entering the city name anyway in a manual search) but don't want my exact coordinates going to them.
Actually i spoof the location to say you're at area 51 in my remote isolated browser. It's done using chrome DevTools protocol, which is very simple to use, using the method `Emulation.setGeolocationOverride`[1]
This is doable right now with some effort, which I kind of like because it keeps the cat-and-mouse game paused for a while.
The moment someone makes a one-click browser that lets the average person spoof all these things, then sites are going to start building captcha-style verification mechanisms like having to "wink three times" in your video to get it to proceed.
But someone would still have to program that SnapCam filter specifically for each captcha, and that process is not widespread and easy yet, so it will pause for a few more years hopefully.
Hmm maybe I could use injected JavaScript and allow third party cookies in Chromium but use my injected JavaScript to randomly flip some bits in all third party cookies. And first party cookies for sites that I haven't explicitly whitelisted.
I also wonder how many database backends this will trip up due to invalid IDs, though that's not really my intention.
I'm not sure but for a while WeChat (the app) would not let you login if you did not give it location permissions. I think they've changed that but they still unscrupulously scan your Wi-Fi networks from time to time.
I haven't yet seen a website do that, but they conceivably could.
Lots of media presences that either clearly didn't get the memo on "how to implement GDPR correctly" or who still are determined on selling out their users, for one.
Technically speaking, at least in German jurisdiction the way to completely be in compliance with GDPR is to not even load a single library or other asset from any third party servers (the point being that the third party receives the datum of the user's visit timestamp, their IP address, and the referrer) without the user's explicit consent. Anything else opens you up for issues.
I don't personally believe anyone needs to actually implement GDPR unless they have a company registered in the EU.
I do believe in privacy and the general intent of GDPR, but EU laws shouldn't take effect outside their jurisdiction any more than Iranian laws or Chinese laws.
> Meanwhile, Selenium’s standalone Chromium image is 348.1 MB compressed.
Selenium implies you have quite a bit of a developer stack, runtimes and SDKs installed too. It would be unfair to attribute all that usage to Chrome alone.
Not aware of the docker image in question, but a docker image starts empty (0 bytes). A full operating system (Ubuntu 20.04) base is is 75MB.
You can probably slim that down a bit if you needed, but installing your software on top might only take a few MB. So cutting chrome down from 60MB could well be a good goal.
Unfortunately our image is ~450MB right now because we have install both Python and JS, Chrome, fonts, vido codecs, and a bunch of other software for all the extractors.
Most of it is from this killer 266MB line, but unfortunately we need all these things:
Yeah the fonts take alot. Why do you need to slim the images down? Do you really need the fonts present to archive the content, I thought you were saving the HTML, wouldn't these fonts just be needed to render?
I've only just had time to glance at this paper and I've already found it to be very informative. There are types of web operations that I've not much need for so I've long advocated that it would be good if there were ways of either rendering them inoperative or depreciating certain aspects of their operation for the purposes of either improving user privacy/security or improving the browser's performance.
In particular, the JavaScript engine could be modified in ways that would allow it to be tailored to the user's requirements such as to improve the speed of the rendering of web pages by having it ignore the huge amounts of dross/bloat that's now found in a large percentage of of web pages. Same would apply to user privacy, JS could be tweaked to either ignore certain requests for privacy-sapping info or where necessary spoof them with quasi-random junk ('quasi-random' here refers to data that's inaccurate but its format and structure still satisfies any validity checks the server may perform). The same philosophy applies to other aspects of browser operation that I won't bother to cover here.
To do this clearly requires a detailed understanding of the browser's complex operations and that's not easy for someone—even a good programmer—who only approaches a browser's internal functions as a means to an end to quickly change some facet of its operation.
It seems from what I've read so far, this paper could initiate new more detailed work and or lead to the development of tools that would make browser tailoring much easier and simpler.
They replace it with some illegal instruction, so the process crashes. With a lot of logic running in per-origin processes, this may only bring down one tab instead of the whole browser.
Edit: Or maybe the error handling can avoid killing a process? This is what the paper says, but I feel like a child process would almost certainly be killed:
> Code elimination is trivial because we nullify unused code with illegal instructions based on known binary function boundaries. Once the instructions triggers a Chromium’s error handling routine that catches an exception, an error page shows an “Aw, Snap!” message by default instead of crashing a whole Chromium process. (section 5, p467)
For some, sure, if an API implements "new foo(), new foo(string), new foo(string, int), and new foo(string, int, int)" but the code that uses the library only uses "new foo()" and sets the other properties after creating the object then it would eliminate the extra constructors. Figuring out what stuff gets called is really hard in something as big as Chromium, where there are so many components that in some spots have their own domain-specific language to link stuff together.
For others, no, they do allow the minifier to break some sites: they analyze the Alexa top 1000 and check which functions get called and how often by those sites.
English is not nearly a consistent enough language to make the argument that “slimium” can’t be spelled as-is and still be pronounced like “slim” — and personally I think “slimmium” just looks weird.
With the exception of reducing the exploitable surface, what is the point of debloating chrome?
Code only gets pulled off disk and into memory, and out of memory and into the cache if it actually gets called yes? So the initialisation code for something like WebUSB would get called as the browser wakes up, but then never called again and the majority of it would never even make it off the disk. Are there really gains to be made here?
I understand that you're referring to the username, not the expletive...but who is still browsing HN on a machine where it's a problem when you click a link and it's unexpectedly a PDF? That mattered several years ago, but I exoect most users probably have embedded PDF viewers in their browsers today.
Honestly? Because I'm setup to download PDFs automatically, and the indicator that this has happened is too inconspicuous. So I often click a link twice or even three times before realizing its a PDF, and then I have to clean up 2 or 3 files.
Every time I clean my downloads folder I find a few "(1)" strings appended to several duplicates I've downloaded by mistake. Mostly due to the UX issue you mentioned but I'm also mildly ADHD.
Kinda makes you wonder why browsers don't have a built in (content based) deduping feature. I'm sure some users actually desire this behavior now that they're used to it.
Can anyone recommend a Firefox extension that can dedupe and clean your downloads for you?
Why not just use a general deduping utility and unleash it on your Downloads folder? That will cover files that have been removed from the browser's downloads list, too.
> your phone starts downloading a large PDF instead (4MB, what the what?).
I agree about the parent point of marking PDF as such but still feel compelled to point out that a 4 MB PDF is not significantly bigger than a lot of regular pages that are posted to HN.
In this very moment the currently highest ranked link, which is just a regular web page, on the HN front page had a total network payload of 3,937 KiB according to Google PageSpeed.
I sympathize with what you are saying though in general, and I think it is sad that so much mobile bandwidth is needed in order to read stuff.
The way that I personally get around this is that when I am on mobile I mostly restrict myself to reading HN comments instead of clicking through on any of the featured links themselves. But depending on how you like to use your phone I realize this might not be feasible to switch habit into.
Except a lot of regular pages are just pages and so it is trivial to read them at 20kb size because you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them. A 4mb PDF is a single 4mb file. A 4mb webpage is more like 100 separate files, most of which can be entirely forbidden from loading at all, and many of which can be set to load only once tapped on (like images).
> you obviously run your browser with extensions that block the loading of the vast majority of resources on the page until you whitelist them
I think most people do not do this on mobile. Even among the HN crowd. For example, on iPhone that I use, Safari does not support extensions. So for most people on mobile the site will download everything.
The issue for me isn't viewing the pdf but getting rid of it afterwards.
Almost all of my storage used on my phone other media are PDFs, but I can't split them up like I would on a PC e.g. I want to keep the textbooks and datasheets but I don't need every paper I've ever read.
What debloating tools are there for the Android NDK? For any code that ends up dexed, R8 takes care of it, so it's not terrible if I add a library dependency with a lot of code I don't need. I haven't found anything similar for native libraries.
One thing I've done in a project before was statically linking all of my native library's dependencies into the library itself. Everything would be compiled with `-ffunction-sections -fdata-sections` and the final shared library linked with `-Wl,--gc-sections`. I'd call `strip` on the `.so` as well.
This worked well because the JVM code only ever called the native API I exposed and did not need to access the dependencies' APIs.
There is a tremendous need for a minimalistic chromium project that would coexist well with the Linux Tiled Window Manager ecosystem.
With downloads passed outside the browser, favorites stored in some sort of JSON and support for native WM tabbing.
What about qutebrowser? It's based on QtWebEngine (a sort of ungoogled
chromium), has a minimal UI, extremely customisable and can be extended with
userscripts (nothing to do with js). You can do stuff like editing textareas in
your editor by spawning a subprocess, open a video with mpv, pipe something to
dmenu.
[1] https://fgiesen.wordpress.com/2012/04/08/metaprogramming-for...