While I get that they're sometimes useful to trigger debate, I don't really subs...

fuziontech · on Feb 7, 2023

100% agree. One of the biggest assets we had at <driver and rider marketplace app> was the data we collected. We built models on it that would determine how markets were run and whether drivers and passengers were safe. These were key features that enabled us to bring a quality service to customers (over ye ol' taxi). The same applied to the autonomous cars, bikes, and scooters. We used data to improve placement of vehicles to help us anticipate and meet demand. It was insane how much data used to build these models.

To say big data is dead sounds to me like someone desperate for eyeballs.

I do think there is a huge opportunity for DuckDB - running analytics on 'not quite big data' is a market that has always existed and is arguably growing. I've seen way too many people trying to use Postgres for analyzing 10 Billion row tables and people booting up an EMR cluster to hit the same 10 Billion rows. There is a huge sweet spot for DuckDB here were you can grab a slice of the data you are interested in, take it home and slice and dice it as you please on your local computer. I did this just this weekend on DuckDB _and_ ClickHouse!

Disclaimer: I work at a company that is entirely based on ClickHouse.

vgt · on Feb 7, 2023

Didn't know that Posthog is based on CH these days. Interesting!

zX41ZdbW · on Feb 8, 2023

Check the list of companies using ClickHouse: https://clickhouse.com/docs/en/introduction/adopters/

ayewo · on Feb 8, 2023

Really neat that you scour job postings to learn useful intelligence about companies using your product. I do this too :)

I'm curious how you have this set up. Is it currently a manual process or you use social monitoring tools to help you find mentions of ClickHouse in the wild?

gingerwizard · on Feb 8, 2023

Just use ClickHouse :) https://sql.clickhouse.com/play?user=play#U0VMRUNUICogRlJPTS...

ayewo · on Feb 10, 2023

Thanks for the reply :-) but your link is only for tracking mentions on the HN website.

I was asking about how they are able to track mentions, across the web, of companies using ClickHouse. This type of info is usually listed in the tech stack section of job descriptions (and these links tend to expire once the position is filled).

spopejoy · on Feb 7, 2023

I guess the article title is a "bold statement" but maybe the biggest insight in there is that people don't think hard enough about throwing old data away, and it hurts them. This is a liferaft for drowning in data and is more "bold" organizationally, as it actually takes a certain kind of courage to realize you should just throw stuff away instead of succumb to the false comfort that "hey you never know when you might need it".

Weirdly there's a similar thing that can happen to codebases, specifically unit tests and test fixtures that outlive any of their original programmers, nobody understands what's actually being tested and before each release lose days/weeks hammering to "fix the test". The only solution is to throw it away, but good luck getting most teams to ever do that, because of the false comfort they get -- even though that fixture is now just testing itself and not protecting you from any actual bugs.

I mean how often does Netflix need to look a viewing habits from 2015? Summarize and throw it away.

crazygringo · on Feb 8, 2023

I am baffled by this comment.

Throwing out unit tests? If you make a change and it fails a test, then you fix the bug or fix the test. I can't even imagine in what universe it's a good idea to throw away a test if it covers code in use. In what universe are unit tests "false comfort"? And if "nobody understands what's actually being tested" then you've got huge problems with your development practices.

Similarly, viewing habits from 2015 are tremendously important. There may be a show they're releasing soon that is most similar to a title released in 2015, and those stats will provide the best model. "Summarize" requires knowing how data will be used in the future, but will likely throw away what you need. Not to mention how useful and profitable vast quantities of data are for ML training.

Storing data is incredibly cheap. I'm actually curious where this desire to throw away old data comes from? I've literally never encountered it before, and it flies in the face of everything I've ever learned. The only context I know it from is data retention policies, but that's solely to limit legal liability.

fijiaarone · on Feb 8, 2023

Unit tests are only potentially of value if the code is changing. And 90% of code never changes. And 99% of unit tests never fail. Almost all of the value of unit tests come at the time of writing (a tiny percentage of) them.

After that, they become a liability that slows down builds, makes changes brittle and code based schlerotic.

A few good unit tests are a lot better than a bunch of bad ones. And even from your statement we can tell a much more pernicious risk — the false beliefs that code coverage measures whether code is tested and that a code coverage percentage is a mark of quality or safety in its own right.

spopejoy · on Feb 8, 2023

The unit test story is indeed bizarre. Done right unit tests should test the unit, and you'll never hit these problems.

The villians here were monstrous test fixtures instead of mocks, "testing the fixture" instead of testing the code. Both were agency trading systems so "platforms" of a sort that needed significant refactoring to mock properly, so instead tests had to inject essentially fake concrete services.

Somehow I joined teams twice in my career that were trapped under this (who both indeed had "huge problems with their development practices") as their only coverage. The only way out is to write all new unit tests.

fijiaarone · on Feb 8, 2023

Mocks are the personification of bad data. The only meaningful measurement derived from tests with mocks is how bad the architecture is.

spopejoy · on Feb 13, 2023

I don't know what you're criticizing here. I was contrasting "mocks" and "fixtures" in the context of unit tests as ways to instrument services depended on by the code under test.

A "mock" in this paradigm is some kind of testing technology that allows you to directly instrument return values for function calls on the dependent service, whereas a "fixture" is some concrete test-only thing you coded up to use in your tests.

If a fixture just acts as a dummy return-value provider, no problem (but you probably should have used a mocking solution). The problem that arises is fixture code that simulates some or all of the production service code, and/or (even worse) allowing modification of production code to allow use as a test fixture. This is the way to madness.

shswkna · on Feb 8, 2023

The heading is definitely “clickbait-ey” but the quality of the content was worth it. I probably would have missed the article without the headline. And I am already applying the insights gained.