TIL: Sep 2025

ruX

6 months ago

Finally got an invitation for Perplexity Commet but

Commet doesn't fly on Linux, apparently. No matter how much I pay.

Big Brother is watching you in UK

Things are getting more ridiculous:

I've seen what happens next. While a significant part of the population seems to like this change "because it's meant to protect children" (yeah, some people really do enjoy having fewer parental responsibilities), I'm genuinely happy that lots of people are opposing this and correctly comparing the "Online Safety Act" to a move towards totalitarianism - even if it comes from the good intention of protecting children.

As a father and technologist, I do see one solution that has been around in the crypto world for some time - a solution that actually helps protect children without exposing privacy: zero-knowledge-based identity. And the exact reason why I'd oppose this law is because zk-based age checks weren't suggested as the preferred way of implementation, let alone made a requirement. It just screams how little politicians understand technology risks - and how little they care about identity theft (~$27B loss in 2024 alone).

Opus 4.1 is still being "optimised for costs"

That's a new level of success theatre. I saw something similar with Llama 3.1 7B running locally — but hey, it's not a $100/mo subscription with tight limits.

Really fed up with these quality inconsistencies.

Getting targeted for a crypto phishing scam

They pretend to be Gitcoin, and of course the links point to a GitHub-like website.

Numerai meetup in Vienna

Great to see people in the industry working at the intersection of AI & Crypto. Thanks Jo for organising a great event in wonderful Vienna!

GPT-5 & OpenAI Codex

At first, the release of GPT-5 was a disaster — a lot of memes emerged comparing it to 4o.
However, over time the quality (reasoning, logic, structured thinking, self-doubt/"critical thinking") has improved and now matches — and may be exceeding — o3. But from my perception it's still susceptible to self-convincing: once it "believes" in something, it's difficult to steer it away.

Contrary to that, the quality of Sonnet 4.0 and Opus 4.1 has degraded. Apparently it's not only me who noticed the decline, and from my perception, two months after launch Opus is performing somewhere around Sonnet 4.0 level. GPT-5 matches it, but certainly beats it in logical/analytical/math thinking.

So I cancelled Claude subscription and gave Codex a whirl. Pretty impressive — an omni model that works everywhere: CLI, IDE, or web (browser PR creator). Codex Web deserves attention specifically — I've been very sceptical about PR + no edits (only suggestions) workflow, but it works exceptionally well.

1) Having a single task/PR reduces scope of work and focuses the model on the specific feature, not "create the whole system in one go"
2) The "human" mental model of task creation implies clear goal definition, often covered by tests as good hygiene
3) Codex Web produces multiple versions — doing things differently helps you choose the best solution. It's what you'd probably do in Cursor while iterating the same prompt until the model gets it
4) (minor benefit) for $200 it seems to be unlimited, or nearly unlimited, for now