Rendered at 23:23:07 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
l1k 10 hours ago [-]
Fun fact (or not so fun if you're a subscriber):
Somebody is spamming kernel mailing lists under the name Marian Corcodel with a 26 MByte message multiple times per day containing a collection of nonsensical patches. Looks AI-generated, perhaps with the intention to poison LLMs. This has been going on for a few days now.
I'd warn HN users not to click on that link simply because it will load a 26Mb message that will likely cause quite a strain on kernel.org's servers if everyone here does it.
sillysaurusx 8 hours ago [-]
I was curious how much of an impact HN could have. Napkin math:
HN gets 24M views a day. Assume those views are evenly distributed across the front page (they aren’t), and that’s about 1M views for each front page post, assuming each user clicks on one post.
By the rule of 10s (also not exact), there are 10x less views on comment threads. So assume around 100k views on a comment thread as a theoretical average.
If everyone in this thread clicked on the link, that would be 2.6 TB of transfer across the day. But by the rule of 10’s we have to assume 10x fewer people will interact (upvote, click, anything) than view. So we’re down to 260GB transfer over the course of a day.
I wonder how close that is. It seems plausible that a link in the top comment of a thread could garner 10,000 clicks.
That’s still about one click every 8 seconds, which at 10Mbit/s would indeed overwhelm the server by a factor of about 2.5x. But I clicked through and it loaded in just a few seconds, so presumably the pipe is faster than 10Mbit/s.
Another caveat is that many websites are already several megabytes, so it seems strange that 26Mb would be the breaking point for a reasonable web host.
devsda 6 hours ago [-]
Don't forget scrapers. Scrapers can be biased towards top posts and comments.
> There's no stats page but last I checked it was around 5M monthly unique users (depending on how you count them), perhaps 10M page views a day (including a guess at API traffic), and something like 1300 submissions (stories) and 13k comments a day.
> The most interesting number is the 1300 submissions because that hasn't grown since 2011 - it just fluctuates. Everything else has been growing more or less linearly for a long time, which is how we like it.
kraftman 8 hours ago [-]
Plenty of people deliberately posting to HN have their servers overwhelmed.
jedberg 3 hours ago [-]
It's mirrored by Akamai, which is designed to repeatedly serve the same thing over and over. It won't really hurt anyone.
jmalicki 9 hours ago [-]
Does a 26MB message actually cause noticeable strain on the server much beyond loading the page? I would think serving a contiguous 26MB chunk would be relatively similar to say 20 normal sized messages.
mort96 7 hours ago [-]
Way off. I went to an arbitrary message on lore.kernel.org. Firefox's network inspector says 7.37kB was transferred, including stylesheets. 26MB is roughly 3500x 7.37kB.
jmalicki 6 hours ago [-]
Data transferred is not what generates load. sendfile() is about the lowest-overhead thing a web server does.
I don't think needlessly straining the Internet Archive's servers is any better.
embedding-shape 8 hours ago [-]
IA's infra is slightly better for big loads though, they tend to just have higher latency rather than aborted/timed out requests, for better or worse. It can be bit slow, but as long as you're ready to wait, you'll eventually get the response. Usually hosts just cut you off with a hardcoded timeout instead, which for people on high latency/low bandwidth connections can be super fun.
grosswait 9 hours ago [-]
Will clicking on this link download a 26MB message putting extra load on archive.org's servers?
neksn 6 hours ago [-]
The page is gzipped in transit - only 5 MB of traffic are generated.
shevy-java 9 hours ago [-]
Thank you for the warning. I rarely click on links these days though; only exception I make for HN links for main articles.
embedding-shape 8 hours ago [-]
How do you navigate the web, everything is CTRL+L then manually type the address, or you have some fancier solution?
kelsey98765431 8 hours ago [-]
the web is useless outside of hn
embedding-shape 6 hours ago [-]
90% of it yeah, but the 10% is still worth it, like HN.
Phelinofist 7 hours ago [-]
> perhaps with the intention to poison LLMs
How does that work?
stefan_ 7 hours ago [-]
This is just nonsensical changes and slurs, but particularly degenerate input data can cause big issues in training:
Actual context: Linux 7.1-rc4 release, Linus remarked on a specific documentation change.
The Register somehow turned this into an "article" that says a lot less with roughly the same number of words, and provides "context" by linking to a number of unrelated articles.
The Register has always been a... weird 'news' source, but they've gotten significantly worse over the last year or two.
m463 7 minutes ago [-]
I think people are used to bias in mainstream media, but theregister seems to show it to computer/tech folks. I don't mind, I like opinionated stuff and can make up my own mind, but I wonder what folks with less context/experience think when they read some of their articles.
(and at least weird isn't like uncanny-valley-ai-written-weird)
Sweepi 11 hours ago [-]
"Torvalds' remarks contrast with recent comments from fellow kernel maintainer Greg Kroah-Hartman, who recently told The Register that AI has become an increasingly useful tool for the FOSS community."
Does it? Both points can be true at the same time.
m463 1 minutes ago [-]
yeah reading the original post linus seems to be pretty constructive in his words and open to AI in a common-sense way.
ses1984 11 hours ago [-]
Linus also said
“AI tools are great, but only if they actually help, rather than cause unnecessary pain and pointless make-believe work,” he wrote. “Feel free to use them, but use them in a way that is productive and makes for a better experience.”
So I think the closing remark from the register isn’t really appropriate given the context from the quotes they pulled.
dathinab 8 hours ago [-]
the problem here is that many of the submissions are not "make-believe work" but actual existing security issues
it's just that in the past people most times didn't find security vulnerabilities independently of each other without knowing about the others en mass
worse it's non trivial to dedup on the submitter side, nor on the receiver site (as long as we stay with a classical mailing list format)
and while this might be fixable with an AI auto grouping duplicates etc. getting that right is _hard_ especially if we consider that there can be a lot to gain for an adversary to use prompt injection and similar to cause an effective "hiding" of "useful" security issues (e.g. by wrongly causing them being labeling as duplicate).
In addition to all the technical problems this causes some other problems: 1.) additional cost you can intentional (maliciously) increase 2.) dependence on some LLM provider 3.) trust problem wrt. the used LLM provider. Some of this can be avoided by running open models on sponsored owned hardware, but at the cost of often outdated LLM tech, higher cost, now needing to maintain additional hardware etc.
pessimizer 7 hours ago [-]
> the problem here is that many of the submissions are not "make-believe work" but actual existing security issues
Not exactly, the submissions are reports about actual existing security issues. They are make-believe work because everybody has access to AI, and anybody could have done it. Deduping is not productive work, it's a search for productive work.
Instead of spamming bug reports generated by AI, people should spam cash or token credit of some sort so the project can generate these themselves. The real unnecessary part of the entire process is the submitter. There's no need for an AI middleman.
If somebody comes up with some witty trick that gets an AI to find a bug that it wouldn't have found on its own, submit the prompt.
mock-possum 3 hours ago [-]
So if a thing is good then it is good, but if a thing is bad then it is bad? Got it!
renegade-otter 9 hours ago [-]
I will argue that ON AVERAGE, humans are lazy, and will use LLMs to generate walls of text and code. We like the easy way out - just pop a pill. Here we have a technology that can finally help us manage the crippling firehose of data, and instead, we are going to make it much worse. As expected.
A few of us will actually use these tools to reduce toil and achieve something useful.
j16sdiz 9 hours ago [-]
Torvalds didn't say AI isn't useful. He is saying everybody use AI to file same duplicate bug report causing extra churn.
orthoxerox 9 hours ago [-]
AI can amplify your intelligence just as easily as it can amplify your stupidity. All while telling you how smart and brilliant you are.
Incipient 8 hours ago [-]
[dead]
happytoexplain 11 hours ago [-]
I mean, they are two (of many) contrasting results of AI. The writer didn't say "contradict". But I agree they probably could have chosen better wording.
moezd 9 hours ago [-]
I think it's time the report-only intake should stop. If a reporter can't reproduce at least one use case or can't summarise it in two sentences, it should be classified as spam. LLMs write beautiful reports, it's just that sometimes it doesn't bear anything resembling the truth.
nashadelic 9 hours ago [-]
couldn't an llm be used for verification like we're seeing some OSS projects do? Some projects are moving so fast, its almost certain there's little human involvement.
Tempest1981 8 hours ago [-]
At my job, multiple people have vibe-coded bug-triage utilities. They're great for grouping duplicates.
But now we need an AI tool to consolidate the triage utilities.
trelbutate 9 hours ago [-]
Will never understand why some people prefer mailing lists to do development, it always feels like the most convoluted way to hold a discussion, especially if there are multiple topics at the same time.
It probably doesn't really change that much in this scenario but with a forum or any other topics-based platform you can at least just close and ignore these things without it affecting everyone else.
PurpleRamen 9 hours ago [-]
A good mailclient allows a skilled user a much more efficient communication than most forums.
> It probably doesn't really change that much in this scenario but with a forum or any other topics-based platform you can at least just close and ignore these things without it affecting everyone else.
True, external moderation is a benefit of centralized platforms, but a mailclient allows personalized moderation, which allows with a well organized list to only filter out anything you are not interested in. Usenet had the benefit of both, a centralized platform with moderation, and powerful clients for further personalization. Too bad it died for most usages.
SoKamil 8 hours ago [-]
Is there a demo of such communication on YouTube, or at least some article with screenshots?
"Will never understand why some people prefer mailing lists to do development ..."
The people who have this preference are processing the mailing lists with a highly specialized mailtool, not a web browser.
If you have only ever accessed email with a web browser it is not surprising that you find the mailing list format weird.
pixl97 9 hours ago [-]
Because it is an open and widely distributed system that is difficult to take down or otherwise have an extended outage.
toast0 6 hours ago [-]
Usenet is probably better, but to a rough approximation, nobody has access to a usenet feed anymore.
Mailing lists allow people to use threads if they want (assuming nobody thrashes the threads headers by using terrible email software from Microsoft/Google), and also allows people to read from the firehose if the want. And there's plenty of threaded web views of mailing lists available for lurkers.
bhaak 7 hours ago [-]
Show me a forum or topics based platform that handle threads as good as proper mail clients? Don’t mistake the poor HTML view for how managing threads with thousands of replies look like.
Local filtering is the key to ignoring threads you are not interested in. Depending on the client with 2 or 3 keystrokes you are ignoring the whole thread or this particular sub branch of it and automatically jumping to the next interesting, unread message.
fhn 8 hours ago [-]
old people like the old tools that they grew up using
foresto 3 hours ago [-]
This could be read as reductive, presumptuous, ignorant, and insulting.
At the same time, it's often technically true, but for a good reason that you neglected to mention:
Those old tools tend to be very capable email clients, not web apps with their awkward attempts to simplify complex conversation structure. A good email client can handle large, high-traffic, frequently branching, long-lived threads with ease. All the web forums I've ever used fail miserably here.
The people who are tasked with participating in large scale discussion groups (like the LKML) know this through experience. They prefer email because it works better. It makes their lives easier. It helps them to be more efficient, which is absolutely necessary given the sheer volume of messaging that they handle.
Yes, a specialized tool is required to get these benefits, just as a specialized tool is required to make web server output easily readable. Thankfully, these tools have existed for decades.
vitally3643 7 hours ago [-]
This is the reason behind essentially every reply I've ever seen to this question.
"I like it this way because it's always been this way and once you change your entire email workflow and customize your email client, it's almost as good as PHPbb"
Forums are built for threads and are immediately visible and accessible for everyone, not just people who want to spend their limited time dicking with email clients.
Mailing lists are the proto-discord: knowledge locked away from the public behind a special frontend and elitist attitudes. It's only better because the list is technically visible, but only in the worst, most low-effort way possible. You dump a raw txt copy of the entire thread unstructured onto the user and make it their problem to figure out. After all, your email client makes it easy to read, so why should you care about what anyone else needs?
47282847 6 hours ago [-]
I understand it is out of fashion, but technologically more advanced are systems that use well defined interfaces and allow pieces to be exchanged easily. After all, we’re communicating over a number of open protocols here. A forum merges all elements into one silo. I prefer web over AOL/Compuserve. If you want a forum-like interface, there’s no technical reason why this couldn’t be done on top of a mailing list. In fact, Discourse and others attempted it.
This discussion has been happening since forever. And also the idea that it serves anyone to complain how others are obviously doing it wrong, without even attempting to understand why they’re doing it a certain way. And then be irritated when the response is negative, and labeling others as elitist for using and providing open platforms over decades and not silos.
If you don’t know, feel free to ask. And then suggest (or provide!) improvements that factor in current requirements and goals instead of dismissing them as stupid.
Life advice: if you want anybody to change what they do, you need to first understand why they’re doing it, and then offer suggestions based on that understanding that improve it with them. Otherwise you’re going to continue to recreate your own victim position, and an “elite” position that you will never belong to.
uecker 4 hours ago [-]
yes, exactly this.
BobaFloutist 4 hours ago [-]
I think it's unsurprising that the people maintaining the tangled plumbing of the clunky, customizable, fiercely independent operating system most popular with highly-opinionated power-users prefer the clunky, customizable, open format that's a little inconvenient for non power-users but allows them to set up their own personal bespoke client.
kobalsky 7 hours ago [-]
maybe the old tools are prevailing for a good reason.
I prefer people to email me because half of the time they figure out their problems while writing them.
it's not an absolute rule but people who don't do their homework gravitate towards calls and messaging because they just don't prepare their questions.
asynchronous communication puts the burden on the sender, where it belongs.
skydhash 5 hours ago [-]
It’s an open format and with basic tools, you can create a very good pipeline to consume the information. There’s no ceiling on the convenience and automation. Gmail and other webmail client are not a representation of good email workflow.
vfclists 6 hours ago [-]
Because ther don't have to keep switching from discussion client to discussion client or whatever tools the use for each project they are involved with, at the same time being distracted by adverts, geegaws, random emojis and other kinds of nonsense.
They are simply more efficient and more importantly censoring is done by the user themselves, not by politically motivated admins who ban discussions based on their ideologies and whims.
rnxrx 9 hours ago [-]
It seems like LLMs are actually pretty good at the sorts of things needed to manage a high-volume mailing list (summarizing, looking for dupes, sentiment, flagging things, etc), even if only as augmentation for human eyes.
That said, I get why this would rankle a lot of the folks involved.
rolandog 8 hours ago [-]
That's just a security/protection racket with extra steps: "Someone is paying us to hurt your business/site; pay us money to defend your site against our attacks".
olive-n 9 hours ago [-]
I like to imagine that LLM's ability to optimize code is like an extension of the training-loop in deep learning. The loss function is some kind of metric representing security and/or performance (or the lack of it) of the code and we use the LLM as the gradient/diff generator to iterate in batches over the code and fine tune it.
Imagine the current state being for the most part a collection of local maxima in security. To push the system in a more optimal state, you either need skilled people and time to overcome the barrier to a new local maximum or you throw AI at it and evaluate whether you land in a more optimal state.
I think after some time of turbulent exploit/patch cycles we will reach a stable state again, where the code converges against a new local minimum that even with AI requires significant effort (time and tokens) to overcome. Or ideally a global maximum.
With time, the LLMs improve, so the diffs/gradients get better and we will be able to reach optimal points for any software faster.
My problem with the idea is that apparently it is assumed that OSS contributors and especially maintainers will generously donate their time to get this machinery into a state that makes the optimization loop work well - just for the AI labs to turn around and sell access to the optimized models for increasingly larger amounts of money.
AI generated code can be great. Hand rolled code can be bad. The rules are the same in both cases. Make sure your code changes are focused (no random changes just because you happen to be in the file/dir or notice something) and make sure you don't break anything else along the way.
9 hours ago [-]
oncallthrow 8 hours ago [-]
I think this will sort itself out over time, as people realise that it’s no longer impressive whatsoever to land an AI-assisted PR to the Linux kernel.
VLM 7 hours ago [-]
Make it anonymous and the problem will go away.
The problem is people trying to get individual credit for merely running a script that spams a mailing list. Many of those people are likely not even C programmers or programmers at all.
Without the immense personal reward and recognition and job offers as a motivation, the problem will disappear.
The problem will also disappear with time as the people lauding and celebrating and hiring security researchers of the past will quickly abandon LLM generated spam as a positive signal; running a prompt that sends spam is, if anything, a strong negative indicator of infosec ability and skill.
LLMs are a tool. Like all tools, most people can't or won't use them responsibly or profitably although they are useful in the correct hands.
thewebguyd 6 hours ago [-]
I really like this idea. Removes the fame, blog & resume/job hunting incentive from it.
The kernel isn't the only OSS project with this issue either. Requiring submissions & issues to be anonymous could help a ton of other open source projects currently drowning in AI slop issues.
NoSalt 9 hours ago [-]
So ... who, exactly, is AI supposed to be "helping"???
pavon 6 hours ago [-]
The bug reports are helpful! Many Linux developers including Linus, Greg Kroah-Hartman, Andrew Morton, Chris Mason, and Willy Tarreau have all commented positively on all the legitimate problems that are being found with LLM. Here is just one example article[1].
This is just a workflow issue. In the past it was very rare for multiple people to find and report a security vulnerability at once, so it made sense to keep the discussion private until they were ready to release a fix. With AI that is happening all the time, so it makes more sense for the discussion to be in public to avoid duplication. So they changed the policy accordingly. That is it.
The "security researches" who post those bugs. Their goal being self promotion.
newswasboring 9 hours ago [-]
> Torvalds' remarks contrast with recent comments from fellow kernel maintainer Greg Kroah-Hartman, who recently told The Register that AI has become an increasingly useful tool for the FOSS community
Thats kinda a misrepresentation. They are talking about two different things. Linus is trying to point out incorrect use of a tool while GKH is praising a correct use. This sentence felt weird at the end of the article, kind like rage bait. And I took it :P.
stabbles 10 hours ago [-]
Isn't it mostly the medium that's problematic? With an issue tracker it's easier to close as duplicate
Aurornis 10 hours ago [-]
An open visibility tracker would be a goldmine for finding new exploits before a fix is even available.
From what I’ve seen many of the AI bug search operators are newer to security research. They’re burning their tokens trying to find kernel bugs as their claim to fame before other people with AI tools find them first. They don’t spend time de-duplicating their own bugs.
Some of them may not be coming from real people. There are honeypot repos that are entirely fake and only have folders of simple files with clear security problems. They collect automated reports they get from all of the AI bots that people are running.
smallerfish 10 hours ago [-]
So make it a closed issue tracker with a public email gateway. Get Anthropic to donate LLM time to classify and combine incoming reports.
throwaway85825 10 hours ago [-]
If the LLM hallucinates bugs what makes you think any classification won't be hallucinated?
quuxplusone 9 hours ago [-]
The issue highlighted in Linus's message isn't that the LLM is hallucinating fake bugs; it's that 100 people running the same LLM on the same codebase find the same real bug 100 times, and if they all send it to the private security mailing list, it's (1) unmanageably high volume and (2) stupid security theater [because by definition any bad actor with the same LLM would find that bug — it's effectively public at that point].
throwaway85825 5 hours ago [-]
You don't need an LLM to deduplicate bugs, just categorize by files affected. The real security problem is LLMs have a ~499/500 false positive rate and the new 'security research' post this slop and DDoS the mailing list.
dgellow 9 hours ago [-]
You still spend time identifying duplicates and doing triage. That can be very significant for a project like Linux.
Interestingly enough doing that type of triage is something LLMs are actually great at
3 hours ago [-]
cduzz 10 hours ago [-]
If the AI is awesome at identifying security bugs in the linux kernel, it likely can also identify if the thing it's found is similar to something that is already found in the security mailing list?
Or, put another way -- what flags the duplicate? The filer or the system? If my cheese factory is measured by the volume of cheese instead of the quality, I'll churn out the cheese even if it's sloppy duplicated cheese. And that is the case if a person has to flag a new ticket as "same as this" or not.
What's that law that says that any sufficiently large problem turns into a moderation problem?
crote 10 hours ago [-]
The problem is that the tech companies are paying their research/marketing departments for headlines that go "Researcher uses powerful new Saga 6.2 release to find 597 kernel vulnerabilities! (Can your company afford NOT getting their $1000/month subscription?)", not for headlines that go "Researcher spends $50.000 to find 597 bugs, then spends $25.000 figuring out 540 of them are duplicates".
Unless the kernel community starts banning & publicly shaming repeat offenders, there's zero incentive for them to put any effort in filtering out duplicates. They are mostly doing it for marketing after all, not out of a genuine interest in making the kernel better.
fiedzia 9 hours ago [-]
> it likely can also identify if the thing it's found is similar to something that is already found in the security mailing list?
It can not because this mailing list is not public.
flumes_whims_ 10 hours ago [-]
> “AI detected bugs are pretty much by definition not secret, and treating them on some private list is a waste of time for everybody involved – and only makes that duplication worse because the reporters can't even see each other's reports.”
cduzz 9 hours ago [-]
Ah; so it _is_ a tool problem. It is _also_ a moderation problem.
One could ban orgs that flood the zone with AI generated trash, but is there some potential middle ground where there are sets of filters to identify duplicated bugs, and possibly just internally dump "AI spam" to a lower queue?
This seems like the sort of problem I'd addressed in the 90s with killfiles and spamassassin. In other words, can't the ingestion just go through some filters to shield the humans at the end of the pipe?
Cthulhu_ 9 hours ago [-]
While true, security reports should be treated as confidential until a patch is widely available.
stonogo 10 hours ago [-]
And with a mailing list you don't even have to do that! The problem doesn't really change, because you have to figure out whether it is a duplicate before you can mark it as duplicate, and that's the 'managing' part of 'unmanageable'.
mixxit 4 hours ago [-]
I feel like the ability to speed up finding bugs will exceed your ability to fix them and/or review ai PRs
It will almost likely find issues that require fundamental design changes to even fix some of these
Dangerous waters ahead for data security and vital infrastructure
827a 3 hours ago [-]
I hate to be "that guy" but there's a reason why most of the industry stopped using mailing lists for things like this. Extremely impressive that Linux lasted this long.
perching_aix 7 hours ago [-]
Nonsense advice, he's just asking for duplicate slop patches too this way.
It's a catch 22. Why not make a separate list for AI generated reports that can be subscribed to instead? If the claim is that these are not private anyhow, no reason not to, and then a reasonable expectation could be held against submitters to check against existing reports.
That is unless it is still absolutely sensitive, in which case the only way forward that I see is to start using AI for triaging and duplicate detection as well.
quotemstr 9 hours ago [-]
Maybe it's time to require public zero-knowledge proofs of a working exploits before privately-delivered exploit details can be considered.
shevy-java 9 hours ago [-]
So ... first, AI slop is killing mankind slowly. Skynet is winning here.
On the other hand ... IF the bug report is real, and let's assume that AI slop reports at the least a few bugs that are indeed real, then I really think it should not make a difference WHO or WHAT reports these bugs. I would not disagree on fake bugs or bogus bug reports wasting time of humans, but this is a quality difference then. Surely people can tweak AI models to be better at finding bugs too. Besides, they should auto-fix that. Is AI still too stupid to fully replace humans? Other than killing them with spam, as it does right now.
new_account_100 10 hours ago [-]
AI (read: LLM technology) is the most powerful spam weapon ever invented.
kadirbg 2 hours ago [-]
[flagged]
parweb 8 hours ago [-]
[flagged]
abelzentric 10 hours ago [-]
[flagged]
nave94hn 8 hours ago [-]
[flagged]
kirtivr 8 hours ago [-]
I'd really like maintainers to get their hands dirty with AI agents as well to help speed up the reviews.
Over the last year there have been way too many stories and Twitter posts like these.
Yes, maintainers are overloaded, but that's only because we haven't yet built the tools to support them.
Other than such statements, I would, as a builder like to hear the sorts of tools and requirements maintainers are looking for which would make their work easier!
We need to move fast without breaking things.
sockaddr 8 hours ago [-]
I'm a huge AI advocate but even I can't get on board with this.
Feel free to fork the kernel and maintain your own vibe-coded disaster.
dathinab 8 hours ago [-]
I'm confused by your answer, the previous post doesn't seem to be about vibe-coding at all.
It seems to be more about:
1. auto grouping duplicate security reports
2. auto validating if they are likely viable or likely nonsense
3. auto checking if they have recently been patched
4. auto assessing if they likely "invalide" for other reasons (e.g. they are for a very old long time no longer maintained Linux version, out of tree drivers, etc.)
I mean practically all of that isn't trivial to get working in a way appropriate for the Linux security mailing list and comes with many not so obvious complications. But also non of that is vibe coding and in most cases this is is more about AI doing a per-assemsment of send security issues to speed up the review of them, then it is about the AI doing the final decision.
kirtivr 7 hours ago [-]
Exactly.
At the end of the day, we would rather have a more stable and bug-free kernel than not.
It's not that much work for me anymore to report and even fix that obscure monitor driver bug that sometimes causes my machine to bootloop, unless I boot without graphics and start the XOrg server manually.
I often find myself surprised at how easily frontier models are able to find bugs across abstraction layers, that only original authors can comprehend. We need more positivity around these contributions as well.
sirsinsalot 7 hours ago [-]
AI slop causes additional noise on the mailing list. Your suggestion is to use more AI to filter the noise?
How about we just reduce the noise?
wang_li 5 hours ago [-]
The world doesn't need to support the projects and research areas that interest you. How about we do something better: No one is allowed to say or write anything about AI or AI generated slop until AI is 100% perfect and produces zero errors and does everything with perfect efficiency.
AI trash like this is like showing up to a baseball game with a pitching machine and demanding that they let you join in and be the pitcher using your machine. Just because your slop cannon is fun and exciting to you doesn't make anyone else obligated to join your club just because you fired your slop on them.
Somebody is spamming kernel mailing lists under the name Marian Corcodel with a 26 MByte message multiple times per day containing a collection of nonsensical patches. Looks AI-generated, perhaps with the intention to poison LLMs. This has been going on for a few days now.
https://lore.kernel.org/all/CAGg4U=GNtCObd_Nbm_1Rr5FEvPb69Yz...
HN gets 24M views a day. Assume those views are evenly distributed across the front page (they aren’t), and that’s about 1M views for each front page post, assuming each user clicks on one post.
By the rule of 10s (also not exact), there are 10x less views on comment threads. So assume around 100k views on a comment thread as a theoretical average.
If everyone in this thread clicked on the link, that would be 2.6 TB of transfer across the day. But by the rule of 10’s we have to assume 10x fewer people will interact (upvote, click, anything) than view. So we’re down to 260GB transfer over the course of a day.
I wonder how close that is. It seems plausible that a link in the top comment of a thread could garner 10,000 clicks.
That’s still about one click every 8 seconds, which at 10Mbit/s would indeed overwhelm the server by a factor of about 2.5x. But I clicked through and it loaded in just a few seconds, so presumably the pipe is faster than 10Mbit/s.
Another caveat is that many websites are already several megabytes, so it seems strange that 26Mb would be the breaking point for a reasonable web host.
This is available info?
2022 from dang:
> There's no stats page but last I checked it was around 5M monthly unique users (depending on how you count them), perhaps 10M page views a day (including a guess at API traffic), and something like 1300 submissions (stories) and 13k comments a day.
> The most interesting number is the 1300 submissions because that hasn't grown since 2011 - it just fluctuates. Everything else has been growing more or less linearly for a long time, which is how we like it.
How does that work?
https://x.com/gabriberton/status/2051873677998956851
Actual context: Linux 7.1-rc4 release, Linus remarked on a specific documentation change.
The Register somehow turned this into an "article" that says a lot less with roughly the same number of words, and provides "context" by linking to a number of unrelated articles.
see "If you resorted to AI assistance to identify a bug, you must treat it as public." and https://docs.kernel.org/process/security-bugs.html#responsib...
(and at least weird isn't like uncanny-valley-ai-written-weird)
Does it? Both points can be true at the same time.
“AI tools are great, but only if they actually help, rather than cause unnecessary pain and pointless make-believe work,” he wrote. “Feel free to use them, but use them in a way that is productive and makes for a better experience.”
So I think the closing remark from the register isn’t really appropriate given the context from the quotes they pulled.
it's just that in the past people most times didn't find security vulnerabilities independently of each other without knowing about the others en mass
worse it's non trivial to dedup on the submitter side, nor on the receiver site (as long as we stay with a classical mailing list format)
and while this might be fixable with an AI auto grouping duplicates etc. getting that right is _hard_ especially if we consider that there can be a lot to gain for an adversary to use prompt injection and similar to cause an effective "hiding" of "useful" security issues (e.g. by wrongly causing them being labeling as duplicate).
In addition to all the technical problems this causes some other problems: 1.) additional cost you can intentional (maliciously) increase 2.) dependence on some LLM provider 3.) trust problem wrt. the used LLM provider. Some of this can be avoided by running open models on sponsored owned hardware, but at the cost of often outdated LLM tech, higher cost, now needing to maintain additional hardware etc.
Not exactly, the submissions are reports about actual existing security issues. They are make-believe work because everybody has access to AI, and anybody could have done it. Deduping is not productive work, it's a search for productive work.
Instead of spamming bug reports generated by AI, people should spam cash or token credit of some sort so the project can generate these themselves. The real unnecessary part of the entire process is the submitter. There's no need for an AI middleman.
If somebody comes up with some witty trick that gets an AI to find a bug that it wouldn't have found on its own, submit the prompt.
A few of us will actually use these tools to reduce toil and achieve something useful.
But now we need an AI tool to consolidate the triage utilities.
It probably doesn't really change that much in this scenario but with a forum or any other topics-based platform you can at least just close and ignore these things without it affecting everyone else.
> It probably doesn't really change that much in this scenario but with a forum or any other topics-based platform you can at least just close and ignore these things without it affecting everyone else.
True, external moderation is a benefit of centralized platforms, but a mailclient allows personalized moderation, which allows with a well organized list to only filter out anything you are not interested in. Usenet had the benefit of both, a centralized platform with moderation, and powerful clients for further personalization. Too bad it died for most usages.
The people who have this preference are processing the mailing lists with a highly specialized mailtool, not a web browser.
If you have only ever accessed email with a web browser it is not surprising that you find the mailing list format weird.
Mailing lists allow people to use threads if they want (assuming nobody thrashes the threads headers by using terrible email software from Microsoft/Google), and also allows people to read from the firehose if the want. And there's plenty of threaded web views of mailing lists available for lurkers.
Local filtering is the key to ignoring threads you are not interested in. Depending on the client with 2 or 3 keystrokes you are ignoring the whole thread or this particular sub branch of it and automatically jumping to the next interesting, unread message.
At the same time, it's often technically true, but for a good reason that you neglected to mention:
Those old tools tend to be very capable email clients, not web apps with their awkward attempts to simplify complex conversation structure. A good email client can handle large, high-traffic, frequently branching, long-lived threads with ease. All the web forums I've ever used fail miserably here.
The people who are tasked with participating in large scale discussion groups (like the LKML) know this through experience. They prefer email because it works better. It makes their lives easier. It helps them to be more efficient, which is absolutely necessary given the sheer volume of messaging that they handle.
Yes, a specialized tool is required to get these benefits, just as a specialized tool is required to make web server output easily readable. Thankfully, these tools have existed for decades.
"I like it this way because it's always been this way and once you change your entire email workflow and customize your email client, it's almost as good as PHPbb"
Forums are built for threads and are immediately visible and accessible for everyone, not just people who want to spend their limited time dicking with email clients.
Mailing lists are the proto-discord: knowledge locked away from the public behind a special frontend and elitist attitudes. It's only better because the list is technically visible, but only in the worst, most low-effort way possible. You dump a raw txt copy of the entire thread unstructured onto the user and make it their problem to figure out. After all, your email client makes it easy to read, so why should you care about what anyone else needs?
This discussion has been happening since forever. And also the idea that it serves anyone to complain how others are obviously doing it wrong, without even attempting to understand why they’re doing it a certain way. And then be irritated when the response is negative, and labeling others as elitist for using and providing open platforms over decades and not silos.
If you don’t know, feel free to ask. And then suggest (or provide!) improvements that factor in current requirements and goals instead of dismissing them as stupid.
Life advice: if you want anybody to change what they do, you need to first understand why they’re doing it, and then offer suggestions based on that understanding that improve it with them. Otherwise you’re going to continue to recreate your own victim position, and an “elite” position that you will never belong to.
I prefer people to email me because half of the time they figure out their problems while writing them.
it's not an absolute rule but people who don't do their homework gravitate towards calls and messaging because they just don't prepare their questions.
asynchronous communication puts the burden on the sender, where it belongs.
They are simply more efficient and more importantly censoring is done by the user themselves, not by politically motivated admins who ban discussions based on their ideologies and whims.
That said, I get why this would rankle a lot of the folks involved.
Imagine the current state being for the most part a collection of local maxima in security. To push the system in a more optimal state, you either need skilled people and time to overcome the barrier to a new local maximum or you throw AI at it and evaluate whether you land in a more optimal state.
I think after some time of turbulent exploit/patch cycles we will reach a stable state again, where the code converges against a new local minimum that even with AI requires significant effort (time and tokens) to overcome. Or ideally a global maximum.
With time, the LLMs improve, so the diffs/gradients get better and we will be able to reach optimal points for any software faster.
My problem with the idea is that apparently it is assumed that OSS contributors and especially maintainers will generously donate their time to get this machinery into a state that makes the optimization loop work well - just for the AI labs to turn around and sell access to the optimized models for increasingly larger amounts of money.
AI generated code can be great. Hand rolled code can be bad. The rules are the same in both cases. Make sure your code changes are focused (no random changes just because you happen to be in the file/dir or notice something) and make sure you don't break anything else along the way.
The problem is people trying to get individual credit for merely running a script that spams a mailing list. Many of those people are likely not even C programmers or programmers at all.
Without the immense personal reward and recognition and job offers as a motivation, the problem will disappear.
The problem will also disappear with time as the people lauding and celebrating and hiring security researchers of the past will quickly abandon LLM generated spam as a positive signal; running a prompt that sends spam is, if anything, a strong negative indicator of infosec ability and skill.
LLMs are a tool. Like all tools, most people can't or won't use them responsibly or profitably although they are useful in the correct hands.
The kernel isn't the only OSS project with this issue either. Requiring submissions & issues to be anonymous could help a ton of other open source projects currently drowning in AI slop issues.
This is just a workflow issue. In the past it was very rare for multiple people to find and report a security vulnerability at once, so it made sense to keep the discussion private until they were ready to release a fix. With AI that is happening all the time, so it makes more sense for the discussion to be in public to avoid duplication. So they changed the policy accordingly. That is it.
[1]https://lwn.net/Articles/1066581/
Thats kinda a misrepresentation. They are talking about two different things. Linus is trying to point out incorrect use of a tool while GKH is praising a correct use. This sentence felt weird at the end of the article, kind like rage bait. And I took it :P.
From what I’ve seen many of the AI bug search operators are newer to security research. They’re burning their tokens trying to find kernel bugs as their claim to fame before other people with AI tools find them first. They don’t spend time de-duplicating their own bugs.
Some of them may not be coming from real people. There are honeypot repos that are entirely fake and only have folders of simple files with clear security problems. They collect automated reports they get from all of the AI bots that people are running.
Interestingly enough doing that type of triage is something LLMs are actually great at
Or, put another way -- what flags the duplicate? The filer or the system? If my cheese factory is measured by the volume of cheese instead of the quality, I'll churn out the cheese even if it's sloppy duplicated cheese. And that is the case if a person has to flag a new ticket as "same as this" or not.
What's that law that says that any sufficiently large problem turns into a moderation problem?
Unless the kernel community starts banning & publicly shaming repeat offenders, there's zero incentive for them to put any effort in filtering out duplicates. They are mostly doing it for marketing after all, not out of a genuine interest in making the kernel better.
It can not because this mailing list is not public.
One could ban orgs that flood the zone with AI generated trash, but is there some potential middle ground where there are sets of filters to identify duplicated bugs, and possibly just internally dump "AI spam" to a lower queue?
This seems like the sort of problem I'd addressed in the 90s with killfiles and spamassassin. In other words, can't the ingestion just go through some filters to shield the humans at the end of the pipe?
It will almost likely find issues that require fundamental design changes to even fix some of these
Dangerous waters ahead for data security and vital infrastructure
It's a catch 22. Why not make a separate list for AI generated reports that can be subscribed to instead? If the claim is that these are not private anyhow, no reason not to, and then a reasonable expectation could be held against submitters to check against existing reports.
That is unless it is still absolutely sensitive, in which case the only way forward that I see is to start using AI for triaging and duplicate detection as well.
On the other hand ... IF the bug report is real, and let's assume that AI slop reports at the least a few bugs that are indeed real, then I really think it should not make a difference WHO or WHAT reports these bugs. I would not disagree on fake bugs or bogus bug reports wasting time of humans, but this is a quality difference then. Surely people can tweak AI models to be better at finding bugs too. Besides, they should auto-fix that. Is AI still too stupid to fully replace humans? Other than killing them with spam, as it does right now.
Over the last year there have been way too many stories and Twitter posts like these.
Yes, maintainers are overloaded, but that's only because we haven't yet built the tools to support them.
Other than such statements, I would, as a builder like to hear the sorts of tools and requirements maintainers are looking for which would make their work easier!
We need to move fast without breaking things.
Feel free to fork the kernel and maintain your own vibe-coded disaster.
It seems to be more about:
1. auto grouping duplicate security reports
2. auto validating if they are likely viable or likely nonsense
3. auto checking if they have recently been patched
4. auto assessing if they likely "invalide" for other reasons (e.g. they are for a very old long time no longer maintained Linux version, out of tree drivers, etc.)
I mean practically all of that isn't trivial to get working in a way appropriate for the Linux security mailing list and comes with many not so obvious complications. But also non of that is vibe coding and in most cases this is is more about AI doing a per-assemsment of send security issues to speed up the review of them, then it is about the AI doing the final decision.
At the end of the day, we would rather have a more stable and bug-free kernel than not.
It's not that much work for me anymore to report and even fix that obscure monitor driver bug that sometimes causes my machine to bootloop, unless I boot without graphics and start the XOrg server manually.
I often find myself surprised at how easily frontier models are able to find bugs across abstraction layers, that only original authors can comprehend. We need more positivity around these contributions as well.
How about we just reduce the noise?
AI trash like this is like showing up to a baseball game with a pitching machine and demanding that they let you join in and be the pitcher using your machine. Just because your slop cannon is fun and exciting to you doesn't make anyone else obligated to join your club just because you fired your slop on them.