We stopped AI bot spam in our GitHub repo using Git's –author flag

134 points by ildari 1 hour ago

arecsu 46 minutes ago

Makes me wonder if an ELO-based system would work to mitigate these issues. People who merged PR successfully onto a project, that had real issues acknowledged, the quality of their responses measured by other users reactions or something, etc, multiplied possibly by the degree of importance of the project where their activity has been made. Won't be about human vs AI, but actual helpful effective being vs low effort/spammy contributions. Issues and PRs could be sorted and filtered by their ELO score. I'm saying ELO as analogy to "score based given the context", not really a 1:1 translation of the ELO system.

Negative score would be reports from other users because of spammy content or not acknowledged issues, with a middle ground of neutral score (+-0) or little positive score to issues or whatever with clear good intention, but couldn't reach a proper merged PR or were not issues (e.g. issue existed but wasn't the correct repo to be addressed, PR was good but needed other stuff to be implemented prior to it, maybe in the long run, etc)

philipwhiuk 36 minutes ago

The problem is you want the ELO score based on work on other community projects - you can't assume good faith here.
- btilly 28 minutes ago
  
  The problem with that is that there are certain kinds of users that like to take control of community projects. And then they take control of more, and bigger ones.
  There are a lot of political tricks that get used.
  What is scary is that one of those kinds of users are malicious state actors. Like North Korea and Russia...
doh 32 minutes ago

I have built something like this and in process of collecting the data.
Frontier users: 527,865 Light indexed: 527,865 Ready to queue: 9,083 Fast scores ready: 0 Activity events 24h: 30,266 Fast scores completed 24h: 19,123 Deep jobs completed 24h: 3,043 Fast-score ETA: n/a Deep-hydrate ETA: 69h Stale running jobs: 0 GitHub backpressure jobs: 19,113 High automation signals: 4,608 Medium automation signals: 1,327 Completed jobs: 74,714
Biggest challenge is Github's rate limits. At this pace it will take two more months to have 98% coverage. But after that the maintenance should be quite straight forward.
btilly 30 minutes ago

ELO is shockingly easy to manipulate. For example there was a literal jail with a decent chess player in it. He created a pool of players who got great ELOs by beating him, then used them to boost his rating higher. Wash, rinse, and repeat.
Given any manipulatable scheme, AI will figure out how to manipulate it. For the OP, what happens if a single AI manages to get through to contributor? Then it starts elevating other AIs to contributor, and we're off again. There doesn't have to be a purpose to this. Trolls will troll, and trolls armed with AI bots can devote endless energy to doing so. The more you work to keep them out, the more fun it becomes for them.
I wish I had an answer for that problem. But I don't.
- morkalork 8 minutes ago
  
  Reputation scores, review cartels. This all sounds familiar!
- chii 7 minutes ago
  
  fix this problem by make the rating value tied to some paid currency - a repo owner would have to pay for the PR, and that PR contributor will now have more currency than previously. In order to have said currency to pay, the repo owner would need to have contributed to another repo whose owner have currency.
  The totality of someone's currency is their reputation.
  Of course, now the decision becomes...who is the central currency issuer that creates it?
ElijahLynn 29 minutes ago

For those wondering what Elo means, it is a person's last name, not an acronym (not all caps). More info here:
https://en.wikipedia.org/wiki/Elo_rating_system
sebastiansm7 22 minutes ago

It's Elo not ELO. Elo is not an acronym.
https://en.wikipedia.org/wiki/Elo_rating_system

captn3m0 41 minutes ago

This has a security implication which is overlooked. Contributors to a repository have higher rights, such as avoiding approval requirements for fork PR runs. GitHub warns in the docs:

> When requiring approvals only for first-time contributors (the first two settings), a user that has had any commit or pull request merged into the repository will not require approval. A malicious user could meet this requirement by getting a simple typo or other innocuous change accepted by a maintainer, either as part of a pull request they have authored or as part of another user's pull request.

ildari 34 minutes ago

fair point! We believe "Require approval for all external contributors" should be a default setting, as you cannot trust anyone who is not a member of the organization
- finseam 16 minutes ago
  
  Interesting approach. We’ve seen similar spam/noise problems appear in financial workflow automation too — especially when AI-generated submissions scale faster than manual review processes.
- cermicelli 13 minutes ago
  
  you can't trust org members either I have seen projects have inter maintainer fallouts. In general trust doesn't exist.
  If companies can screw you over and claim it's a mistake, there isn't much a person can do.
  It's all about level's of trust, a maintainer going rogue is less likely, a past contributor going rogue more likely but not too much, a stranger with a typo pr merged even more likely but still, a complete stranger least trust worthy.
orlp 27 minutes ago

No it doesn't have security implications.
If you are insecure because someone has had one of their otherwise completely innocent PRs merged into your repo... you are insecure, period.
- lgrapenthin 18 minutes ago
  
  What you are describing is exactly a security implication.
- stavros 12 minutes ago
  
  Security isn't a binary "secure/insecure". You can be more or less secure than something.

silverwind 55 minutes ago

PR spam is a major problems for repo that run bounties. Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.

marginalx 49 minutes ago

Problem is the bots can create any number of github accounts and continue spamming. Though this would be a good simple defense to start with.
hiccuphippo 47 minutes ago

GitHub has not incentive for blocking AI. It's like asking an ad company to build an adblocker into their browser.
cdrnsf 41 minutes ago

GitHub and Microsoft are actively contributing to the problem, why would they admit fault?
microtonal 30 minutes ago

I feel like GitHub should have a system where you can give out tokens that are valid for e.g. 1 PR. If someone shows to engage in meaningful discussion and has a good idea to address an issue/feature, you initially give them one PR token. If the PR is of good quality, you can give them a few more, until they are contributors that can just create PRs as they like.
A similar system would be nice for issues, though I'm not sure what it'd look like if issues are the springboard for contributing PRs.
Not likely to ever happen (as others said), GitHub/MS want to sell CoPilot subscriptions/tokens and LLM-generated PRs are a part of that business model.
- ZeWaka 13 minutes ago
  
  My community does something vaguely similar, where you get credit for having bugfix PRs merged, and it's deducted when you get feature PRs merged.

aizk 8 minutes ago

I'm not sure why gh hasn't already implemented stricter measures / filters / tools for PRs. It would cut down on spam and also help save their servers that can't handle the increased AI load!

hiccuphippo 49 minutes ago

The irony of the .ai domain.

wafflemaker 30 minutes ago

Thanks for pointing it out. It has eluded me and it's incredibly funny
dbgrman 8 minutes ago

also, could the website plz fix its scrolling code? its annoying. i can't read the article
- motakuk 2 minutes ago
  
  Would love to! Could you please share more? I can't quite see the issue

Muromec 22 minutes ago

How is the status revoked without rewriting git history?

_joel 34 minutes ago

Woudln't it be trivial to farm the stats needed to pass the bot checker's theshold?

zer0tonin 55 minutes ago

> Should we stop giving fun test tasks to our job candidates?

Yes

Chaosvex 34 minutes ago

Yeah, fun for who exactly?
FartyMcFarter 22 minutes ago

It seems this particular company makes a payment for completing those tasks, so it might not be that bad.

optionalsquid 27 minutes ago

I don't have a better solution, unfortunately, but it doesn't seem seem to like the spam problem has been solved. It has just been moved from pull requests to commits:

Currently, more than 10% of all commits in the archestra repo are essentially noise (369 of 3521 commits), accounting for more than half of all commits in the last month (303 of 578 commits).

But maybe (probably) the amount of such commits will go down over time, compared to the growing amounts of AI slop

zzzeek 28 minutes ago

so...they are manually re-setting the "interaction limits" over and over again, since they are only temporary?

why not use hooks to automatically reject issue comments / PRs etc. from users that didnt go through onboarding, rather than repurposing GH features that aren't really designed for that use (and are hence in danger of being changed someday)?

petterroea 43 minutes ago

What I see is a (clever) hack, and GitHub continuing to provide good tools to its users.

skydhash 24 minutes ago

What I see is a solution for a problem that is self inflicted, meaning lumping contributors and generic internet users in the same workflow. In big projects, you have the core team, a handful of well known contributors, and everyone else.
I strongly prefer the git email model, where it’s often trivial to control the flow of changes proposal. GitHub does not have the same wealth of tools and versatility.

ildari 1 hour ago

Hi HN community, I wanted to share our approach to reduce amount of AI slop PR's and issues in our repo. We enabled "require prior contribution" flag on GH and created a CI script that creates a tiny commit co-authored with you, if you pass captcha on our website. Worked really well and we were able to block at least 500 bots in the first week. Sharing a screenshot from cloudflare: https://archestra.ai/hn-comment-cloudflare-challenge-outcome...

satvikpendem 1 hour ago

Yep, this is similar to some other version control tools like Tangled which has vouching.
https://blog.tangled.org/vouching/
tln 59 minutes ago

Thats a really elegant solution.
How does the website trigger the CI script? Through GH rest API?
- ildari 52 minutes ago
  
  thank you, yep through the rest API, here is the example: https://github.com/archestra-ai/website/blob/29ebdacbd8a22b9...

ramon156 47 minutes ago

See, this is an article that uses dashes correctly. It adds value, creates a bit of buildup

chrismorgan 38 minutes ago

This is funny to me because the title on this submission currently refers to “Git's –author flag”, which is an extremely incorrect use of a dash. (The original article doesn’t make the mistake. Not sure if the error is from the submitter or from an HN title mangulation.)

IshKebab 27 minutes ago

That's a neat way to interface with GitHub's authentication system, but I don't see how they've solved the fundamental problem because their whitelisting process is just "click ok fine 10 times". Why won't the slop peddlers just do that too?

delduca 47 minutes ago

For now…

philipwhiuk 36 minutes ago

Until the AI learns the workflow on the next model update, indeed.