Ask HN: Did HN just start using Google recaptcha for logins?

104 points by neltnerb 3 years ago

In ten years I've never been asked to solve a captcha to login in. Is this new? What happened?

I know that my input into conversations is not some critical feature of HN or anything, but this is enough of a barrier to keep me from bothering to login on most occasions.

Seems odd to enable google to track users logging into HN, but maybe it's always been this way and for some reason recaptcha is just flagging accounts from my network today.

dang 3 years ago

No recent changes, but we do sometimes turn captchas on for logins when HN is under some kind of (possible) attack or other. That's been happening for a few hours. Hopefully it goes away soon.

Btw I also fume when I have to work as an unpaid manual image recognizer, so I'm open to alternatives.

  • traceroute66 3 years ago

    > we do sometimes turn captchas on for logins when HN is under some kind of (possible) attack

    I don't think people are disputing the necessity, just the mechanism used.

    The other services (hCaptcha) are effectively drop-in replacement with minimal code changes.

    • nokya 3 years ago

      +1 nothing against some anti bot feature, just hopefully not Google.

  • vdfs 3 years ago

    You can actually solve those captcha using Speech to text, there many tools that do that, ex:

    https://github.com/xHossein/PyPasser

    • pxx 3 years ago

      2captcha is a human-based solver service.

      • pxx 3 years ago

        (for posterity, since I can't edit this post anymore: the parent of this comment originally linked to 2captcha)

  • 2OEH8eoCRo0 3 years ago

    I'd love to know more. Historically, what kind of attacks do you see? What is their goal or what do they get out of attacking HN?

  • sillysaurusx 3 years ago

    I vote "What's the output of the following Arc snippet?"

    Be sure to include a few macros, otherwise the JS crowd will still be able to reverse engineer their way in.

  • btilly 3 years ago

    I have an idea for a button that will slow down bots while being less inconvenient for humans.

    I'll send details in an email.

  • LinuxBender 3 years ago

    Aside from 3rd party code perhaps one middle-of-the-road idea would be a table of a few hundred factoids and then code that makes multiple choice checkbox factoids like

    - Select everything that is a color im sure there are more clever open-ended questions and maybe sometimes switch up "is" with "is not".

    - Red

    - Blue

    - Monkey

    - Violet

    - Armchair

    People say that bots can learn such things but if every site had their own in-house tool then bots would have to keep track of thousands of site specific puzzles. Each site could even rotate through a dozen sets of different puzzle types and pause the ones that get learned. This would avoid sending cookies to a third party or depending on 3rd party code thus mitigating some corporate capture.

    Bonus complexity: Don't use Alpha-Numeric characters. Use something like "figlet" [1] and cycle through a few of its ASCII art fonts.

    [1] - https://github.com/xero/figlet-fonts

    • amf12 3 years ago

      > but if every site had their own in-house tool

      Costs money to maintain and build correctly, which naturally leads to buying existing solutions.

      • LinuxBender 3 years ago

        I've heard that too, but I think it would take a decent developer 1 hour to make a first pass at such a thing. It doesn't have to be complicated nor perfect. I think it should not require images nor javascript. It should probably be a server-side LUA script that caches the puzzle and answer.

        I found a few starter ideas [1][2] and concepts [3] but I would prefer to use something like figlet vs gd generated images. Figlet or something like it should be much lighter weight. I just have to find one that is readable on cell phones.

        [1] - https://github.com/lua-programming/lua-captcha

        [2] - https://github.com/mrDoctorWho/lua-captcha

        [3] - https://nedbatchelder.com/text/stopbots.html

        • waspight 3 years ago

          Said every customer

          • Filligree 3 years ago

            I've seen a couple sites that do this. One of them just asks for the square root of -1!

            And sure, qntm.org isn't nearly as big a website as HN, but I concur that this isn't likely to be super-difficult. The wide use of recaptcha seems like mostly laziness; most websites aren't big enough to get targeted attacks.

    • ghghgfdfgh 3 years ago

      I think your idea is exactly what I would want in a captcha, but an issue with your example is that it would only pertain to English speakers, and it would be difficult to translate it into a variety of different languages to accommodate everyone.

      • LinuxBender 3 years ago

        I assume the existing captcha services look at the Accept-Language header. That header could be read by the Lua script. Each puzzle set could be translated one time through google translate or perhaps a better translation site. It should probably be proof read by someone from each language used to ensure nothing translates out of context into something offensive. I think I've seen people do this on Github, having people proof read translations. In this case it probably should be a smaller group to reduce risk of leaking the puzzle mapping to the bot code authors.

    • chrismcb 3 years ago

      I think chatgpt is way ahead of you.

  • badrabbit 3 years ago

    It looks like the attack is login based since that's where your captcha is. Allow a single captcha-free attempt to login successfully from a /24. If the login fails then put the /24 on captcha for X hours. That way most login attempts that are legit won't see the captcha. Also, HN crowd I think prefers hcaptcha.

    Lastly, what I would do is have users pick a login image, in addition to the password login, they have to pick a correct image in addition to password.So it would still be the process I suggested except a failed login is allowed one time so long as the correct login image is selected. Also, the login images will be slow to load during times of attack on purpose to identify clients that are guessing before the image is served and to slow down their attack. I would also maintain a list of IP+UA that have repeatedly logged in succesfully to exempt or prioritize them depending on the attack.

  • _-----_ 3 years ago

    hCaptcha would be an upgrade from Google's data farming.

    • mdaniel 3 years ago

      please god no; I appreciate the anti Google crowd's concern but hCaptcha can die in a fire

      • eatbots 3 years ago

        hCaptcha has completely passive score-only modes. When to challenge and how hard is up to the site.

        • mdaniel 3 years ago

          Well, if the "rager hard" mode is the default, that doesn't help the general population, now does it? I still hate hCaptcha with a burning passion and hope no site ever adopts it

          • cassianoleal 3 years ago

            I dislike any kind of CAPTCHA. That said, reCAPTCHA is the worst. It may be my combination of anonymised Firefox, CG-NAT and other tracking and privacy protections, but reCAPTCHA is overly hostile to me.

  • neilv 3 years ago

    One concern with Google Recaptcha on HN is that it seems a good number of HN users want to be pseudonymous, possibly including towards Google. Always-perfect browser OPSEC is hard in practice.

    (Condolences on the attack/headache.)

  • joshspankit 3 years ago

    How about genuinely-long delays between login attempts? 5 seconds slows down a bot, 15-30 seconds could make many login attacks unrealistic.

    Also: OTP 2FA?

    • tinus_hn 3 years ago

      It’s not easy to tell two login attempts are from one bot. This kind of workaround unfortunately doesn’t work in practice. Otherwise of course this whole problem wouldn’t exist.

      • joshspankit 3 years ago

        Why would you have to tell if they’re from one bot?

        • tinus_hn 3 years ago

          Because you want to have delays between bot login attempts?

          • joshspankit 3 years ago

            I think that’s overcomplicating it: Just do it site-wide for all login attempts (always on, or like the captcha: as-needed)

            • tinus_hn 3 years ago

              So now in the case of a bot attack no one can login. That doesn’t work.

              • joshspankit 3 years ago

                What do you mean?

                • tinus_hn 3 years ago

                  If you block all logins for 5 seconds after a bot attempts to login, and the bot attempts to login 50 times per second, no one will be able to login.

                  • joshspankit 3 years ago

                    I understand the confusion now.

                    No I mean when a specific user has a failed login attempt that user has to wait 5-30 seconds before being able to try again. A legitimate user would only be affected if a bot is trying to log in as them.

                    • tinus_hn 3 years ago

                      That’s account lockouts, it doesn’t work against bots because they can just try a million accounts instead of a million passwords on one account and it makes it super easy to do a denial of service on an account, and it doesn’t prevent a denial of service against the server that has to service all these login attempts that might very well involve running hashes designed to be computationally intensive, like PBKDF2.

                      This is not a novel measure, rest assured that the people that choose to implement captcha instead are aware of its existence and chose for the captcha instead.

  • LoveMortuus 3 years ago

    How about if Dang assesses our humanity? That way we don't have to do image recognition stuff and neither does Dang! A win-win if I say so!

  • lrvick 3 years ago

    If possible, implement WebAuthn even if only for human verification.

    Bots will not have access to TouchID, Windows Hello, or a Yubikey but most humans have one of those in the device in front of them right now.

    Fallback to captcha for edge cases, but then at least /most/ people can skip it.

    Example: https://cloudflarechallenge.com/

    • tjohns 3 years ago

      Those can all easily be emulated in software, if you're determined enough.

      There's nothing about the WebAuthn protocol that forces hardware backed key storage, other than everyone collectively agreeing it's a good idea. A bot author would just ignore that.

      Firefox already includes this functionality, gated by flag (security.webauth.webauthn_enable_softtoken).

      • lrvick 3 years ago

        > Those can all easily be emulated in software, if you're determined enough.

        Not possible if vendor signature checking is enforced. All major webauthn device manufacturers sign the keys of all the devices they produce. You can prove a given device is unique and issued by Apple, Yubico, Google, Microsoft, etc.

  • cassianoleal 3 years ago

    Hi dang, I'm not sure you're still going to read this message since it's been many hours.

    First, I'm sorry to hear HN was under attack. That's never fun.

    Second, I understand your reasons for temporarily turning on the CAPTCHA, even though as a user I really dislike it - especially reCAPTCHA.

    Given the latter, I hope you will consider alternatives. Regardless though, it would be nice to add a message to the login page explaining that the CAPTCHA is temporary because the website is under attack. That would allow me to keep 3rd-party stuff blocked by uBO on the login page and still know what's going on. I would probably just keep the pages I'm interested in on a tab and come back to them later, when the CAPTCHA is gone.

    In any case, as always, thanks for your work keeping this forum alive and healthy.

  • andrewshadura 3 years ago

    Unfortunately, this breaks apps, Materialistic in my case.

    • dang 3 years ago

      Yes, the mobile apps are all third-party and this is one of the downsides.

      I'll whitelist your account for now (i.e. until the server restarts). If anyone else wants that, email hn@ycombinator.com and I'll do it as soon as I'm back online.

      (It looked like the attack had died down but then it un-died back up again)

RjQoLCOSwiIKfpm 3 years ago

What boggles me about this is:

I do NOT consent to working for free for Google to train their AI.

I'd be willing to solve any CAPTCHA the product of which would be open source, or even useless.

But Google is a for-profit company which uses the solutions to create proprietary software and profit off of it, they won't pay me, and I have no way to opt-out of working for them because the most useful places of the Internet use their CAPTCHAs.

(Yes, I can intentionally put wrong solutions into their CAPTCHAs to poison their data, but I'm afraid they get so many valid solutions that they can just calculate the wrong ones out.)

  • bun_at_work 3 years ago

    If your goal is to avoid Google using your data, putting in bad data that is filtered out accomplishes that, right?

    I don't personally have an opinion on HN using the captcha, but their reasoning is pretty obvious, and almost certainly comes from a good place (reducing any spam on the site). That said, you're welcome to your opinions, it just seems like you have an option, based on your stated goal.

    • RjQoLCOSwiIKfpm 3 years ago

      Even if it does accomplish that they will still have coaxed me into doing work for them even though I'm not consenting to working for Google.

      Consider it like this:

      If someone forced you to do physical work against your will, you wouldn't like it any more just because they throw away the product of your labor in the end.

      It would just make it more obscene.

  • loufe 3 years ago

    That's an interesting take. Everything costs money. You know the reason why the CAPTCHA service is free is because they have value in the results of the CAPTCHA, right? You're not viewing ads or paying for this service. I'd prefer not to help Google either, but nothing is truly free.

    • sebazzz 3 years ago

      Do you know how ReCAPTCHA started? Digitalizing old analog books. Probably just as commercial, but it feels better than training a ML algorithm for an international conglomerate.

  • oslac 3 years ago

    This just shows how little consent-based ethics matter (they break down immediately when the other party simply defects).

  • browningstreet 3 years ago

    You’re not. You’re getting access validation for the cost of a test.

  • jsnell 3 years ago

    There is basically no chance the captchas are actually being used for generating training data at this point. The puzzles have not changed for ages. Like, five years? How many billions (trillions?) of labels do you think they have for buses and traffic lights at this point?

    If there was an economic value to using captcha solutions for labeling, somebody would be rotating novel tasks into the mix. But they don't seem to be.

    (And if the goal of running the service was to generate labels, they would not have built solutions to make it possible to pass the captchas without a puzzle, like recaptcha v3.)

    So rest assured, your work in solving the captchas is totally b useless, just like you wanted!

    • RjQoLCOSwiIKfpm 3 years ago

      I would rather guess that the emperor has no clothes, i.e. AI is still so bad that it needs insane amounts of training data and hasn't got enough yet.

      • ShamelessC 3 years ago

        That's a fun guess? Do you flip a coin every time you make a decision or...

        I can assure you that if CLIP and ALIGN exist, there is objectively no reason for them to collect what would amount to a dataset for...solving Google CAPTCHA's? Which I'm pretty sure is a solved problem even without the data.

    • gghffguhvc 3 years ago

      It is just continuous QA for Waymo. Measures how well existing ML is working in the real world.

      • jsnell 3 years ago

        That's a great example of something that would require them mixing in novel tasks, rather than recycle the stale traffic light detection puzzles. Because let's be honest, detecting traffic lights is not anywhere closest to the hard part about self-driving cars. Knowing how well you can do it tells you nothing about how well you can solve the actual difficult things.

        What would a task that uses humans to solve that problem actually look like? I'm guessing it would need to be short videos, not images. And look for things with some ambiguity. "Select any videos where a pedestrian looks like they intend to cross the street".

        • gghffguhvc 3 years ago

          I’m guessing Waymo has very little influence over Google.

          Waymo: “I see you have that hammer, we have a usecase for it”

          Google: “Ok, but we aren’t changing the hammer or how hard it is to use it”

    • neltnerb 3 years ago

      Lately I've seen captchas that ask me to identify things in images that are clearly generated with AI. Like frogs without backs.

      I think at this point it is clear we are not training image recognition so much as providing them with free scoring for their image generation algorithms.

      • jsnell 3 years ago

        Just to be clear, are you saying that you saw the "frogs without backs" puzzle on Recaptcha? Because I definitely have not seen anything but the streetview images there for ages.

        Now, if it is a captcha provider whose advertised business model is to sell access to the users for labeling and split the profits with the website that integrates their captcha (e.g. hCaptcha), then I can believe somebody would submit a image generation eval dataset. But it seems irrelevant to discussion of whether solving a Recaptcha is free work.

        • neltnerb 3 years ago

          I mostly see streetview stuff, but twice I've seen one that had stuff like "which one is a ladybug" or even "which one looks like a blah without a blah".

          This prompt of course could still be for classification of images.

          But then the "ladybugs" often were heavily distorted to the point where they did things like morph into other animals or the background. They did not seem possible to be photos, but I could be wrong. The prompts were very odd.

  • dragonwriter 3 years ago

    > I do NOT consent to working for free for Google to train their AI.

    Its not for free.

    In this case, you get access to HN when it is under attack.

    If you don’t consent to those terms, that's your choice, you can wait and come back later.

    • Veen 3 years ago

      Or we can complain, suggest alternatives, and hope that it motivates a change. Hacker News is, after all, a place for conversation—people are entitled to express an opinion.

      • dragonwriter 3 years ago

        Sure, my point is not “don't complain”, but “the current HN usage of CAPTCHA neither compels you to use it without consent, nor proposes that you train Google’s AI for free”.

        That is, it is about the specific content of the complaint, not a meta-level commentary on the appropriateness of complaining about practices you disapprove of.

  • version_five 3 years ago

    I'm pretty sure Google's AI has already reached the information theoretic limits for recognizing fire hydrants etc. so you're not really training it anymore

    What bothers me about recaptcha (other than the obvious first order task) is that I believe it's used to penalize people who don't let google track them, and by extension to make other browsers look worse. It's an abuse of their market power.

    • neltnerb 3 years ago

      I am not sure that's the "intent" but it sure is a suspiciously advantageous (for them) side effect.

      Like how I gave up using protonmail because my emails kept getting classified as spam by anyone using gmail or gmail-backed organizational email.

      • nokya 3 years ago

        I'm on the opposite spectrum: I think the intent to collect data is at least above 50%, meaning that gathering information on individuals visiting third-party platforms has taken precedence over training their model.

        Also, I think we shouldn't underestimate the monetization value of being able to target "HN users" for advertising. From the moment we are flagged, Google can exploit this data pointer for targeted advertisment on any other website/app.

        This information should be given at the highest cost possible:)

  • paxys 3 years ago

    If you don't consent to it then don't fill it out. Plus CAPTCHAs haven't been used for ML training for many years now.

    • gghffguhvc 3 years ago

      They are being used by Waymo for continuous QA. Basically just checking their ML is still working well.

      • ShamelessC 3 years ago

        So they're using them to verify personhood, not to label a dataset...

  • 41amxn41 3 years ago

    Then create an open source, non-profit CAPTCHA and make money via donations. Win win.

    • cassianoleal 3 years ago

      And use it to feed open source AI models. Win win win.

user764743 3 years ago

Are there no viable alternatives more respectful of privacy than Google's recaptcha? Seems like an anti-user choice to me.

  • traceroute66 3 years ago

    > Are there no viable alternatives more respectful of privacy than Google's recaptcha?

    Two milliseconds on Google will lead you to hCaptcha[1]

    [1]https://www.hcaptcha.com

    • mdaniel 3 years ago

      > more respectful of privacy than Google > > Two milliseconds on Google

      can't tell if is satire or not

  • britneybitch 3 years ago

    There are alternatives:

    https://www.hcaptcha.com/

    https://blog.cloudflare.com/turnstile-private-captcha-altern...

    Unfortunately if you're a user (as opposed to a website author) you don't get a choice.

    • Semaphor 3 years ago

      hcaptcha started out nice, but it quickly got way worse. Maybe more privacy, but it wants me to solve so many dumb puzzles, while Google is usually happy with only one.

      If captchas weren’t a rare-ish occurrence for me, I’d buy a nopeCha subscription to solve that shit for me.

      • eatbots 3 years ago

        This is entirely configurable by the site owner. hCaptcha has entirely passive score-based detection, 99.9% passive mode, and more aggressive options as needed.

        (disclosure: work there)

        • Semaphor 3 years ago

          Passive scores rarely work for me, because I block trackers and as much 3rd party things as I can get away with. But I guess I’ll believe you and be slightly less annoyed with hCaptcha.

  • gabriel34 3 years ago

    Hcaptcha does pretty well in privacy for general purpose sites

  • arbol 3 years ago

    We're working on a distributed, privacy-focused captcha system at prosopo. captcha are served by a network of providers so there's no central data store. We're going live this quarter and you can sign up for updates on https://prosopo.io

robgibbons 3 years ago

Upon upvoting this submission, I was prompted to login which included a captcha. Possibly due to use of a VPN. Are you using a VPN?

  • yborg 3 years ago

    I just got it and I'm not on a VPN. Would be nice of dang or someone just announced that this is in place now.

  • neltnerb 3 years ago

    I do often use a VPN, but not this time, I'm just connecting from MIT's network.

    As Dang says it sounds like it's just a short term attack mitigation, glad to hear it's not intended as a permanent feature.

  • greggarious 3 years ago

    I currently avoid VPNs, since it is my understanding using them can collapse all your Tor Browser circuits, ruining the utility of the tool.

    https://web.archive.org/web/20211120193211/https://matt.trau...

    If you need to geo-shift or pirate or something less risky than being a literal dissident, I recommend being VERY careful not to do it where you sleep in case you slip up.

    (And remember: your DNS queries go to the exit note or the VPN, not the ISP.)

greggarious 3 years ago

Hi Folks!

I was not prompted to do a CAPTCHA logging in on the clearnet, but this may be my last batch of posts I do if that ever changes... for I will never consent to asking GOOG if I can post here.

Let us set aside the myriad of issues with visual CAPTCHAs and how they exclude folks with disabilities such as blindness.

There are other solutions like Hcaptcha[1] that do not use GOOG, a company which has strayed so far from it's "Don't be evil" mission that they went from supporting Mozilla via the search deal to moving the Chrome team into the same building, poaching key employees, and aggressively pushing folks so young they can't ask for help without violating COPPA to switch to a browser that would allow them to monitor them from cradle to the grave.

I greatly sympathize with the goal of an authentic dialog... trust me.

But using GOOG to accomplish it is not going to do that.

(The true threats to HN, like any democratic space, come not from automatons, but human beings. Only when you stop giving undue attention to the wrong... metrics... will you find the ecstatic truths you claim to seek.)

- "Greg"

Filligree 3 years ago

I use Safari almost universally, and therefore can't pass a recaptcha challenge. For the moment I remain logged in, but I suppose once the cookie dies, I won't be able to get back here.

  • intelVISA 3 years ago

    Finally, we are released brother.

  • happyopossum 3 years ago

    > I use Safari almost universally, and therefore can't pass a recaptcha challenge

    That doesn't grok - Millions of people use safari daily and pass recaptcha challenges. You probably have a content blocker set too aggressively or something like that preventing you from it - it's not safari.

    • Filligree 3 years ago

      I don't use a content blocker. The only extension in my browser is Bitwarden.

IYasha 3 years ago

If true, that's really, really sad.

chaps 3 years ago

Gross, yep. And since I'm using a VPN I'm thrown into the "give them the most painful captchas just to be sure" bucket.

freitasm 3 years ago

Do what you have to do. Keeping the platform safe is your job. Others might put different values above yours.

If you want drop in replacement, Cloudflare a turnstile as mentioned. Otherwise fully behind Cloudflare. CDN won't help much due to nature of content but WAF rules can be used to easily turn on invisible captcha based on rules.

gjsman-1000 3 years ago

Considering the articles and general sentiment we have towards Google and AI training, if this is true HN failed to read the room.

Edit: Yep, it's real. Seriously?

Edit 2: Understandable, see https://news.ycombinator.com/item?id=34313452.

bobobob420 3 years ago

Can confirm incognito login had to match images of cars. Dang please do not make me match images of cars, busses, and traffic lights. Please!

binarymax 3 years ago

Doesn’t HN use cloudflare? I’d guess that’s the source of the captcha and not something that was implemented by HN on purpose.

FeistySkink 3 years ago

Just tried re-signing in and got reCAPTCHA without any VPNs on Firefox/Linux.

justinator 3 years ago

I logged out, and upon login recaptcha made me solve FIVE puzzles. JFC.

telis 3 years ago

At least use Funcaptcha by Arkose HN gods.