What is a CAPTCHA, and how does it work?

Written by: Lisa Whelan

CAPTCHAs are tests that determine whether traffic to a website originates from a human or a bot. They work by providing challenges that are difficult for computers to solve but easy for humans. Websites that use CAPTCHAs only allow you to access their website once you have solved the challenge, which helps them to block bot traffic.

What is CAPTCHA?

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”. These tests act as an authentication mechanism to ensure that traffic to a website originates from a real human, and not a robot. Sometimes they are called “I am not a robot” tests.

CAPTCHA tests can appear at various points during your interaction with a website, such as during login attempts, form submissions, or when your browsing activity mimics bot traffic – for example, by refreshing your browser many times or quickly opening a lot of links.

Websites can use many different kinds of CAPTCHA tests. They can be:

  • Text-based: Requiring you to decipher and input text from a distorted image
  • Image-based: Asking you to select images that match a given prompt (e.g., "Select all images with traffic lights")
  • Audio-based: Providing an audio challenge for you to transcribe
  • Puzzle or math-based: Asking you to solve a simple puzzle or math problem

What is reCAPTCHA?

ReCAPTCHA is a CAPTCHA service from Google that many websites use. While reCAPTCHA v1 required you to decipher and input text from a distorted image, reCAPTCHA v2 presents you with a much simpler “I am not a robot” checkbox. ReCAPTCHA v2 analyzes your behavior, such as how you move the mouse cursor, to establish whether you are a human or a robot.

If Google’s initial analysis is inconclusive, you may be presented with image-based challenges where you need to select images that match a given prompt (for example, "Select all images with traffic lights").

The latest version of reCAPTCHA (v3) uses a score-based system that does not interrupt your browsing. Instead, it runs in the background and assigns a score to your behavior based on the perceived likelihood of your activity originating from a bot or human. ReCAPTCHA v3 assigns a score from zero to one, with zero meaning the activity is perceived as almost certainly a bot, and human likelihood increasing as the score gets closer to one.

What are CAPTCHAs used for?

Websites use CAPTCHAs to detect and block automated traffic. High-traffic websites like social media platforms, ecommerce websites, and online banking services are prime targets for automated abuse. Bots can flood these platforms with spam or fraudulent activity, potentially causing significant harm to the business and its customers.

In this context, CAPTCHAs can serve several important purposes:

  • Prevent automated abuse: CAPTCHAs block bots and automated scripts from exploiting website functionalities, such as submitting forms, creating fake accounts, or spamming comment sections.
  • Protect against brute force attacks: By requiring human verification, CAPTCHAs stop bots from making repeated, automated attempts to guess passwords or bypass login credentials.
  • Ensure accurate data collection: CAPTCHAs ensure that responses come from real people, maintaining the integrity and accuracy of data from polls and surveys.
  • Prevent ticket scalping: CAPTCHAs limit bots from rapidly purchasing large quantities of event tickets, which scalpers resell at higher prices.
  • Reduce spam: By verifying that users are human, CAPTCHAs prevent automated systems from sending spam emails or posting spam comments.
  • Protect e-commerce sites: CAPTCHAs prevent bots from adding items to shopping carts and depleting stock, ensuring genuine customers have fair access to products.

Types of CAPTCHA

There are many different kinds of CAPTCHA tests. They can be text-based, image-based, or audio-based. Some CAPTCHAs are interactive puzzles or simple math problems.

ReCAPTCHA

Google’s reCAPTCHA is the most common form of CAPTCHA, as it helps to protect websites from bot traffic without inconveniencing the user. You will probably be familiar with reCAPTCHA’s image-based challenges and checkbox verification.

recaptcha

Text-based CAPTCHA

The most common form of text-based CAPTCHA requires you to decipher and input text from a distorted image. For example, there may be twisted letters and numbers that are hard for bots to read, but recognizable to humans.

text based captcha

Some text-based CAPTCHAs ask common knowledge questions that bots typically lack (for example, “What color is the sky?”) and require a correct text-based response.

text based captcha question

Image-based CAPTCHA

Image-based CAPTCHAs require you to solve a puzzle based on an image. Typically there will be a prompt, for example, “Select all images with traffic lights”. This includes reCAPTCHA’s image-based puzzles.

Audio CAPTCHA

Audio CAPTCHAs provide an audio clip of spoken words or numbers that you must transcribe. This type is especially useful for people who are visually impaired, offering an alternative to text or image-based CAPTCHAs.

Math-based CAPTCHA

Math-based CAPTCHAs ask you to solve a simple math problem, such as “What is 3 + 4?” Sometimes these CAPTCHAs will use the same distortion strategy as text-based CAPTCHAs so bots can’t read the question. These problems are easy for humans to solve but can block basic bots.

math captcha

Puzzle CAPTCHA

Puzzle CAPTCHAs require you to complete a task such as dragging and dropping pieces to form a complete image. This type of CAPTCHA relies on human cognitive skills and interaction, making it difficult for bots to solve. For example, these CAPTCHAs can analyze how precise your solution is, and how quickly you solved the problem, and compare it to predictable bot behavior.

puzzle captcha

Honeypot CAPTCHA

Honeypot CAPTCHAs include hidden fields within forms that humans cannot see, so they do not fill out. Bots often fill in all fields, including the ones that are invisible to humans, triggering the CAPTCHA mechanism.

Logical CAPTCHA

Logical CAPTCHAs present logic-based questions or riddles that require human-level reasoning abilities. For example, “If you have two apples and take away one, how many do you have?” These questions are designed to be simple for humans but challenging for bots.

How does CAPTCHA work?

CAPTCHAs leverage differences between humans and machines. Humans are good at pattern recognition, visual perception, and language understanding, while machines can struggle with these tasks.

All CAPTCHAs follow a similar process:

  1. Challenge presented: The website or app displays a CAPTCHA challenge.
  2. User response: You complete the challenge (for example, by typing the text, selecting images, or ticking the box).
  3. Analysis: The system analyzes the user's response.
    1. In simple CAPTCHAs, the system compares the response to the correct answer.
    2. In more advanced CAPTCHAs, the system may analyze mouse movements, typing patterns, or other behavioral factors.
  4. Verification: If the response is deemed human-like, you are allowed to proceed. Otherwise, you may be asked to try again or blocked.

In reality, modern anti-bot solutions (including CAPTCHA and other techniques), don’t rely solely on simple challenge-response processes. Instead, they collect and analyze a number of data points from your interaction with the website, which they use to build a profile about you and score the likelihood of you being a bot or a human. 

The factors anti-bot solutions analyze include:

  • IP address quality
  • TCP fingerprints
  • SSL fingerprints
  • Web browser fingerprints
  • IP address leaks (DNS, WebRTC, etc)
  • Behavior metrics (mouse moves, timings, etc)

This means that sophisticated anti-bot solutions can immediately detect and block bots based on these factors, before they even present a CAPTCHA challenge. For example, a bot could be blocked outright if it has a known malicious IP address, or unusual TCP fingerprinting. In this case, the bot won’t have the chance to attempt to solve a CAPTCHA.

Why do you see CAPTCHAs?

A risk assessment algorithm decides whether to present you with a CAPTCHA or not. The algorithm can consider multiple factors, including:

  • Your behavior: You can trigger a CAPTCHA by displaying unusual behavior, such as rapidly making multiple form submissions or multiple failed login attempts.
  • IP address reputation: The algorithm may decide to present you with a CAPTCHA if your IP address is known for suspicious or malicious activity 

CAPTCHA alternatives

Although CAPTCHAs are a common tool for websites to use to prevent automated traffic, they are not the only option. Some websites use other anti-bot solutions to provide broader bot management and protection mechanisms, such as:

  • DataDome: Uses machine learning and behavioral analysis to detect bots without requiring legitimate users to interact with a challenge-response test.
  • Akamai: Analyzes traffic patterns and uses sophisticated algorithms to identify and block bot. This solution aims to be non-intrusive to legitimate users and avoid traditional CAPTCHA challenges.
  • Imperva: Uses behavioral analysis and threat intelligence to detect and block bots without interrupting the user experience and eliminate the need for CAPTCHAs.
  • CloudFlare: Uses rate limiting (restricting the number of requests from a single IP address), behavioral analysis (monitoring user interactions for suspicious patterns), and JavaScript challenges to manage bot traffic and allow legitimate users to access the site without encountering CAPTCHAs.

Can you bypass CAPTCHAs?

Yes, you can bypass CAPTCHAs, but the difficulty and methods vary depending on the type of CAPTCHA and its complexity. As CAPTCHA technology evolves, so do the techniques for bypassing it.

Sophisticated bots can sometimes solve CAPTCHAs using machine learning, image recognition, and other techniques. They can be trained on massive datasets of solved CAPTCHAs and learn to identify patterns. 

Three ways to bypass CAPTCHA

The future of CAPTCHA

CAPTCHA use is shifting towards more seamless and user-friendly solutions, as illustrated by the introduction of Private Access Tokens (PATs). PATS are cryptographic tokens that act as proof of legitimacy for your device, rather than your IP. Trusted parties (like Apple, Google, or Cloudflare) can issue cryptographic tokens to your device when it meets specific criteria – like not exhibiting bot-like behavior. The websites you visit can then verify the token with the issuer without collecting any personal information from you, or presenting you with a CAPTCHA.

PATs are a relatively new technology; Apple officially introduced them in June 2022 at the Worldwide Developers Conference (WWDC). However, major companies like Apple, Google, and Cloudflare are actively supporting and promoting the use of PATs, and as more websites implement them, we can expect PATs to become a more common way of verifying human activity.

How does SOAX help to bypass CAPTCHA?

SOAX proxies mask the origin of your requests and distribute them across multiple IP addresses, so that websites find it more difficult to detect your automated traffic. Our Web Unblocker takes things a step further by automatically retrying failed requests and managing cookies and headers to avoid triggering CAPTCHA challenges.

Many businesses rely on bypassing CAPTCHAs so they can quickly and reliably scrape public data from the web to help inform their decisions. You can access SOAX’s purpose-built scraping APIs for ecommercesocial media, and SERP data, or sign up to try our new self-building AI Scraper that uses machine learning to solve all kinds of CAPTCHAs.

We use several sophisticated techniques to avoid detection and ensure successful data extraction, including:

  • Automatic proxy rotation
  • User-agent rotation
  • Headless browsers and browser automation
  • Solving CAPTCHAs automatically
  • Javascript rendering

Glossary

Turing test

The foundation of CAPTCHA is the Turing Test, a concept in artificial intelligence that evaluates a machine's ability to exhibit intelligent behavior equivalent to or indistinguishable from that of a human. CAPTCHA is essentially a reverse Turing Test, where the goal is to prove you are a human.

Challenge-response

CAPTCHAs use a process called challenge-response. This means they present a challenge (for example, “select all images with cars”) and require you to provide the correct response. The challenge is designed to be easy for humans but computationally intensive for bots.

Distortion

Text-based and image-based CAPTCHAs often distort characters or add background noise. This makes it difficult for bots to use Optical Character Recognition (OCR) software to decode the text or identify objects.

Behavior analysis

Advanced CAPTCHAs go beyond simple challenge-response and analyze the user's behavior during the interaction. They track mouse movements, typing speed, and other patterns. This data is used to create a "behavioral profile" that can be compared to known bot patterns.

Frequently asked questions

What does CAPTCHA stand for?

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.”

Is CAPTCHA safe to use?

Generally, CAPTCHA is safe to use as it's designed to protect websites and the people that use them. However, some CAPTCHAs track your behavior, raising privacy concerns.

What is the most common CAPTCHA?

While image-based CAPTCHAs are common, reCAPTCHA, specifically reCAPTCHA v2 and v3, developed by Google, is arguably the most widely used CAPTCHA system across the internet.

What is a Cloudflare challenge?

A Cloudflare challenge is another anti-bot security measure used to verify web traffic. Like CAPTCHAs, these challenges help websites distinguish between human users and automated bots. They include various methods, such as managed challenges, JavaScript challenges, and interactive challenges.

Lisa Whelan

Lisa is a content professional, specializing in tech and cybersecurity. She's here to share valuable insights and break down complex technical concepts for the SOAX audience.

Contact author