Hacking HCaptcha through simulated clicks

In the previous article we introduced ReCaptcha’s mock click crack tutorial, but in addition to ReCaptcha, there is another CAPTCHA that is very similar to ReCapacha’s verification process, called HCaptcha.

ReCaptcha is Google’s, for some reason, we can not use ReCaptcha in China, so sometimes HCaptcha has become a better choice for some international sites.

So today we’ll learn about HCaptcha and its simulated click cracking process.

HCaptcha

Let’s first take a look at HCaptcha’s verification interaction flow, its Demo website is https://democaptcha.com/demo-form-eng/hcaptcha.html, after opening it, we can see the following CAPTCHA entry page.

HCaptcha

It looks very similar to ReCaptcha, and the verification process is also very similar.

When we click on the checkbox, Captcha will first determine the risk of the current user through its risk analysis engine, and if it is a low-risk user, it will pass directly, otherwise, Captcha will pop up a dialog box and let us answer the questions in the dialog box, similar to the following.

HCaptcha

At this point we see that the HCaptcha verification code will give us a question, for example, the above question is “Please click on each picture containing an airplane”, we need to select from the following nine pictures containing airplanes, if there are no airplanes in the nine pictures, then click the “Skip / Skip” button, if there are, then all pictures with airplanes will be selected, the skip button will become a “Check / Verify” button, after the verification is passed we can see the following effect of successful verification.

HCaptcha successful

Is the overall process very similar to ReCaptcha?

But in fact, this is simpler than ReCaptcha, its captcha image must be 3x3 each time, no 4x4, and after clicking a picture will not appear a new small picture for us to choose twice, so its cracking idea is also relatively simple.

How to crack

The whole process actually we sort out a little, we know the overall idea of cracking, there are so two key points.

The first is to find out the content of the text above, so that we know what to click on.
The second is that we need to know which target images and the text above is a match, to find the analog click in turn on the good.

It sounds very simple, but the second point is a difficult one, how do we know which images match the text? This is a difficult problem.

We learned about using YesCaptcha for image recognition in the previous ReCaptcha hack. In addition to ReCaptcha, YesCaptcha actually supports HCaptcha’s CAPTCHA recognition, so we can easily know which images and input content match using YesCaptcha.

Let’s try it out.

YesCaptcha

Before using the site we need to register first, the website address is https://yescaptcha.com/auth/register, after registering an account you can get an account key in the background, that is, ClientKey, save the backup.

YesCaptcha

Then we can check out the official documentation here. There is an API that is described here, and the general content is like this.

First there is an API for creating tasks, the API address is https://api.yescaptcha.com/createTask. Then look at the request parameters.

yescaptcha createTask

Here we need to pass in these parameters.

type: the content is HCaptchaClassification
queries: the corresponding Base64 encoding of the verification code, here directly into a list can be
question: the corresponding question ID, that is, the identification of the target code, here is actually the content of the entire sentence of the question
corrdinate: a control switch to return the result, the default will return the true / false result of each picture recognition, that is, whether the first x pictures and pictures match, if the parameter is added, then the API will return the index of the corresponding matching pictures.

For example, here we can POST such a content to the server with the following structure.

{
    "clientKey": "cc9c18d3e263515c2c072b36a7125eecc078618f",
    "task": {
        "type": "HCaptchaClassification",
        "queries": [
            "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8Uw...",
            "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8Uw...",
            ...
            "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8Uw...",
    ],
        "question": "请单击每个包含卡车的图像。" // Upload the whole sentence of the question directly
    }
}

The server then returns a response like this.

{
    "errorId": 0,
    "errorCode": "",
    "status": "ready",
    "solution": {
        "objects": [true, false, false, true, true, false, true, true] // 返回图片是否为目标,
        "labels": ["truck", "boat", "boat", "truck", "truck", "airplane-right", "truck", "truck"] // 返回图片对应的标签
    },
    "taskId": "5aa8be0c-94a5-11ec-80d7-00163f00a53c""
}

We can see that the objects field in the solution field of the returned result contains a list of true and false, which represents whether each image matches the target or not.

Once we know this result, we just need to simulate the click on the image that returns true.

Code base implementation

Okay, so with the basic idea in mind, let’s start implementing the whole process in Python. Here we’ll take the https://democaptcha.com/demo-form-eng/hcaptcha.html website as a sample to explain the whole process of identifying and simulating clicks.

Encapsulation of recognition methods

First we implement a wrapper around the above task API, and write a class to start with.

from loguru import logger
from app.settings import CAPTCHA_RESOLVER_API_KEY, CAPTCHA_RESOLVER_API_URL
import requests


class CaptchaResolver(object):

    def __init__(self, api_url=CAPTCHA_RESOLVER_API_URL, api_key=CAPTCHA_RESOLVER_API_KEY):
        self.api_url = api_url
        self.api_key = api_key

    def create_task(self, queries, question):
        logger.debug(f'start to recognize image for question {question}')
        data = {
            "clientKey": self.api_key,
            "task": {
                "type": "HCaptchaClassification",
                "queries": queries,
                "question": question
            }
        }
        try:
            response = requests.post(self.api_url, json=data)
            result = response.json()
            logger.debug(f'captcha recogize result {result}')
            return result
        except requests.RequestException:
            logger.exception(
                'error occurred while recognizing captcha', exc_info=True)

OK, here we define a class CaptchaResolver, and then mainly receive two parameters, one is api_url, which corresponds to the https://api.yescaptcha.com/createTask API address, and then there is another parameter is api_key, which is the ClientKey introduced in the previous article.

Then we define a create_task method that takes two parameters, the first one, queries, is the Base64 encoding of each captcha image, and the second one, question, is the whole sentence of the question to be recognized. Here the whole request is simulated with requests, and finally the corresponding JSON is returned.

Basic framework

OK, so let’s use Selenium to simulate opening this example site, then simulate tapping to trigger the CAPTCHA, and then recognize the CAPTCHA.

First write a general framework.

import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.common.action_chains import ActionChains
from app.captcha_resolver import CaptchaResolver


class Solution(object):
    def __init__(self, url):
        self.browser = webdriver.Chrome()
        self.browser.get(url)
        self.wait = WebDriverWait(self.browser, 10)
        self.captcha_resolver = CaptchaResolver()

    def __del__(self):
        time.sleep(10)
        self.browser.close()

Here we first initialize a Chrome operation object in the constructor, then call the corresponding get method to open the example website, and then declare a WebDriverWait object and a CaptchaResolver object to handle node lookup and captcha recognition operations, respectively, as a backup.

iframe toggle support

The next step is to simulate a click on the Captcha entry to trigger the Captcha, right?

By looking at this captcha, we see that it is very similar to ReCaptcha in that the entry point is actually loaded in an iframe, and the corresponding iframe looks like this.

hcaptcha

In addition, the pop-up captcha image is inside another iframe, as shown in the figure.

hcaptcha

Selenium needs to switch to the corresponding iframe to find the node, otherwise it will not be able to find the corresponding node and simulate a click or something.

So here we define several tool methods to support switching to the corresponding iframe of the portal and the corresponding iframe of the CAPTCHA itself, with the following code.

def get_captcha_entry_iframe(self) -> WebElement:
    self.browser.switch_to.default_content()
    captcha_entry_iframe = self.browser.find_element_by_css_selector(
        '.h-captcha > iframe')
    return captcha_entry_iframe

def switch_to_captcha_entry_iframe(self) -> None:
    captcha_entry_iframe: WebElement = self.get_captcha_entry_iframe()
    self.browser.switch_to.frame(captcha_entry_iframe)

def get_captcha_content_iframe(self) -> WebElement:
    self.browser.switch_to.default_content()
    captcha_content_iframe = self.browser.find_element_by_xpath(
        '//iframe[contains(@title, "Main content")]')
    return captcha_content_iframe

def switch_to_captcha_content_iframe(self) -> None:
    captcha_content_iframe: WebElement = self.get_captcha_content_iframe()
    self.browser.switch_to.frame(captcha_content_iframe)

In this case, we only need to call switch_to_captcha_content_iframe to find the content inside the captcha image and switch_to_captcha_entry_iframe to find the content inside the captcha entry.

Trigger CAPTCHA

OK, so the next step is to simulate a click on the captcha entry and then trigger the captcha, right, by simulating a click here.

CAPTCHA

The implementation is simple and the code is as follows.

def trigger_captcha(self) -> None:
    self.switch_to_captcha_entry_iframe()
    captcha_entry = self.wait.until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, '#anchor #checkbox')))
    captcha_entry.click()
    time.sleep(2)
    self.switch_to_captcha_content_iframe()
    captcha_element: WebElement = self.get_captcha_element()
    if captcha_element.is_displayed:
        logger.debug('trigged captcha successfully')

Here we first call switch_to_captcha_entry_iframe to switch the iframe, then find the node corresponding to the entry box and click on it.

After clicking on it, we call switch_to_captcha_content_iframe to switch to the iframe corresponding to the captcha itself and find out if the node corresponding to the captcha itself has been loaded, and if it has, then it proves that the trigger is successful.

Find the recognition target

OK, so now the captcha might look like this.

captcha

So the next thing we need to do is two things, one thing is to find the match target, which is the question itself, and the second thing is to save each captcha and convert it to Base64 encoding.

Okay, so how do you find the question? Just use Selenium’s regular node search.

By calling this method, we get the complete text of the question in the image above.

Captcha recognition

Next, we need to download and convert each image to Base64 encoding, and let’s observe its HTML structure.

Captcha

We can see that each captcha actually corresponds to a .task-image node, and then there is a .image-wrapper node inside, and inside there is a .image node, so how is the image rendered? Here it is set a style CSS style, through the CSS backgroud to set the address of the CAPTCHA image.

So, it’s easy to extract the captcha image, we just need to find out the content of the style property of the .image node and extract the url from it.

After we get the URL, we turn down the Base64 encoding and use captcha_resolver to recognize the content.

So the code can be written as follows.

def verify_captcha(self):
    # get target text
    self.captcha_target_text = self.get_captcha_target_text()
    logger.debug(
        f'captcha_target_text {self.captcha_target_text}'
    )
    # extract all images
    single_captcha_elements = self.wait.until(EC.visibility_of_all_elements_located(
        (By.CSS_SELECTOR, '.task-image .image-wrapper .image')))
    resized_single_captcha_base64_strings = []
    for i, single_captcha_element in enumerate(single_captcha_elements):
        single_captcha_element_style = single_captcha_element.get_attribute(
            'style')
        pattern = re.compile('url\("(https.*?)"\)')
        match_result = re.search(pattern, single_captcha_element_style)
        single_captcha_element_url = match_result.group(
            1) if match_result else None
        logger.debug(
            f'single_captcha_element_url {single_captcha_element_url}')
        with open(CAPTCHA_SINGLE_IMAGE_FILE_PATH % (i,), 'wb') as f:
            f.write(requests.get(single_captcha_element_url).content)
        resized_single_captcha_base64_string = resize_base64_image(
            CAPTCHA_SINGLE_IMAGE_FILE_PATH % (i,), (100, 100))
        resized_single_captcha_base64_strings.append(
            resized_single_captcha_base64_string)

    logger.debug(
        f'length of single_captcha_element_urls {len(resized_single_captcha_base64_strings)}')

Here we extract the url of each captcha image, which is matched using regular expressions. After extracting the url, we then store it in the resized_single_captcha_base64_strings list.

Here we have defined a separate method for Base64 encoding, passing in the image path and resizing, and then we can return the encoded result, defined as follows.

from PIL import Image
import base64
from app.settings import CAPTCHA_RESIZED_IMAGE_FILE_PATH


def resize_base64_image(filename, size):
    width, height = size
    img = Image.open(filename)
    new_img = img.resize((width, height))
    new_img.save(CAPTCHA_RESIZED_IMAGE_FILE_PATH)
    with open(CAPTCHA_RESIZED_IMAGE_FILE_PATH, "rb") as f:
        data = f.read()
        encoded_string = base64.b64encode(data)
        return encoded_string.decode('utf-8')

Image Recognition

Okay, so now that we can get the content of the question and the corresponding Base64 encoding for each image, we can directly use YesCaptcha for image recognition, with the following code call.

# try to verify using API
captcha_recognize_result = self.captcha_resolver.create_task(
    resized_single_captcha_base64_strings,
    self.captcha_target_text
)
if not captcha_recognize_result:
    logger.error('count not get captcha recognize result')
    return
recognized_results = captcha_recognize_result.get(
    'solution', {}).get('objects')

if not recognized_results:
    logger.error('count not get captcha recognized indices')
    return

If it runs properly, we may get the following return results.

{
   "errorId":0,
   "errorCode":"",
   "status":"ready",
   "solution":{
      "objects":[
         true,
         false,
         false,
         false,
         true,
         false,
         true,
         true,
         false
      ],
      "labels":[
         "boat",
         "seaplane",
         "bicycle",
         "train",
         "boat",
         "train",
         "boat",
         "boat",
         "bus"
      ]
   },
   "taskId":"25fee484-df63-11ec-b02e-c2654b11608a"
}

Now we can see that the objects field in sulution contains a list of true false, for example, the first true means that the first captcha is a match to the question, and the second false means that the second captcha image is a mismatch to the question. How do the serial number and the image correspond to each other? See the following figure.

yescaptcha response

Counting from left to right, line by line, the serial number in increasing order, for example, the first serial number in the first line is 0, then the result is the first result inside the objects result, true.

Simulate a click

Now we have got the true false list, we just need to extract the serial number of the result is true, and then click on these CAPTCHA small picture is good, the code is as follows.

# click captchas
recognized_indices = [i for i, x in enumerate(recognized_results) if x]
logger.debug(f'recognized_indices {recognized_indices}')
click_targets = self.wait.until(EC.visibility_of_all_elements_located(
    (By.CSS_SELECTOR, '.task-image')))
for recognized_index in recognized_indices:
    click_target: WebElement = click_targets[recognized_index]
    click_target.click()
    time.sleep(random())

Of course, we can also simulate a click on each node by executing JavaScript, and the effect is similar.

Here we use a for loop to turn the true false list into a list, and each element of the list represents the position of true in the list, which is actually the target of our click.

Then we get all the nodes corresponding to the small CAPTCHA images, and then we call the click method to click them in turn.

This way, we can recognize the CAPTCHA image one by one.

Click to verify

Okay, so with the above logic, we can complete the entire HCaptcha recognition and point-and-click.

Finally, let’s simulate clicking the validation button and we’re done.

# after all captcha clicked
verify_button: WebElement = self.get_verify_button()
if verify_button.is_displayed:
    verify_button.click()
    time.sleep(3)

And the verfiy_button is extracted using Selenium as well.

1
2
3

def get_verify_button(self) -> WebElement:
    verify_button = self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.button-submit')))
    return verify_button

Verify the result

After clicking on it, we can try to check the page changes to see if the validation was successful.

For example, the sign of successful validation is the appearance of a small green check mark.

HCaptcha successful

The inspection method is as follows.

def get_is_successful(self):
    self.switch_to_captcha_entry_iframe()
    anchor: WebElement = self.wait.until(EC.visibility_of_element_located((
        By.CSS_SELECTOR, '#anchor #checkbox'
    )))
    checked = anchor.get_attribute('aria-checked')
    logger.debug(f'checked {checked}')
    return str(checked) == 'true'

Here we switch the iframe first, and then check if the corresponding class is what we expect.

Finally, if the result of get_is_successful is True, then it means the recognition is successful, and the whole process is finished.

If the result is False, we can recursively call the above logic for a second time until the recognition is successful.

# check if succeed
is_succeed = self.get_is_successful()
if is_succeed:
    logger.debug('verifed successfully')
else:
    self.verify_captcha()

Code

The above code may be complicated, so I’ve organized the code here and put it on GitHub: https://github.com/Python3WebSpider/HCaptchaResolver

Finally, it should be noted that the above verification code service is charged, each verification may cost a certain number of points, for example, to identify a 3x3 figure costs 10 points, while a recharge of a dollar will get 1000 points, so identify a penny, or relatively cheap.

Table of Contents

HCaptcha

How to crack

YesCaptcha

Code base implementation

Encapsulation of recognition methods

Basic framework

iframe toggle support

Trigger CAPTCHA

Find the recognition target

Captcha recognition

Image Recognition

Simulate a click

Click to verify

Verify the result

Code