Maybe you haven’t heard of what ReCAPTCHA is, probably for some reason this CAPTCHA doesn’t appear much in China, but I think you should have seen or used it more or less. It looks like this.

At this point, as soon as we click the checkbox at the top, the CAPTCHA algorithm will first use its “risk analysis engine” to do a security check, and if it passes the test directly, we will directly get the following result.

If the algorithm detects a risk in the current system, such as a possible unfamiliar network environment, possibly a simulated program, it will need to do a secondary checksum. It will further pop up something like the following.

For example, in the picture above, nine pictures will appear on the verification code page, and the text “tree” will appear at the top. We need to tap the picture with “tree” in the nine pictures below, and after we finish tapping, several new pictures may appear, so we need to finish tapping again, and finally click the “verify” button to complete the verification. Or we can click on the “headset” icon below, and then it will switch to dictation mode, and the verification code will look like this.

At this time, if we can fill in the audio content read to the CAPTCHA, we can also pass the validation. Both ways can pass the verification, and after the verification is completed, we can complete the submission of the form, such as completing the login, registration and other operations. What is the name of this CAPTCHA? This CAPTCHA is Google’s ReCAPTCHA V2 CAPTCHA, which is a kind of behavioral CAPTCHA. These behaviors include clicking checkboxes, selecting corresponding images, voice dictation, etc. Only when these behaviors are verified, this CAPTCHA can pass the verification. Compared with the general graphical CAPTCHA, this kind of CAPTCHA has better interactive experience, higher security and more difficult to crack.

In fact, the CAPTCHA introduced above is only one form of ReCAPTCHA, which is the explicit version of V2, and there is also an implicit version of V2, which does not appear explicitly on the verification page when verifying, but binds the CAPTCHA and the submit button through JavaScript, and will automatically complete the verification when submitting the form. In addition to the V2 version, Google has launched the latest V3 version. reCAPTCHA V3 captcha will calculate a score based on the user’s behavior, which represents the probability that the user may be a robot, and the final probability will determine whether the verification can pass. Its security is higher and the experience is better.

Experience

So where can we experience ReCAPTCHA? We can open this website: https://www.google.com/recaptcha/api2/demo, (if you are in China, you need to use a VPN) while opening it with an anonymous window, so that the test will not be interrupted by historical cookies, as shown in the picture.

At this point, we can see a ReCAPTCHA window at the bottom, and then after clicking on it, a verification block appears.

Of course it is possible to solve it manually, but it is certainly not possible for crawlers, so how to automate the solution?

Next we will introduce a simple and useful platform.

Solutions

This time we introduce a ReCAPTCHA cracking service called YesCaptcha, the homepage is http://yescaptcha.365world.com.cn/, which can now support both V2 and V3 versions of cracking.

Let’s use it to try to solve the V2 type verification code on ReCAPTCHA just now: https://www.google.com/recaptcha/api2/demo.

After a simple registration, you can find a Token on the home page, which we can copy for later use, as shown here.

It has two key APIs, one is to create CAPTCHA service tasks and the other is to query the status of the tasks, the APIs are as follows.

API documentation can be found here: http://docs.yescaptcha.365world.com.cn/

After the API documentation, you can see that the following parameters can be configured when using it.

Parameter Name required Description
token yes Please get it in your personal center (Token)
siteKey ReCaptcha SiteKey (Fixed parameters)
siteReferer yes ReCaptcha Referer (Generally also fixed parameters)
captchaType no ReCaptchaV2 (Default) / ReCaptchaV3
siteAction no ReCaptchaV3 Optional Action Action Default verify
minScore no ReCaptchaV3 Optional Minimum fraction (0.1 - 0.9)

Here are the three key pieces of information.

  • token: This is the parameter we just copied off of YesCaptcha
  • siteKey: This is the flag string of ReCAPACHA, we will show how to find it later.
  • siteReferer, which is generally the referer of the source site of ReCAPTCHA, for example, for the current case, the value is https://www.google.com/recaptcha/api2/demo

How to find the siteKey? It’s actually very simple, let’s look at the HTML source code of the current ReCAPTCHA and just look for it from the source code.

Here you can see that each ReCAPTCHA corresponds to a div, div has an attribute called date-sitekey, see the value here is.

1
6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-

Well, everything is ready, all that’s missing is the code!

Code

Let’s use the simplest requests to implement it, first defining the constants.

1
2
3
4
TOKEN = '50a07xxxxxxxxxxxxxxxxxxxxxxxxxf78'  # Please replace with your own TOKEN
REFERER = 'https://www.google.com/recaptcha/api2/demo'
BASE_URL = 'http://api.yescaptcha.365world.com.cn'
SITE_KEY = '6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-' # Please replace it with your own SITE_KEY

Here we define these constants.

  • TOKEN: is the token copied from the website
  • REFERER: is the link to the demo site
  • API_BASE_URL: is the API URL of YesCaptcha
  • SITE_KEY: is the data-sitekey we just found

Then we define a method to create the task.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def create_task():
    url = f"{BASE_URL}/v3/recaptcha/create?token={TOKEN}&siteKey={SITE_KEY}&siteReferer={REFERER}"
    try:
        response = requests.get(url)
        if response.status_code == 200:
            data = response.json()
            print('response data:', data)
            return data.get('data', {}).get('taskId')
    except requests.RequestException as e:
        print('create task failed', e)

Here is the API call to create a task, nothing to say.

If the creation is successful, we will get a task_id, and then we need to use this task_id to poll the status of the task, and define a method as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def polling_task(task_id):
    url = f"{BASE_URL}/v3/recaptcha/status?token={TOKEN}&taskId={task_id}"
    count = 0
    while count < 120:
        try:
            response = requests.get(url)
            if response.status_code == 200:
                data = response.json()
                print('polling result', data)
                status = data.get('data', {}).get('status')
                print('status of task', status)
                if status == 'Success':
                    return data.get('data', {}).get('response')
        except requests.RequestException as e:
            print('polling task failed', e)
        finally:
            count += 1
            time.sleep(1)

If the result is Success, it proves that the task is successful, and the response result is the token obtained after parsing the CAPTCHA crack.

Two methods are called.

1
2
3
4
5
if __name__ == '__main__':
    task_id = create_task()
    print('create task successfully', task_id)
    response = polling_task(task_id)
    print('get response:', response[0:40]+'...')

The results of the run are similar to the following.

1
2
3
4
5
6
7
8
9
response data: {'status': 0, 'msg': 'ok', 'data': {'taskId': '1479436991'}}
create task successfully 1479436991
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Success', 'response': '03AGdBq27-ABqvNmgq96iuprN8Mvzfq6_8noknIed5foLb15oWvWVksq9KesDkDd7dgMMr-UmqULZduXTWr87scJXl3djhl2btPO721eFAYsVzSk7ftr4uHBdJWonnEemr9dNaFB9qx5pnxr3P24AC7cCfKlOH_XARaN4pvbPNxx_UY5G5fzKUPFDOV14nNkCWl61jwwC0fuwetH1q99r4hBQxyI6XICD3PiHyHJMZ_-wolcO1R9C90iGQyjzrSMiNqErezO24ODCiKRyX2cVaMwM9plbxDSuyKUVaDHqccz8UrTNNdJ4m2WxKrD9wZDWaSK10Ti1LgsqOWKjKwqBbuyRS_BkSjG6OJdHqJN4bpk_jAcPMO13wXrnHBaXdK4FNDR9-dUvupHEnr7QZEuNoRxwl8FnO2Fgwzp2sJbGeQkMbSVYWdAalE6fzJ8NwsFJxCdDyeyO817buBtvTJ4C06C1uZ92fpPTeYGJwbbicOuqbGfHNTyiSJeRNmt-5RKz0OUiPJOPnmVKGlWBOqwbwCW1WZt-E-hH4FEg4En5TITmmPb_feS9dWKUxudn1U0hHk2vV9PerjZLtI7F67KtgmcqRrARPbwnc6KyAi3Hy1hthP92lv4MRIcO2jx0Llvsja-G2nhjZB0ZoJwkb9106pmqldiwlXxky4Dcg7VPStiCYJvhQpRYol7Iq1_ltU2tyhMqsu_Xa8Z6Mr5ykRCLnmlLb8DV8isndrdwp84wo_vPARGRj7Up9ov-ycb5lDKTf1XRaHiMCa8d2WLy0Pjco9UnsRAPw0FW3MsBJah6ryHUUDho7ffhUUgV1k86ryJym6xbWch1sVC4D5owzrCFn6L-rSLc5SS1pza2zU5LK4kAZCmbXNRffiFrhUY8nP4T1xaR2KMhIaN8HhJQpR8sQh1Azc-QkDy4rwbYmxUrysYGMrAOnmDx9z7tWQXbJE4IgCVMx5wihSiE-T8nbF5y1aJ0Ru9zqg1nZ3GSqsucSnvJA8HV5t9v0QSG5cBC1x5HIceA-2uEGSjwcmYOMw8D_65Dl-d6yVk1YN2FZCgMWY5ewzB1RAFN1BMqKoITQJ64jq3lKATpkc5i7aTA2bRGQyXrbDyMRIrVXKnYMHegfMbDn0l4O81a8vxmevLspKkacVPiqLsAe-73jAxMvsOqaG7cKxMQO9CY3qbtD55YgN0W4p2jyNSVz3aEpffHRqYyWMsRI5LddLgaZQDoHHgGUhV580PSIdZJ5eKd0gOjxIYxKlr0IgbMWRmsG_TgDNImy1c5oey8ojl-zWpOQW7bnfq5Z4tZ10_sCTfoOZVLqRuOsqB1OOO9pLRQojLBP0HUiGhRAr_As9EIDu6F9NIQfdAmCaVvavJbi1CZITFjcywP-tBrHsxpwkCXlwl996MK_XyEDuyWnJVGiVSthUMY306tIh1Xxj93W3KQJCzsfJQcjN-3lGLLeDFddypHyG4yrpRqRHHBNyiNJHgxSk5SaShEhXvByjkepvhrKX3kJssCU04biqqmkrQ49GqBV9OsWIy0nN3OJTx8v05MP8aU8YYkYBF01UbSff4mTfLAhin6iWk84Y074mRbe2MbgFAdU58KnCrwYVxcAR8voZsFxbxNwZXdVeexNx5HlIlSgaAHLWm2kFWmGPPW-ZA7R8Wst-mc7oIKft5iJl8Ea0YFz8oXyVgQk1rd9nDR3xGe5mWL1co0MiW1yvHg'}}

If it returns the data in the above format, it means that the ReCAPTCHA authentication code has been recognized successfully, and the content of the response field is the recognized token, we directly take this token and put it into the form to submit it successfully.

Then how to use this token? In fact, if we use the browser to verify that the validation is successful, when we click on the form to submit, a textarea with the name g-recaptcha-response will be assigned a value in the form, and if the validation is successful, its value is the token obtained after the validation, which will be sent to the server as part of the form submission for validation This is sent to the server for validation as part of the form submission. If the field checks out, it’s fine.

So, the above process is equivalent to simulating the process of tapping a CAPTCHA for us, and the token we end up with is actually what we should assign to the name g-recaptcha-response. So how do we assign it? It’s as simple as using JavaScript. We can use JavaScript to select the textarea and assign a value directly to it, with the following code.

1
document.getElementById("g-recaptcha-response").innerHTML="TOKEN_FROM_YESCAPTCHA";

Note that the TOKEN_FROM_YESCAPTCHA here needs to be replaced with the token value we just got. When we do the crawler login simulation, if we use Selenium, Puppeteer, etc., we only need to simulate the execution of this JavaScript code in the simulation program, and then we can successfully assign the value. After executing and submitting the form directly, let’s look at the Network request.

You can see that it’s just submitting a form, and one of the fields is g-recaptcha-response, and it will send it to the server for verification, and if the verification passes, it’s successful. So if we get this token with YesCaptcha and assign it to the textarea of the form, the form will be submitted and if the token is valid, we can successfully bypass the login without having to click on the captcha again. Finally we get the following successful page.

Of course, we can also use requests to simulate the completion of a form submission: the

1
2
3
4
5
6
def verify(response):
    url = "https://www.google.com/recaptcha/api2/demo"
    data = {"g-recaptcha-response": response}
    response = requests.post(url, data=data)
    if response.status_code == 200:
        return response.text

Final refinement of the call.

1
2
3
4
5
6
7
if __name__ == '__main__':
    task_id = create_task()
    print('create task successfully', task_id)
    response = polling_task(task_id)
    print('get response:', response[0:40]+'...')
    result = verify(response)
    print(result)

The results of the run are as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
response data: {'status': 0, 'msg': 'ok', 'data': {'taskId': '1479436991'}}
create task successfully 1479436991
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Working'}}
status of task Working
polling result {'status': 0, 'msg': 'ok', 'data': {'status': 'Success', 'response': '03AGdBq27-ABqvNmgq96iuprN8Mvzfq6_8noknIed5foLb15oWvWVksq9KesDkDd7dgMMr-UmqULZduXTWr87scJXl3djhl2btPO721eFAYsVzSk7ftr4uHBdJWonnEemr9dNaFB9qx5pnxr3P24AC7cCfKlOH_XARaN4pvbPNxx_UY5G5fzKUPFDOV14nNkCWl61jwwC0fuwetH1q99r4hBQxyI6XICD3PiHyHJMZ_-wolcO1R9C90iGQyjzrSMiNqErezO24ODCiKRyX2cVaMwM9plbxDSuyKUVaDHqccz8UrTNNdJ4m2WxKrD9wZDWaSK10Ti1LgsqOWKjKwqBbuyRS_BkSjG6OJdHqJN4bpk_jAcPMO13wXrnHBaXdK4FNDR9-dUvupHEnr7QZEuNoRxwl8FnO2Fgwzp2sJbGeQkMbSVYWdAalE6fzJ8NwsFJxCdDyeyO817buBtvTJ4C06C1uZ92fpPTeYGJwbbicOuqbGfHNTyiSJeRNmt-5RKz0OUiPJOPnmVKGlWBOqwbwCW1WZt-E-hH4FEg4En5TITmmPb_feS9dWKUxudn1U0hHk2vV9PerjZLtI7F67KtgmcqRrARPbwnc6KyAi3Hy1hthP92lv4MRIcO2jx0Llvsja-G2nhjZB0ZoJwkb9106pmqldiwlXxky4Dcg7VPStiCYJvhQpRYol7Iq1_ltU2tyhMqsu_Xa8Z6Mr5ykRCLnmlLb8DV8isndrdwp84wo_vPARGRj7Up9ov-ycb5lDKTf1XRaHiMCa8d2WLy0Pjco9UnsRAPw0FW3MsBJah6ryHUUDho7ffhUUgV1k86ryJym6xbWch1sVC4D5owzrCFn6L-rSLc5SS1pza2zU5LK4kAZCmbXNRffiFrhUY8nP4T1xaR2KMhIaN8HhJQpR8sQh1Azc-QkDy4rwbYmxUrysYGMrAOnmDx9z7tWQXbJE4IgCVMx5wihSiE-T8nbF5y1aJ0Ru9zqg1nZ3GSqsucSnvJA8HV5t9v0QSG5cBC1x5HIceA-2uEGSjwcmYOMw8D_65Dl-d6yVk1YN2FZCgMWY5ewzB1RAFN1BMqKoITQJ64jq3lKATpkc5i7aTA2bRGQyXrbDyMRIrVXKnYMHegfMbDn0l4O81a8vxmevLspKkacVPiqLsAe-73jAxMvsOqaG7cKxMQO9CY3qbtD55YgN0W4p2jyNSVz3aEpffHRqYyWMsRI5LddLgaZQDoHHgGUhV580PSIdZJ5eKd0gOjxIYxKlr0IgbMWRmsG_TgDNImy1c5oey8ojl-zWpOQW7bnfq5Z4tZ10_sCTfoOZVLqRuOsqB1OOO9pLRQojLBP0HUiGhRAr_As9EIDu6F9NIQfdAmCaVvavJbi1CZITFjcywP-tBrHsxpwkCXlwl996MK_XyEDuyWnJVGiVSthUMY306tIh1Xxj93W3KQJCzsfJQcjN-3lGLLeDFddypHyG4yrpRqRHHBNyiNJHgxSk5SaShEhXvByjkepvhrKX3kJssCU04biqqmkrQ49GqBV9OsWIy0nN3OJTx8v05MP8aU8YYkYBF01UbSff4mTfLAhin6iWk84Y074mRbe2MbgFAdU58KnCrwYVxcAR8voZsFxbxNwZXdVeexNx5HlIlSgaAHLWm2kFWmGPPW-ZA7R8Wst-mc7oIKft5iJl8Ea0YFz8oXyVgQk1rd9nDR3xGe5mWL1co0MiW1yvHg'}}
status of task Success
get response: 03AGdBq27-ABqvNmgq96iuprN8Mvzfq6_8noknIe...
<!DOCTYPE HTML><html dir="ltr"><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><meta name="viewport" content="width=device-width, user-scalable=yes"><title>ReCAPTCHA demo</title><link rel="stylesheet" href="https://www.gstatic.com/recaptcha/releases/TbD3vPFlUWKZD-9L4ZxB0HJI/demo__ltr.css" type="text/css"></head><body><div class="recaptcha-success">Verification Success... Hooray!</div></body></html>

Finally, you can see that after the mock submission, the result will have a Verification Success… Hooray! text, which means that the verification was successful!

At this point, we have successfully completed the ReCAPTCHA crack.

Above we introduced the implementation of requests, of course, the use of tools such as Selenium can also be achieved, the specific Demo in the document is also written, please refer to the instructions to use the document can


Reference https://cuiqingcai.com/30026.html