Maybe you haven’t heard of what ReCAPTCHA is, probably for some reason this CAPTCHA doesn’t appear much in China, but I think you should have seen or used it more or less. It looks like this.
At this point, as soon as we click the checkbox at the top, the CAPTCHA algorithm will first use its “risk analysis engine” to do a security check, and if it passes the test directly, we will directly get the following result.
If the algorithm detects a risk in the current system, such as a possible unfamiliar network environment, possibly a simulated program, it will need to do a secondary checksum. It will further pop up something like the following.
For example, in the picture above, nine pictures will appear on the verification code page, and the text “tree” will appear at the top. We need to tap the picture with “tree” in the nine pictures below, and after we finish tapping, several new pictures may appear, so we need to finish tapping again, and finally click the “verify” button to complete the verification. Or we can click on the “headset” icon below, and then it will switch to dictation mode, and the verification code will look like this.
At this time, if we can fill in the audio content read to the CAPTCHA, we can also pass the validation. Both ways can pass the verification, and after the verification is completed, we can complete the submission of the form, such as completing the login, registration and other operations. What is the name of this CAPTCHA? This CAPTCHA is Google’s ReCAPTCHA V2 CAPTCHA, which is a kind of behavioral CAPTCHA. These behaviors include clicking checkboxes, selecting corresponding images, voice dictation, etc. Only when these behaviors are verified, this CAPTCHA can pass the verification. Compared with the general graphical CAPTCHA, this kind of CAPTCHA has better interactive experience, higher security and more difficult to crack.
So where can we experience ReCAPTCHA? We can open this website: https://www.google.com/recaptcha/api2/demo, (if you are in China, you need to use a VPN) while opening it with an anonymous window, so that the test will not be interrupted by historical cookies, as shown in the picture.
At this point, we can see a ReCAPTCHA window at the bottom, and then after clicking on it, a verification block appears.
Of course it is possible to solve it manually, but it is certainly not possible for crawlers, so how to automate the solution?
Next we will introduce a simple and useful platform.
This time we introduce a ReCAPTCHA cracking service called YesCaptcha, the homepage is http://yescaptcha.365world.com.cn/, which can now support both V2 and V3 versions of cracking.
Let’s use it to try to solve the V2 type verification code on ReCAPTCHA just now: https://www.google.com/recaptcha/api2/demo.
After a simple registration, you can find a Token on the home page, which we can copy for later use, as shown here.
It has two key APIs, one is to create CAPTCHA service tasks and the other is to query the status of the tasks, the APIs are as follows.
- Create task: http://api.yescaptcha.365world.com.cn/v3/recaptcha/create
- Query status: http://api.yescaptcha.365world.com.cn/v3/recaptcha/status
API documentation can be found here: http://docs.yescaptcha.365world.com.cn/
After the API documentation, you can see that the following parameters can be configured when using it.
|token||yes||Please get it in your personal center (Token)|
|siteKey||是||ReCaptcha SiteKey （Fixed parameters）|
|siteReferer||yes||ReCaptcha Referer （Generally also fixed parameters）|
|captchaType||no||ReCaptchaV2 (Default) / ReCaptchaV3|
|siteAction||no||ReCaptchaV3 Optional Action Action Default verify|
|minScore||no||ReCaptchaV3 Optional Minimum fraction (0.1 - 0.9)|
Here are the three key pieces of information.
- token: This is the parameter we just copied off of YesCaptcha
- siteKey: This is the flag string of ReCAPACHA, we will show how to find it later.
- siteReferer, which is generally the referer of the source site of ReCAPTCHA, for example, for the current case, the value is https://www.google.com/recaptcha/api2/demo
How to find the siteKey? It’s actually very simple, let’s look at the HTML source code of the current ReCAPTCHA and just look for it from the source code.
Here you can see that each ReCAPTCHA corresponds to a div, div has an attribute called date-sitekey, see the value here is.
Well, everything is ready, all that’s missing is the code!
Let’s use the simplest requests to implement it, first defining the constants.
Here we define these constants.
- TOKEN: is the token copied from the website
- REFERER: is the link to the demo site
- API_BASE_URL: is the API URL of YesCaptcha
- SITE_KEY: is the data-sitekey we just found
Then we define a method to create the task.
Here is the API call to create a task, nothing to say.
If the creation is successful, we will get a task_id, and then we need to use this task_id to poll the status of the task, and define a method as follows.
If the result is Success, it proves that the task is successful, and the response result is the token obtained after parsing the CAPTCHA crack.
Two methods are called.
The results of the run are similar to the following.
If it returns the data in the above format, it means that the ReCAPTCHA authentication code has been recognized successfully, and the content of the response field is the recognized token, we directly take this token and put it into the form to submit it successfully.
Then how to use this token? In fact, if we use the browser to verify that the validation is successful, when we click on the form to submit, a textarea with the name g-recaptcha-response will be assigned a value in the form, and if the validation is successful, its value is the token obtained after the validation, which will be sent to the server as part of the form submission for validation This is sent to the server for validation as part of the form submission. If the field checks out, it’s fine.
You can see that it’s just submitting a form, and one of the fields is g-recaptcha-response, and it will send it to the server for verification, and if the verification passes, it’s successful. So if we get this token with YesCaptcha and assign it to the textarea of the form, the form will be submitted and if the token is valid, we can successfully bypass the login without having to click on the captcha again. Finally we get the following successful page.
Of course, we can also use requests to simulate the completion of a form submission: the
Final refinement of the call.
The results of the run are as follows.
Finally, you can see that after the mock submission, the result will have a Verification Success… Hooray! text, which means that the verification was successful!
At this point, we have successfully completed the ReCAPTCHA crack.
Above we introduced the implementation of requests, of course, the use of tools such as Selenium can also be achieved, the specific Demo in the document is also written, please refer to the instructions to use the document can