Skip to content

🛠 Captcha Solver based on Tensorflow. The project incorporates a Conditional Generative Adversarial Network (CGAN) architecture within Tensorflow, enhancing its ability to effectively solve Captchas even when trained on small limited datasets.

License

Notifications You must be signed in to change notification settings

Alexander-Zadorozhnyy/CGAN_CAPTCHA_SOLVER

Repository files navigation

logo

version issues forks stars license

Description

This is code for solving the text-based captchas based on the machine learning technologies. This approach is able to achieve a higher success rate than others whilst it requires significantly fewer real captchas because of using synthetic captcha generator. Here we exposed only code without dataset that can run independently on your data for security reasons. Note that it is not production ready. If you encounter any problems, please file an issue on GitHub.

The work was carried out with the support and on the basis of the laboratory of theoretical and interdisciplinary problems of Informatics of the Federal State Budgetary Institution of Science "St. Petersburg Federal Research Center of the Russian Academy of Sciences" (St. Petersburg FRC RAS). Official website: https://dscs.pro/

CAPTCHA image recognition

There are CAPTCHA that can be recognized by this solver. You can find some trained models in app/models.

Captcha examples that can be solved by this CAPTCHA solver

Requirements

pip install -r requirement.txt

Usage guide⚙️

Step0: Clone the Project
git clone https://github.com/Alexander-Zadorozhnyy/CGAN_CAPTCHA_SOLVER.git
cd CGAN_CAPTCHA_SOLVER
Step1: Create & Activate Conda Env
conda create -n "CGAN_CAPTCHA_SOLVER" python=3.9.12
conda activate CGAN_CAPTCHA_SOLVER
Step2: Install PIP Requirements
pip install -r requirement.txt
Step3: Configure captcha_setting.py
Step4: Prepare dataset

If you have a lot of different styles in your CAPTCHA dataset, you can use the clustering algorithm:

python -m src.Clustering.clustering --dataset path_to_dataset
Step5: Train CAPTCHA generator

if you have quite a few original data, you can generate synthetic CAPTCHA:

python -m src.GAN.train --dataset_folder --symbols --model_name --saved_model_name
Step6: Generate as much CAPTCHA as you need for training solver
python -m src.GAN.create_dataset --dataset_folder --count
Step7: Train CAPTCHA solver

if you have quite a few original data, you can generate synthetic CAPTCHA:

python -m src.CNN.train --gen_data --num_gen_train --num_gen_test --saved_model_name --orig_data --num_orig_train --num_orig_test --model_name --saved_model_name'

Metrics

Synthetic CAPTCHA

Time: 385 seconds to solve 5000 CAPTCHAs

Accuracy: ~99%

Real CAPTCHA

Time: 8 seconds to solve 100 CAPTCHAs

Accuracy:  ~65%

Documentation

You can check some details about this solver in the docs directory:

  • docs/report.pdf - educational practice's report (RU)

Authors

License

Source code of this repository is released under the Apache-2.0 license

About

🛠 Captcha Solver based on Tensorflow. The project incorporates a Conditional Generative Adversarial Network (CGAN) architecture within Tensorflow, enhancing its ability to effectively solve Captchas even when trained on small limited datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages