Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for detecting bots through client-hints #7316

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sanchezzzhak
Copy link
Collaborator

@sanchezzzhak sanchezzzhak commented Dec 30, 2022

for matomo, you will need to add a new HTTP_X_CLIENT key
https://github.com/matomo-org/matomo/blob/d1eaaca1b7abdddea15ff8d1d8e2075b6f92c672/core/Http.php#L996-L1012

[!] this PR is worth viewing about when we have more bots through the xClient header.

@sanchezzzhak sanchezzzhak linked an issue Dec 30, 2022 that may be closed by this pull request
@liviuconcioiu
Copy link
Collaborator

@sanchezzzhak I have something to add:

  1. Can you also include the HTTP_FROM header? According to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/From this one is a must for any crawler. So, if this header is present it should be detected as Generic bot for example.
  2. Here are some HTTP_FROM values that should be added, to be detected as specific bots:
googlebot(at)googlebot.com
bingbot(at)microsoft.com
support@search.yandex.ru
the.knowledge.ai@gmail.com
crawler@alexa.com

@sanchezzzhak
Copy link
Collaborator Author

@liviuconcioiu Perhaps the implementation of HTTP_FROM should be added separately, since this PR is in limbo.
It would be good to have approximate statistics before implementing new features.

@sanchezzzhak
Copy link
Collaborator Author

ChatGPT support header HTTP_FROM;

From: gptbot(at)openai.com
User-Agent" Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

@liviuconcioiu
Copy link
Collaborator

Here is a complete list of what I have so far:

bingbot(at)microsoft.com
googlebot(at)googlebot.com
gptbot(at)openai.com
robot@seokicks.de
support@search.yandex.ru
tech@babbar.tech
TGVnaXRpbWF0ZSBsaW5rIHRyYWNrZXI=
the.knowledge.ai@gmail.com
wc@verisign.com
crawler@alexa.com
pigafetta-bot(at)visual-seo.com
oai-searchbot@openai.com
"<?=print(9347655345-4954366);?>"
root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.oast.site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvements to the ClientHints - HTTP_X_CLIENT
2 participants