Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(1346): fix control frames flood #2108

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

kingluo
Copy link
Contributor

@kingluo kingluo commented Apr 24, 2024

part of #1346

@kingluo kingluo marked this pull request as ready for review April 24, 2024 13:23
fw/http_frame.c Outdated
@@ -280,6 +280,17 @@ __tfw_h2_send_frame(TfwH2Ctx *ctx, TfwFrameHdr *hdr, TfwStr *data,
unsigned char buf[FRAME_HEADER_SIZE];
TfwStr *hdr_str = TFW_STR_CHUNK(data, 0);
TfwH2Conn *conn = container_of(ctx, TfwH2Conn, h2);
bool is_control_frame = !hdr->stream_id || hdr->type == HTTP2_RST_STREAM;

// If the peer is causing us to generate a lot of control frames,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use another type of comments like:
/*
*
*/

@EvgeniiMekhanik
Copy link
Contributor

I think that it is better to implement new counter in frang and check max count of control frames as we do for all other limits in frang.

@kingluo
Copy link
Contributor Author

kingluo commented May 6, 2024

I think that it is better to implement new counter in frang and check max count of control frames as we do for all other limits in frang.

Maybe global configuration is better than frang? Just like "http_max_header_list_size", it should apply to all HTTP/2 connections regardless of virtual host, and it is not a request response related parameter.

@kingluo kingluo force-pushed the jinhua/fix-1346-flood branch from 1effa31 to 9153825 Compare May 6, 2024 16:05
@const-t
Copy link
Contributor

const-t commented May 9, 2024

I think that it is better to implement new counter in frang and check max count of control frames as we do for all other limits in frang.

I don't see much sense to move it to frang. 1. There is no appropriate frang handler. 2. We don't need history. 3. We have http_max_header_list_size that is not belong to frang as well.

Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fw/http_frame.c Outdated
*/
if (is_control_frame &&
atomic_read(&ctx->queued_control_frames)
> max_queued_control_frames) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move brace to next line.

fw/http_frame.c Outdated
if (is_control_frame &&
atomic_read(&ctx->queued_control_frames)
> max_queued_control_frames) {
T_ERR("Too many control frames in send queue, closing connection");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we use T_WARN in such cases. frang as example.

fw/http_frame.c Outdated
* run out of memory.
*/
if (is_control_frame &&
atomic_read(&ctx->queued_control_frames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need atomic here. Both functions where we read/inc/dec queued_control_frames are called under the socket lock.

@const-t
Copy link
Contributor

const-t commented May 10, 2024

I think that it is better to implement new counter in frang and check max count of control frames as we do for all other limits in frang.

I don't see much sense to move it to frang. 1. There is no appropriate frang handler. 2. We don't need history. 3. We have http_max_header_list_size that is not belong to frang as well.

However also there is one disadvantage of using it outside the frang, we can't configure ip_block for ping flooding. And this could be the deciding factor for moving it to frang.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve the PR since I don't understand what do we fix and why in this way.

Please, in this PR further development and other pull requests, write good commit messages what the commit does and why. In this case exact enumeration of CVEs and description of the approach to close the vulnerabilities is required. In #1346 for now we just have a bunch of CVE references, so it's not helpful in the review.

fw/http_frame.h Outdated
* SETTINGS, PING and RST_STREAM that will be queued for writing before
* the connection is closed to prevent memory exhaustion attacks.
*/
#define MAX_QUEUED_CONTROL_FRAMES 10000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant isn't used anywhere, and only a string default value "10000" is used in http.c configuration.

fw/sock_clnt.c Outdated
@@ -460,6 +460,9 @@ tfw_sk_write_xmit(struct sock *sk, struct sk_buff *skb, unsigned int mss_now,
if (h2_mode) {
h2 = tfw_h2_context(conn);
tbl = &h2->hpack.enc_tbl;
if (flags & SS_F_HTTT2_FRAME_CONTROL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (flags & SS_F_HTTT2_FRAME_CONTROL) {
if (unlikely(flags & SS_F_HTTT2_FRAME_CONTROL)) {

fw/http_frame.h Outdated
@@ -209,6 +223,7 @@ typedef struct {
*/
typedef struct tfw_h2_ctx_t {
spinlock_t lock;
unsigned int queued_control_frames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably there is a better place for the 4 byte field in the data structure to avoid alignment holes

T_WARN("Too many control frames in send queue, closing connection\n");
r = SS_BLOCK_WITH_RST;
goto err;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we limit egress control frames, not ingress from clients?

It is said in the comment above that control frames are PING, SETTINGS or RESET_STREAM, so what's the point to limit them on _egress_path? In general, (D)DoS attack implies some asymmetric conditions, when an attacker spends less resources than a victim server. It seems all the control frames sent from our side imply a some frame from a client. RESET even terminates a stream. E.g. we already have request_rate, which blocks HTTP/2 Rapid Reset - doesn't it solve the problem. Do we have a test using the rate limit and exhibiting a (D)DoS attack? @RomanBelozerov FYI

P.S. even with this comment tempesta-tech/tempesta-test#612 (comment) it seems request_rate will still do its job to block/mitigate the attack.

P.P.S. Is this a HTTP/2 slow read prevention? If so, then why don't we account the number and size of header and data frames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. According to the h2 spec, we should ack some control frames, e.g. PING, SETTINGS, RESET_STREAM, and there is no limit for the number of incoming frames in the spec, while it's necessary to limit them according to the responsive capability of the server, that is, when the server is under pressure or due to malicious client (zero TCP window), we should not ack the client in case of OOM or DDOS. If we limit the number of ingress paths instead, we cannot provision a reasonable threshold because the runtime changes always.
  2. request_rate is only used to limit the HTTP/1 or HTTP/2 requests, not control frames. So unfortunately it's useless for this purpose.
  3. for generic slow read prevention, there are other two PRs for it: test OOM by header leak tempesta-test#616, test OOM by slow read attack tempesta-test#618. All slow-read-related issues are located in http2: tests for CVE-2019-9512/9517 tempesta-test#612.

Copy link
Contributor Author

@kingluo kingluo Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case exact enumeration of CVEs and description of the approach to close the vulnerabilities is required. In #1346 for now we just have a bunch of CVE references, so it's not helpful in the review.

Yes. But 1346 refers to 612.

And Roman lists the exact CVEs we need to fix in 1346, and this PR, as part of PRs of 1346, it solves:

GHSA-hgr8-6h9x-f7q9 “Ping Flood”
GHSA-9259-5376-vjcj “Settings Flood”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krizhanovsky The inspiration of this fix comes from Nginx and golang's built-in http2 server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But 1346 refers to 612.

And Roman #1346 (comment) the exact CVEs we need to fix in 1346, and this PR, as part of PRs of 1346, it solves:

GHSA-hgr8-6h9x-f7q9 “Ping Flood”
GHSA-9259-5376-vjcj “Settings Flood”

I know. The idea is that we should have a good commit message history, so anyone should be able to quickly understand the changes history only from git, without digging on github: this is faster and sometimes it's not trivial to find required task and pull request on github and reducing github (as 3rd-party product and company) dependency is also good.

Copy link
Contributor

@krizhanovsky krizhanovsky Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we solve the problem with

  1. rate limit for ingress h2 frames
  2. limit the number of pending bytes (aka TCP send buffer)

Both the limits are easy to understand by a user and they follow the drop early strategy. With the current #2108 we accept control frames, process them and only after that terminate the connection.If we configure that a client connection can have not more than N bytes (we actually can just get the value of the current sysctl send buffer minus data in flight - we know everything about TCP state) for transmission to a client, we already have N pending bytes to the client and receive a control frame, which must be replied, we know upfront that we can't send the reply and can just reset send TCP zero window.This is part of #498 and seems not so hard to do. Definitely more work than #2108, but I'd prefer to make a good iteration on the important task (scheduled for the next milestone) now rather than pull tasks from 1.0

I'm not sure, but looks like we will broke #1973 with send buffer. The main goal of 1973 is to have send buffer with max size = CONG_WND

If we have non-zero congestion window, then we don't have the problem with slow read. We should be on the maximum transfer size as large as the congestion window, but do not keep more than send buffer size of pending data (waiting to be pushed down to the stack by TCP). I.e. 'send buffer' limit the the size of data queued for transmission to the TCP client connectio. This is plus to the data, which is transmitted in #1973

With #1973 we push for TCP transmission up to congestion window size bytes (h2 encode, ecrypt and send to the IP layer). But we also have some (maybe a lot) data for transmission (in the TCP send queue).

With #2108 we aiming to catch slow http2 attack, but each control frame sent by us is actually triggered by some frame received from a client. We also aim to drop malicious data early, so we should drop the malicious client frames early (this is where Nginx and Go http do bad job). Another problem with 2108, also inherited from Nginx, is why 10,000? The slow read attack targets memory exhaust, not "frames number" exhaust, so we should account memory, not frames. Is 10K too much for the default 100 streams? Is it to small if a user defines 1000 streams?

My proposal is to send zero receive TCP window to a client if our send buffer is full for the client, i.e. do no accept new data from a client and notify them that we can't accept data if we can't reply to the data.

  1. receive a control frame (do not send TCP ACK!)
  2. determine if we have a send buffer space for a reply for the frame
  3. if yes - send ACK, if not - send zero tcp window and drop the tcp data

This is partial version of #498. We do not implement backpresure at least. Probably a lot of more logic isn't in this scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingluo After discussion with @krizhanovsky we decided to implement rate limit for PING, SETTINGS, PRIORITY frames. This limit must be simple as possible, just reset connection when receive frames too fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kingluo @const-t @EvgeniiMekhanik we also agreed that the task should be the part of #498, i.e. also limit the memory consumption for slow read attacks. I.e. we should have the rate limiting as @const-t mentioned above and memory consumption by each client, just like Nginx provides 2-layer protection

fw/sock_clnt.c Outdated
@@ -460,6 +460,9 @@ tfw_sk_write_xmit(struct sock *sk, struct sk_buff *skb, unsigned int mss_now,
if (h2_mode) {
h2 = tfw_h2_context(conn);
tbl = &h2->hpack.enc_tbl;
if (flags & SS_F_HTTT2_FRAME_CONTROL) {
--h2->queued_control_frames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that we can't counted control frames here. Several skbs can be aggregated in one tls record in this case we don't decrement count of control frames.

@EvgeniiMekhanik EvgeniiMekhanik self-requested a review June 25, 2024 16:12
@kingluo kingluo force-pushed the jinhua/fix-1346-flood branch from c3d466b to 902025a Compare July 10, 2024 11:48
@EvgeniiMekhanik EvgeniiMekhanik requested a review from const-t July 11, 2024 10:18
@kingluo kingluo linked an issue Aug 12, 2024 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTTP/2 (D)DoS prevention
4 participants