190 lines
7.2 KiB
Diff
190 lines
7.2 KiB
Diff
|
|
From aa156a439da681361ed6f53f1a8131892418838b Mon Sep 17 00:00:00 2001
|
||
|
|
From: James Coglan <jcoglan@gmail.com>
|
||
|
|
Date: Mon, 1 Jun 2020 21:50:56 +0100
|
||
|
|
Subject: [PATCH] Remove ReDoS vulnerability in the Sec-WebSocket-Extensions
|
||
|
|
header parser
|
||
|
|
|
||
|
|
There is a regular expression denial of service (ReDoS) vulnerability in
|
||
|
|
the parser we use to process the `Sec-WebSocket-Extensions` header. It
|
||
|
|
can be exploited by sending an opening WebSocket handshake to a server
|
||
|
|
containing a header of the form:
|
||
|
|
|
||
|
|
Sec-WebSocket-Extensions: a;b="\c\c\c\c\c\c\c\c\c\c ...
|
||
|
|
|
||
|
|
i.e. a header containing an unclosed string parameter value whose
|
||
|
|
content is a repeating two-byte sequence of a backslash and some other
|
||
|
|
character. The parser takes exponential time to reject this header as
|
||
|
|
invalid, and this can be used to exhaust the server's capacity to
|
||
|
|
process requests.
|
||
|
|
|
||
|
|
This vulnerability has been assigned the identifier CVE-2020-7663.
|
||
|
|
|
||
|
|
We believe this flaw stems from the grammar specified for this header.
|
||
|
|
[RFC 6455][1] defines the grammar for the header as:
|
||
|
|
|
||
|
|
Sec-WebSocket-Extensions = extension-list
|
||
|
|
|
||
|
|
extension-list = 1#extension
|
||
|
|
extension = extension-token *( ";" extension-param )
|
||
|
|
extension-token = registered-token
|
||
|
|
registered-token = token
|
||
|
|
extension-param = token [ "=" (token | quoted-string) ]
|
||
|
|
|
||
|
|
It refers to [RFC 2616][2] for the definitions of `token` and
|
||
|
|
`quoted-string`, which are:
|
||
|
|
|
||
|
|
token = 1*<any CHAR except CTLs or separators>
|
||
|
|
separators = "(" | ")" | "<" | ">" | "@"
|
||
|
|
| "," | ";" | ":" | "\" | <">
|
||
|
|
| "/" | "[" | "]" | "?" | "="
|
||
|
|
| "{" | "}" | SP | HT
|
||
|
|
|
||
|
|
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
|
||
|
|
qdtext = <any TEXT except <">>
|
||
|
|
quoted-pair = "\" CHAR
|
||
|
|
|
||
|
|
These rely on the `CHAR`, `CTL` and `TEXT` grammars, which are:
|
||
|
|
|
||
|
|
CHAR = <any US-ASCII character (octets 0 - 127)>
|
||
|
|
CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
|
||
|
|
TEXT = <any OCTET except CTLs, but including LWS>
|
||
|
|
|
||
|
|
Other relevant definitions to support these:
|
||
|
|
|
||
|
|
OCTET = <any 8-bit sequence of data>
|
||
|
|
LWS = [CRLF] 1*( SP | HT )
|
||
|
|
CRLF = CR LF
|
||
|
|
|
||
|
|
HT = <US-ASCII HT, horizontal-tab (9)>
|
||
|
|
LF = <US-ASCII LF, linefeed (10)>
|
||
|
|
CR = <US-ASCII CR, carriage return (13)>
|
||
|
|
SP = <US-ASCII SP, space (32)>
|
||
|
|
|
||
|
|
To expand some of these terms out and write them as regular expressions:
|
||
|
|
|
||
|
|
OCTET = [\x00-\xFF]
|
||
|
|
CHAR = [\x00-\x7F]
|
||
|
|
TEXT = [\t \x21-\x7E\x80-\xFF]
|
||
|
|
|
||
|
|
The allowable bytes for `token` are [\x00-\x7F], except [\x00-\x1F\x7F]
|
||
|
|
(leaving [\x20-\x7E]) and `separators`, which leaves the following set
|
||
|
|
of allowed chars:
|
||
|
|
|
||
|
|
! # $ % & ' * + - . ^ _ ` | ~ [0-9] [A-Z] [a-z]
|
||
|
|
|
||
|
|
`quoted-string` contains a repeated pattern of either `qdtext` or
|
||
|
|
`quoted-pair`. `qdtext` is any `TEXT` byte except <">, and the <">
|
||
|
|
character is ASCII 34, or 0x22. The <!> character is 0x21. So `qdtext`
|
||
|
|
can be written either positively as:
|
||
|
|
|
||
|
|
qdtext = [\t !\x23-\x7E\x80-\xFF]
|
||
|
|
|
||
|
|
or negatively, as:
|
||
|
|
|
||
|
|
qdtext = [^\x00-\x08\x0A-\x1F\x7F"]
|
||
|
|
|
||
|
|
We use the negative definition here. The other alternative in the
|
||
|
|
`quoted-string` pattern is:
|
||
|
|
|
||
|
|
quoted-pair = \\[\x00-\x7F]
|
||
|
|
|
||
|
|
The problem is that the set of bytes matched by `qdtext` includes <\>,
|
||
|
|
and intersects with the second element of `quoted-pair`. That means the
|
||
|
|
sequence \c can be matched as either two `qdtext` bytes, or as a single
|
||
|
|
`quoted-pair`. When the regex engine fails to find a trailing <"> to
|
||
|
|
close the string, it back-tracks and tries every alternate parse for the
|
||
|
|
string, which doubles with each pair of bytes in the input.
|
||
|
|
|
||
|
|
To fix the ReDoS flaw we need to rewrite the repeating pattern so that
|
||
|
|
none of its alternate branches can match the same text. For example, we
|
||
|
|
could try dividing the set of bytes [\x00-\xFF] into those that must not
|
||
|
|
follow a <\>, those that may follow a <\>, and those that must be
|
||
|
|
preceded by <\>, and thereby construct a pattern of the form:
|
||
|
|
|
||
|
|
(A|\?B|\C)*
|
||
|
|
|
||
|
|
where A, B and C have no characters in common. In our case the three
|
||
|
|
branch patterns would be:
|
||
|
|
|
||
|
|
A = qdtext - CHAR = [\x80-\xFF]
|
||
|
|
B = qdtext & CHAR = [\t !\x23-\x7E]
|
||
|
|
C = CHAR - qdtext = [\x00-\x08\x0A-\x1F\x7F"]
|
||
|
|
|
||
|
|
These sets do not intersect, and notice <"> appears in set C so must be
|
||
|
|
preceded by <\>. But we still have a problem: <\> (0x5C) and all the
|
||
|
|
alphabetic characters are in set B, so the pattern \?B can match all
|
||
|
|
these:
|
||
|
|
|
||
|
|
c
|
||
|
|
\
|
||
|
|
\c
|
||
|
|
|
||
|
|
So the sequence \c\c\c... still produces exponential back-tracking. It
|
||
|
|
also fails to parse input like this correctly:
|
||
|
|
|
||
|
|
Sec-WebSocket-Extensions: a; b="c\", d"
|
||
|
|
|
||
|
|
Because the grammar allows a single backslash to appear by itself, this
|
||
|
|
is arguably a syntax error where the parameter `b` has value `c\` and
|
||
|
|
then a new extension `d` begins with a <"> appearing where it should
|
||
|
|
not.
|
||
|
|
|
||
|
|
So the core problem is with the grammar itself: `qdtext` matches a
|
||
|
|
single backslash <\>, and `quoted-pair` matches a pair <\\>. So given a
|
||
|
|
sequence of backslashes there's no canonical parse and the grammar is
|
||
|
|
ambiguous.
|
||
|
|
|
||
|
|
[RFC 7230][3] remedies this problem and makes the grammar clearer.
|
||
|
|
First, it defines `token` explicitly rather than implicitly:
|
||
|
|
|
||
|
|
token = 1*tchar
|
||
|
|
|
||
|
|
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
|
||
|
|
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
|
||
|
|
/ DIGIT / ALPHA
|
||
|
|
|
||
|
|
And second, it defines `quoted-string` so that backslashes cannot appear
|
||
|
|
on their own:
|
||
|
|
|
||
|
|
quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
|
||
|
|
qdtext = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
|
||
|
|
obs-text = %x80-FF
|
||
|
|
quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
|
||
|
|
|
||
|
|
where VCHAR is any printing ASCII character 0x21-0x7E. Notice `qdtext`
|
||
|
|
is just our previous definition but with 5C excluded, so it cannot
|
||
|
|
accept a single backslash.
|
||
|
|
|
||
|
|
This commit makes this modification to our matching patterns, and
|
||
|
|
thereby removes the ReDoS vector. Technically this means it does not
|
||
|
|
match the grammar of RFC 6455, but we expect this to have little or no
|
||
|
|
practical impact, especially since the one main protocol extension,
|
||
|
|
`permessage-deflate` ([RFC 7692][4]), does not have any string-valued
|
||
|
|
parameters.
|
||
|
|
|
||
|
|
[1]: https://tools.ietf.org/html/rfc6455#section-9.1
|
||
|
|
[2]: https://tools.ietf.org/html/rfc2616#section-2.2
|
||
|
|
[3]: https://tools.ietf.org/html/rfc7230#section-3.2.6
|
||
|
|
[4]: https://tools.ietf.org/html/rfc7692
|
||
|
|
|
||
|
|
---
|
||
|
|
lib/websocket/extensions/parser.rb | 2 +-
|
||
|
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
||
|
|
|
||
|
|
diff --git a/lib/websocket/extensions/parser.rb b/lib/websocket/extensions/parser.rb
|
||
|
|
index 06db917..38d1f19 100644
|
||
|
|
--- a/lib/websocket/extensions/parser.rb
|
||
|
|
+++ b/lib/websocket/extensions/parser.rb
|
||
|
|
@@ -6,7 +6,7 @@ module WebSocket
|
||
|
|
class Parser
|
||
|
|
TOKEN = /([!#\$%&'\*\+\-\.\^_`\|~0-9a-z]+)/
|
||
|
|
NOTOKEN = /([^!#\$%&'\*\+\-\.\^_`\|~0-9a-z])/
|
||
|
|
- QUOTED = /"((?:\\[\x00-\x7f]|[^\x00-\x08\x0a-\x1f\x7f"])*)"/
|
||
|
|
+ QUOTED = /"((?:\\[\x00-\x7f]|[^\x00-\x08\x0a-\x1f\x7f"\\])*)"/
|
||
|
|
PARAM = %r{#{TOKEN.source}(?:=(?:#{TOKEN.source}|#{QUOTED.source}))?}
|
||
|
|
EXT = %r{#{TOKEN.source}(?: *; *#{PARAM.source})*}
|
||
|
|
EXT_LIST = %r{^#{EXT.source}(?: *, *#{EXT.source})*$}
|
||
|
|
--
|
||
|
|
2.27.0
|
||
|
|
|