Compare commits

...

10 Commits

Author SHA1 Message Date
openeuler-ci-bot
336c5874d6
!66 [sync] PR-65: 回合两个补丁
From: @openeuler-sync-bot 
Reviewed-by: @dillon_chen 
Signed-off-by: @dillon_chen
2024-09-06 00:13:21 +00:00
guojunding
f642eeecd8 backport two patches
(cherry picked from commit 34b8091eaf4a40ec882a32043589232a5e1b5a14)
2024-09-05 17:26:13 +08:00
openeuler-ci-bot
1a3b683cf3
!62 [sync] PR-61: fix lang file declaration
From: @openeuler-sync-bot 
Reviewed-by: @dillon_chen 
Signed-off-by: @dillon_chen
2024-08-12 02:18:43 +00:00
Funda Wang
57270d0994 fix lang file declaration
(cherry picked from commit ed0950e9a2018e37588db20b3bd6eeccc2dedce9)
2024-08-09 16:46:36 +08:00
openeuler-ci-bot
a635dce455
!58 回合上游社区补丁
From: @addrexist 
Reviewed-by: @zhoupengcheng11, @gaoruoshu 
Signed-off-by: @gaoruoshu
2024-06-06 12:22:44 +00:00
wangziliang
8392fcf162 fix grep -m2 pattern bug 2024-06-03 10:11:56 +08:00
openeuler-ci-bot
05f4302413
!51 update to 3.11
From: @dillon_chen 
Reviewed-by: @overweight 
Signed-off-by: @overweight
2023-07-14 09:38:13 +00:00
dillon_chen
7c29d66a6f update to 3.11 2023-07-14 16:39:07 +08:00
openeuler-ci-bot
40b4a52cbc
!40 update version to 3.8
From: @gaoruoshu 
Reviewed-by: @hubin95 
Signed-off-by: @hubin95
2023-01-19 07:57:04 +00:00
gaoruoshu
314c208813 update version to 3.8 2023-01-19 15:32:36 +08:00
20 changed files with 187 additions and 1425 deletions

View File

@ -0,0 +1,42 @@
From eda769be72def8a14098af968e04cc6952fc53a3 Mon Sep 17 00:00:00 2001
From: Bruno Haible <bruno@clisp.org>
Date: Mon, 8 Jul 2024 14:06:16 +0200
Subject: tests: Fix recognition of cs_CZ.UTF-8 locale on FreeBSD.
* tests/fmbtest: Use 'locale charmap' to determine the locale's encoding.
* tests/foad1: Likewise.
---
tests/fmbtest | 2 +-
tests/foad1 | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/fmbtest b/tests/fmbtest
index e64b6ba..ddc25d9 100644
--- a/tests/fmbtest
+++ b/tests/fmbtest
@@ -10,7 +10,7 @@
cz=cs_CZ.UTF-8
# If cs_CZ.UTF-8 locale doesn't work, skip this test.
-LC_ALL=$cz locale -k LC_CTYPE 2>/dev/null | grep -q charmap.*UTF-8 \
+test "`LC_ALL=$cz locale charmap 2>/dev/null`" = UTF-8 \
|| skip_ this system lacks the $cz locale
# If matching is done in single-byte mode, skip this test too
diff --git a/tests/foad1 b/tests/foad1
index f10c1d7..3e87656 100644
--- a/tests/foad1
+++ b/tests/foad1
@@ -150,7 +150,7 @@ Exit $failures
# The rest of this file is meant to be executed under this locale.
LC_ALL=cs_CZ.UTF-8; export LC_ALL
# If the UTF-8 locale doesn't work, skip these tests silently.
-locale -k LC_CTYPE 2>/dev/null | grep -q "charmap.*UTF-8" || Exit $failures
+test "`locale charmap 2>/dev/null`" = UTF-8 || Exit $failures
# Test character class erroneously matching a '[' character.
grep_test "[/" "" "[[:alpha:]]" -E
--
2.9.3.windows.1

View File

@ -0,0 +1,53 @@
From 53b889155f5ee53404a9873f48300fe5b50321d9 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 21 May 2024 09:50:43 -0700
Subject: doc: fix troff typos
* doc/grep.in.1: Fix troff typos found by mandoc and groff.
Problem reported by Bjarni Ingi Gislason (bug#71087).
---
doc/grep.in.1 | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/doc/grep.in.1 b/doc/grep.in.1
index 236791e..eb6a5d9 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -23,6 +23,7 @@
. \}
.\}
.
+.as la
.if !\w|\*(la| \{\
.\" groff an-ext.tmac does not seem to be in use, so define the parts of
.\" it that are used below. For a copy of groff an-ext.tmac, please see:
@@ -245,7 +246,7 @@ If this option is used multiple times or is combined with the
option, search for all patterns given.
The empty file contains zero patterns, and therefore matches nothing.
If
-.IR FILE
+.I FILE
is
.B \-
, read patterns from standard input.
@@ -674,7 +675,7 @@ whose base name matches
Ignore any redundant trailing slashes in
.IR GLOB .
.TP
-.BR \-I
+.B \-I
Process a binary file as if it did not contain matching data; this is
equivalent to the
.B \-\^\-binary\-files=without\-match
@@ -749,7 +750,7 @@ Like the
or
.B \-\^\-null
option, this option can be used with commands like
-.B sort -z
+.B "sort \-z"
to process arbitrary file names.
.
.SH "REGULAR EXPRESSIONS"
--
2.9.3.windows.1

View File

@ -1,50 +0,0 @@
From ef6c7768b300678895348ba7c827fa919e3f1d5c Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 13 May 2022 23:28:30 -0700
Subject: [PATCH] build: update gnulib submodule to latest
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b19a10775e54f8ed17e3a8c08a72d261d8c26244
This fixes a bug introduced in 2019-12-18T05:41:27Z!eggert@cs.ucla.edu,
an earlier patch that fixed dfa.c to not match invalid UTF-8.
Unfortunately that patch had a couple of typos when dfa.c is
matching against the regular expression . (dot). One typo
caused dfa.c to incorrectly reject the valid UTF-8 sequences
(ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF, which
are some Hangul Syllables and Hangul Jamo Extended-B. The other
typo caused dfa.c to incorrectly reject the valid sequences
(F4)(88-8F)(80-BF)(80-BF) which correspond to U+108000 through
U+10FFFF (Supplemental Private Use Area plane B).
* lib/dfa.c (utf8_classes): Fix typos.
* tests/test-dfa-match.sh: Test the fix.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=ef6c7768b300678895348ba7c827fa919e3f1d5c
Conflict:delete ChangeLog and test-dfa-match.sh
---
lib/dfa.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/dfa.c b/lib/dfa.c
index a27d096f7..e88fabb44 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -1704,7 +1704,7 @@ add_utf8_anychar (struct dfa *dfa)
/* G. ed (just a token). */
/* H. 80-9f: 2nd byte of a "GHC" sequence. */
- CHARCLASS_INIT (0, 0, 0, 0, 0xffff, 0, 0, 0),
+ CHARCLASS_INIT (0, 0, 0, 0, 0xffffffff, 0, 0, 0),
/* I. f0 (just a token). */
@@ -1717,7 +1717,7 @@ add_utf8_anychar (struct dfa *dfa)
/* L. f4 (just a token). */
/* M. 80-8f: 2nd byte of a "LMCC" sequence. */
- CHARCLASS_INIT (0, 0, 0, 0, 0xff, 0, 0, 0),
+ CHARCLASS_INIT (0, 0, 0, 0, 0xffff, 0, 0, 0),
};
/* Define the character classes that are needed below. */
--
2.27.0

View File

@ -1,40 +0,0 @@
From 6f84f3be1cdd3aadacc42007582116d1c2c0a3e4 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 12 Nov 2021 21:30:25 -0800
Subject: [PATCH] =?UTF-8?q?grep:=20Don=E2=80=99t=20limit=20jitstack=5Fmax?=
=?UTF-8?q?=20to=20INT=5FMAX?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT
stack size.
---
src/pcresearch.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index daa0c42..bf966f8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -59,10 +59,16 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
{
while (true)
{
+ /* STACK_GROWTH_RATE is taken from PCRE's src/pcre2_jit_compile.c.
+ Going over the jitstack_max limit could trigger an int
+ overflow bug within PCRE. */
+ int STACK_GROWTH_RATE = 8192;
+ size_t jitstack_max = SIZE_MAX - (STACK_GROWTH_RATE - 1);
+
int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
search_offset, options, pc->data, pc->mcontext);
if (e == PCRE2_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
{
PCRE2_SIZE old_size = pc->jit_stack_size;
PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
--
1.8.3.1

View File

@ -1,56 +0,0 @@
From ad6de316cca655cd8b0b20b3e9dd18e7e98e443a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 21 Aug 2021 10:44:17 -0700
Subject: [PATCH] =?UTF-8?q?grep:=20avoid=20sticky=20problem=20with?=
=?UTF-8?q?=20=E2=80=98-f=20-=20-f=20-=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Inspired by bug#50129 even though this is a different bug.
* src/grep.c (main): For -f -, use clearerr (stdin) after
reading, so that grep -f - -f - reads stdin twice even
when stdin is a tty. Also, for -f FILE, report any
I/O error when closing FILE.
---
src/grep.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/src/grep.c b/src/grep.c
index 7a33686..b2a0566 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -2477,7 +2477,6 @@ main (int argc, char **argv)
int matcher = -1;
int opt;
int prev_optind, last_recursive;
- int fread_errno;
intmax_t default_context;
FILE *fp;
exit_failure = EXIT_TROUBLE;
@@ -2648,11 +2647,17 @@ main (int argc, char **argv)
if (cc == 0)
break;
}
- fread_errno = errno;
- if (ferror (fp))
- die (EXIT_TROUBLE, fread_errno, "%s", optarg);
- if (fp != stdin)
- fclose (fp);
+ int err = errno;
+ if (!ferror (fp))
+ {
+ err = 0;
+ if (fp == stdin)
+ clearerr (fp);
+ else if (fclose (fp) != 0)
+ err = errno;
+ }
+ if (err)
+ die (EXIT_TROUBLE, err, "%s", optarg);
/* Append final newline if file ended in non-newline. */
if (newkeycc != keycc && keys[newkeycc - 1] != '\n')
keys[newkeycc++] = '\n';
--
1.8.3.1

View File

@ -1,67 +0,0 @@
From b061d24916fb9a14da37a3f2a05cb80dc65cfd38 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 5 Dec 2022 14:16:45 -0800
Subject: [PATCH] backport: grep: bug backref in last of multiple patterns
---
src/dfasearch.c | 25 ++++++++++++-------------
tests/backref | 8 ++++++++
2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/src/dfasearch.c b/src/dfasearch.c
index d6afa8d..2d0e861 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -267,20 +267,19 @@ GEAcompile (char *pattern, size_t size, reg_syntax_t syntax_bits,
if (compilation_failed)
exit (EXIT_TROUBLE);
- if (prev <= patlim)
+ if (patlim < prev)
+ buflen--;
+ else if (pattern < prev)
{
- if (pattern < prev)
- {
- ptrdiff_t prevlen = patlim - prev;
- buf = xrealloc (buf, buflen + prevlen);
- memcpy (buf + buflen, prev, prevlen);
- buflen += prevlen;
- }
- else
- {
- buf = pattern;
- buflen = size;
- }
+ ptrdiff_t prevlen = patlim - prev;
+ buf = xrealloc (buf, buflen + prevlen);
+ memcpy (buf + buflen, prev, prevlen);
+ buflen += prevlen;
+ }
+ else
+ {
+ buf = pattern;
+ buflen = size;
}
/* In the match_words and match_lines cases, we use a different pattern
diff --git a/tests/backref b/tests/backref
index 947981b..5cc3060 100755
--- a/tests/backref
+++ b/tests/backref
@@ -43,4 +43,12 @@ if test $? -ne 2 ; then
failures=1
fi
+# https://bugs.gnu.org/36148#13
+echo 'Total failed: 2 (1 ignored)' |
+ grep -e '^Total failed: 0$' -e '^Total failed: \([0-9]*\) (\1 ignored)$'
+if test $? -ne 1 ; then
+ echo "Backref: Multiple -e test, test #5 failed"
+ failures=1
+fi
+
Exit $failures
--
2.30.1 (Apple Git-130)

View File

@ -1,38 +0,0 @@
From 0687c51c4792b997988c03a34a8b57717d9961cc Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 Aug 2021 13:58:13 -0700
Subject: [PATCH] grep: djb2 correction
Problem reported by Alex Murray (bug#50093).
* src/grep.c (hash_pattern): Use a nonzero initial value.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=0687c51c4792b997988c03a34a8b57717d9961cc
Conflict:NA
---
src/grep.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/grep.c b/src/grep.c
index 271b6b9..7a33686 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -126,7 +126,15 @@ static Hash_table *pattern_table;
static size_t _GL_ATTRIBUTE_PURE
hash_pattern (void const *pat, size_t n_buckets)
{
- size_t h = 0;
+ /* This uses the djb2 algorithm, except starting with a larger prime
+ in place of djb2's 5381, if size_t is wide enough. The primes
+ are taken from the primeth recurrence sequence
+ <https://oeis.org/A007097>. h15, h32 and h64 are the largest
+ sequence members that fit into 15, 32 and 64 bits, respectively.
+ Since any H will do, hashing works correctly on oddball machines
+ where size_t has some other width. */
+ uint_fast64_t h15 = 5381, h32 = 3657500101, h64 = 4123221751654370051;
+ size_t h = h64 <= SIZE_MAX ? h64 : h32 <= SIZE_MAX ? h32 : h15;
intptr_t pat_offset = (intptr_t) pat - 1;
unsigned char const *s = (unsigned char const *) pattern_array + pat_offset;
for ( ; *s != '\n'; s++)
--
2.27.0

View File

@ -1,126 +0,0 @@
From 5447010fdbdf3f1a874689dd41a7c916bb262b2a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 13 May 2022 23:46:21 -0700
Subject: [PATCH] grep: fix bug with . and some Hangul Syllables
* NEWS: Mention the fix, which comes from the recent Gnulib update.
* tests/hangul-syllable: New file.
* tests/Makefile.am (TESTS): Add it.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=5447010fdbdf3f1a874689dd41a7c916bb262b2a
Conflict:delete NEWS
---
tests/Makefile.am | 1 +
tests/hangul-syllable | 88 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 89 insertions(+)
create mode 100755 tests/hangul-syllable
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 708980d..d72637f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -110,6 +110,7 @@ TESTS = \
grep-dev-null \
grep-dev-null-out \
grep-dir \
+ hangul-syllable \
hash-collision-perf \
help-version \
high-bit-range \
diff --git a/tests/hangul-syllable b/tests/hangul-syllable
new file mode 100755
index 0000000..9f94d2e
--- /dev/null
+++ b/tests/hangul-syllable
@@ -0,0 +1,88 @@
+#!/bin/sh
+# grep 3.4 through 3.7 mishandled matching '.' against the valid UTF-8
+# sequences (ED)(90-9F)(80-BF) corresponding to U+D400 through U+D7FF,
+# which are some Hangul Syllables and Hangul Jamo Extended-B. They
+# also mishandled (F4)(88-8F)(80-BF)(80-BF) which correspond to
+# U+108000 through U+10FFFF (Supplemental Private Use Area plane B).
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+
+LC_ALL=en_US.UTF-8
+export LC_ALL
+
+check_char ()
+{
+ printf "$1\\n" >in || framewmork_failure_
+
+ grep $2 '^.$' in >out || fail=1
+ cmp in out || fail=1
+}
+
+fail=0
+
+# "." should match U+D45C HANGUL SYLLABLE PYO.
+check_char '\355\221\234'
+
+# Check boundary-condition characters
+# while we are at it.
+
+check_char '\0' -a
+check_char '\177'
+
+for i in 302 337; do
+ for j in 200 277; do
+ check_char "\\$i\\$j"
+ done
+done
+for i in 340; do
+ for j in 240 277; do
+ for k in 200 277; do
+ check_char "\\$i\\$j\\$k"
+ done
+ done
+done
+for i in 341 354 356 357; do
+ for j in 200 277; do
+ for k in 200 277; do
+ check_char "\\$i\\$j\\$k"
+ done
+ done
+done
+for i in 355; do
+ for j in 200 237; do
+ for k in 200 277; do
+ check_char "\\$i\\$j\\$k"
+ done
+ done
+done
+for i in 360; do
+ for j in 220 277; do
+ for k in 200 277; do
+ for l in 200 277; do
+ check_char "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+done
+for i in 361 363; do
+ for j in 200 277; do
+ for k in 200 277; do
+ for l in 200 277; do
+ check_char "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+done
+for i in 364; do
+ for j in 200 217; do
+ for k in 200 277; do
+ for l in 200 277; do
+ check_char "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+done
+
+Exit $fail
--
2.27.0

View File

@ -1,26 +0,0 @@
From ad6e5cbcf598f55cafe83a11487ea4a6694e433b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 14 Nov 2021 10:54:12 -0800
Subject: [PATCH] grep: fix minor -P memory leak
* src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
---
src/pcresearch.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index badcd4c..c287d99 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -184,6 +184,8 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
die (EXIT_TROUBLE, 0, "%s", ep);
}
+ pcre2_compile_context_free (ccontext);
+
pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
--
1.8.3.1

View File

@ -1,63 +0,0 @@
From e2aec8c91e9d6ed3fc76f9f145dec8a456ce623a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 24 Jun 2022 17:53:34 -0500
Subject: [PATCH] grep: fix regex compilation memory leaks
Problem reported by Jim Meyering in:
https://lists.gnu.org/r/grep-devel/2022-06/msg00012.html
* src/dfasearch.c (regex_compile): Fix memory leaks when SYNTAX_ONLY.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=e2aec8c91e9d6ed3fc76f9f145dec8a456ce623a
Conflict:context adaptation
---
src/dfasearch.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/dfasearch.c b/src/dfasearch.c
index d6afa8d..2875453 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -148,24 +148,32 @@ regex_compile (struct dfa_comp *dc, char const *p, ptrdiff_t len,
ptrdiff_t pcount, ptrdiff_t lineno, reg_syntax_t syntax_bits,
bool syntax_only)
{
- struct re_pattern_buffer pat0;
- struct re_pattern_buffer *pat = syntax_only ? &pat0 : &dc->patterns[pcount];
- pat->buffer = NULL;
- pat->allocated = 0;
+ struct re_pattern_buffer pat;
+ pat.buffer = NULL;
+ pat.allocated = 0;
/* Do not use a fastmap with -i, to work around glibc Bug#20381. */
- pat->fastmap = (syntax_only | match_icase) ? NULL : xmalloc (UCHAR_MAX + 1);
+ pat.fastmap = syntax_only | match_icase ? NULL : ximalloc (UCHAR_MAX + 1);
- pat->translate = NULL;
+ pat.translate = NULL;
if (syntax_only)
re_set_syntax (syntax_bits | RE_NO_SUB);
else
re_set_syntax (syntax_bits);
- char const *err = re_compile_pattern (p, len, pat);
+ char const *err = re_compile_pattern (p, len, &pat);
if (!err)
- return true;
+ {
+ if (syntax_only)
+ regfree (&pat);
+ else
+ dc->patterns[pcount] = pat;
+
+ return true;
+ }
+
+ free (pat.fastmap);
/* Emit a filename:lineno: prefix for patterns taken from files. */
size_t pat_lineno;
--
2.27.0

View File

@ -1,567 +0,0 @@
From e0d39a9133e1507345d73ac5aff85f037f39aa54 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com>
Date: Fri, 12 Nov 2021 16:45:04 -0800
Subject: [PATCH] grep: migrate to pcre2
Mostly a bug by bug translation of the original code to the PCRE2 API.
Code still could do with some optimizations but should be good as a
starting point.
The API changes the sign of some types and therefore some ugly casts
were needed, some of the changes are just to make sure all variables
fit into the newer types better.
Includes backward compatibility and could be made to build all the way
to 10.00, but assumes a recent enough version and has been tested with
10.23 (from CentOS 7, the oldest).
Performance seems equivalent, and it also seems functionally complete.
* m4/pcre.m4 (gl_FUNC_PCRE): Check for PCRE2, not the original PCRE.
* src/pcresearch.c (struct pcre_comp, jit_exec)
(Pcompile, Pexecute):
Use PCRE2, not the original PCRE.
* tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
---
doc/grep.in.1 | 8 +-
doc/grep.texi | 2 +-
m4/pcre.m4 | 21 ++--
src/pcresearch.c | 249 +++++++++++++++++++++++------------------------
tests/filename-lineno.pl | 4 +-
5 files changed, 138 insertions(+), 146 deletions(-)
diff --git a/doc/grep.in.1 b/doc/grep.in.1
index e8854f2..21bb471 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -767,7 +767,7 @@ In other implementations, basic regular expressions are less powerful.
The following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and are
-documented in B<pcresyntax>(3) and B<pcrepattern>(3), but work only if
+documented in B<pcre2syntax>(3) and B<pcre2pattern>(3), but work only if
PCRE support is enabled.
.PP
The fundamental building blocks are the regular expressions
@@ -1371,9 +1371,9 @@ from the globbing syntax that the shell uses to match file names.
.BR sort (1),
.BR xargs (1),
.BR read (2),
-.BR pcre (3),
-.BR pcresyntax (3),
-.BR pcrepattern (3),
+.BR pcre2 (3),
+.BR pcre2syntax (3),
+.BR pcre2pattern (3),
.BR terminfo (5),
.BR glob (7),
.BR regex (7)
diff --git a/doc/grep.texi b/doc/grep.texi
index 01ac81e..aae8571 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1186,7 +1186,7 @@ In other implementations, basic regular expressions are less powerful.
The following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and
-are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
+are documented in the @i{pcre2syntax}(3) and @i{pcre2pattern}(3) manual
pages, but work only if PCRE is available in the system.
@menu
diff --git a/m4/pcre.m4 b/m4/pcre.m4
index 78b7fda..a1c6c82 100644
--- a/m4/pcre.m4
+++ b/m4/pcre.m4
@@ -1,4 +1,4 @@
-# pcre.m4 - check for libpcre support
+# pcre.m4 - check for PCRE library support
# Copyright (C) 2010-2021 Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
@@ -9,7 +9,7 @@ AC_DEFUN([gl_FUNC_PCRE],
[
AC_ARG_ENABLE([perl-regexp],
AS_HELP_STRING([--disable-perl-regexp],
- [disable perl-regexp (pcre) support]),
+ [disable perl-regexp (pcre2) support]),
[case $enableval in
yes|no) test_pcre=$enableval;;
*) AC_MSG_ERROR([invalid value $enableval for --disable-perl-regexp]);;
@@ -21,24 +21,25 @@ AC_DEFUN([gl_FUNC_PCRE],
use_pcre=no
if test $test_pcre != no; then
- PKG_CHECK_MODULES([PCRE], [libpcre], [], [: ${PCRE_LIBS=-lpcre}])
+ PKG_CHECK_MODULES([PCRE], [libpcre2-8], [], [: ${PCRE_LIBS=-lpcre2-8}])
- AC_CACHE_CHECK([for pcre_compile], [pcre_cv_have_pcre_compile],
+ AC_CACHE_CHECK([for pcre2_compile], [pcre_cv_have_pcre2_compile],
[pcre_saved_CFLAGS=$CFLAGS
pcre_saved_LIBS=$LIBS
CFLAGS="$CFLAGS $PCRE_CFLAGS"
LIBS="$PCRE_LIBS $LIBS"
AC_LINK_IFELSE(
- [AC_LANG_PROGRAM([[#include <pcre.h>
+ [AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8
+ #include <pcre2.h>
]],
- [[pcre *p = pcre_compile (0, 0, 0, 0, 0);
+ [[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0);
return !p;]])],
- [pcre_cv_have_pcre_compile=yes],
- [pcre_cv_have_pcre_compile=no])
+ [pcre_cv_have_pcre2_compile=yes],
+ [pcre_cv_have_pcre2_compile=no])
CFLAGS=$pcre_saved_CFLAGS
LIBS=$pcre_saved_LIBS])
- if test "$pcre_cv_have_pcre_compile" = yes; then
+ if test "$pcre_cv_have_pcre2_compile" = yes; then
use_pcre=yes
elif test $test_pcre = maybe; then
AC_MSG_WARN([AC_PACKAGE_NAME will be built without pcre support.])
@@ -50,7 +51,7 @@ AC_DEFUN([gl_FUNC_PCRE],
if test $use_pcre = yes; then
AC_DEFINE([HAVE_LIBPCRE], [1],
[Define to 1 if you have the Perl Compatible Regular Expressions
- library (-lpcre).])
+ library (-lpcre2).])
else
PCRE_CFLAGS=
PCRE_LIBS=
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 8070d06..2916d31 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -17,41 +17,32 @@
02110-1301, USA. */
/* Written August 1992 by Mike Haertel. */
+/* Updated for PCRE2 by Carlo Arenas. */
#include <config.h>
#include "search.h"
#include "die.h"
-#include <pcre.h>
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
-/* This must be at least 2; everything after that is for performance
- in pcre_exec. */
-enum { NSUB = 300 };
-
-#ifndef PCRE_EXTRA_MATCH_LIMIT_RECURSION
-# define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0
-#endif
-#ifndef PCRE_STUDY_JIT_COMPILE
-# define PCRE_STUDY_JIT_COMPILE 0
-#endif
-#ifndef PCRE_STUDY_EXTRA_NEEDED
-# define PCRE_STUDY_EXTRA_NEEDED 0
+/* Needed for backward compatibility for PCRE2 < 10.30 */
+#ifndef PCRE2_CONFIG_DEPTHLIMIT
+#define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
+#define PCRE2_ERROR_DEPTHLIMIT PCRE2_ERROR_RECURSIONLIMIT
+#define pcre2_set_depth_limit pcre2_set_recursion_limit
#endif
struct pcre_comp
{
- /* Compiled internal form of a Perl regular expression. */
- pcre *cre;
-
- /* Additional information about the pattern. */
- pcre_extra *extra;
-
-#if PCRE_STUDY_JIT_COMPILE
/* The JIT stack and its maximum size. */
- pcre_jit_stack *jit_stack;
- int jit_stack_size;
-#endif
+ pcre2_jit_stack *jit_stack;
+ PCRE2_SIZE jit_stack_size;
+ /* Compiled internal form of a Perl regular expression. */
+ pcre2_code *cre;
+ pcre2_match_context *mcontext;
+ pcre2_match_data *data;
/* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
string matches when that flag is used. */
int empty_match[2];
@@ -60,54 +51,49 @@ struct pcre_comp
/* Match the already-compiled PCRE pattern against the data in SUBJECT,
of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
- options OPTIONS, and storing resulting matches into SUB. Return
- the (nonnegative) match location or a (negative) error number. */
+ options OPTIONS.
+ Return the (nonnegative) match count or a (negative) error number. */
static int
-jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
- int search_offset, int options, int *sub)
+jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
+ PCRE2_SIZE search_offset, int options)
{
while (true)
{
- int e = pcre_exec (pc->cre, pc->extra, subject, search_bytes,
- search_offset, options, sub, NSUB);
-
-#if PCRE_STUDY_JIT_COMPILE
- /* Going over this would trigger an int overflow bug within PCRE. */
- int jitstack_max = INT_MAX - 8 * 1024;
-
- if (e == PCRE_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
+ int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
+ search_offset, options, pc->data, pc->mcontext);
+ if (e == PCRE2_ERROR_JIT_STACKLIMIT
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
{
- int old_size = pc->jit_stack_size;
- int new_size = pc->jit_stack_size = old_size * 2;
+ PCRE2_SIZE old_size = pc->jit_stack_size;
+ PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
+
if (pc->jit_stack)
- pcre_jit_stack_free (pc->jit_stack);
- pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size);
- if (!pc->jit_stack)
+ pcre2_jit_stack_free (pc->jit_stack);
+ pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL);
+
+ if (!pc->mcontext)
+ pc->mcontext = pcre2_match_context_create (NULL);
+
+ if (!pc->jit_stack || !pc->mcontext)
die (EXIT_TROUBLE, 0,
_("failed to allocate memory for the PCRE JIT stack"));
- pcre_assign_jit_stack (pc->extra, NULL, pc->jit_stack);
+ pcre2_jit_stack_assign (pc->mcontext, NULL, pc->jit_stack);
continue;
}
-#endif
-
-#if PCRE_EXTRA_MATCH_LIMIT_RECURSION
- if (e == PCRE_ERROR_RECURSIONLIMIT
- && (PCRE_STUDY_EXTRA_NEEDED || pc->extra))
+ if (e == PCRE2_ERROR_DEPTHLIMIT)
{
- unsigned long lim
- = (pc->extra->flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION
- ? pc->extra->match_limit_recursion
- : 0);
- if (lim <= ULONG_MAX / 2)
- {
- pc->extra->match_limit_recursion = lim ? 2 * lim : (1 << 24) - 1;
- pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION;
- continue;
- }
- }
-#endif
+ uint32_t lim;
+ pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
+ if (lim >= UINT32_MAX / 2)
+ return e;
+
+ lim <<= 1;
+ if (!pc->mcontext)
+ pc->mcontext = pcre2_match_context_create (NULL);
+ pcre2_set_depth_limit (pc->mcontext, lim);
+ continue;
+ }
return e;
}
}
@@ -118,27 +104,35 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
void *
Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
{
- int e;
- char const *ep;
+ PCRE2_SIZE e;
+ int ec;
+ PCRE2_UCHAR8 ep[128]; /* 120 code units is suggested to avoid truncation */
static char const wprefix[] = "(?<!\\w)(?:";
static char const wsuffix[] = ")(?!\\w)";
static char const xprefix[] = "^(?:";
static char const xsuffix[] = ")$";
int fix_len_max = MAX (sizeof wprefix - 1 + sizeof wsuffix - 1,
sizeof xprefix - 1 + sizeof xsuffix - 1);
- char *re = xnmalloc (4, size + (fix_len_max + 4 - 1) / 4);
- int flags = PCRE_DOLLAR_ENDONLY | (match_icase ? PCRE_CASELESS : 0);
+ unsigned char *re = xmalloc (size + fix_len_max + 1);
+ int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
char *patlim = pattern + size;
- char *n = re;
- char const *p;
- char const *pnul;
+ char *n = (char *)re;
struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
+ pcre2_compile_context *ccontext = pcre2_compile_context_create(NULL);
if (localeinfo.multibyte)
{
if (! localeinfo.using_utf8)
die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
- flags |= PCRE_UTF8;
+ flags |= PCRE2_UTF;
+#if 0
+ /* do not match individual code units but only UTF-8 */
+ flags |= PCRE2_NEVER_BACKSLASH_C;
+#endif
+#ifdef PCRE2_MATCH_INVALID_UTF
+ /* consider invalid UTF-8 as a barrier, instead of error */
+ flags |= PCRE2_MATCH_INVALID_UTF;
+#endif
}
/* FIXME: Remove this restriction. */
@@ -151,56 +145,42 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
if (match_lines)
strcpy (n, xprefix);
n += strlen (n);
-
- /* The PCRE interface doesn't allow NUL bytes in the pattern, so
- replace each NUL byte in the pattern with the four characters
- "\000", removing a preceding backslash if there are an odd
- number of backslashes before the NUL. */
- *patlim = '\0';
- for (p = pattern; (pnul = p + strlen (p)) < patlim; p = pnul + 1)
+ memcpy (n, pattern, size);
+ n += size;
+ if (match_words && !match_lines)
{
- memcpy (n, p, pnul - p);
- n += pnul - p;
- for (p = pnul; pattern < p && p[-1] == '\\'; p--)
- continue;
- n -= (pnul - p) & 1;
- strcpy (n, "\\000");
- n += 4;
- }
- memcpy (n, p, patlim - p + 1);
- n += patlim - p;
- *patlim = '\n';
-
- if (match_words)
strcpy (n, wsuffix);
+ n += strlen(wsuffix);
+ }
if (match_lines)
+ {
strcpy (n, xsuffix);
+ n += strlen(xsuffix);
+ }
- pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ());
+ pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
+ pc->cre = pcre2_compile (re, n - (char *)re, flags, &ec, &e, ccontext);
if (!pc->cre)
- die (EXIT_TROUBLE, 0, "%s", ep);
-
- int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE;
- pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep);
- if (ep)
- die (EXIT_TROUBLE, 0, "%s", ep);
+ {
+ pcre2_get_error_message (ec, ep, sizeof (ep));
+ die (EXIT_TROUBLE, 0, "%s", ep);
+ }
-#if PCRE_STUDY_JIT_COMPILE
- if (pcre_fullinfo (pc->cre, pc->extra, PCRE_INFO_JIT, &e))
- die (EXIT_TROUBLE, 0, _("internal error (should never happen)"));
+ pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
- /* The PCRE documentation says that a 32 KiB stack is the default. */
- if (e)
- pc->jit_stack_size = 32 << 10;
-#endif
+ ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
+ if (ec && ec != PCRE2_ERROR_JIT_BADOPTION && ec != PCRE2_ERROR_NOMEMORY)
+ die (EXIT_TROUBLE, 0, _("JIT internal error: %d"), ec);
+ else
+ {
+ /* The PCRE documentation says that a 32 KiB stack is the default. */
+ pc->jit_stack_size = 32 << 10;
+ }
free (re);
- int sub[NSUB];
- pc->empty_match[false] = pcre_exec (pc->cre, pc->extra, "", 0, 0,
- PCRE_NOTBOL, sub, NSUB);
- pc->empty_match[true] = pcre_exec (pc->cre, pc->extra, "", 0, 0, 0, sub,
- NSUB);
+ pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
+ pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
return pc;
}
@@ -209,15 +189,15 @@ size_t
Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
char const *start_ptr)
{
- int sub[NSUB];
char const *p = start_ptr ? start_ptr : buf;
bool bol = p[-1] == eolbyte;
char const *line_start = buf;
- int e = PCRE_ERROR_NOMATCH;
+ int e = PCRE2_ERROR_NOMATCH;
char const *line_end;
struct pcre_comp *pc = vcp;
+ PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data);
- /* The search address to pass to pcre_exec. This is the start of
+ /* The search address to pass to PCRE. This is the start of
the buffer, or just past the most-recently discovered encoding
error or line end. */
char const *subject = buf;
@@ -229,14 +209,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
better and the correctness issues were too puzzling. See
Bug#22655. */
line_end = rawmemchr (p, eolbyte);
- if (INT_MAX < line_end - p)
+ if (PCRE2_SIZE_MAX < line_end - p)
die (EXIT_TROUBLE, 0, _("exceeded PCRE's line length limit"));
for (;;)
{
/* Skip past bytes that are easily determined to be encoding
errors, treating them as data that cannot match. This is
- faster than having pcre_exec check them. */
+ faster than having PCRE check them. */
while (localeinfo.sbclen[to_uchar (*p)] == -1)
{
p++;
@@ -244,10 +224,10 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
bol = false;
}
- int search_offset = p - subject;
+ PCRE2_SIZE search_offset = p - subject;
/* Check for an empty match; this is faster than letting
- pcre_exec do it. */
+ PCRE do it. */
if (p == line_end)
{
sub[0] = sub[1] = search_offset;
@@ -257,13 +237,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
int options = 0;
if (!bol)
- options |= PCRE_NOTBOL;
+ options |= PCRE2_NOTBOL;
- e = jit_exec (pc, subject, line_end - subject, search_offset,
- options, sub);
- if (e != PCRE_ERROR_BADUTF8)
+ e = jit_exec (pc, subject, line_end - subject,
+ search_offset, options);
+ /* PCRE2 provides 22 different error codes for bad UTF-8 */
+ if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
break;
- int valid_bytes = sub[0];
+ PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
if (search_offset <= valid_bytes)
{
@@ -273,14 +254,15 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
/* Handle the empty-match case specially, for speed.
This optimization is valid if VALID_BYTES is zero,
which means SEARCH_OFFSET is also zero. */
+ sub[0] = valid_bytes;
sub[1] = 0;
e = pc->empty_match[bol];
}
else
e = jit_exec (pc, subject, valid_bytes, search_offset,
- options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
+ options | PCRE2_NO_UTF_CHECK | PCRE2_NOTEOL);
- if (e != PCRE_ERROR_NOMATCH)
+ if (e != PCRE2_ERROR_NOMATCH)
break;
/* Treat the encoding error as data that cannot match. */
@@ -291,7 +273,7 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
subject += valid_bytes + 1;
}
- if (e != PCRE_ERROR_NOMATCH)
+ if (e != PCRE2_ERROR_NOMATCH)
break;
bol = true;
p = subject = line_start = line_end + 1;
@@ -302,26 +284,35 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
{
switch (e)
{
- case PCRE_ERROR_NOMATCH:
+ case PCRE2_ERROR_NOMATCH:
break;
- case PCRE_ERROR_NOMEMORY:
+ case PCRE2_ERROR_NOMEMORY:
die (EXIT_TROUBLE, 0, _("%s: memory exhausted"), input_filename ());
-#if PCRE_STUDY_JIT_COMPILE
- case PCRE_ERROR_JIT_STACKLIMIT:
+ case PCRE2_ERROR_JIT_STACKLIMIT:
die (EXIT_TROUBLE, 0, _("%s: exhausted PCRE JIT stack"),
input_filename ());
-#endif
- case PCRE_ERROR_MATCHLIMIT:
+ case PCRE2_ERROR_MATCHLIMIT:
die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's backtracking limit"),
input_filename ());
- case PCRE_ERROR_RECURSIONLIMIT:
- die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's recursion limit"),
+ case PCRE2_ERROR_DEPTHLIMIT:
+ die (EXIT_TROUBLE, 0,
+ _("%s: exceeded PCRE's nested backtracking limit"),
input_filename ());
+ case PCRE2_ERROR_RECURSELOOP:
+ die (EXIT_TROUBLE, 0, _("%s: PCRE detected recurse loop"),
+ input_filename ());
+
+#ifdef PCRE2_ERROR_HEAPLIMIT
+ case PCRE2_ERROR_HEAPLIMIT:
+ die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"),
+ input_filename ());
+#endif
+
default:
/* For now, we lump all remaining PCRE failures into this basket.
If anyone cares to provide sample grep usage that can trigger
diff --git a/tests/filename-lineno.pl b/tests/filename-lineno.pl
index 1e84b45..1ff3d6a 100755
--- a/tests/filename-lineno.pl
+++ b/tests/filename-lineno.pl
@@ -101,13 +101,13 @@ my @Tests =
],
['invalid-re-P-paren', '-P ")"', {EXIT=>2},
{ERR => $ENV{PCRE_WORKS} == 1
- ? "$prog: unmatched parentheses\n"
+ ? "$prog: unmatched closing parenthesis\n"
: $no_pcre
},
],
['invalid-re-P-star-paren', '-P "a.*)"', {EXIT=>2},
{ERR => $ENV{PCRE_WORKS} == 1
- ? "$prog: unmatched parentheses\n"
+ ? "$prog: unmatched closing parenthesis\n"
: $no_pcre
},
],
--
1.8.3.1

View File

@ -1,50 +0,0 @@
From af79b17356f2edeca2908c14d922a24f659d4a96 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 20 Nov 2021 22:53:55 -0800
Subject: [PATCH] =?UTF-8?q?grep:=20-s=20does=20not=20suppress=20?=
=?UTF-8?q?=E2=80=9Cbinary=20file=20matches=E2=80=9D?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/grep.c (grep): Implement this.
* tests/binary-file-matches: Add regression test.
---
src/grep.c | 2 +-
tests/binary-file-matches | 8 +++++---
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/src/grep.c b/src/grep.c
index a55194c..19dff43 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1646,7 +1646,7 @@ grep (int fd, struct stat const *st, bool *ineof)
finish_grep:
done_on_match = done_on_match_0;
out_quiet = out_quiet_0;
- if (binary_files == BINARY_BINARY_FILES && ! (out_quiet | suppress_errors)
+ if (binary_files == BINARY_BINARY_FILES && !out_quiet
&& (encoding_error_output
|| (0 <= nlines_first_null && nlines_first_null < nlines)))
error (0, 0, _("%s: binary file matches"), input_filename ());
diff --git a/tests/binary-file-matches b/tests/binary-file-matches
index 7fc4a11..8fea071 100755
--- a/tests/binary-file-matches
+++ b/tests/binary-file-matches
@@ -14,8 +14,10 @@ fail=0
echo "grep: (standard input): binary file matches" > exp \
|| framework_failure_
-printf 'a\0' | grep a > out 2> err || fail=1
-compare /dev/null out || fail=1
-compare exp err || fail=1
+for option in '' -s; do
+ printf 'a\0' | grep $option a > out 2> err || fail=1
+ compare /dev/null out || fail=1
+ compare exp err || fail=1
+done
Exit $fail
--
1.8.3.1

View File

@ -1,44 +0,0 @@
From 5e3d207d5b7dba28ca248475188a029570766bc1 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 30 May 2022 17:03:26 -0700
Subject: [PATCH] grep: sanity-check GREP_COLOR
This patch closes a longstanding security issue with GREP_COLOR that I
just noticed, where if the attacker has control over GREP_COLOR's
settings the attacker can trash the victim's terminal or have 'grep'
generate misleading output. For example, without the patch
the shell command:
GREP_COLOR="$(printf '31m\33[2J\33[31')" grep --color=always PATTERN
mucks with the screen, leaving behind only the trailing part of
the last matching line. With the patch, this GREP_COLOR is ignored.
* src/grep.c (main): Sanity-check GREP_COLOR contents the same way
GREP_COLORS values are checked, to not trash the user's terminal.
This follows up the recent fix to Bug#55641.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=5e3d207d5b7dba28ca248475188a029570766bc1
Conflict:delete NEWS
---
src/grep.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/grep.c b/src/grep.c
index edefac6..59d3431 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -2911,7 +2911,12 @@ main (int argc, char **argv)
/* Legacy. */
char *userval = getenv ("GREP_COLOR");
if (userval != NULL && *userval != '\0')
- selected_match_color = context_match_color = userval;
+ for (char *q = userval; *q == ';' || c_isdigit (*q); q++)
+ if (!q[1])
+ {
+ selected_match_color = context_match_color = userval;
+ break;
+ }
/* New GREP_COLORS has priority. */
parse_grep_colors ();
--
2.27.0

View File

@ -1,51 +0,0 @@
From 6e1450408a7921771c41973761995e06445ba18b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 13 Nov 2021 13:52:23 -0800
Subject: [PATCH] grep: speed up, fix bad-UTF8 check with -P
* src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug
where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error.
Improve performance when PCRE2_MATCH_INVALID_UTF is defined.
(Pexecute): Use it.
---
src/pcresearch.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 286e1dc..953aca2 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -104,6 +104,18 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
}
}
+/* Return true if E is an error code for bad UTF-8, and if pcre2_match
+ could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */
+static bool
+bad_utf8_from_pcre2 (int e)
+{
+#ifdef PCRE2_MATCH_INVALID_UTF
+ return false;
+#else
+ return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1;
+#endif
+}
+
/* Compile the -P style PATTERN, containing SIZE bytes that are
followed by '\n'. Return a description of the compiled pattern. */
@@ -248,9 +260,9 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
e = jit_exec (pc, subject, line_end - subject,
search_offset, options);
- /* PCRE2 provides 22 different error codes for bad UTF-8 */
- if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
+ if (!bad_utf8_from_pcre2 (e))
break;
+
PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
if (search_offset <= valid_bytes)
--
1.8.3.1

View File

@ -1,35 +0,0 @@
From b3a85a1a8a816f4f6f9c01399c16efe92a86ca06 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 9 Nov 2021 10:11:42 -0800
Subject: [PATCH] grep: work around PCRE bug
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Problem reported by Carlo Marcelo Arenas Belón (Bug#51710).
* src/pcresearch.c (jit_exec): Dont attempt to grow the JIT stack
over INT_MAX - 8 * 1024.
---
src/pcresearch.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 3bdaee9..09f92c8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -72,8 +72,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
search_offset, options, sub, NSUB);
#if PCRE_STUDY_JIT_COMPILE
+ /* Going over this would trigger an int overflow bug within PCRE. */
+ int jitstack_max = INT_MAX - 8 * 1024;
+
if (e == PCRE_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
{
int old_size = pc->jit_stack_size;
int new_size = pc->jit_stack_size = old_size * 2;
--
1.8.3.1

View File

@ -1,182 +0,0 @@
From e4a71086bf8143ae083f4e97d8226f30c7e1a079 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 17 May 2022 13:47:44 -0700
Subject: [PATCH] =?UTF-8?q?tests:=20improve=20tests=20of=20=E2=80=98.?=
=?UTF-8?q?=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* tests/hangul-syllable: Test some encoding errors too.
Reference:https://git.savannah.gnu.org/cgit/grep.git/commit?id=e4a71086bf8143ae083f4e97d8226f30c7e1a079
Conflict:NA
---
tests/hangul-syllable | 89 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 1 deletion(-)
diff --git a/tests/hangul-syllable b/tests/hangul-syllable
index 9f94d2e..fce5c2c 100755
--- a/tests/hangul-syllable
+++ b/tests/hangul-syllable
@@ -12,6 +12,7 @@ require_en_utf8_locale_
LC_ALL=en_US.UTF-8
export LC_ALL
+# Check that '.' completely matches $1, i.e., that $1 is a single UTF-8 char.
check_char ()
{
printf "$1\\n" >in || framewmork_failure_
@@ -20,27 +21,52 @@ check_char ()
cmp in out || fail=1
}
+# Check that '.*' does not completely match $1, i.e., that
+# $1 contains an encoding error.
+check_nonchar ()
+{
+ printf "$1\\n" >in || framewmork_failure_
+
+ grep -a -v '^.*$' in >out || fail=1
+ cmp in out || fail=1
+}
+
fail=0
# "." should match U+D45C HANGUL SYLLABLE PYO.
check_char '\355\221\234'
-# Check boundary-condition characters
+# Check boundary-condition characters, and non-characters,
# while we are at it.
check_char '\0' -a
check_char '\177'
+check_nonchar '\200'
+check_nonchar '\277'
+check_nonchar '\300\200'
+check_nonchar '\301\277'
for i in 302 337; do
for j in 200 277; do
check_char "\\$i\\$j"
done
+ for j in 177 300; do
+ check_nonchar "\\$i\\$j"
+ done
done
for i in 340; do
for j in 240 277; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
+ for k in 177 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
+ done
+ for j in 239 300; do
+ for k in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
done
done
for i in 341 354 356 357; do
@@ -48,6 +74,14 @@ for i in 341 354 356 357; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
+ for k in 177 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
+ done
+ for j in 177 300; do
+ for k in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
done
done
for i in 355; do
@@ -55,6 +89,14 @@ for i in 355; do
for k in 200 277; do
check_char "\\$i\\$j\\$k"
done
+ for k in 177 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
+ done
+ for j in 177 240; do
+ for k in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k"
+ done
done
done
for i in 360; do
@@ -63,6 +105,21 @@ for i in 360; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
+ for l in 177 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ for k in 177 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+ for j in 217 300; do
+ for k in 177 200 277 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
done
done
done
@@ -72,6 +129,21 @@ for i in 361 363; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
+ for l in 177 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ for k in 177 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+ for j in 177 300; do
+ for k in 177 200 277 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
done
done
done
@@ -81,6 +153,21 @@ for i in 364; do
for l in 200 277; do
check_char "\\$i\\$j\\$k\\$l"
done
+ for l in 177 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ for k in 177 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
+ done
+ done
+ for j in 177 220; do
+ for k in 177 200 277 300; do
+ for l in 177 200 277 300; do
+ check_nonchar "\\$i\\$j\\$k\\$l"
+ done
done
done
done
--
2.27.0

62
fix-grep-m2-pattern.patch Normal file
View File

@ -0,0 +1,62 @@
From b9a8047099d2388c15e6ad39e7b8c91c6633096c Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 9 Feb 2024 01:06:49 -0800
Subject: =?UTF-8?q?grep:=20fix=20=E2=80=98grep=20-m2=20pattern=20<file=20>?=
=?UTF-8?q?/dev/null=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Problem reported by Grisha Levit <https://bugs.gnu.org/68989>.
* src/grep.c (grep, main): Dont set done_on_match if -m is used.
* tests/max-count-overread: Add a test case.
---
src/grep.c | 9 +++++++--
tests/max-count-overread | 6 ++++++
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/grep.c b/src/grep.c
index dab3be7..1256dfd 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1558,7 +1558,11 @@ grep (int fd, struct stat const *st, bool *ineof)
if (binary_files == WITHOUT_MATCH_BINARY_FILES)
return 0;
if (!count_matches)
- done_on_match = out_quiet = true;
+ {
+ out_quiet = true;
+ if (max_count == INTMAX_MAX)
+ done_on_match = true;
+ }
nlines_first_null = nlines;
nul_zapper = eol;
skip_nuls = skip_empty_lines;
@@ -2897,7 +2901,8 @@ main (int argc, char **argv)
if ((exit_on_match | dev_null_output) || list_files != LISTFILES_NONE)
{
count_matches = false;
- done_on_match = true;
+ if (max_count == INTMAX_MAX)
+ done_on_match = true;
}
out_quiet = count_matches | done_on_match;
diff --git a/tests/max-count-overread b/tests/max-count-overread
index 23c45cb..f829cc5 100755
--- a/tests/max-count-overread
+++ b/tests/max-count-overread
@@ -12,4 +12,10 @@ echo x > exp || framework_failure_
yes x | timeout 10 grep -m1 x > out || fail=1
compare exp out || fail=1
+# Make sure -m2 stops reading even when output is /dev/null.
+# In grep 3.11, it would continue reading.
+printf 'x\nx\nx\n' >in || framework_failure
+(grep -m2 x >/dev/null && head -n1) <in >out || fail=1
+compare exp out || fail=1
+
Exit $fail
--
cgit v1.1

BIN
grep-3.11.tar.xz Normal file

Binary file not shown.

Binary file not shown.

View File

@ -1,30 +1,19 @@
Name: grep
Version: 3.7
Release: 8
Version: 3.11
Release: 4
Summary: A string search utility
License: GPLv3+
URL: http://www.gnu.org/software/grep/
URL: https://www.gnu.org/software/grep
Source0: https://ftp.gnu.org/gnu/grep/grep-%{version}.tar.xz
Source1: color_grep.sh
Source2: colorgrep.csh
Source3: grepconf.sh
Patch1: backport-grep-avoid-sticky-problem-with-f-f.patch
Patch2: backport-grep-s-does-not-suppress-binary-file-matches.patch
Patch3: backport-grep-work-around-PCRE-bug.patch
Patch4: backport-grep-migrate-to-pcre2.patch
Patch5: backport-grep-Don-t-limit-jitstack_max-to-INT_MAX.patch
Patch6: backport-grep-speed-up-fix-bad-UTF8-check-with-P.patch
Patch7: backport-grep-fix-minor-P-memory-leak.patch
Patch8: backport-grep-djb2-correction.patch
Patch9: backport-build-update-gnulib-submodule-to-latest.patch
Patch10: backport-grep-fix-bug-with-and-some-Hangul-Syllables.patch
Patch11: backport-tests-improve-tests-of.patch
Patch12: backport-grep-sanity-check-GREP_COLOR.patch
Patch13: backport-grep-fix-regex-compilation-memory-leaks.patch
Patch14: backport-grep-bug-backref-in-last-of-multiple-patter.patch
Patch0001: fix-grep-m2-pattern.patch
Patch0002: backport-Fix-troff-typos-found-by-mandoc-and-groff.patch
Patch0003: backport-Fix-recognition-of-cs_CZ.UTF-8-locale-on-FreeBSD.patch
BuildRequires: gcc pcre2-devel texinfo gettext libsigsegv-devel automake
BuildRequires: gcc pcre2-devel texinfo gettext automake
Provides: /bin/egrep /bin/fgrep /bin/grep bundled(gnulib)
%description
@ -47,27 +36,38 @@ mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/profile.d
install -pm 644 %{SOURCE1} %{SOURCE2} $RPM_BUILD_ROOT%{_sysconfdir}/profile.d
install -Dpm 755 %{SOURCE3} $RPM_BUILD_ROOT%{_libexecdir}/grepconf.sh
%pre
%preun
%post
%postun
%find_lang %{name}
%check
make check
%make_build check
%files
%{_datadir}/locale/*
%files -f %{name}.lang
%config(noreplace) %{_sysconfdir}/profile.d/color_grep.sh
%config(noreplace) %{_sysconfdir}/profile.d/colorgrep.csh
%doc NEWS README THANKS TODO
%license COPYING AUTHORS
%doc NEWS README THANKS TODO AUTHORS
%license COPYING
%{_bindir}/*grep
%{_libexecdir}/grepconf.sh
%{_infodir}/grep.info.gz
%{_mandir}/man1/*grep.1.gz
%{_infodir}/grep.info*
%{_mandir}/man1/*grep.1*
%changelog
* Mon Aug 26 2024 guojunding <guojunding@kylinos.cn> - 3.11-4
- fix troff typos
- fix recognition of cs_CZ.UTF-8 locale on FreeBSD
* Tue Jul 30 2024 Funda Wang <fundawang@yeah.net> - 3.11-3
- fix lang file declaration
* Mon Jun 03 2024 wangziliang <wangziliang@kylinos.cn> - 3.11-2
- fix grep -m2 pattern bug
* Fri Jul 14 2023 dillon chen <dillon.chen@gmail.com> - 3.11-1
- update version to 3.11
* Thu Jan 19 2023 gaoruoshu <gaoruoshu@huawei.com> - 3.8-1
- update version to 3.8
* Mon Dec 26 2022 gaoruoshu <gaoruoshu@huawei.com> - 3.7-8
- backport patch from upstream