sync bugfix patches for pcre2

This commit is contained in:
panxiaohe 2022-06-28 17:37:08 +08:00
parent 2e8230ce33
commit 1da37c81e6
6 changed files with 216 additions and 601 deletions

View File

@ -0,0 +1,40 @@
From 6f84f3be1cdd3aadacc42007582116d1c2c0a3e4 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 12 Nov 2021 21:30:25 -0800
Subject: [PATCH] =?UTF-8?q?grep:=20Don=E2=80=99t=20limit=20jitstack=5Fmax?=
=?UTF-8?q?=20to=20INT=5FMAX?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/pcresearch.c (jit_exec): Remove arbitrary INT_MAX limit on JIT
stack size.
---
src/pcresearch.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index daa0c42..bf966f8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -59,10 +59,16 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
{
while (true)
{
+ /* STACK_GROWTH_RATE is taken from PCRE's src/pcre2_jit_compile.c.
+ Going over the jitstack_max limit could trigger an int
+ overflow bug within PCRE. */
+ int STACK_GROWTH_RATE = 8192;
+ size_t jitstack_max = SIZE_MAX - (STACK_GROWTH_RATE - 1);
+
int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
search_offset, options, pc->data, pc->mcontext);
if (e == PCRE2_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
{
PCRE2_SIZE old_size = pc->jit_stack_size;
PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
--
1.8.3.1

View File

@ -0,0 +1,26 @@
From ad6e5cbcf598f55cafe83a11487ea4a6694e433b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 14 Nov 2021 10:54:12 -0800
Subject: [PATCH] grep: fix minor -P memory leak
* src/pcresearch.c (Pcompile): Free ccontext when no longer needed.
---
src/pcresearch.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index badcd4c..c287d99 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -184,6 +184,8 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
die (EXIT_TROUBLE, 0, "%s", ep);
}
+ pcre2_compile_context_free (ccontext);
+
pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
--
1.8.3.1

View File

@ -1,7 +1,7 @@
From e0d39a9133e1507345d73ac5aff85f037f39aa54 Mon Sep 17 00:00:00 2001 From e0d39a9133e1507345d73ac5aff85f037f39aa54 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas@gmail.com>
Date: Fri, 12 Nov 2021 16:45:04 -0800 Date: Fri, 12 Nov 2021 16:45:04 -0800
Subject: grep: migrate to pcre2 Subject: [PATCH] grep: migrate to pcre2
Mostly a bug by bug translation of the original code to the PCRE2 API. Mostly a bug by bug translation of the original code to the PCRE2 API.
Code still could do with some optimizations but should be good as a Code still could do with some optimizations but should be good as a
@ -23,566 +23,15 @@ Performance seems equivalent, and it also seems functionally complete.
Use PCRE2, not the original PCRE. Use PCRE2, not the original PCRE.
* tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics. * tests/filename-lineno.pl: Adjust to match PCRE2 diagnostics.
--- ---
0001-grep-migrate-to-pcre2.patch | 543 +++++++++++++++++++++++++++++++ doc/grep.in.1 | 8 +-
doc/grep.in.1 | 8 +- doc/grep.texi | 2 +-
doc/grep.texi | 2 +- m4/pcre.m4 | 21 ++--
m4/pcre.m4 | 21 +- src/pcresearch.c | 249 +++++++++++++++++++++++------------------------
src/pcresearch.c | 244 +++++++------- tests/filename-lineno.pl | 4 +-
tests/filename-lineno.pl | 4 +- 5 files changed, 138 insertions(+), 146 deletions(-)
6 files changed, 681 insertions(+), 141 deletions(-)
create mode 100644 0001-grep-migrate-to-pcre2.patch
diff --git a/0001-grep-migrate-to-pcre2.patch b/0001-grep-migrate-to-pcre2.patch
new file mode 100644
index 0000000..8375f30
--- /dev/null
+++ b/0001-grep-migrate-to-pcre2.patch
@@ -0,0 +1,543 @@
+From 2b4c255e67ae835c18c5ec41f3b67dadfd190213 Mon Sep 17 00:00:00 2001
+From: licihua <licihua@huawei.com>
+Date: Sat, 14 May 2022 18:24:47 +0800
+Subject: [PATCH 1/1] grep: migrate to pcre2
+
+---
+ doc/grep.in.1 | 8 +-
+ doc/grep.texi | 2 +-
+ m4/pcre.m4 | 21 ++--
+ src/pcresearch.c | 244 +++++++++++++++++++--------------------
+ tests/filename-lineno.pl | 4 +-
+ 5 files changed, 138 insertions(+), 141 deletions(-)
+
+diff --git a/doc/grep.in.1 b/doc/grep.in.1
+index e8854f2..0178db1 100644
+--- a/doc/grep.in.1
++++ b/doc/grep.in.1
+@@ -767,7 +767,7 @@ In other implementations, basic regular expressions are less powerful.
+ The following description applies to extended regular expressions;
+ differences for basic regular expressions are summarized afterwards.
+ Perl-compatible regular expressions give additional functionality, and are
+-documented in B<pcresyntax>(3) and B<pcrepattern>(3), but work only if
++documented in B<pcres2yntax>(3) and B<pcre2pattern>(3), but work only if
+ PCRE support is enabled.
+ .PP
+ The fundamental building blocks are the regular expressions
+@@ -1371,9 +1371,9 @@ from the globbing syntax that the shell uses to match file names.
+ .BR sort (1),
+ .BR xargs (1),
+ .BR read (2),
+-.BR pcre (3),
+-.BR pcresyntax (3),
+-.BR pcrepattern (3),
++.BR pcre2 (3),
++.BR pcre2syntax (3),
++.BR pcre2pattern (3),
+ .BR terminfo (5),
+ .BR glob (7),
+ .BR regex (7)
+diff --git a/doc/grep.texi b/doc/grep.texi
+index 01ac81e..aae8571 100644
+--- a/doc/grep.texi
++++ b/doc/grep.texi
+@@ -1186,7 +1186,7 @@ In other implementations, basic regular expressions are less powerful.
+ The following description applies to extended regular expressions;
+ differences for basic regular expressions are summarized afterwards.
+ Perl-compatible regular expressions give additional functionality, and
+-are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
++are documented in the @i{pcre2syntax}(3) and @i{pcre2pattern}(3) manual
+ pages, but work only if PCRE is available in the system.
+
+ @menu
+diff --git a/m4/pcre.m4 b/m4/pcre.m4
+index 78b7fda..0ca510f 100644
+--- a/m4/pcre.m4
++++ b/m4/pcre.m4
+@@ -1,4 +1,4 @@
+-# pcre.m4 - check for libpcre support
++# pcre.m4 - check for PCRE library support
+
+ # Copyright (C) 2010-2021 Free Software Foundation, Inc.
+ # This file is free software; the Free Software Foundation
+@@ -9,7 +9,7 @@ AC_DEFUN([gl_FUNC_PCRE],
+ [
+ AC_ARG_ENABLE([perl-regexp],
+ AS_HELP_STRING([--disable-perl-regexp],
+- [disable perl-regexp (pcre) support]),
++ [disable perl-regexp (pcre2) support]),
+ [case $enableval in
+ yes|no) test_pcre=$enableval;;
+ *) AC_MSG_ERROR([invalid value $enableval for --disable-perl-regexp]);;
+@@ -21,24 +21,25 @@ AC_DEFUN([gl_FUNC_PCRE],
+ use_pcre=no
+
+ if test $test_pcre != no; then
+- PKG_CHECK_MODULES([PCRE], [libpcre], [], [: ${PCRE_LIBS=-lpcre}])
++ PKG_CHECK_MODULES([PCRE], [libpcre2-8], [], [: ${PCRE_LIBS=-lpcre2-8}])
+
+- AC_CACHE_CHECK([for pcre_compile], [pcre_cv_have_pcre_compile],
++ AC_CACHE_CHECK([for pcre2_compile], [pcre_cv_have_pcre2_compile],
+ [pcre_saved_CFLAGS=$CFLAGS
+ pcre_saved_LIBS=$LIBS
+ CFLAGS="$CFLAGS $PCRE_CFLAGS"
+ LIBS="$PCRE_LIBS $LIBS"
+ AC_LINK_IFELSE(
+- [AC_LANG_PROGRAM([[#include <pcre.h>
++ [AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8
++ #include <pcre2.h>
+ ]],
+- [[pcre *p = pcre_compile (0, 0, 0, 0, 0);
++ [[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0);
+ return !p;]])],
+- [pcre_cv_have_pcre_compile=yes],
+- [pcre_cv_have_pcre_compile=no])
++ [pcre_cv_have_pcre2_compile=yes],
++ [pcre_cv_have_pcre2_compile=no])
+ CFLAGS=$pcre_saved_CFLAGS
+ LIBS=$pcre_saved_LIBS])
+
+- if test "$pcre_cv_have_pcre_compile" = yes; then
++ if test "$pcre_cv_have_pcre2_compile" = yes; then
+ use_pcre=yes
+ elif test $test_pcre = maybe; then
+ AC_MSG_WARN([AC_PACKAGE_NAME will be built without pcre support.])
+@@ -50,7 +51,7 @@ AC_DEFUN([gl_FUNC_PCRE],
+ if test $use_pcre = yes; then
+ AC_DEFINE([HAVE_LIBPCRE], [1],
+ [Define to 1 if you have the Perl Compatible Regular Expressions
+- library (-lpcre).])
++ library (-lpcre2).])
+ else
+ PCRE_CFLAGS=
+ PCRE_LIBS=
+diff --git a/src/pcresearch.c b/src/pcresearch.c
+index 37f7e40..38dc010 100644
+--- a/src/pcresearch.c
++++ b/src/pcresearch.c
+@@ -17,40 +17,32 @@
+ 02110-1301, USA. */
+
+ /* Written August 1992 by Mike Haertel. */
++/* Updated for PCRE2 by Carlo Arenas. */
+
+ #include <config.h>
+ #include "search.h"
+ #include "die.h"
+
+-#include <pcre.h>
++#define PCRE2_CODE_UNIT_WIDTH 8
++#include <pcre2.h>
+
+-/* This must be at least 2; everything after that is for performance
+- in pcre_exec. */
+-enum { NSUB = 300 };
+-
+-#ifndef PCRE_EXTRA_MATCH_LIMIT_RECURSION
+-# define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0
+-#endif
+-#ifndef PCRE_STUDY_JIT_COMPILE
+-# define PCRE_STUDY_JIT_COMPILE 0
+-#endif
+-#ifndef PCRE_STUDY_EXTRA_NEEDED
+-# define PCRE_STUDY_EXTRA_NEEDED 0
++/* Needed for backward compatibility for PCRE2 < 10.30 */
++#ifndef PCRE2_CONFIG_DEPTHLIMIT
++#define PCRE2_CONFIG_DEPTHLIMIT PCRE2_CONFIG_RECURSIONLIMIT
++#define PCRE2_ERROR_DEPTHLIMIT PCRE2_ERROR_RECURSIONLIMIT
++#define pcre2_set_depth_limit pcre2_set_recursion_limit
+ #endif
+
+ struct pcre_comp
+ {
+- /* Compiled internal form of a Perl regular expression. */
+- pcre *cre;
+-
+- /* Additional information about the pattern. */
+- pcre_extra *extra;
+-
+-#if PCRE_STUDY_JIT_COMPILE
+ /* The JIT stack and its maximum size. */
+- pcre_jit_stack *jit_stack;
+- int jit_stack_size;
+-#endif
++ pcre2_jit_stack *jit_stack;
++ PCRE2_SIZE jit_stack_size;
++
++ /* Compiled internal form of a Perl regular expression. */
++ pcre2_code *cre;
++ pcre2_match_context *mcontext;
++ pcre2_match_data *data;
+
+ /* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
+ string matches when that flag is used. */
+@@ -60,51 +52,50 @@ struct pcre_comp
+
+ /* Match the already-compiled PCRE pattern against the data in SUBJECT,
+ of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
+- options OPTIONS, and storing resulting matches into SUB. Return
+- the (nonnegative) match location or a (negative) error number. */
++ options OPTIONS.
++ Return the (nonnegative) match count or a (negative) error number. */
+ static int
+-jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
+- int search_offset, int options, int *sub)
++jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
++ PCRE2_SIZE search_offset, int options)
+ {
+ while (true)
+ {
+- int e = pcre_exec (pc->cre, pc->extra, subject, search_bytes,
+- search_offset, options, sub, NSUB);
+-
+-#if PCRE_STUDY_JIT_COMPILE
+- if (e == PCRE_ERROR_JIT_STACKLIMIT
++ int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
++ search_offset, options, pc->data, pc->mcontext);
++ if (e == PCRE2_ERROR_JIT_STACKLIMIT
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+ {
+- int old_size = pc->jit_stack_size;
+- int new_size = pc->jit_stack_size = old_size * 2;
++ PCRE2_SIZE old_size = pc->jit_stack_size;
++ PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
+ if (pc->jit_stack)
+- pcre_jit_stack_free (pc->jit_stack);
+- pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size);
+- if (!pc->jit_stack)
++ pcre2_jit_stack_free (pc->jit_stack);
++ pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL);
++
++ if (!pc->mcontext)
++ pc->mcontext = pcre2_match_context_create (NULL);
++
++ if (!pc->jit_stack || !pc->mcontext)
+ die (EXIT_TROUBLE, 0,
+ _("failed to allocate memory for the PCRE JIT stack"));
+- pcre_assign_jit_stack (pc->extra, NULL, pc->jit_stack);
++ pcre2_jit_stack_assign (pc->mcontext, NULL, pc->jit_stack);
+ continue;
+ }
+-#endif
+
+-#if PCRE_EXTRA_MATCH_LIMIT_RECURSION
+- if (e == PCRE_ERROR_RECURSIONLIMIT
+- && (PCRE_STUDY_EXTRA_NEEDED || pc->extra))
++
++ if (e == PCRE2_ERROR_DEPTHLIMIT)
+ {
+- unsigned long lim
+- = (pc->extra->flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION
+- ? pc->extra->match_limit_recursion
+- : 0);
+- if (lim <= ULONG_MAX / 2)
+- {
+- pc->extra->match_limit_recursion = lim ? 2 * lim : (1 << 24) - 1;
+- pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION;
+- continue;
+- }
++ uint32_t lim;
++ pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
++ if (lim >= UINT32_MAX / 2)
++ return e;
++
++ lim <<= 1;
++ if (!pc->mcontext)
++ pc->mcontext = pcre2_match_context_create (NULL);
++
++ pcre2_set_depth_limit (pc->mcontext, lim);
++ continue;
+ }
+-#endif
+-
+ return e;
+ }
+ }
+@@ -115,27 +106,35 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
+ void *
+ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
+ {
+- int e;
+- char const *ep;
++ PCRE2_SIZE e;
++ int ec;
++ PCRE2_UCHAR8 ep[128]; /* 120 code units is suggested to avoid truncation */
+ static char const wprefix[] = "(?<!\\w)(?:";
+ static char const wsuffix[] = ")(?!\\w)";
+ static char const xprefix[] = "^(?:";
+ static char const xsuffix[] = ")$";
+ int fix_len_max = MAX (sizeof wprefix - 1 + sizeof wsuffix - 1,
+ sizeof xprefix - 1 + sizeof xsuffix - 1);
+- char *re = xnmalloc (4, size + (fix_len_max + 4 - 1) / 4);
+- int flags = PCRE_DOLLAR_ENDONLY | (match_icase ? PCRE_CASELESS : 0);
++ unsigned char *re = xmalloc (size + fix_len_max + 1);
++ int flags = PCRE2_DOLLAR_ENDONLY | (match_icase ? PCRE2_CASELESS : 0);
+ char *patlim = pattern + size;
+- char *n = re;
+- char const *p;
+- char const *pnul;
++ char *n = (char *)re;
+ struct pcre_comp *pc = xcalloc (1, sizeof (*pc));
++ pcre2_compile_context *ccontext = pcre2_compile_context_create(NULL);
+
+ if (localeinfo.multibyte)
+ {
+ if (! localeinfo.using_utf8)
+ die (EXIT_TROUBLE, 0, _("-P supports only unibyte and UTF-8 locales"));
+- flags |= PCRE_UTF8;
++ flags |= PCRE2_UTF;
++#if 0
++ /* do not match individual code units but only UTF-8 */
++ flags |= PCRE2_NEVER_BACKSLASH_C;
++#endif
++#ifdef PCRE2_MATCH_INVALID_UTF
++ /* consider invalid UTF-8 as a barrier, instead of error */
++ flags |= PCRE2_MATCH_INVALID_UTF;
++#endif
+ }
+
+ /* FIXME: Remove this restriction. */
+@@ -149,55 +148,43 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
+ strcpy (n, xprefix);
+ n += strlen (n);
+
+- /* The PCRE interface doesn't allow NUL bytes in the pattern, so
+- replace each NUL byte in the pattern with the four characters
+- "\000", removing a preceding backslash if there are an odd
+- number of backslashes before the NUL. */
+- *patlim = '\0';
+- for (p = pattern; (pnul = p + strlen (p)) < patlim; p = pnul + 1)
++ memcpy (n, pattern, size);
++ n += size;
++ if (match_words && !match_lines)
+ {
+- memcpy (n, p, pnul - p);
+- n += pnul - p;
+- for (p = pnul; pattern < p && p[-1] == '\\'; p--)
+- continue;
+- n -= (pnul - p) & 1;
+- strcpy (n, "\\000");
+- n += 4;
++ strcpy (n, wsuffix);
++ n += strlen(wsuffix);
+ }
+- memcpy (n, p, patlim - p + 1);
+- n += patlim - p;
+- *patlim = '\n';
+
+- if (match_words)
+- strcpy (n, wsuffix);
+ if (match_lines)
+- strcpy (n, xsuffix);
++ {
++ strcpy (n, xsuffix);
++ n += strlen(xsuffix);
++ }
+
+- pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ());
++ pcre2_set_character_tables (ccontext, pcre2_maketables (NULL));
++ pc->cre = pcre2_compile (re, n - (char *)re, flags, &ec, &e, ccontext);
+ if (!pc->cre)
+- die (EXIT_TROUBLE, 0, "%s", ep);
+-
+- int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE;
+- pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep);
+- if (ep)
+- die (EXIT_TROUBLE, 0, "%s", ep);
++ {
++ pcre2_get_error_message (ec, ep, sizeof (ep));
++ die (EXIT_TROUBLE, 0, "%s", ep);
++ }
+
+-#if PCRE_STUDY_JIT_COMPILE
+- if (pcre_fullinfo (pc->cre, pc->extra, PCRE_INFO_JIT, &e))
+- die (EXIT_TROUBLE, 0, _("internal error (should never happen)"));
++ pc->data = pcre2_match_data_create_from_pattern (pc->cre, NULL);
+
+- /* The PCRE documentation says that a 32 KiB stack is the default. */
+- if (e)
+- pc->jit_stack_size = 32 << 10;
+-#endif
++ ec = pcre2_jit_compile (pc->cre, PCRE2_JIT_COMPLETE);
++ if (ec && ec != PCRE2_ERROR_JIT_BADOPTION && ec != PCRE2_ERROR_NOMEMORY)
++ die (EXIT_TROUBLE, 0, _("JIT internal error: %d"), ec);
++ else
++ {
++ /* The PCRE documentation says that a 32 KiB stack is the default. */
++ pc->jit_stack_size = 32 << 10;
++ }
+
+ free (re);
+
+- int sub[NSUB];
+- pc->empty_match[false] = pcre_exec (pc->cre, pc->extra, "", 0, 0,
+- PCRE_NOTBOL, sub, NSUB);
+- pc->empty_match[true] = pcre_exec (pc->cre, pc->extra, "", 0, 0, 0, sub,
+- NSUB);
++ pc->empty_match[false] = jit_exec (pc, "", 0, 0, PCRE2_NOTBOL);
++ pc->empty_match[true] = jit_exec (pc, "", 0, 0, 0);
+
+ return pc;
+ }
+@@ -206,15 +193,14 @@ size_t
+ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ char const *start_ptr)
+ {
+- int sub[NSUB];
+ char const *p = start_ptr ? start_ptr : buf;
+ bool bol = p[-1] == eolbyte;
+ char const *line_start = buf;
+- int e = PCRE_ERROR_NOMATCH;
++ int e = PCRE2_ERROR_NOMATCH;
+ char const *line_end;
+ struct pcre_comp *pc = vcp;
+-
+- /* The search address to pass to pcre_exec. This is the start of
++ PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data);
++ /* The search address to pass to PCRE. This is the start of
+ the buffer, or just past the most-recently discovered encoding
+ error or line end. */
+ char const *subject = buf;
+@@ -226,14 +212,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ better and the correctness issues were too puzzling. See
+ Bug#22655. */
+ line_end = rawmemchr (p, eolbyte);
+- if (INT_MAX < line_end - p)
++ if (PCRE2_SIZE_MAX < line_end - p)
+ die (EXIT_TROUBLE, 0, _("exceeded PCRE's line length limit"));
+
+ for (;;)
+ {
+ /* Skip past bytes that are easily determined to be encoding
+ errors, treating them as data that cannot match. This is
+- faster than having pcre_exec check them. */
++ faster than having PCRE check them. */
+ while (localeinfo.sbclen[to_uchar (*p)] == -1)
+ {
+ p++;
+@@ -241,10 +227,10 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ bol = false;
+ }
+
+- int search_offset = p - subject;
++ PCRE2_SIZE search_offset = p - subject;
+
+ /* Check for an empty match; this is faster than letting
+- pcre_exec do it. */
++ PCRE do it. */
+ if (p == line_end)
+ {
+ sub[0] = sub[1] = search_offset;
+@@ -254,13 +240,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+
+ int options = 0;
+ if (!bol)
+- options |= PCRE_NOTBOL;
++ options |= PCRE2_NOTBOL;
+
+- e = jit_exec (pc, subject, line_end - subject, search_offset,
+- options, sub);
+- if (e != PCRE_ERROR_BADUTF8)
++ e = jit_exec (pc, subject, line_end - subject,
++ search_offset, options);
++ /* PCRE2 provides 22 different error codes for bad UTF-8 */
++ if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
+ break;
+- int valid_bytes = sub[0];
++ PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
+
+ if (search_offset <= valid_bytes)
+ {
+@@ -270,14 +257,15 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ /* Handle the empty-match case specially, for speed.
+ This optimization is valid if VALID_BYTES is zero,
+ which means SEARCH_OFFSET is also zero. */
++ sub[0] = valid_bytes;
+ sub[1] = 0;
+ e = pc->empty_match[bol];
+ }
+ else
+ e = jit_exec (pc, subject, valid_bytes, search_offset,
+- options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub);
++ options | PCRE2_NO_UTF_CHECK | PCRE2_NOTEOL);
+
+- if (e != PCRE_ERROR_NOMATCH)
++ if (e != PCRE2_ERROR_NOMATCH)
+ break;
+
+ /* Treat the encoding error as data that cannot match. */
+@@ -288,7 +276,7 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ subject += valid_bytes + 1;
+ }
+
+- if (e != PCRE_ERROR_NOMATCH)
++ if (e != PCRE2_ERROR_NOMATCH)
+ break;
+ bol = true;
+ p = subject = line_start = line_end + 1;
+@@ -299,26 +287,34 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
+ {
+ switch (e)
+ {
+- case PCRE_ERROR_NOMATCH:
++ case PCRE2_ERROR_NOMATCH:
+ break;
+
+- case PCRE_ERROR_NOMEMORY:
++ case PCRE2_ERROR_NOMEMORY:
+ die (EXIT_TROUBLE, 0, _("%s: memory exhausted"), input_filename ());
+
+-#if PCRE_STUDY_JIT_COMPILE
+- case PCRE_ERROR_JIT_STACKLIMIT:
++ case PCRE2_ERROR_JIT_STACKLIMIT:
+ die (EXIT_TROUBLE, 0, _("%s: exhausted PCRE JIT stack"),
+ input_filename ());
+-#endif
+
+- case PCRE_ERROR_MATCHLIMIT:
++ case PCRE2_ERROR_MATCHLIMIT:
+ die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's backtracking limit"),
+ input_filename ());
+
+- case PCRE_ERROR_RECURSIONLIMIT:
+- die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's recursion limit"),
++ case PCRE2_ERROR_DEPTHLIMIT:
++ die (EXIT_TROUBLE, 0,
++ _("%s: exceeded PCRE's nested backtracking limit"),
++ input_filename ());
++
++ case PCRE2_ERROR_RECURSELOOP:
++ die (EXIT_TROUBLE, 0, _("%s: PCRE detected recurse loop"),
+ input_filename ());
+
++#ifdef PCRE2_ERROR_HEAPLIMIT
++ case PCRE2_ERROR_HEAPLIMIT:
++ die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"),
++ input_filename ());
+++#endif
+ default:
+ /* For now, we lump all remaining PCRE failures into this basket.
+ If anyone cares to provide sample grep usage that can trigger
+diff --git a/tests/filename-lineno.pl b/tests/filename-lineno.pl
+index 1e84b45..1ff3d6a 100755
+--- a/tests/filename-lineno.pl
++++ b/tests/filename-lineno.pl
+@@ -101,13 +101,13 @@ my @Tests =
+ ],
+ ['invalid-re-P-paren', '-P ")"', {EXIT=>2},
+ {ERR => $ENV{PCRE_WORKS} == 1
+- ? "$prog: unmatched parentheses\n"
++ ? "$prog: unmatched closing parenthesis\n"
+ : $no_pcre
+ },
+ ],
+ ['invalid-re-P-star-paren', '-P "a.*)"', {EXIT=>2},
+ {ERR => $ENV{PCRE_WORKS} == 1
+- ? "$prog: unmatched parentheses\n"
++ ? "$prog: unmatched closing parenthesis\n"
+ : $no_pcre
+ },
+ ],
+--
+2.26.2
+
diff --git a/doc/grep.in.1 b/doc/grep.in.1 diff --git a/doc/grep.in.1 b/doc/grep.in.1
index e8854f2..0178db1 100644 index e8854f2..21bb471 100644
--- a/doc/grep.in.1 --- a/doc/grep.in.1
+++ b/doc/grep.in.1 +++ b/doc/grep.in.1
@@ -767,7 +767,7 @@ In other implementations, basic regular expressions are less powerful. @@ -767,7 +767,7 @@ In other implementations, basic regular expressions are less powerful.
@ -590,7 +39,7 @@ index e8854f2..0178db1 100644
differences for basic regular expressions are summarized afterwards. differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and are Perl-compatible regular expressions give additional functionality, and are
-documented in B<pcresyntax>(3) and B<pcrepattern>(3), but work only if -documented in B<pcresyntax>(3) and B<pcrepattern>(3), but work only if
+documented in B<pcres2yntax>(3) and B<pcre2pattern>(3), but work only if +documented in B<pcre2syntax>(3) and B<pcre2pattern>(3), but work only if
PCRE support is enabled. PCRE support is enabled.
.PP .PP
The fundamental building blocks are the regular expressions The fundamental building blocks are the regular expressions
@ -621,7 +70,7 @@ index 01ac81e..aae8571 100644
@menu @menu
diff --git a/m4/pcre.m4 b/m4/pcre.m4 diff --git a/m4/pcre.m4 b/m4/pcre.m4
index 78b7fda..0ca510f 100644 index 78b7fda..a1c6c82 100644
--- a/m4/pcre.m4 --- a/m4/pcre.m4
+++ b/m4/pcre.m4 +++ b/m4/pcre.m4
@@ -1,4 +1,4 @@ @@ -1,4 +1,4 @@
@ -654,8 +103,8 @@ index 78b7fda..0ca510f 100644
LIBS="$PCRE_LIBS $LIBS" LIBS="$PCRE_LIBS $LIBS"
AC_LINK_IFELSE( AC_LINK_IFELSE(
- [AC_LANG_PROGRAM([[#include <pcre.h> - [AC_LANG_PROGRAM([[#include <pcre.h>
+ [AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8 + [AC_LANG_PROGRAM([[#define PCRE2_CODE_UNIT_WIDTH 8
+ #include <pcre2.h> + #include <pcre2.h>
]], ]],
- [[pcre *p = pcre_compile (0, 0, 0, 0, 0); - [[pcre *p = pcre_compile (0, 0, 0, 0, 0);
+ [[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0); + [[pcre2_code *p = pcre2_compile (0, 0, 0, 0, 0, 0);
@ -682,10 +131,10 @@ index 78b7fda..0ca510f 100644
PCRE_CFLAGS= PCRE_CFLAGS=
PCRE_LIBS= PCRE_LIBS=
diff --git a/src/pcresearch.c b/src/pcresearch.c diff --git a/src/pcresearch.c b/src/pcresearch.c
index 37f7e40..caedf49 100644 index 8070d06..2916d31 100644
--- a/src/pcresearch.c --- a/src/pcresearch.c
+++ b/src/pcresearch.c +++ b/src/pcresearch.c
@@ -17,40 +17,32 @@ @@ -17,41 +17,32 @@
02110-1301, USA. */ 02110-1301, USA. */
/* Written August 1992 by Mike Haertel. */ /* Written August 1992 by Mike Haertel. */
@ -733,15 +182,15 @@ index 37f7e40..caedf49 100644
-#endif -#endif
+ pcre2_jit_stack *jit_stack; + pcre2_jit_stack *jit_stack;
+ PCRE2_SIZE jit_stack_size; + PCRE2_SIZE jit_stack_size;
+
+ /* Compiled internal form of a Perl regular expression. */ + /* Compiled internal form of a Perl regular expression. */
+ pcre2_code *cre; + pcre2_code *cre;
+ pcre2_match_context *mcontext; + pcre2_match_context *mcontext;
+ pcre2_match_data *data; + pcre2_match_data *data;
/* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty /* Table, indexed by ! (flag & PCRE_NOTBOL), of whether the empty
string matches when that flag is used. */ string matches when that flag is used. */
@@ -60,51 +52,50 @@ struct pcre_comp int empty_match[2];
@@ -60,54 +51,49 @@ struct pcre_comp
/* Match the already-compiled PCRE pattern against the data in SUBJECT, /* Match the already-compiled PCRE pattern against the data in SUBJECT,
of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with
@ -761,16 +210,21 @@ index 37f7e40..caedf49 100644
- search_offset, options, sub, NSUB); - search_offset, options, sub, NSUB);
- -
-#if PCRE_STUDY_JIT_COMPILE -#if PCRE_STUDY_JIT_COMPILE
- /* Going over this would trigger an int overflow bug within PCRE. */
- int jitstack_max = INT_MAX - 8 * 1024;
-
- if (e == PCRE_ERROR_JIT_STACKLIMIT - if (e == PCRE_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
+ int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes, + int e = pcre2_match (pc->cre, (PCRE2_SPTR)subject, search_bytes,
+ search_offset, options, pc->data, pc->mcontext); + search_offset, options, pc->data, pc->mcontext);
+ if (e == PCRE2_ERROR_JIT_STACKLIMIT + if (e == PCRE2_ERROR_JIT_STACKLIMIT
&& 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2) + && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
{ {
- int old_size = pc->jit_stack_size; - int old_size = pc->jit_stack_size;
- int new_size = pc->jit_stack_size = old_size * 2; - int new_size = pc->jit_stack_size = old_size * 2;
+ PCRE2_SIZE old_size = pc->jit_stack_size; + PCRE2_SIZE old_size = pc->jit_stack_size;
+ PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2; + PCRE2_SIZE new_size = pc->jit_stack_size = old_size * 2;
+
if (pc->jit_stack) if (pc->jit_stack)
- pcre_jit_stack_free (pc->jit_stack); - pcre_jit_stack_free (pc->jit_stack);
- pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size); - pc->jit_stack = pcre_jit_stack_alloc (old_size, new_size);
@ -778,7 +232,7 @@ index 37f7e40..caedf49 100644
+ pcre2_jit_stack_free (pc->jit_stack); + pcre2_jit_stack_free (pc->jit_stack);
+ pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL); + pc->jit_stack = pcre2_jit_stack_create (old_size, new_size, NULL);
+ +
+ if (!pc->mcontext) + if (!pc->mcontext)
+ pc->mcontext = pcre2_match_context_create (NULL); + pc->mcontext = pcre2_match_context_create (NULL);
+ +
+ if (!pc->jit_stack || !pc->mcontext) + if (!pc->jit_stack || !pc->mcontext)
@ -789,11 +243,10 @@ index 37f7e40..caedf49 100644
continue; continue;
} }
-#endif -#endif
-
-#if PCRE_EXTRA_MATCH_LIMIT_RECURSION -#if PCRE_EXTRA_MATCH_LIMIT_RECURSION
- if (e == PCRE_ERROR_RECURSIONLIMIT - if (e == PCRE_ERROR_RECURSIONLIMIT
- && (PCRE_STUDY_EXTRA_NEEDED || pc->extra)) - && (PCRE_STUDY_EXTRA_NEEDED || pc->extra))
+
+ if (e == PCRE2_ERROR_DEPTHLIMIT) + if (e == PCRE2_ERROR_DEPTHLIMIT)
{ {
- unsigned long lim - unsigned long lim
@ -806,6 +259,8 @@ index 37f7e40..caedf49 100644
- pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION; - pc->extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION;
- continue; - continue;
- } - }
- }
-#endif
+ uint32_t lim; + uint32_t lim;
+ pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim); + pcre2_config (PCRE2_CONFIG_DEPTHLIMIT, &lim);
+ if (lim >= UINT32_MAX / 2) + if (lim >= UINT32_MAX / 2)
@ -814,16 +269,14 @@ index 37f7e40..caedf49 100644
+ lim <<= 1; + lim <<= 1;
+ if (!pc->mcontext) + if (!pc->mcontext)
+ pc->mcontext = pcre2_match_context_create (NULL); + pc->mcontext = pcre2_match_context_create (NULL);
+
+ pcre2_set_depth_limit (pc->mcontext, lim); + pcre2_set_depth_limit (pc->mcontext, lim);
+ continue; + continue;
} + }
-#endif
-
return e; return e;
} }
} }
@@ -115,27 +106,35 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes, @@ -118,27 +104,35 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
void * void *
Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact) Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
{ {
@ -867,10 +320,11 @@ index 37f7e40..caedf49 100644
} }
/* FIXME: Remove this restriction. */ /* FIXME: Remove this restriction. */
@@ -149,55 +148,43 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact) @@ -151,56 +145,42 @@ Pcompile (char *pattern, size_t size, reg_syntax_t ignored, bool exact)
if (match_lines)
strcpy (n, xprefix); strcpy (n, xprefix);
n += strlen (n); n += strlen (n);
-
- /* The PCRE interface doesn't allow NUL bytes in the pattern, so - /* The PCRE interface doesn't allow NUL bytes in the pattern, so
- replace each NUL byte in the pattern with the four characters - replace each NUL byte in the pattern with the four characters
- "\000", removing a preceding backslash if there are an odd - "\000", removing a preceding backslash if there are an odd
@ -888,20 +342,19 @@ index 37f7e40..caedf49 100644
- n -= (pnul - p) & 1; - n -= (pnul - p) & 1;
- strcpy (n, "\\000"); - strcpy (n, "\\000");
- n += 4; - n += 4;
+ strcpy (n, wsuffix); - }
+ n += strlen(wsuffix);
}
- memcpy (n, p, patlim - p + 1); - memcpy (n, p, patlim - p + 1);
- n += patlim - p; - n += patlim - p;
- *patlim = '\n'; - *patlim = '\n';
-
- if (match_words) - if (match_words)
- strcpy (n, wsuffix); strcpy (n, wsuffix);
+ n += strlen(wsuffix);
+ }
if (match_lines) if (match_lines)
- strcpy (n, xsuffix);
+ { + {
+ strcpy (n, xsuffix); strcpy (n, xsuffix);
+ n += strlen(xsuffix); + n += strlen(xsuffix);
+ } + }
- pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ()); - pc->cre = pcre_compile (re, flags, &ep, &e, pcre_maketables ());
@ -949,7 +402,7 @@ index 37f7e40..caedf49 100644
return pc; return pc;
} }
@@ -206,15 +193,14 @@ size_t @@ -209,15 +189,15 @@ size_t
Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
char const *start_ptr) char const *start_ptr)
{ {
@ -961,14 +414,14 @@ index 37f7e40..caedf49 100644
+ int e = PCRE2_ERROR_NOMATCH; + int e = PCRE2_ERROR_NOMATCH;
char const *line_end; char const *line_end;
struct pcre_comp *pc = vcp; struct pcre_comp *pc = vcp;
-
- /* The search address to pass to pcre_exec. This is the start of
+ PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data); + PCRE2_SIZE *sub = pcre2_get_ovector_pointer (pc->data);
- /* The search address to pass to pcre_exec. This is the start of
+ /* The search address to pass to PCRE. This is the start of + /* The search address to pass to PCRE. This is the start of
the buffer, or just past the most-recently discovered encoding the buffer, or just past the most-recently discovered encoding
error or line end. */ error or line end. */
char const *subject = buf; char const *subject = buf;
@@ -226,14 +212,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -229,14 +209,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
better and the correctness issues were too puzzling. See better and the correctness issues were too puzzling. See
Bug#22655. */ Bug#22655. */
line_end = rawmemchr (p, eolbyte); line_end = rawmemchr (p, eolbyte);
@ -985,7 +438,7 @@ index 37f7e40..caedf49 100644
while (localeinfo.sbclen[to_uchar (*p)] == -1) while (localeinfo.sbclen[to_uchar (*p)] == -1)
{ {
p++; p++;
@@ -241,10 +227,10 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -244,10 +224,10 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
bol = false; bol = false;
} }
@ -998,7 +451,7 @@ index 37f7e40..caedf49 100644
if (p == line_end) if (p == line_end)
{ {
sub[0] = sub[1] = search_offset; sub[0] = sub[1] = search_offset;
@@ -254,13 +240,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -257,13 +237,14 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
int options = 0; int options = 0;
if (!bol) if (!bol)
@ -1018,7 +471,7 @@ index 37f7e40..caedf49 100644
if (search_offset <= valid_bytes) if (search_offset <= valid_bytes)
{ {
@@ -270,14 +257,15 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -273,14 +254,15 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
/* Handle the empty-match case specially, for speed. /* Handle the empty-match case specially, for speed.
This optimization is valid if VALID_BYTES is zero, This optimization is valid if VALID_BYTES is zero,
which means SEARCH_OFFSET is also zero. */ which means SEARCH_OFFSET is also zero. */
@ -1036,7 +489,7 @@ index 37f7e40..caedf49 100644
break; break;
/* Treat the encoding error as data that cannot match. */ /* Treat the encoding error as data that cannot match. */
@@ -288,7 +276,7 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -291,7 +273,7 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
subject += valid_bytes + 1; subject += valid_bytes + 1;
} }
@ -1045,7 +498,7 @@ index 37f7e40..caedf49 100644
break; break;
bol = true; bol = true;
p = subject = line_start = line_end + 1; p = subject = line_start = line_end + 1;
@@ -299,26 +287,34 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size, @@ -302,26 +284,35 @@ Pexecute (void *vcp, char const *buf, size_t size, size_t *match_size,
{ {
switch (e) switch (e)
{ {
@ -1085,6 +538,7 @@ index 37f7e40..caedf49 100644
+ die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"), + die (EXIT_TROUBLE, 0, _("%s: exceeded PCRE's heap limit"),
+ input_filename ()); + input_filename ());
+#endif +#endif
+
default: default:
/* For now, we lump all remaining PCRE failures into this basket. /* For now, we lump all remaining PCRE failures into this basket.
If anyone cares to provide sample grep usage that can trigger If anyone cares to provide sample grep usage that can trigger
@ -1109,5 +563,5 @@ index 1e84b45..1ff3d6a 100755
}, },
], ],
-- --
2.26.2 1.8.3.1

View File

@ -0,0 +1,51 @@
From 6e1450408a7921771c41973761995e06445ba18b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 13 Nov 2021 13:52:23 -0800
Subject: [PATCH] grep: speed up, fix bad-UTF8 check with -P
* src/pcresearch.c (bad_utf8_from_pcre2): New function. Fix bug
where PCRE2_ERROR_UTF8_ERR1 was not treated as an encoding error.
Improve performance when PCRE2_MATCH_INVALID_UTF is defined.
(Pexecute): Use it.
---
src/pcresearch.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 286e1dc..953aca2 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -104,6 +104,18 @@ jit_exec (struct pcre_comp *pc, char const *subject, PCRE2_SIZE search_bytes,
}
}
+/* Return true if E is an error code for bad UTF-8, and if pcre2_match
+ could return E because PCRE lacks PCRE2_MATCH_INVALID_UTF. */
+static bool
+bad_utf8_from_pcre2 (int e)
+{
+#ifdef PCRE2_MATCH_INVALID_UTF
+ return false;
+#else
+ return PCRE2_ERROR_UTF8_ERR21 <= e && e <= PCRE2_ERROR_UTF8_ERR1;
+#endif
+}
+
/* Compile the -P style PATTERN, containing SIZE bytes that are
followed by '\n'. Return a description of the compiled pattern. */
@@ -248,9 +260,9 @@ Pexecute (void *vcp, char const *buf, idx_t size, idx_t *match_size,
e = jit_exec (pc, subject, line_end - subject,
search_offset, options);
- /* PCRE2 provides 22 different error codes for bad UTF-8 */
- if (! (PCRE2_ERROR_UTF8_ERR21 <= e && e < PCRE2_ERROR_UTF8_ERR1))
+ if (!bad_utf8_from_pcre2 (e))
break;
+
PCRE2_SIZE valid_bytes = pcre2_get_startchar (pc->data);
if (search_offset <= valid_bytes)
--
1.8.3.1

View File

@ -0,0 +1,35 @@
From b3a85a1a8a816f4f6f9c01399c16efe92a86ca06 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 9 Nov 2021 10:11:42 -0800
Subject: [PATCH] grep: work around PCRE bug
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Problem reported by Carlo Marcelo Arenas Belón (Bug#51710).
* src/pcresearch.c (jit_exec): Dont attempt to grow the JIT stack
over INT_MAX - 8 * 1024.
---
src/pcresearch.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 3bdaee9..09f92c8 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -72,8 +72,11 @@ jit_exec (struct pcre_comp *pc, char const *subject, int search_bytes,
search_offset, options, sub, NSUB);
#if PCRE_STUDY_JIT_COMPILE
+ /* Going over this would trigger an int overflow bug within PCRE. */
+ int jitstack_max = INT_MAX - 8 * 1024;
+
if (e == PCRE_ERROR_JIT_STACKLIMIT
- && 0 < pc->jit_stack_size && pc->jit_stack_size <= INT_MAX / 2)
+ && 0 < pc->jit_stack_size && pc->jit_stack_size <= jitstack_max / 2)
{
int old_size = pc->jit_stack_size;
int new_size = pc->jit_stack_size = old_size * 2;
--
1.8.3.1

View File

@ -1,6 +1,6 @@
Name: grep Name: grep
Version: 3.7 Version: 3.7
Release: 4 Release: 5
Summary: A string search utility Summary: A string search utility
License: GPLv3+ License: GPLv3+
URL: http://www.gnu.org/software/grep/ URL: http://www.gnu.org/software/grep/
@ -8,7 +8,11 @@ Source0: https://ftp.gnu.org/gnu/grep/grep-%{version}.tar.xz
Patch1: backport-grep-avoid-sticky-problem-with-f-f.patch Patch1: backport-grep-avoid-sticky-problem-with-f-f.patch
Patch2: backport-grep-s-does-not-suppress-binary-file-matches.patch Patch2: backport-grep-s-does-not-suppress-binary-file-matches.patch
Patch3: backport-grep-migrate-to-pcre2.patch Patch3: backport-grep-work-around-PCRE-bug.patch
Patch4: backport-grep-migrate-to-pcre2.patch
Patch5: backport-grep-Don-t-limit-jitstack_max-to-INT_MAX.patch
Patch6: backport-grep-speed-up-fix-bad-UTF8-check-with-P.patch
Patch7: backport-grep-fix-minor-P-memory-leak.patch
BuildRequires: gcc pcre2-devel texinfo gettext libsigsegv-devel automake BuildRequires: gcc pcre2-devel texinfo gettext libsigsegv-devel automake
Provides: /bin/egrep /bin/fgrep /bin/grep bundled(gnulib) Provides: /bin/egrep /bin/fgrep /bin/grep bundled(gnulib)
@ -48,8 +52,13 @@ make check
%changelog %changelog
* Sat May 14 2022 licihua <licihua@huawei.com> -3.7-4 * Tue Jun 28 2022 panxiaohe <panxh.life@foxmail.com> - 3.7-5
- Modify the dependency from pcre to pcre2 - grep: Don't limit jitstack_max to INT_MAX
- grep: speed up, fix bad-UTF8 check with -P
- grep: fix minor -P memory leak
* Sat May 14 2022 licihua <licihua@huawei.com> - 3.7-4
- Modify the dependency from pcre to pcre2
* Fri Mar 18 2022 yangzhuangzhuang <yangzhuangzhuang1@h-partners.com> - 3.7-3 * Fri Mar 18 2022 yangzhuangzhuang <yangzhuangzhuang1@h-partners.com> - 3.7-3
- The -s option no longer suppresses "binary file matches" messages - The -s option no longer suppresses "binary file matches" messages