FS#39245 - [glibc] regex infinite loop
Attached to Project:
Arch Linux
Opened by test0 (test0) - Sunday, 09 March 2014, 00:46 GMT
Last edited by Allan McRae (Allan) - Saturday, 20 February 2016, 01:47 GMT
Opened by test0 (test0) - Sunday, 09 March 2014, 00:46 GMT
Last edited by Allan McRae (Allan) - Saturday, 20 February 2016, 01:47 GMT
|
Details
Description:
Running the file command on a specific file causes it to get stuck using 100% cpu (the original report for that is here http://bugs.gw.com/view.php?id=332) Traced the problem to a call to regexec and created a short program to replicate the issue (attached). Additional info: * Using glibc 2.19-3 Steps to reproduce: Compile and run the attached c file. It should return either "Match" or "No match", but instead gets stuck. |
This task depends upon
$ time file TFAC00I0.RCI
TFAC00I0.RCI: Non-ISO extended-ASCII text, with very long lines, with NEL line terminators
real 0m53.735s
user 0m53.717s
sys 0m0.000s
Running perf on the example file in the upstream bug report:
en_US.UTF-8:
Samples: 220K of event 'cycles', Event count (approx.): 183996098111
29.05% file libc-2.19.so [.] re_search_internal
24.70% file libc-2.19.so [.] re_acquire_state_context
13.97% file libc-2.19.so [.] merge_state_with_log
13.91% file libc-2.19.so [.] re_node_set_compare.part.2
6.99% file libc-2.19.so [.] re_string_context_at
5.04% file libc-2.19.so [.] check_node_accept_bytes.isra.26
3.98% file libc-2.19.so [.] clean_state_log_if_needed
1.84% file libc-2.19.so [.] memset
C:
Samples: 22K of event 'cycles', Event count (approx.): 18264791986
97.21% file libc-2.19.so [.] re_search_internal
1.93% file libc-2.19.so [.] check_halt_state_context.isra.20
0.56% file libc-2.19.so [.] re_string_context_at
BTW, the example attached here seems just to be a regex on a really long line and not related...
The attached example completes in 33 seconds with my default LANG or LANG=C, so I guess I just didn't wait long enough when first testing. However, if I change the regex in the attachement (e.g. to ".*aaa$") it completes in around half of a second, so the slowness is not just due to a really long line. I compile it simply with `gcc regex_test.c -o regex_test`.
I left `file TFAC00I0.RCI` running again and it returned after 8 minutes ('though I swear I left it for much longer than that initially & it didn't return). Considering it's usually as good as instant it shouldn't be taking even 8 minutes, should it?