FS#54834 - [dnsmasq] segfaults during get

Attached to Project: Arch Linux
Opened by Arne Wörner (riddicc) - Sunday, 16 July 2017, 11:08 GMT
Last edited by Christian Hesse (eworm) - Wednesday, 23 August 2017, 08:07 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Christian Hesse (eworm)
Architecture i686
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
The current dnsmasq crashes due to a segmentation violation during PXE boot.

It worked somewhen before...

I would like to continue to use dnsmasq, because: I already have a working config file...

Additional info:
* package version: 2.77-1-i686
* config and/or log files etc.:
Jul 16 09:24:01 neo dnsmasq-tftp[1967]: error 8 User aborted the transfer received from 192.168.1.2
Jul 16 09:24:01 neo dnsmasq-tftp[1967]: failed sending /var/dnsmasq/tftpboot/pxe/x86_64-efi/core.efi to 192.168.1.2
Jul 16 09:24:01 neo dhcp[2239]: old 74:d4:35:54:cf:0d 192.168.1.2
Jul 16 09:24:01 neo dnsmasq-tftp[1967]: sent /var/dnsmasq/tftpboot/pxe/x86_64-efi/core.efi to 192.168.1.2
Jul 16 09:24:01 neo systemd[1]: dnsmasq.service: Main process exited, code=killed, status=11/SEGV

Steps to reproduce:
1. try to do a PXE boot
2. wait... after some progress it hangs
3. try again
4. wait... it hangs much earlier for obvious reasons
This task depends upon

Closed by  Christian Hesse (eworm)
Wednesday, 23 August 2017, 08:07 GMT
Reason for closing:  Fixed
Additional comments about closing:  dnsmasq 2.77-3
Comment by Arne Wörner (riddicc) - Sunday, 16 July 2017, 13:25 GMT
I just downgraded to dnsmasq-2.76-4 and it works again...
seems like dnsmasq has a problem...
-Arne
Comment by Christian Hesse (eworm) - Thursday, 20 July 2017, 06:55 GMT
Please provide your config file...
Comment by Arne Wörner (riddicc) - Thursday, 20 July 2017, 10:05 GMT
oki doke...
Comment by Christian Hesse (eworm) - Thursday, 20 July 2017, 10:29 GMT
Could not reproduce here... Possibly this depends on the PXE implementation of a specific client.

So some questions arise:

* With your downgraded dnsmasq, do you still see the message "error 8 User aborted the transfer received from..."?
* Do other clients produce the same issue?
* What is /root/pbt.sh?
* Any chance you bisect the issue?
Comment by Arne Wörner (riddicc) - Thursday, 20 July 2017, 11:12 GMT
> With your downgraded dnsmasq, do you still see the message "error 8 User aborted the transfer received from..."?
yes
AFAIK: the BIOS first checks if the file (and the tftp server) is there and then it starts the real download...

> Do other clients produce the same issue?
dunno
i only have one...

> What is /root/pbt.sh?
it is a shell script, that should not be able to crash the server... *giggle*
#!/bin/sh
logger -t dhcp "$@"

> Any chance you bisect the issue?
not really... that server box is too old for building packages (Atom N270), i guess...
and i would need to hibernate+thaw my other box each time, because I dont know how to provoke that "error 8" otherwise...

isnt there something in the change log that looks suspicious?
I mean: A segmentation violation should not happen... :)
is there no report in the upstream lists?
Comment by Arne Wörner (riddicc) - Sunday, 23 July 2017, 07:41 GMT
I removed that dhcp-script line from my config file.
And now it works fine (with and without strace) (I did 3 suspend+thaw cycles with no segv).
Thx.
-Arne
Comment by Arne Wörner (riddicc) - Sunday, 23 July 2017, 20:26 GMT Comment by Arne Wörner (riddicc) - Monday, 07 August 2017, 05:09 GMT
i used a raw source-tar-ball from thekelleys.org.uk and compiled it with -g and run dnsmnsq from gdb and sent the result to their mailing list,
but: it arrived there all in one line and now nobody reacts... *sob*
so i re-post it here...
gdb output looks quite weird to me...

seems like something weird is going on in helper.c... see the gdb output...

since transfer->file->filename can never be zero (as long as transfer->file plus 20 bytes or so is not zero),
it seems like someone is writing zeroes to the stack after the correct transfer->file->filename has been wrilten and b4 strlen() is called
(or do they use a register at -O2? nope: pushl 0x40(%esp) // call 3700 <strlen@plt>)...

maybe someone who knows more about transfer->file can see, what is wrong here...

-Arne

gdb output:
dnsmasq-dhcp: 3182551826 sent size: 4 option: 28 broadcast 192.168.1.255
dnsmasq-dhcp: 3182551826 sent size: 12 option:209 70:78:65:2f:67:72:75:62:2e:63:66:67
dnsmasq-dhcp: 3182551826 sent size: 4 option: 3 router 192.168.1.1
dnsmasq-tftp: error 8 User aborted the transfer received from 192.168.1.2
dnsmasq-tftp: failed sending /var/dnsmasq/tftpboot/pxe/x86_64-efi/core.efi to 192.168.1.2
dnsmasq-tftp: sent /var/dnsmasq/tftpboot/pxe/x86_64-efi/core.efi to 192.168.1.2

Program received signal SIGSEGV, Segmentation fault.
0xb7ef0bc6 in __strlen_sse2 () from /usr/lib/libc.so.6
(gdb) where
#0 0xb7ef0bc6 in __strlen_sse2 () from /usr/lib/libc.so.6
#1 0x8002b8b7 in queue_tftp (file_len=203776, filename=0x0, peer=0x8005bf68) at helper.c:819
#2 0x8002d3b3 in do_tftp_script_run () at tftp.c:811
#3 0x80006875 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:955
(gdb) frame 1
#1 0x8002b8b7 in queue_tftp (file_len=203776, filename=0x0, peer=0x8005bf68) at helper.c:819
819 filename_len = strlen(filename) + 1;
(gdb) list
814
815 /* no script */
816 if (daemon->helperfd == -1)
817 return;
818
819 filename_len = strlen(filename) + 1;
820 buff_alloc(sizeof(struct script_data) + filename_len);
821 memset(buf, 0, sizeof(struct script_data));
822
823 buf->action = ACTION_TFTP;
(gdb) print filename
$1 = 0x0
(gdb) frame 2
#2 0x8002d3b3 in do_tftp_script_run () at tftp.c:811
811 queue_tftp(transfer->file->size, transfer->file->filename, &transfer->peer);
(gdb) list
806
807 if ((transfer = daemon->tftp_done_trans))
808 {
809 daemon->tftp_done_trans = transfer->next;
810 #ifdef HAVE_SCRIPT
811 queue_tftp(transfer->file->size, transfer->file->filename, &transfer->peer);
812
#endif
813 free_transfer(transfer);
814 return 1;
815 }
(gdb) print *transfer->file
$2 = {refcount = 1, fd = 15, size = 203776, dev = 20, inode = 5570, filename = 0x8005bf68 "/var/dnsmasq/tftpboot/pxe/x86_64-efi/core.efi"}
Comment by Arne Wörner (riddicc) - Monday, 21 August 2017, 08:13 GMT

Loading...