X11 転送を行おうとすると固まる

現象

X11 転送を有効にして接続し、リモート側からディスプレイを開こうとすると、 固まる。 対話的にログインしている場合は SIGINT で落すこともできない。 エスケープシーケンスで切断することもできない。

詳細

SSH2 の場合

% ssh -v -X remotehost
OpenSSH_3.0.2p1, SSH protocols 1.5/2.0, OpenSSL 0x0090603f
debug1: Reading configuration data /home/users/nishi/.ssh/config
...
debug1: Remote protocol version 1.99, remote software version OpenSSH_3.0.2p1
debug1: match: OpenSSH_3.0.2p1 pat ^OpenSSH
Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_3.0.2p1
...
debug1: channel 0: new [client-session]
debug1: send channel open 0
debug1: Entering interactive session.
debug1: ssh_session2_setup: id 0
debug1: Requesting X11 forwarding with authentication spoofing.
debug1: channel request 0: shell
debug1: channel 0: open confirm rwindow 0 rmax 16384
remotehost:~% xlogo
debug1: client_input_channel_open: ctype x11 rchan 2 win 4096 max 2048
debug1: client_request_x11: request from 192.168.1.141 32974
debug1: fd 11 setting O_NONBLOCK
debug1: channel 1: new [x11]
debug1: confirm x11

SSH1 の場合

% ssh -v -1 -X remotehost
OpenSSH_3.0.2p1, SSH protocols 1.5/2.0, OpenSSL 0x0090603f
debug1: Reading configuration data /home/users/nishi/.ssh/config
...
debug1: Remote protocol version 1.99, remote software version OpenSSH_3.0.2p1
debug1: match: OpenSSH_3.0.2p1 pat ^OpenSSH
debug1: Local version string SSH-1.5-OpenSSH_3.0.2p1
...
debug1: Requesting pty.
debug1: Requesting X11 forwarding with authentication spoofing.
debug1: Requesting shell.
debug1: Entering interactive session.
remotehost:~% xlogo
debug1: Received X11 open request.
debug1: fd 8 setting O_NONBLOCK
debug1: channel 0: new [X11 connection from 192.168.1.141 port 32985]

対象

再現条件

再現する
再現しない

調査

SSH2 の場合を見ると、 client_input_channel_open に入り、 "confirm x11" というメッセージを出した後で固まっている。

デバッガを使って、どこで固まっているか詳しく調べる。

% gdb ssh
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (sparc-sun-solaris2.4), 
Copyright 1996 Free Software Foundation, Inc...
(gdb) break client_input_channel_open
Breakpoint 1 at 0x22940: file /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c, line 1154.
(gdb) run -X remotehost
Starting program: /a/yugiri/mount/yugiri3/obj/openssh/ssh -X remotehost
warning: Unable to find dynamic linker breakpoint function.
warning: GDB will be unable to debug shared library initializers
warning: and track explicitly loaded dynamic code.
remotehost:~% xlogo

Breakpoint 1, client_input_channel_open (type=90, plen=40, ctxt=0xc6748)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:1154
1154            Channel *c = NULL;
(gdb) bt
#0  client_input_channel_open (type=90, plen=40, ctxt=0xc6748)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:1154
#1  0x31f8c in dispatch_run (mode=1, done=0x9a060, ctxt=<error type>)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/dispatch.c:71
#2  0x21504 in client_process_buffered_input_packets ()
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:744
#3  0x219dc in client_loop (have_pty=1, escape_char_arg=126, ssh2_chan_id=0)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:849
#4  0x167f4 in ssh_session2 ()
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/ssh.c:1171
#5  0x15320 in main (ac=0, av=0xffbef060)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/ssh.c:784
(gdb) fin
Run till exit from #0  client_input_channel_open (type=90, plen=40, 
    ctxt=0xc6748)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:1154
0x31f8c in dispatch_run (mode=1, done=0x9a060, ctxt=<error type>)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/dispatch.c:71
71                              (*dispatch[type])(type, plen, ctxt);
(gdb) 
Run till exit from #0  0x31f8c in dispatch_run (mode=1, done=0x9a060, 
    ctxt=<error type>)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/dispatch.c:71
client_process_buffered_input_packets ()
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:745
745     }
(gdb) 
Run till exit from #0  client_process_buffered_input_packets ()
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:745
client_loop (have_pty=1, escape_char_arg=126, ssh2_chan_id=0)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:851
851                     if (compat20 && session_closed && !channel_still_open())
(gdb) next
854                     rekeying = (xxx_kex != NULL && !xxx_kex->done);
(gdb) 
856                     if (rekeying) {
(gdb) 
863                             if (!compat20)
(gdb) 
870                             if (packet_not_very_much_data_to_write())
(gdb) 
871                                     channel_output_poll();
(gdb) 
877                             client_check_window_change();
(gdb) 
879                             if (quit_pending)
(gdb) 
886                     max_fd2 = max_fd;
(gdb) 
887                     client_wait_until_can_do_something(&readset, &writeset,
(gdb) 
890                     if (quit_pending)
(gdb) 
894                     if (!rekeying) {
(gdb) 
895                             channel_after_select(readset, writeset);
(gdb) 
897                             if (need_rekeying) {
(gdb) 
906                     client_process_net_input(readset);
(gdb) 
908                     if (quit_pending)
(gdb) 
911                     if (!compat20) {
(gdb) 
922                     if (FD_ISSET(connection_out, writeset))
(gdb) 
923                             packet_write_poll();
(gdb) 
924             }
(gdb) 
846             while (!quit_pending) {
(gdb) 
849                     client_process_buffered_input_packets();
(gdb) 
851                     if (compat20 && session_closed && !channel_still_open())
[中略]
895                             channel_after_select(readset, writeset);
(gdb) 
897                             if (need_rekeying) {
(gdb) 
906                     client_process_net_input(readset);
(gdb) 
908                     if (quit_pending)
(gdb) 
911                     if (!compat20) {
(gdb) 
922                     if (FD_ISSET(connection_out, writeset))
(gdb) 
924             }
(gdb) 
846             while (!quit_pending) {
[中略]
895                             channel_after_select(readset, writeset);
[中略]
846             while (!quit_pending) {
[中略]
894                     if (!rekeying) {
(gdb) 
895                             channel_after_select(readset, writeset);
(gdb) 

4周目の channel_after_select で固まっている。

Breakpoint 1, client_input_channel_open (type=90, plen=40, ctxt=0xc6748)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/clientloop.c:1154
1154            Channel *c = NULL;
(gdb) break channel_after_select
Breakpoint 2 at 0x2c708: file /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c, line 1597.
(gdb) ignore 2 3
Will ignore next 3 crossings of breakpoint 2.
(gdb) cont
Continuing.

Breakpoint 2, channel_after_select (readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1597
1597            channel_handler(channel_post, readset, writeset);
(gdb) step
channel_handler (ftab=0xaf5c0, readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1548
1548            if (!did_init) {
(gdb) next
1552            for (i = 0; i < channels_alloc; i++) {
(gdb) 
1553                    c = channels[i];
(gdb) 
1554                    if (c == NULL)
(gdb) 
1556                    if (ftab[c->type] != NULL)
(gdb) 
1557                            (*ftab[c->type])(c, readset, writeset);
(gdb) 
1558                    channel_garbage_collect(c);
(gdb) 
1552            for (i = 0; i < channels_alloc; i++) {
(gdb) 
1553                    c = channels[i];
(gdb) 
1554                    if (c == NULL)
(gdb) 
1556                    if (ftab[c->type] != NULL)
(gdb) 
1557                            (*ftab[c->type])(c, readset, writeset);
(gdb) 

i == 1 (c == channels[1]) の時の ftab の関数の呼出しで固まる。 channel_handler の先頭からこの場所まで飛んで先へ続ける。

(gdb) step
channel_handler (ftab=0xaf5c0, readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1548
1548            if (!did_init) {
(gdb) break 1557 if i == 1
Breakpoint 3 at 0x2c560: file /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c, line 1557.
(gdb) cont
Continuing.

Breakpoint 3, channel_handler (ftab=0xaf5c0, readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1557
1557                            (*ftab[c->type])(c, readset, writeset);
(gdb) step
channel_post_open_2 (c=0xd7310, readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1417
1417            if (c->delayed)
(gdb) next
1419            channel_handle_rfd(c, readset, writeset);
(gdb) 
(gdb) step
channel_handle_rfd (c=0xd7310, readset=0xc6f48, writeset=0xc6f38)
    at /usr/local/share/src/net/security/openssh-3.0.2p1+dante/channels.c:1259
1259            if (c->rfd != -1 &&
(gdb) next
1261                    len = read(c->rfd, buf, sizeof(buf));
(gdb) 

この read で固まることがわかる。 SOCKS 化されているので、 これは実は Dante の Rread である。ソースを見ると、 RreadRrecvRrecvmsg と呼び出されることがわかる。

(gdb) step
Rread (d=11, buf=, nbytes=)
    at /usr/local/share/src/net/socks/dante-1.1.12-pre1/lib/Rcompat.c:193
193             const char *function = "Rread()";
(gdb) break Rrecvmsg
Breakpoint 4 at 0x611e4: file /usr/local/share/src/net/socks/dante-1.1.12-pre1/lib/Rcompat.c, line 261.
(gdb) cont
Continuing.

Breakpoint 4, Rrecvmsg (s=0, msg=0xffbea898, flags=0)
    at /usr/local/share/src/net/socks/dante-1.1.12-pre1/lib/Rcompat.c:261
261             const char *function = "Rrecvmsg()";
(gdb) next
256     {
(gdb) 
263             clientinit();
(gdb) 
265             slog(LOG_DEBUG, "%s", function);
(gdb) 
267             namelen = sizeof(name);
(gdb) 
268             if (getsockname(s, &name, &namelen) == -1) {
(gdb) 
273             switch (name.sa_family) {
(gdb) 
283                             return recvmsg(s, msg, flags);
(gdb) 

この recvmsg で固まることがわかる。 これはシステムコールのように見えるが、実はマクロである。

(gdb) step
recvmsgn (s=0, msg=0xffbea888, flags=0)
    at /usr/local/share/src/net/socks/dante-1.1.12-pre1/lib/io.c:214
214             const char *function = "recvmsgn()";
(gdb) 

この関数は、1回で終わらずに、読み尽すまで何度でも読むというものである。

ssize_t
recvmsgn __P((int s, struct msghdr *msg, int flags));
/*
 * Like recvmsg(), but tries to read until all has been read.
 */

これのために、読むものがないのに待ち続けているということが考えられる。

そこで、マクロ recvmsg がどこで定義されているか調べる。

dante-1.1.11% fgrep -n -w recvmsg include/*.h
include/common.h:534:#ifdef recvmsg
include/common.h:535:#define recvmsg_system recvmsg
include/common.h:536:#undef recvmsg
include/common.h:537:#endif /* recvmsg */
include/common.h:539:#define recvmsg(s, msg, flags)     recvmsgn(s, msg, flags)
include/common.h:1706: * Like recvmsg(), but tries to read until all has been read.
include/interposition.h:64:#define SYMBOL_RECVMSG "recvmsg"
include/socks.h:153:#ifdef recvmsg
include/socks.h:154:#undef recvmsg
include/socks.h:155:#endif  /* recvmsg */
include/socks.h:157:#define recvmsg(s, msg, flags)                      sys_Erecvmsg(s, msg, flags)
include/socks.h:159:#define recvmsg(s, msg, flags)                      sys_recvmsg(s, msg, flags)

このように同じマクロに複数の定義がある。 それらの関係を見ると次のようになっている。

#define recvmsg(s, msg, flags)	recvmsgn(s, msg, flags)
#if SOCKS_CLIENT
# if SOCKSLIBRARY_DYNAMIC
#  undef recvmsg
#  if HAVE_EXTRA_OSF_SYMBOLS
#   define recvmsg(s, msg, flags)		sys_Erecvmsg(s, msg, flags)
#  else
#   define recvmsg(s, msg, flags)		sys_recvmsg(s, msg, flags)
#  endif  /* HAVE_EXTRA_OSF_SYMBOLS */
# endif /* SOCKSLIBRARY_DYNAMIC */
#endif

SOCKSLIBRARY_DYNAMIC は、 libdsocks を構築するときに真になる マクロである。すなわち、 libdsocks でない場合にのみ recvmsgrecvmsgn を指し、 libdsocks では sys_recvmsg すなわちシステムの本来の recvmsg を指している。 libsocks と libdsocks の違いは、 SOCKS 化された関数を本来と同じ名前で 呼ぶかどうかというだけのはずで、このように挙動が変わるのはバグである 可能性が高い。 そこで、最初の定義をクライアントでは有効にならないように変更すると、解決した。



西村 大介 <nishi@graco.c.u-tokyo.ac.jp>