最新消息:

TCP: time wait bucket table overflow

kernel admin 4149浏览 0评论

redhat网站查到下面的信息,说是因为内存不够的原因。我觉得这个可以当作出现这个问题的解释,但是却解释得不够“完美”,我仍旧还在疑惑中:如果是是因为内存不够的原因,那么在每次测试之前,只要保证机器状态一样,那么TCP: time wait bucket table overflow这条信息的输出次数就应该是一样的,或者差的不是很多,因为每次有一个socket来了,就给一个数据结构,直到内存用完输出TCP: time wait bucket table overflow信息,但是我测试的结果是差异不小,这就让我感觉疑惑了,莫非每次分配的数据结构不一样大?这不可能。我每次单独测试都是重启开发板,然后不运行任何其他程序就进行测试的,这样应该可以保证每次测试机器的状态不是差的很多吧。

The “TCP: time wait bucket table overflow” message shows when the kernel is unable to allocate a data structure to put a socket in the TIME_WAIT state.

This is happening according to linux/net/ipv4/tcp_minisocks.c:

 

if (tcp_tw_count < sysctl_tcp_max_tw_buckets)
tw = kmem_cache_alloc(tcp_timewait_cachep, SLAB_ATOMIC);

if(tw != NULL) {
(..)
} else {
/* Sorry, if we’re out of memory, just CLOSE this
* socket up. We’ve got bigger problems than
* non-graceful socket closings.
*/
if (net_ratelimit())
printk(KERN_INFO “TCP: time wait bucket table overflow\\n”);
}

This problem is more likely to happen on systems creating a lot of TCP connections at a fast pace. RFC 793 decided that those sockets should stay in the TIME_WAIT state for 2*MSL (Maximum Segment Life), but the Linux implementation seems to make the TIME_WAIT state last for 1 minute.

Monitor the resources used by those time wait buckets by watching:

# cat /proc/slabinfo | grep tcp_tw_bucket

The size of the time wait bucket can be adjusted by writing to /proc/sys/net/ipv4/tcp_max_tw_buckets. However, the link between the client and the server cannot cause packets to arrive out of order, then the TIME_WAIT state can be skipped and sockets can be recycled immediately. Socket recycling can be configured in /proc/sys/net/ipv4/tcp_tw_recycle, but check with your network administrator to verify whether it’s safe to do so.

集群中的节点中每台在/var/log/messages中发现大量错误,内容如下:

root@real2 ~]# tail -f /var/log/messages
Oct 27 22:45:55 real2 kernel: printk: 1438 messages suppressed.
Oct 27 22:45:55 real2 kernel: TCP: time wait bucket table overflow
Oct 27 22:46:00 real2 kernel: printk: 1682 messages suppressed.
Oct 27 22:46:00 real2 kernel: TCP: time wait bucket table overflow
Oct 27 22:46:05 real2 kernel: printk: 1752 messages suppressed.
Oct 27 22:46:05 real2 kernel: TCP: time wait bucket table overflow
Oct 27 22:46:10 real2 kernel: printk: 1681 messages suppressed.
Oct 27 22:46:10 real2 kernel: TCP: time wait bucket table overflow
Oct 27 22:46:15 real2 kernel: printk: 1660 messages suppressed.
Oct 27 22:46:15 real2 kernel: TCP: time wait bucket table overflow

 

root@real2 ~]# tail -f /var/log/messages

Oct 27 22:45:55 real2 kernel: printk: 1438 messages suppressed.

Oct 27 22:45:55 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:00 real2 kernel: printk: 1682 messages suppressed.

Oct 27 22:46:00 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:05 real2 kernel: printk: 1752 messages suppressed.

Oct 27 22:46:05 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:10 real2 kernel: printk: 1681 messages suppressed.

Oct 27 22:46:10 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:15 real2 kernel: printk: 1660 messages suppressed.

Oct 27 22:46:15 real2 kernel: TCP: time wait bucket table overflow

 

原因:/proc/sys/net/ipv4/tcp_max_tw_buckets的值太小,才2000

解决方法:增大 tcp_max_tw_buckets的值,并不是这个值越小越好,我看了我系统中TIME_WAIT 大部是由php-fpm产生的,是属于正常的现象

修改 /etc/sysctl.conf

net.ipv4.tcp_max_tw_buckets = 20000

sysctl -p 让其生效

附TIME_WAIT

 

 

[root@real2 ~]#

[root@real2 ~]# netstat -an | grep 80 | awk ‘{print $6}’ | sort | uniq -c | sort -rn

5395 ESTABLISHED

2671 TIME_WAIT

978 FIN_WAIT2

501 FIN_WAIT1

165 SYN_RECV

71 LAST_ACK

2 CLOSING

1 LISTEN

[root@real2 ~]# netstat -an | grep 9000 | awk ‘{print $6}’ | sort | uniq -c | sort -rn

8550 TIME_WAIT

1 LISTEN

1 FIN_WAIT1

1 ESTABLISHED

tcp_max_tw_buckets 参数类型:整型
系统在同时所处理的最大timewait sockets 数目。如果超过此数的话﹐time-wait socket 会被立即砍除并且显示警告信息。之所以要设定这个限制﹐纯粹为了抵御那些简单的 DoS 攻击﹐千万不要人为的降低这个限制﹐不过﹐如果网络条件需要比默认值更多﹐则可以提高它(或许还要增加内存)。

今天写了一个超级超级简陋的web服务器,采用epoll多线程的方式:主线程负责accpet,子线程处理链接。问题出在测试环节:

在笔记本上用ab模拟50个并发连接,10000次请求,进行测试,一切OK。可是移植到mini2440开发板上去运行,测试参数不变,就出现这样的输出(截取一部分)
TCP: time wait bucket table overflow
TCP: time wait bucket table overflow
__ratelimit: 1745 callbacks suppressed
TCP: time wait bucket table overflow
TCP: time wait bucket table overflow

我第一反应是服务器连接队列太小,我立刻改大了再测试,还是有问题。我又找了一个别人写的单进程服务器移植到mini2440开发板上来测试,一样有这样的输出。但是这些服务器在笔记本上测试都没有这种问题,很明显是硬件处理能力跟不上的原因。

索性用开发板上自带的boa服务器来测试,靠,问题更加严重,出现好多好多这样的输出。

不解。回头一定要搞明白这个问题。

性能测试结果对比如下,都是在mini2440开发板上进行测试:

1.epoll+pthread服务器,处理10000次请求总共耗时22.341秒,平均每秒处理447.61个请求,测试结果性能最好
2.单进程服务器,处理10000次请求总共耗时28.922秒,平均每秒处理个345.75请求,测试结果性能最差
3.mini2440系统里面自带的boa服务器,处理10000次请求总共耗时26.739秒,平均每秒处理个373.98个请求,测试性能第二

从结果来看多线程方式比单进程的处理方式要好一点,估计boa性能略差的原因是因为boa比较完善,支持动态网页。而另外两个都只支持静态页面。不过这样分析也不对,因为测试时我没有让他们输出任何内容,仅仅是一个报文头而已。所以其实这三个服务器的测试基础是一样的。后面完善了服务器之后再测试一下。

转载请注明:爱开源 » TCP: time wait bucket table overflow

您必须 登录 才能发表评论!