不要乱用 TCP ENC flag-爱开源

前段时间处理了一个 case，现象很简单，同网络环境下的机器，绝大多数的机器都无法 curl 访问 example.com，仅有少部分的可以 curl 访问，并且他们的 mtr 的路径一模一样，机器的配置应该也有一样。
对比一下，可以访问的：

$ curl -IL "http://example.com:80/rest"
-v
* About to connect() to example.com port 80 (#0)
*   Trying example.com… connected
* Connected to example.com (example.com) port 80 (#0)
> HEAD /rest HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: example.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: nginx
Server: nginx
< Date: Fri, 24 Oct 2014 07:13:23 GMT
Date: Fri, 24 Oct 2014 07:13:23 GMT
< Content-Type: application/json;charset=UTF-8
Content-Type: application/json;charset=UTF-8
< Content-Length: 0
Content-Length: 0
< Connection: keep-alive
Connection: keep-alive
< s-rt: 2
s-rt: 2

<
* Connection #0 to host example.com left intact
* Closing connection #0

访问超时的:

curl -IL "http://example.com:80/rest" -v
* About to connect() to example.com port 80 (#0)
*   Trying example.com… Connection timed out
* couldn't connect to host
* Closing connection #0
curl: (7) couldn't connect to host

最初以为是 example.com(泛指) 的问题，后来联系了下我们的服务方，确认没有问题。当时想到一种可能是否是由于 SNAT 出口的 timestamps, tcp recycle 的时间戳问题引起的，到 nat 机器上看了下，并不存在这个问题。

直接抓包，惊奇的发现，根据经验，第一次的 TCP 连接，9bit 的 Flags 区域只应该有 SYN 标识位是被设置的，但是连不上的机器的 flags 竟然是 SYN, ECN, CWR 三个 bit 都被设置了。

看了下 wiki 对 ECN 的解释，以及 stackexchange 的实践，原因就很明显了，这条经过的 routing 上肯定有不支持 ECN 的 router，收到带有 ECN flag 的包之后就直接丢弃了，
可以访问的那部分机器由于 /proc/sys/net/ipv4/tcp_ecn 的默认值 2，可以接受带有 ECN 的包，但是不会主动发送，其余的机器由于明确的设置为了 1，这些包到从目的地返回源的过程中被中间 router 丢弃了。
2000 的统计显示，全球有 8% 的网络不可达跟 ECN 有关系。十多年过去了，理论上讲，应该有所改善。

转载请注明：爱开源 » 不要乱用 TCP ENC flag

不要乱用 TCP ENC flag

与本文相关的文章

您必须登录才能发表评论！

与本文相关的文章

您必须 登录 才能发表评论！

您必须登录才能发表评论！