环境: CentOS release 6.2 2.6.32-220.el6.x86_64
keepalived-1.2.7 ipvsadm v1.26 IPVS v1.2.1
使用的是keepalived做健康检查
因为目前使用VIP的数量有200左右. 每个VIP下面realserver约在5-10个左右
每个keepalived所管理的realserver数量大概在1100个左右,keepalived进程就会挂掉,然后进入无限循环
现在只有keepalvied start 或是 keepalived reload 都会在出现大量这样的日志
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Cannot send get request to [10.15.200.200]:80. Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Removing service [10.100.200.200]:80 from VS [10.15.177.177]:80 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: SMTP connection ERROR to [127.0.0.1]:25. Jul 14 19:47:07 b02 Keepalived[13055]: Healthcheck child process(14203) died: Respawning Jul 14 19:47:07 b02 Keepalived[13055]: Starting Healthcheck child process, pid=14525 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: Interface queue is empty Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, eth1 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, usb0 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, bond0
想请教一下各位该数值可能会受什么影响 ,而且出问题时只有keepalived的监控进程受影响. vrrp进程还正常
在源码中貌似也没有找到在哪里有硬性规定rs的数量
最初怀疑可能是因为VIP过多导致,后来经过测试发现还是RS列表过多会影响~
check_respawn_thread(thread_t * thread) { pid_t pid; /* Fetch thread args */ pid = THREAD_CHILD_PID(thread); /* Restart respawning thread */ if (thread->type == THREAD_CHILD_TIMEOUT) { thread_add_child(master, check_respawn_thread, NULL, pid, RESPAWN_TIMER); return 0; } /* We catch a SIGCHLD, handle it */ log_message(LOG_ALERT, "Healthcheck child process(%d) died: Respawning", pid); start_check_child(); return 0; }
测试1: 当realserver超过1100个左右,keepalived的的Healthcheck进程会挂掉,然后不停的重启,主进程及vrrp子进程都无影响
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Cannot send get request to [10.15.200.200]:80. Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Removing service [10.100.200.200]:80 from VS [10.15.177.177]:80 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: SMTP connection ERROR to [127.0.0.1]:25. Jul 14 19:47:07 b02 Keepalived[13055]: Healthcheck child process(14203) died: Respawning Jul 14 19:47:07 b02 Keepalived[13055]: Starting Healthcheck child process, pid=14525 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: Interface queue is empty Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, eth1 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, usb0 Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, bond0
测试2: 当realserver在1020左右 keepalived正常. 父进程及2个子进程都正常.
解决:
因为系统参数__FD_SETSIZE限制 ,因为keepalived使用select模式,默认select限制 1024个socket连接~
转载请注明:爱开源 » keepalived Real_server 过多导致进程崩溃