在线上,我们使用了icinga结合check_mk作为监控系统。
今天,在用cmk -II更新主机的inventory信息时,无论后面跟的是什么主机,都会报告如下错误:
Removing unimplemented check / Removing unimplemented check oom_adj_for_cron Removing unimplemented check oom_adj_for_sshd Traceback (most recent call last): File "/usr/share/check_mk/modules/check_mk.py", line 5801, in <module> remove_autochecks_of(host, checknames) File "/usr/share/check_mk/modules/check_mk.py", line 2907, in remove_autochecks_of if splitted[3] not in check_info: IndexError: list index out of range
在网上搜寻了半天,根本找不到任何有帮助的信息,于是我尝试通过报错中提到的位置对源码进行调试:
修改/usr/share/check_mk/modules/check_mk.py,加入’print splitted’来打印溢出的List,即splitted。
for fn in glob.glob(autochecksdir + "/*.mk"): lines = [] count = 0 for line in file(fn): # hostname and check type can be quoted with ' or with " double_quoted = line.replace("'", '"').lstrip() if double_quoted.startswith('("'): count += 1 splitted = double_quoted.split('"') print splitted if splitted[1] != hostname or (checktypes != None and splitted[3] not in checktypes): if splitted[3] not in check_info: sys.stderr.write('Removing unimplemented check %sn' % splitted[3]) continue lines.append(line) else: removed += 1 if len(lines) == 0:
然后再次运行cmk -II,发现如下信息:
... ("iad1-server5", job, 'oom_adj_for_sshd', None) Removing unimplemented check oom_adj_for_sshd ("iad1-server5", kernel.util, None, kernel_util_default_levels) Traceback (most recent call last): File "/usr/share/check_mk/modules/check_mk.py", line 5803, in <module> remove_autochecks_of(host, checknames) File "/usr/share/check_mk/modules/check_mk.py", line 2909, in remove_autochecks_of if splitted[3] not in check_info: </module>
可以发现,
(“iad1-server5”, kernel.util, None, kernel_util_default_levels)
根本不能通过单双引号分割为一个长度大于3的List,所以会报溢出的错误:’IndexError: list index out of range’
于是,我加了一个简单的判断,当List的长度大于3时,再执行’Removing unimplemented check’的操作。
# vim /usr/share/check_mk/modules/check_mk.py
for fn in glob.glob(autochecksdir + "/*.mk"): lines = [] count = 0 for line in file(fn): # hostname and check type can be quoted with ' or with " double_quoted = line.replace("'", '"').lstrip() if double_quoted.startswith('("'): count += 1 splitted = double_quoted.split('"') # Sometimes the length of splitted is only 3 due to some items in 'line' do not have quoted marks. if len(splitted) > 3: if splitted[1] != hostname or (checktypes != None and splitted[3] not in checktypes): if splitted[3] not in check_info: sys.stderr.write('Removing unimplemented check %sn' % splitted[3]) continue lines.append(line) else: removed += 1 if len(lines) == 0:
然后,执行 ‘cmk -II’,看到很多的 ‘Removing unimplemented check’ 信息,再次执行就看不到了,应该是因为符合条件的过期记录都已经被删除了的原因。
# cmk -II iad1-server1
... Removing unimplemented check / Removing unimplemented check oom_adj_for_cron Removing unimplemented check oom_adj_for_sshd Removing unimplemented check crond Removing unimplemented check sshd Removing unimplemented check xinetd cpu.loads 1 new checks df 2 new checks kernel.util 1 new checks lnx_if 1 new checks local 5 new checks mem.used 1 new checks mrpe 4 new checks postfix_mailq 1 new checks ps 5 new checks tcp_conn_stats 1 new checks uptime 1 new checks
# cmk -II iad1-server1
cpu.loads 1 new checks
df 2 new checks
kernel.util 1 new checks
lnx_if 1 new checks
local 5 new checks
mem.used 1 new checks
mrpe 4 new checks
postfix_mailq 1 new checks
ps 5 new checks
tcp_conn_stats 1 new checks
uptime 1 new checks
转载请注明:爱开源 » 一个check_mk源码小bug的解决