refactor: optimize structure, stability and runtime performance
This commit is contained in:
60
scripts/HEALTH_MONITOR_README.md
Normal file
60
scripts/HEALTH_MONITOR_README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# 健康监控(邮件版)
|
||||
|
||||
本目录提供 `health_email_monitor.py`,通过调用 `/health` 接口并使用**容器内已有邮件配置**发告警邮件。
|
||||
|
||||
## 1) 快速试跑
|
||||
|
||||
```bash
|
||||
cd /root/zsglpt
|
||||
python3 scripts/health_email_monitor.py \
|
||||
--to 你的告警邮箱@example.com \
|
||||
--container knowledge-automation-multiuser \
|
||||
--url http://127.0.0.1:51232/health \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
去掉 `--dry-run` 即会实际发邮件。
|
||||
|
||||
## 2) 建议 cron(每分钟)
|
||||
|
||||
```bash
|
||||
* * * * * cd /root/zsglpt && /usr/bin/python3 scripts/health_email_monitor.py \
|
||||
--to 你的告警邮箱@example.com \
|
||||
--container knowledge-automation-multiuser \
|
||||
--url http://127.0.0.1:51232/health \
|
||||
>> /root/zsglpt/logs/health_monitor.log 2>&1
|
||||
```
|
||||
|
||||
## 3) 支持的规则
|
||||
|
||||
- `service_down`:健康接口请求失败(立即告警)
|
||||
- `health_fail`:返回 `ok/db_ok` 异常或 HTTP 5xx(立即告警)
|
||||
- `db_pool_exhausted`:连接池耗尽(默认连续 3 次才告警)
|
||||
- `queue_backlog_high`:任务堆积过高(默认 `pending_total >= 50` 且连续 5 次)
|
||||
|
||||
脚本支持恢复通知(规则恢复正常会发“恢复”邮件)。
|
||||
|
||||
## 4) 常用参数
|
||||
|
||||
- `--to`:收件人(必填)
|
||||
- `--container`:Docker 容器名(默认 `knowledge-automation-multiuser`)
|
||||
- `--url`:健康地址(默认 `http://127.0.0.1:51232/health`)
|
||||
- `--state-file`:状态文件路径(默认 `/tmp/zsglpt_health_monitor_state.json`)
|
||||
- `--remind-seconds`:重复告警间隔(默认 3600 秒)
|
||||
- `--queue-threshold`:队列告警阈值(默认 50)
|
||||
- `--queue-streak`:队列连续次数阈值(默认 5)
|
||||
- `--db-pool-streak`:连接池连续次数阈值(默认 3)
|
||||
|
||||
## 5) 环境变量方式(可选)
|
||||
|
||||
也可不用命令行参数,改用环境变量:
|
||||
|
||||
- `MONITOR_EMAIL_TO`
|
||||
- `MONITOR_DOCKER_CONTAINER`
|
||||
- `HEALTH_URL`
|
||||
- `MONITOR_STATE_FILE`
|
||||
- `MONITOR_REMIND_SECONDS`
|
||||
- `MONITOR_QUEUE_THRESHOLD`
|
||||
- `MONITOR_QUEUE_STREAK`
|
||||
- `MONITOR_DB_POOL_STREAK`
|
||||
|
||||
Reference in New Issue
Block a user