公司的某一台即将交付的生产环境数据库处理异步请求忽然异常缓慢,而且同事反馈说安装软件也有卡顿情况。

问题排查流程

初步尝试

首先我尝试将 PgSQL 服务进行重启,因为服务的异步处理速度延迟达到了1秒之久,况且该服务的依赖性没有那么高

$ kubectl rollout restart -n databases statefulset postgres-postgresql

结果出现了麻烦,整个 Pods 一直处于 Terminating 状态,一开始还以为是 PreStop 任务没有处理结束导致堵塞,后来发现 PgSQL 的日志已经在几分钟之前就不输出了,看起来是集群没有进行处理。

$ kubectl get event -n databases
47m         Warning   Unhealthy                pod/postgres-postgresql-0                          Liveness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task dbd0aae87722bdababfdf720ec389aaeaab040f594b782451ea39edfbc1261bb not found: not found
47m         Warning   Unhealthy                pod/postgres-postgresql-0                          Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task dbd0aae87722bdababfdf720ec389aaeaab040f594b782451ea39edfbc1261bb not found: not found

通过查看错误事件即可发现,是清理 Container 的任务无法被正确处理,让我们手动清理一下

$ sudo k3s ctr c ls | grep postgres
d2a4902b4cef8355a3d62ff969e73d1fa8037235c0fb3c7165eeb9aacdec8236    docker.io/bitnami/postgresql:14.2.0-debian-10-r35                              io.containerd.runc.v2

找到了底层的 Container,这时候直接删除是没有用的,集群会拒绝你的请求

$ ps aux | grep $(sudo k3s ctr task ls | grep d2a4902b4cef8355a3d62ff969e73d1fa8037235c0fb3c7165eeb9aacdec8236 | awk '{print $2}')
1001      520183  0.1  0.0  82456 16784 ?        Ss   08:45   0:04 /opt/bitnami/postgresql/bin/postgres -D /bitnami/postgresql/data --config-file=/opt/bitnami/postgresql/conf/postgresql.conf --external_pid_file=/opt/bitnami/postgresql/tmp/postgresql.pid --hba_file=/opt/bitnami/postgresql/conf/pg_hba.conf

找到了实际映射到宿主机的任务 PID 并且确认了

$ sudo kill -9 520183
$ sudo k3s ctr c rm d2a4902b4cef8355a3d62ff969e73d1fa8037235c0fb3c7165eeb9aacdec8236

此时再返回集群查看,服务正常删除,然而重启后延迟的情况仍然没有缓解

小试牛刀

既然重启服务没有用,只能从机器本身下手了,简单用 iostat 看一下 IO 占用情况

$ iostat -xdm 2
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-2             0.00      0.00     0.00   0.00    0.00     0.00    4.00      0.02     0.00   0.00  120.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.48  11.00
dm-3             0.00      0.00     0.00   0.00    0.00     0.00   46.50      2.04     0.00   0.00  115.57    44.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.37  93.00
dm-4             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
dm-5             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop3            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop4            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop5            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop6            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop7            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00   18.00      2.06    35.50  66.36  140.00   117.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.52  89.40
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sr0              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

明显能看出来 89% 的负载都落到了 /dev/sda 这个盘上,至于前面的 dm-3 我推测是 LVM 将其挂载了 /dev/sda 上了,来看看是哪个挂载点那么高的负载

$ sudo dmsetup ls
ubuntu--vg-lv--0	(253:0)
ubuntu--vg-lv--1	(253:1)
ubuntu--vg-lv--2	(253:2)
ubuntu--vg-lv--3	(253:3)
ubuntu--vg-lv--4	(253:4)
ubuntu--vg-lv--5	(253:5)

$ sudo df -h | grep ubuntu--vg-lv--3
/dev/mapper/ubuntu--vg-lv--3  787G   37G  710G   5% /var/lib

$ sudo df -h | grep /var/lib
/dev/mapper/ubuntu--vg-lv--3  787G   37G  711G   5% /var/lib
tmpfs                          32G   12K   32G   1% /var/lib/kubelet/pods/8bbcb717-74df-4d28-8000-05eb79889481/volumes/kubernetes.io~projected/kube-api-access-zhlwb
tmpfs
...

于是我们得出了一个初步判断,是集群本身占用了那么高负载,可以简单排除是病毒或者系统 BUG 导致的问题,可惜没办法判断是哪个 PVC,因为 K3s 的 local-path-provisioner CSI 不支持 Metrics 模式,但是仍然可以使用进程来推导,毕竟上面也通过杀死 PID 的方法反向干涉 Container

$ iotop
Total DISK READ:         0.00 B/s | Total DISK WRITE:         2.06 M/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       2.22 M/s
    PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    472 be/3 root        0.00 B/s 1328.16 B/s  ?unavailable?  [jbd2/dm-0-8]
   1072 be/4 root        0.00 B/s   12.97 K/s  ?unavailable?  k3s server
   1309 be/4 root        0.00 B/s    3.89 K/s  ?unavailable?  containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml~ate /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
   2236 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  containerd-shim-runc-v2 -namespace k8s.io -id efb0846028315d6940575~6b5cc33b8ef66edceaa3a0f -address /run/k3s/containerd/containerd.sock
   2288 be/4 1001        0.00 B/s   27.24 K/s  ?unavailable?  redis-server *:6379
   3371 be/4 1001        0.00 B/s 1328.16 B/s  ?unavailable?  beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 5~slog_error_logger false -kernel prevent_overlapping_partitions false
  14330 be/4 1001        0.00 B/s    2.01 M/s  ?unavailable?  mongod --config=/opt/bitnami/mongodb/conf/mongodb.conf
 215311 be/4 root        0.00 B/s 1328.16 B/s  ?unavailable?  java -Xmx512M -Xms32M -Duser.timezone=Asia/Shanghai -XX:TieredStopA~ervice.gateway.GatewayServerApplication --spring.profiles.active=mmf
 536565 be/4 root        0.00 B/s 1328.16 B/s  ?unavailable?  python3.8 -m celery -A traffic_control worker -B -Q traffic_control~ntrol-85dfd8ddc5-dzngx --executable=/work/django_py3.8/bin/python3.8
 612037 be/4 root        0.00 B/s    2.59 K/s  ?unavailable?  java -Xmx6000M -Xms1024M -Duser.timezone=Asia/Shanghai -XX:+HeapDum~ai.moying.cloud.service.CoreApplication --spring.profiles.active=mmf
      1 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  init
      2 be/4 root        0.00 B/s    0.00 B/s  ?unavailable?  [kthreadd]

iotop 只能作为一个参考,到了这一步已经可以 Mongo 高负载初步将结果同步给同事了,让同事先做下自查

进一步探查

仿照 SideCar 模式创建一个调试用的 Pod

$ kubectl run -i --tty debug -n databases --image=mongo --restart=Never -- sh
If you don't see a command prompt, try pressing enter

$ apt update
$ apt install netcat

$ mongosh "mongodb://mongo-mongodb:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.6.0"
Current Mongosh Log ID:	634542bf6b5210ca53845522
Connecting to:		mongodb://mongo-mongodb:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.6.0
Using MongoDB:		4.4.13
Using Mongosh:		1.6.0

test> use admin
admin> db.auth("username","xxxx")
{ ok: 1 }

这样就登陆到了远程的 Mongo 中,做进一步的排查

admin> db.serverStatus().connections
{
  current: 165,
  available: 838695,
  totalCreated: 234251,
  active: 9,
  exhaustIsMaster: 2,
  exhaustHello: 6,
  awaitingTopologyChanges: 8
}

admin>db.serverStatus().wiredTiger.cache['maximum bytes configured'] / 1024 / 1024 / 1024
15.0986328125

分别查看了连接数和 CacheSize,Mongo 通过控制 WiredTiger 存储引擎所占用的 CacheSize 进行内存交换。需要注意的是,CacheSize 设置较低,同时 Mongodb 复杂查询很频繁的话,会有延迟发生,通常来说,这个值为系统 RAM 的一半,即我们的宿主机内存为 32G

db.serverStatus().wiredTiger.concurrentTransactions
db.adminCommand({setParameter:1, wiredTigerEngineRuntimeConfig:'cache_size=4G'}) # 手动配置Size

性能分析

现在进行mongo的性能分析

admin> db.setProfilingLevel(2) # 开启性能分析模式
{ was: 0, slowms: 100, sampleRate: 1, ok: 1 }

admin>use core

core>db.system.profile.find({millis:{$gt:10}}).pretty(); # 查询1s以上的慢查询
[
  {
    op: 'command',
    ns: 'core.device',
    command: {
      findAndModify: 'device',
      query: { _id: '391a82cc-d825-44d1-8485-1c84b677b9b0' },
      fields: {},
      new: true,
      update: {
        '$set': {
          'status.lidarData': [ { lidarType: 'MAIN', value: [Object] } ],
          'status.lidarDataTimestamp': Long("1666077698408"),
          statusTimestamp: Long("1666077698408"),
          statusLatency: Long("27"),
          operationStatus: 'IDLE',
          operationStatusSorter: 0,
          online: true,
          modified: {
            principalId: 'anonymous',
            principalType: 'ANONYMOUS',
            anonymous: true,
            timestamp: Long("1666077698435")
          }
        }
      },
      '$db': 'core',
      lsid: { id: new UUID("31d73b48-620c-4b1a-8683-c5527480e96f") }
    },
    keysExamined: 1,
    docsExamined: 1,
    nMatched: 1,
    nModified: 1,
    keysInserted: 1,
    keysDeleted: 1,
    numYield: 0,
    locks: {
      ParallelBatchWriterMode: { acquireCount: { r: Long("1") } },
      ReplicationStateTransition: { acquireCount: { w: Long("1") } },
      Global: { acquireCount: { w: Long("1") } },
      Database: { acquireCount: { w: Long("1") } },
      Collection: { acquireCount: { w: Long("1") } },
      Mutex: { acquireCount: { r: Long("1") } }
    },
    flowControl: { acquireCount: Long("1"), timeAcquiringMicros: Long("2") },
    responseLength: 289353,
    protocol: 'op_msg',
    millis: 42,
    planSummary: 'IDHACK',
    execStats: {
      stage: 'UPDATE',
      nReturned: 1,
      executionTimeMillisEstimate: 42,
      works: 1,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      nMatched: 1,
      nWouldModify: 1,
      wouldInsert: false,
      inputStage: {
        stage: 'IDHACK',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 1,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 1,
        restoreState: 1,
        isEOF: 1,
        keysExamined: 1,
        docsExamined: 1
      }
    },
    ts: ISODate("2022-10-18T07:21:38.483Z"),
    client: '10.42.0.211',
    allUsers: [ { user: 'admin', db: 'admin' } ],
    user: 'admin@admin'
  },
  {
    op: 'command',
    ns: 'core.device',
    command: {
      findAndModify: 'device',
      query: { _id: '391a82cc-d825-44d1-8485-1c84b677b9b0' },
      fields: {},
      new: true,
      update: {
        '$set': {
          'status.lidarData': [ { lidarType: 'MAIN', value: [Object] } ],
          'status.lidarDataTimestamp': Long("1666077699070"),
          statusTimestamp: Long("1666077699070"),
          statusLatency: Long("34"),
          operationStatus: 'IDLE',
          operationStatusSorter: 0,
          online: true,
          modified: {
            principalId: 'anonymous',
            principalType: 'ANONYMOUS',
            anonymous: true,
            timestamp: Long("1666077699104")
          }
        }
      },
      '$db': 'core',
      lsid: { id: new UUID("31d73b48-620c-4b1a-8683-c5527480e96f") }
    },
    keysExamined: 1,
    docsExamined: 1,
    nMatched: 1,
    nModified: 1,
    keysInserted: 1,
    keysDeleted: 1,
    numYield: 0,
    locks: {
      ParallelBatchWriterMode: { acquireCount: { r: Long("1") } },
      ReplicationStateTransition: { acquireCount: { w: Long("1") } },
      Global: { acquireCount: { w: Long("1") } },
      Database: { acquireCount: { w: Long("1") } },
      Collection: { acquireCount: { w: Long("1") } },
      Mutex: { acquireCount: { r: Long("1") } }
    },
    flowControl: { acquireCount: Long("1"), timeAcquiringMicros: Long("3") },
    responseLength: 289361,
    protocol: 'op_msg',
    millis: 24,
    planSummary: 'IDHACK',
    execStats: {
      stage: 'UPDATE',
      nReturned: 1,
      executionTimeMillisEstimate: 22,
      works: 1,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      nMatched: 1,
      nWouldModify: 1,
      wouldInsert: false,
      inputStage: {
        stage: 'IDHACK',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 1,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 1,
        restoreState: 1,
        isEOF: 1,
        keysExamined: 1,
        docsExamined: 1
      }
    },
    ts: ISODate("2022-10-18T07:21:39.133Z"),
    client: '10.42.0.211',
    allUsers: [ { user: 'admin', db: 'admin' } ],
    user: 'admin@admin'
  }
]

可以看到这里的 command 表没有做索引,与 Device 表数据内容太大了,让同事加上索引与代码优化后,API 响应速度快了近一倍