搜索结果: "troubleshooting"
共找到 84 个页面
Wiki Schema
- File names: lowercase, hyphens, no spaces (e.g., `pod-troubleshooting.md`)
tags: [kubernetes, troubleshooting, ...]
sources: [raw/articles/k8s-troubleshooting-article.md]
- troubleshooting: 排障相关
- Troubleshooting steps
Docker 生产环境踩坑指南 — 10 + 5 个常见问题
tags: [docker, container, troubleshooting, security, production, networking, debugging]
| [[container-networking-troubleshooting]] | 容器网络排障 6 层模型(Docker bridge 网络是第 ②~⑤ 层) |
| [[resource-rbac-scheduling-troubleshooting]] | K8s 资源配额/OOMKilled 排障 |
| [[jvm-container-oom-offheap-troubleshooting]] | JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKi |
| [[pod-troubleshooting]] | Pod CrashLoopBackOff 排障(含 Java OOM/Exit Code 分析) |
DevOps 技术面试指南 — 容器/云原生/内核 59 题
| 51 | Pod hostPath 权限问题? | 宿主机权限 700 vs 非 root Pod → Permission denied;用 securityContext.runAsUser + fsGroup 解决 | [[pod-troubleshooting]] |
| 1 | 容器编排与 K8s 优势? | 自动化部署/扩展/管理;优势:自动扩缩、服务发现、存储编排、滚动更新、生态 | [[k8s-troubleshooting-principles]] |
| 2 | K8s RBAC? | Role/ClusterRole 定义权限,RoleBinding/ClusterRoleBinding 分配权限 | [[resource-rbac-scheduling-troubleshooting]] |
| 50 | iptables vs ipvs? | iptables:规则链匹配(规模大效率低);ipvs:哈希表转发(效率高/支持会话保持)。kube-proxy mode 切换 | [[service-troubleshooting]] |
| 10 | 网络故障排查? | ping(连通) → traceroute(路由) → netstat(连接) → tcpdump(抓包) → 防火墙 | [[network-troubleshooting-order]] |
服务器网络排障方法论 — 分层定位七步法
tags: [networking, troubleshooting, production, debugging, monitoring, linux]
sources: [raw/articles/network-troubleshooting-order.md, raw/articles/别再乱猜了-Linux网络不通-按这6个阶段排查-100-找到根因.md]
| 应用性能慢 / 接口超时 | 网络排障 → 全栈性能排障 | [[fullstack-performance-troubleshooting]] |
| K8s 内 Pod 间不通 | K8s Service / NetworkPolicy 排查 | [[service-troubleshooting]] |
| K8s 服务访问 502/504 超时 | K8s 服务访问排查十步工作流 | [[k8s-service-access-troubleshooting]] |
线上故障排查清单 — CPU/磁盘/内存/GC/网络 四维速查
tags: [troubleshooting, performance, linux, debugging, production, monitoring]
- raw/articles/online-troubleshooting-checklist-fredal-4a575c242fc47a28.md
详细排查流程参见 [[cpu-spike-troubleshooting-guide]](四层定位法)和 [[cpu-spike-3-commands]](三命令速查)。
详细调优参见 [[linux-disk-io-tuning]](IO 深度优化)和 [[linux-disk-space-troubleshooting]](空间排查流程)。
GC 问题往往导致 CPU 飙高,参见 [[cpu-spike-troubleshooting-guide]] 中 GC 相关章节。
运维工程师面试 50 题 — 经典 Linux/网络/数据库基础全覆盖
tags: [linux, networking, database, troubleshooting, production, security, monitoring, docker]
参见 [[linux-disk-space-troubleshooting]](磁盘空间排查完整流程)。
参见 [[network-troubleshooting-order]](分层七步法)。
参见 [[linux-disk-space-troubleshooting]](完整清理流程)。
参见 [[server-performance-four-dimensions]](五维排查)和 [[online-troubleshooting-checklist]](四维速查清单)。
容器网络排障 6 层模型 — K8s/Docker/containerd 统一排查体系
tags: [kubernetes, networking, troubleshooting, docker, debugging, performance, container]
sources: [raw/articles/container-networking-troubleshooting-6-layer.md]
| [[k8s-service-access-troubleshooting]] | K8s Service/Ingress 网络排障十步工作流(6 层模型第 ⑥ 层深入) |
| [[network-troubleshooting-order]] | 服务器网络排障七步法(6 层模型第 ③~⑤ 层补充) |
| [[k8s-top10-troubleshooting-checklist]] | K8s 10 大故障场景快速参考(网络故障层级识别) |
K8s 架构与核心概念深度解析 — 面试通关秘籍(一)
与 [[k8s-troubleshooting-principles]] 结合可理解排障的底层逻辑。
详见 [[pod-troubleshooting]](Pod 排障)和 [[k8s-probes-guide]](探针机制)。
详见 [[service-troubleshooting]](Service 排障)和 [[k8s-service-access-troubleshooting]](十步排查流程)。
详见 [[k8s-persistent-storage-guide]](PV/PVC 生产实战)和 [[storage-troubleshooting]](存储排障)。
| [[k8s-troubleshooting-principles]] | 排障基本原则与架构联动 |
Kubernetes CoreDNS 自定义域名解析 — 五种场景从原理到生产实操
tags: [kubernetes, dns, networking, troubleshooting, production]
> 排障参考:[[k8s-service-access-troubleshooting]](DNS 是十步工作流的关键环节)
- [[k8s-service-access-troubleshooting]] — K8s 服务访问十步排查(DNS 是第一步)
- [[service-troubleshooting]] — Service 与网络排障(CoreDNS/kube-proxy 联动)
- [[network-troubleshooting-order]] — 网络排障分层方法论(DNS 层的诊断视角)
K8s 面试通关指南 — 100 道核心题全解析
tags: [kubernetes, deployment, troubleshooting, production, networking, security]
| 1 | 什么是 Kubernetes? | 开源容器编排平台,自动部署、扩展和管理容器化应用 | [[k8s-troubleshooting-principles]] |
| 4 | 什么是 Pod? | K8s 最小部署单元,包含一个或多个容器,共享网络和存储 | [[pod-troubleshooting]] |
| 6 | 什么是 Service? | 暴露应用的稳定访问地址,类型:ClusterIP/NodePort/LoadBalancer/ExternalName | [[service-troubleshooting]] |
| 16 | 什么是 Ingress? | 管理外部 HTTP/HTTPS 访问,支持域名/路径路由和 TLS 终止 | [[k8s-service-access-troubleshooting]] |
Pod Pending 排障指南 — 7 个角度快速定位调度失败根因
tags: [kubernetes, troubleshooting, pod, scheduling, networking, storage, deployment]
- [[pod-troubleshooting]] — Pod 排障总纲(CrashLoopBackOff/OOM/探针/Pending 基本排查)
- [[resource-rbac-scheduling-troubleshooting]] — 调度失败/Taint/OOMKill 排障
- [[storage-troubleshooting]] — 存储排障(PVC Pending)
- [[k8s-top10-troubleshooting-checklist]] — 十大高频排障清单
K8s 资源限制配置指南 — Request / Limit / QoS / CPU Throttling
tags: [kubernetes, troubleshooting, pod, deployment, production, debugging]
详情参见 [[pod-troubleshooting]] 的 Pending/OOMKilled 章节。
ResourceQuota 超限时新 Pod 无法创建。详情见 [[resource-rbac-scheduling-troubleshooting]]。
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKi
- [[k8s-pod-pending-troubleshooting-guide]] — Pod Pending 排障指南 — 7 个排查方向(资源不足/污点/亲和性/存储/配额/选择器/端
K8s 滚动更新无损发布误区 — RollingUpdate 真相与真正无感发布体系
tags: [kubernetes, deployment, networking, troubleshooting, production, performance]
详见 [[k8s-probes-guide]](探针完整配置指南)以及 [[pod-troubleshooting]](三探针配合建议)。
| [[pod-troubleshooting]] | lifecycle hooks / preStop / 三探针配合建议 |
| [[k8s-service-access-troubleshooting]] | 连接排空与流量治理 |
| [[service-troubleshooting]] | Service Endpoints 与滚动更新的关系 |
K8s 高频问题一站式排查清单 — 10 大故障场景快速参考
tags: [kubernetes, troubleshooting, production, debugging, pod, node, service, storage, networking, security]
- raw/articles/k8s-top10-troubleshooting-guide.md
📖 深度排查 → [[k8s-scheduling-strategy-guide]] | [[resource-rbac-scheduling-troubleshooting]]
📖 深度排查 → [[pod-troubleshooting]] | [[k8s-probes-guide]]
📖 深度排查 → [[service-troubleshooting]] | [[k8s-service-access-troubleshooting]]
K8s 生产排障基本原则与快速定位流程
tags: [kubernetes, troubleshooting, production, kubectl]
- raw/articles/k8s-troubleshooting-production-guide.md
- raw/articles/k8s-service-access-troubleshooting.md
- raw/articles/k8s-node-notready-troubleshooting.md
- [[k8s-service-access-troubleshooting]] — 从 Pod→Service→Ingress 十步排查工作流
Node 排障 — NotReady 九步排查 / Kubelet / 容器运行时 / 资源压力 / 证书 / 预防
tags: [kubernetes, troubleshooting, node, production, monitoring, networking, certificate]
- raw/articles/k8s-troubleshooting-production-guide.md
- raw/articles/k8s-node-notready-troubleshooting.md
- [[k8s-service-access-troubleshooting]] — 服务访问排查(Pod 排障前置条件)
- [[pod-troubleshooting]] — Pod 排障(节点 NotReady 后 Pod 被驱逐的后续排查)
Pod 排障 — CrashLoopBackOff / Exit Code 排查 / OOM / 探针 / 依赖服务 / ConfigMap
tags: [kubernetes, troubleshooting, pod, deployment, production, networking, configmap]
- raw/articles/k8s-troubleshooting-production-guide.md
- raw/articles/k8s-service-access-troubleshooting.md
- raw/articles/pod-restart-troubleshooting-guide.md
- [[k8s-service-access-troubleshooting]] — 从 Pod→Service→Ingress 十步排查工作流
Service 与网络排障 — Endpoints / DNS / kube-proxy / CNI / NetworkPolicy / Ingress
tags: [kubernetes, troubleshooting, service, networking, ingress, cni, dns]
- raw/articles/k8s-troubleshooting-production-guide.md
- raw/articles/k8s-service-access-troubleshooting.md
- [[k8s-service-access-troubleshooting]] — 从 Pod→Service→Ingress 十步排查工作流
- [[node-troubleshooting]] — Node 排障
Java 应用 CPU 100% 排查实战 — 从告警到代码行的四步法
tags: [linux, java, performance, troubleshooting, case-study, jvm]
> 排查方法论参见 [[cpu-spike-troubleshooting-guide]](通用 Linux 四层体系),三命令速查参见 [[cpu-spike-3-commands]]。
- [[cpu-spike-troubleshooting-guide]] — 通用 Linux CPU 排查四层体系(整机→进程→线程→调用栈)
- [[fullstack-performance-troubleshooting]] — 全栈性能排查(含 JVM GC 深度分析)
- [[online-troubleshooting-checklist]] — 线上故障排查清单
Linux 运维工程师 30 个高频命令速查手册
tags: [linux, tools, troubleshooting, performance, automation]
> 关联排查:[[linux-disk-space-troubleshooting]] — `ls -lhS` 快速定位大文件
> 关联排查:[[linux-disk-space-troubleshooting]] — `find -size` 定位磁盘占用
> 关联排查:[[cpu-spike-troubleshooting-guide]] — top → strace 排查 CPU 飙高
> 关联排查:[[linux-disk-space-troubleshooting]] — df/du 差异 >5% 的排查方法
Linux Load Average 完全解读 — 内核原理 / 排查方法论 / 容器环境实战
tags: [linux, performance, monitoring, troubleshooting, debugging, production]
| [[fullstack-performance-troubleshooting]] | 全栈性能排障第一环 |
| [[network-troubleshooting-order]] | 网络 IO 瓶颈配合排查 |
| [[node-troubleshooting]] | K8s Node 资源压力排查 |
| [[linux-load-high-cpu-low-troubleshooting]] | Linux Load 高但 CPU 低的排查思路 — 从现象确认到根因定位的系统化诊断流程(含 vm |
Linux Load 高但 CPU 低的排查思路 — 系统化诊断流程
tags: [linux, troubleshooting, performance, debugging]
- [[online-troubleshooting-checklist]] — 在线故障排查清单
- [[cpu-spike-troubleshooting-guide]] — CPU 飙高排查指南
- [[linux-perf-troubleshooting-handbook]] — Linux 服务器性能排查实战手册 — 60 秒快速摸底/4 大瓶颈排查/3 个实战案例/监控阈值/
- [[nfs-troubleshooting-sop]] — NFS 故障排查 SOP — 7 步排查法 / 6 类故障 / 4 大实战案例 / 生产最佳实践
NFS 挂载参数全解析 — 测试与调优指南
tags: [linux, storage, networking, performance, troubleshooting, filesystem]
> 相关排障:[[network-troubleshooting-order]](NFS 依赖网络层)
- [[network-troubleshooting-order]] — 网络排障(NFS 依赖 TCP 网络层)
- [[nfs-troubleshooting-sop]] — NFS 故障排查 SOP — 7 步排查法 / 6 类故障 / 4 大实战案例 / 生产最佳实践
- [[storage-troubleshooting]] — K8s 存储排障(PVC 挂载失败与 NFS 的关系)
服务器负载过高排查 — 案例实战 / Netflix 60 秒法 / 常见根因
tags: [linux, troubleshooting, performance, production, debugging, case-study]
| [[linux-disk-space-troubleshooting]] | 磁盘空间排查(lsof deleted / logrotate + 生产清理流程) |
| [[fullstack-performance-troubleshooting]] | 应用层性能排障方法论 |
| [[network-troubleshooting-order]] | 网络排障(TIME_WAIT 等) |
| [[linux-load-high-cpu-low-troubleshooting]] | Linux Load 高但 CPU 低的排查思路 — 从现象确认到根因定位的系统化诊断流程(含 vm |
服务器性能五维排查 — CPU/内存/磁盘/网络/文件系统深度解析
tags: [linux, troubleshooting, performance, monitoring, production, debugging, networking, mysql, filesystem]
详见 [[network-troubleshooting-order]]。
| [[fullstack-performance-troubleshooting]] | 应用层性能排障方法论(Nginx→DB) |
| [[network-troubleshooting-order]] | 网络连通性排障七步法 |
| [[node-troubleshooting]] | K8s Node 资源压力排查 |
Wiki Index
- [[container-networking-troubleshooting]] — 容器网络排障 6 层模型:K8s/Docker/containerd 统一排查体系(分层定位 + 30 秒速查 + 案例复盘)
- [[cpu-spike-troubleshooting-guide]] — 完整 CPU 排查方法论 + 应急响应:整机→进程→线程→调用栈四层定位/常见根因速查/止血手段
- [[dns-troubleshooting-practical-guide]] — DNS 故障排查实战指南 — dig/nslookup 工具详解、四阶段排查流程、SERVFAIL/NXDOMAIN/TIMEOUT 错误类型速查
- [[fullstack-performance-troubleshooting]] — 全栈性能排障:Nginx → 应用 → 数据库 → 服务器
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKilled 的排障指南
Wiki Log
- Created raw: raw/articles/k8s-troubleshooting-production-guide.md
- Created concepts: k8s-troubleshooting-principles, pod-troubleshooting, node-troubleshooting, service-troubleshooting, storage-troubleshooting, resource-rbac-scheduling-troubleshooting
- Created raw: raw/articles/fullstack-performance-troubleshooting.md
- Created concepts: fullstack-performance-troubleshooting, mysql-performance-config
- Updated: resource-rbac-scheduling-troubleshooting
全栈性能排障方法论 — Nginx → 应用 → 数据库 → 服务器
tags: [troubleshooting, production, monitoring, networking, database, mysql, nginx]
sources: [raw/articles/fullstack-performance-troubleshooting.md]
| [[linux-disk-space-troubleshooting]] | 磁盘空间排查 |
| [[network-troubleshooting-order]] | 网络排障七步法 |
Kubernetes 负载均衡深度实践:Service 数据面到生产级流量治理全链路
tags: [kubernetes, networking, production, troubleshooting, performance]
| [[k8s-service-access-troubleshooting]] | K8s 服务访问排查十步工作流(与本文排查清单互补) |
| [[service-troubleshooting]] | Service 网络排障方法论(Endpoints/502/超时排查) |
| [[container-networking-troubleshooting]] | 容器网络排障 6 层模型(负载均衡属于第 ③-⑥ 层) |
K8s 生产环境 10 大故障复盘 — 集群级灾难到应用级问题
tags: [kubernetes, troubleshooting, production, case-study, debugging]
| [[node-troubleshooting]] | 案例 4:节点 NotReady 排查 |
| [[resource-rbac-scheduling-troubleshooting]] | 案例 2/6/9:OOMKilled / 资源配额 / HPA |
| [[pod-troubleshooting]] | 案例 10:ImagePullBackOff |
K8s Pod 调度策略完全指南 — 六大机制全解析
- [[resource-rbac-scheduling-troubleshooting]] — 调度失败排障(Taint/OOMKill/RBAC)
- [[k8s-troubleshooting-principles]] — K8s 排障基本原则
- [[k8s-pod-pending-troubleshooting-guide]] — Pod Pending 排障指南 — 7 个排查方向(资源不足/污点/亲和性/存储/配额/选择器/端
- [[pod-troubleshooting]] — Pod 排障完整指南
K8s 服务访问排查 — 从 Pod、Service 到 Ingress 十步工作流
tags: [kubernetes, troubleshooting, service, pod, ingress, networking, cni, dns]
sources: [raw/articles/k8s-service-access-troubleshooting.md, raw/articles/cni-comparison-flannel-calico-cilium.md]
| [[container-networking-troubleshooting]] | 容器网络 6 层排障模型(Service 网络是第 ⑥ 层) |
| [[pod-troubleshooting]] | Pod 排障:CrashLoopBackOff/Exit Code/探针/依赖服务 |
资源配额 / OOMKilled / RBAC / 调度排障
tags: [kubernetes, troubleshooting, security, hpa, deployment]
sources: [raw/articles/k8s-troubleshooting-production-guide.md, raw/articles/k8s-resource-limits-configuration.md]
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKi
- [[k8s-pod-pending-troubleshooting-guide]] — Pod Pending 排障指南 — 7 个排查方向(资源不足/污点/亲和性/存储/配额/选择器/端
存储排障 — PVC Pending / 挂载失败
tags: [kubernetes, troubleshooting, storage, pvc, statefulset]
sources: [raw/articles/k8s-troubleshooting-production-guide.md]
- [[pod-troubleshooting]] — Pod 排障:CrashLoopBackOff/OOM/探针/8 步排查法(存储挂载失败导致的 Pod 异常)
- [[k8s-pod-pending-troubleshooting-guide]] — Pod Pending 排障指南 — 7 个排查方向(资源不足/污点/亲和性/存储/配额/选择器/端
Linux 服务器 CPU 飙高排查 — 完整方法论 + 应急响应实战
- raw/articles/cpu-spike-troubleshooting-guide.md
| [[linux-load-high-cpu-low-troubleshooting]] | Linux Load 高但 CPU 低的排查思路 — 从现象确认到根因定位的系统化诊断流程(含 vm |
| [[linux-perf-troubleshooting-handbook]] | Linux 服务器性能排查实战手册 — 60 秒快速摸底/4 大瓶颈排查/3 个实战案例/监控阈值/ |
| [[online-troubleshooting-checklist]] | 四维排查速查清单(CPU/磁盘/内存/网络 + Java 工具 jstack/jmap/jstat/tcpdump) |
Linux 磁盘空间排查 — 8 个命令 / 四种场景 / 生产清理流程
tags: [linux, troubleshooting, disk, storage, monitoring, production, debugging]
sources: [raw/articles/linux-disk-space-troubleshooting.md]
- [[node-troubleshooting]] — K8s Node DiskPressure 处理(节点磁盘清理)
- [[storage-troubleshooting]] — K8s 存储排障(PVC/挂载)
Nginx 502/504/Connection Reset 深度排查指南
tags: [nginx, troubleshooting, networking, production, debugging, performance]
| [[network-troubleshooting-order]] | 基础网络连通性排障(防火墙/路由/tcpdump) |
| [[fullstack-performance-troubleshooting]] | 全栈性能排障入口 |
| [[dns-troubleshooting-practical-guide]] | DNS 故障排查实战指南 — dig/nslookup 工具详解、四阶段排查流程、SERVFAIL/ |
CNI 网络插件深度对比 — Flannel vs Calico vs Cilium
| [[service-troubleshooting]] | CNI 排查实战(Flannel 接口/Calico Pod 状态/BGP 路由检查) |
| [[k8s-service-access-troubleshooting]] | 服务访问十步工作流中 CNI 为关键排查步骤 |
| [[network-troubleshooting-order]] | 网络排障底层方法论(与本⽂ Cross-cluster Overlay 排障互补) |
运维自动化脚本 5 件套 — 健康巡检/日志告警/MySQL备份/批量执行/服务守护
- 配合 [[online-troubleshooting-checklist]] 使用,覆盖日常运维+应急排障
| [[online-troubleshooting-checklist]] | 四维排查速查(自动化脚本 + 应急排障互补) |
| [[fullstack-performance-troubleshooting]] | 全栈性能排障入口(脚本 1-5 的排障背景) |
K8s 容量规划、Pod QoS 与成本优化实战指南
详见 [[jvm-container-oom-offheap-troubleshooting]]。
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 容器 OOM 排障(堆外内存)
- [[resource-rbac-scheduling-troubleshooting]] — 资源配额/OOMKilled 排障
K8s 持久化存储 — PV / PVC / StorageClass 生产实战
> ⚠️ 坑:用了不支持 RWX 的存储系统,Pod 会一直 Pending。详见 [[storage-troubleshooting]]。
完整排障方法见 [[storage-troubleshooting]]。
| [[node-troubleshooting]] | Node 排障与 k8s-statefulset-guide 的 Pod 行为交叉 |
Linux 系统调优实战 — 接口响应从 500ms 降到 100ms 全复盘
tags: [linux, performance, troubleshooting, case-study, production]
| [[fullstack-performance-troubleshooting]] | 全栈性能排障方法论 |
| [[jvm-container-oom-offheap-troubleshooting]] | JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKi |
Linux 内存管理深潜 — Buffer/Cache/Page Cache/Slab/回收/OOM 全链路
tags: [linux, memory, performance, troubleshooting, monitoring, debugging]
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 堆外内存(DirectByteBuffer/Metaspace/线程栈)导致容器 OOMKi
- [[online-troubleshooting-checklist]] — 四维排查速查清单(CPU/磁盘/内存/网络 + Java 工具 jstack/jmap/jstat/tcpdump)
Linux 服务器性能排查实战手册 — 三板斧/案例/阈值/参数速查
tags: [linux, troubleshooting, performance, monitoring, nginx, mysql]
- [[cpu-spike-troubleshooting-guide]] — CPU 飙高排查指南
- [[linux-load-high-cpu-low-troubleshooting]] — Load 高但 CPU 低排查
NFS 故障排查 SOP — 7 步法 / 6 类故障 / 实战案例
tags: [linux, troubleshooting, nfs, storage, networking, performance, monitoring, security]
- [[network-troubleshooting-order]] — 网络排查顺序
- [[linux-load-high-cpu-low-troubleshooting]] — Linux Load 高但 CPU 低排查
TCP 连接数爆表:攻击还是 Bug 排查指南
tags: [networking, tcp, troubleshooting, linux, security, case-study]
- [[network-troubleshooting-order]] — 网络排障顺序
- [[online-troubleshooting-checklist]] — 在线排障清单
MySQL 性能调优 — 慢查询 / 锁分析 / 死锁排查 / 索引失效 / 深度分页 / 缓冲池 / 配置模板
- raw/articles/fullstack-performance-troubleshooting.md
- raw/articles/mysql-deadlock-troubleshooting.md
MySQL 主从复制指南 — 原理 / GTID / 架构 / 故障排查 / 监控
tags: [mysql, database, replication, high-availability, troubleshooting, production]
- [[fullstack-performance-troubleshooting]] — 全栈性能排障(MySQL 在其中)
MySQL 慢查询排查案例复盘 — 从发现到根治的完整链路
tags: [mysql, database, performance, troubleshooting, case-study]
| [[fullstack-performance-troubleshooting]] | 全栈排障方法论(Nginx→应用→数据库→服务器) |
Wi-Fi 6/7 路由器五个性能开关 — TxBeamforming / OFDMA / MU-MIMO / MRU / TWT 全解析
- [[network-troubleshooting-order]] — 服务器网络排障方法论(有线网络底层原理)
- [[fullstack-performance-troubleshooting]] — 全栈性能排障(网络层是其中一环)
ConfigMap 挂载踩坑指南 — 符号链接 / 只读 / 热更新 / 标准挂载模式
tags: [kubernetes, troubleshooting, configmap, pod, storage, debugging]
- [[pod-troubleshooting]] — Pod 排障(ConfigMap 挂载异常的 Events 查看)
JVM 容器 OOM 排障指南 — 堆外内存视角
tags: [kubernetes, java, troubleshooting, memory, container, performance]
- [[resource-rbac-scheduling-troubleshooting]] — K8s 资源配额/OOMKilled 排障
K8s 下 Java 内存调优完整指南 — 预算模型、生产配置与治理体系
tags: [kubernetes, java, jvm, memory, performance, troubleshooting, architecture]
- [[jvm-container-oom-offheap-troubleshooting]] — JVM 容器 OOM 排障(堆外内存视角,排障快速参考)
K8s 探针机制 — Liveness / Readiness / Startup 配置指南 + 百万级故障复盘
tags: [kubernetes, pod, troubleshooting, production, monitoring]
与 [[pod-troubleshooting]](CrashLoopBackOff 排障)、[[k8s-rolling-update-pitfalls]](滚动更新中 Readiness 探针的真实业务就绪判断)和 [[k8s-troubleshooting-principles]](排查原则)配合使用。
CPU 飙高三命令排查法 — top → strace → /proc/PID/fd/ 实战
tags: [linux, performance, troubleshooting, case-study, debugging]
| [[cpu-spike-troubleshooting-guide]] | 完整 CPU 排查方法论(整机→进程→线程→调用栈四层定位) |
DNS 故障排查实战指南 — 从本地解析到权威 DNS 全链路
tags: [dns, networking, linux, troubleshooting, command]
- [[network-troubleshooting-order]] — 网络排障七步法(含 DNS 步骤)
systemd 日志管理与实时监控:journalctl 命令完全指南
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查(journalctl 磁盘清理)
- [[online-troubleshooting-checklist]] — 线上故障排查速查清单
Linux 压缩解压工具对比与实战指南
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查(日志压缩清理策略)
- [[cpu-spike-troubleshooting-guide]] — CPU 飙高排查(gzip/xz 压缩导致 CPU 异常场景)
磁盘排查工具实战指南 — iostat/smartctl/lsscsi 详解
tags: [linux, disk, storage, command, monitoring, troubleshooting]
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查
生产级 Linux 磁盘 IO 调优 — 从核心概念到实战落地
| [[linux-disk-space-troubleshooting]] | 磁盘空间排查(空间问题 vs 性能问题) |
- [[online-troubleshooting-checklist]] — 四维排查速查清单(CPU/磁盘/内存/网络 + Java 工具 jstack/jmap/jstat/tcpdump)
Linux 目录结构完全指南 — FHS 标准与运维实战
tags: [linux, command, storage, disk, architecture, troubleshooting]
- [[linux-perf-troubleshooting-handbook]] — Linux 性能排查手册
Linux 硬件信息查询与软件管理命令速查 — CPU/内存/磁盘/网络/主板全覆盖
tags: [linux, command, performance, monitoring, disk, networking, troubleshooting, architecture]
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查
Linux 高并发内核优化手册 — 文件句柄/网络/内存/调度/I/O/安全七维调优
- [[network-troubleshooting-order]] — 网络排查顺序
- [[linux-perf-troubleshooting-handbook]] — Linux 服务器性能排查实战手册 — 60 秒快速摸底/4 大瓶颈排查/3 个实战案例/监控阈值/
生产环境 Linux 内核参数调优 — 6 个必调参数
- [[container-networking-troubleshooting]] — 容器网络排障 6 层模型(rp_filter/ip_forward 参数实战场景)
- [[linux-perf-troubleshooting-handbook]] — Linux 服务器性能排查实战手册 — 60 秒快速摸底/4 大瓶颈排查/3 个实战案例/监控阈值/
Linux 端口探查三工具 — ss / netstat / lsof 完全指南
tags: [linux, networking, monitoring, debugging, troubleshooting]
- [[network-troubleshooting-order]] — 服务器网络排障七步法(含 ss/netstat 实战场景)
SSH 暴力破解防御指南 — 公钥认证 / fail2ban / 2FA / 入侵检测 / 连接限制 / 蜜罐
tags: [security, linux, production, networking, troubleshooting, automation]
| [[network-troubleshooting-order]] | iptables / firewalld 防火墙规则配置 |
SSH 连接调试完全指南 — ssh -vvv 输出解读 + 服务端日志联查 + 典型问题排查
tags: [linux, networking, troubleshooting, security, debugging]
- [[network-troubleshooting-order]] — 服务器网络排障方法论:分层定位七步法(SSH 连接建立的网络层基础)
Nginx 负载均衡策略选择实战指南
tags: [nginx, networking, performance, troubleshooting, architecture, load-balancing]
- [[fullstack-performance-troubleshooting]] — 全栈性能排障
Nginx 日志分析与监控体系构建指南
tags: [nginx, monitoring, performance, troubleshooting, automation, security]
- [[fullstack-performance-troubleshooting]] — 全栈性能排障
注册中心选型 — Nacos / Zookeeper / Consul 深度对比
- [[k8s-service-access-troubleshooting]] — K8s 服务访问排查十步工作流(含服务发现与 DNS 排障)
Redis 连接管理与熔断治理 — 连接数打满 / 僵尸连接 / 雪崩防护
tags: [redis, database, troubleshooting, production, monitoring, debugging]
高并发四大手段:缓存 / 限流 / 削峰 / 幂等 — 各自解决什么问题?
- [[fullstack-performance-troubleshooting]] — 全栈性能排障(高并发问题从应用到数据库的全局视角)
Jenkins 多 Master 架构部署方案 — K8S + Gateway API
| [[service-troubleshooting]] | Service/Ingress 排障,Gateway API 排查 |
K8s 多集群 + Istio 灰度发布 — 全球多活流量治理生产指南
tags: [kubernetes, networking, deployment, architecture, monitoring, troubleshooting, automation, security]
StatefulSet 完全指南 — 稳定网络标识 / 独立存储 / 有序部署
参见 [[pod-troubleshooting]] 了解通用 Pod 排障方法,[[storage-troubleshooting]] 了解存储相关排障。
Linux 海量文件删除指南 — find/perl/rsync/shred 四种方法对比
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查(删除后空间不释放问题)
Linux RAID 与 LVM 基础指南 — 磁盘组合与空间管理
- [[linux-disk-space-troubleshooting]] — 磁盘空间排查
Linux 用户管理避坑指南 — /etc/passwd/shadow 到用户安全运维
tags: [linux, security, production, troubleshooting]
服务器安全加固清单 — 新机器上线前必做的 20 件事
| [[network-troubleshooting-order]] | 防火墙 / iptables 配置与排障 |
Keepalived+Nginx 高可用实战 — 3 个隐藏坑位与生产级防护方案
tags: [nginx, keepalived, networking, production, ha, troubleshooting, monitoring, architecture]
Nginx 典型配置错误复盘 — 20+ 个踩坑点详解
tags: [nginx, troubleshooting, networking, production, security, deployment]
生产级 Nginx 性能优化 — 从内核到 K8s 全链路
tags: [nginx, performance, networking, linux, architecture, troubleshooting]
Nginx 实时推送生产实践全解:SSE 与 WebSocket 的原理、架构、工程化与生产级落地
| [[fullstack-performance-troubleshooting]] | 全栈性能排障体系(ws/SSE 场景扩展) |
Nginx 安全配置实战 — 防DDoS/限流/WAF 规则编写
| [[network-troubleshooting-order]] | 防火墙/iptables 与 Nginx 安全联动 |