Jenkins 多 Master 架构部署方案(K8S + Gateway API)
适用场景:小型公司 Jenkins 服务,作业增多,需要 Master 拆分,K8S 部署,Gateway API 路由。 ⚠️ 大部分场景单 Master 即可满足需求;能调优解决的别拆,调优解决不了的再拆。
一、架构总览
核心原则
- Master 只做调度(executors=0),不跑构建,否则会增大 Master 负载
- 构建全在 Agent Pod 中执行,按需创建/销毁
- 各 Master 完全独立,各自有独立的 PVC 持久化
- Gateway API 统一入口,按路径/域名路由
架构组件
用户请求 → Gateway (HTTPRoute) → 各 Jenkins Master (StatefulSet)
↓
Agent Pods (Kubernetes Plugin)
↓
K8s Node (物理/虚拟机)
每个 Jenkins Master 部署为一个独立的 StatefulSet,包含:
- Jenkins 容器(JCasC 配置、插件管理)
- PVC 持久化(JENKINS_HOME)
- Service(集群内访问)
- Gateway HTTPRoute(外部路由规则)
二、前置条件
2.1 K8s 集群要求
| 组件 | 版本要求 | 说明 |
|---|---|---|
| Kubernetes | ≥ 1.24 | 稳定版,如 1.28.15 |
| Gateway API CRD | ≥ 1.0.0 | kubectl get gatewayclass 验证 |
| Gateway 控制器 | 取决于实现 | 推荐 Istio / Envoy Gateway / Kong |
| Helm | ≥ 3.8 | 部署 Jenkins 用 |
| cert-manager | ≥ 1.12 | HTTPS 证书管理(可选) |
2.2 安装 Gateway API CRD
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
# 验证
kubectl get crd | grep gateway
# 应看到: gatewayclasses.gateway.networking.k8s.io、gateways.gateway.networking.k8s.io、httproutes.gateway.networking.k8s.io
2.3 存储类确认
kubectl get storageclass
# 确认有默认 StorageClass (annotations 含 storageclass.kubernetes.io/is-default-class=true)
三、命名空间规划
每个 Jenkins Master 专用一个命名空间,隔离资源和权限:
kubectl create ns jenkins-team-a
kubectl create ns jenkins-team-b
⚠️ 如果有多个 Master 共享同一命名空间,Service 命名冲突、RBAC 权限交叉、资源配额互相影响等问题会逐渐暴露。
四、部署 Jenkins Master
4.1 StatefulSet 定义
使用 StatefulSet 而非 Deployment,因为 Jenkins 需要稳定的网络标识和独立的持久化存储:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jenkins-master
namespace: jenkins-team-a
spec:
serviceName: jenkins-master-svc
replicas: 1
selector:
matchLabels:
app: jenkins
template:
metadata:
labels:
app: jenkins
spec:
containers:
- name: jenkins
image: jenkins/jenkins:lts-jdk17
ports:
- containerPort: 8080
name: http
- containerPort: 50000
name: agent
env:
- name: JAVA_OPTS
value: "-Djenkins.install.runSetupWizard=false"
- name: CASC_JENKINS_CONFIG
value: /var/jenkins_home/casc.yaml
volumeMounts:
- name: jenkins-home
mountPath: /var/jenkins_home
- name: casc-config
mountPath: /var/jenkins_home/casc.yaml
subPath: casc.yaml
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
volumes:
- name: casc-config
configMap:
name: jenkins-casc
volumeClaimTemplates:
- metadata:
name: jenkins-home
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 50Gi
4.2 使用 ConfigMap 管理 JCasC(Configuration as Code)
apiVersion: v1
kind: ConfigMap
metadata:
name: jenkins-casc
namespace: jenkins-team-a
data:
casc.yaml: |
jenkins:
systemMessage: "Jenkins Team A - Managed by JCasC"
numExecutors: 0 # ⚠️ Master 不执行构建
mode: NORMAL
security:
globalJobDslSecurity: true
credentials:
system:
domainCredentials:
- credentials:
- string:
scope: SYSTEM
id: "kubernetes-token"
secret: "${K8S_TOKEN}"
unclassified:
location:
url: "https://jenkins-team-a.example.com"
shell:
shell: "/bin/bash"
jobs:
- file: "/var/jenkins_home/init.groovy.d/seed-job.groovy"
4.3 Headless Service(用于 Agent 连接)
apiVersion: v1
kind: Service
metadata:
name: jenkins-master-svc
namespace: jenkins-team-a
spec:
clusterIP: None # Headless Service
ports:
- name: http
port: 8080
- name: agent
port: 50000
selector:
app: jenkins
Headless Service 让 Jenkins Agent 通过 Pod DNS 名称(
jenkins-master-0.jenkins-master-svc.jenkins-team-a.svc.cluster.local)直连 Master 的 50000 端口,避免经过 kube-proxy 造成不必要的网络跳转。
五、Gateway API 配置
5.1 GatewayClass 与 Gateway
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: internal-gateway
spec:
controllerName: istio.io/gateway-controller # 或 envoyproxy/gateway-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: jenkins-gateway
namespace: jenkins-team-a
spec:
gatewayClassName: internal-gateway
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: jenkins-tls
5.2 HTTPRoute 路由配置
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: jenkins-team-a-route
namespace: jenkins-team-a
spec:
parentRefs:
- name: jenkins-gateway
hostnames:
- "jenkins-team-a.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: jenkins-master-svc
port: 8080
多 Master 场景:每个 Master 一个命名空间、一个 HTTPRoute、一个独立域名(如
jenkins-team-a.example.com、jenkins-team-b.example.com),Gateway 根据hostnames路由。
5.3 cert-manager TLS 证书
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: jenkins-tls
namespace: jenkins-team-a
spec:
secretName: jenkins-tls
dnsNames:
- "jenkins-team-a.example.com"
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
六、Agent 动态伸缩
6.1 Kubernetes Plugin 配置(JCasC)
jenkins:
clouds:
- kubernetes:
name: "kubernetes"
serverUrl: "https://kubernetes.default.svc.cluster.local"
namespace: "jenkins-agents"
skipTlsVerify: false
templates:
- name: "default-agent"
label: "default"
nodeUsageMode: NORMAL
containers:
- name: "jnlp"
image: "jenkins/inbound-agent:latest-jdk17"
resourceRequestCpu: "500m"
resourceRequestMemory: "512Mi"
resourceLimitCpu: "1"
resourceLimitMemory: "1Gi"
- name: "docker"
image: "docker:24.0-cli"
command: "cat"
ttyEnabled: true
resourceRequestCpu: "100m"
resourceRequestMemory: "128Mi"
podRetention: "Never"
idleMinutes: 5
6.2 资源限制参考
| Agent 类型 | CPU Request | Memory Request | CPU Limit | Memory Limit | 适用场景 |
|---|---|---|---|---|---|
| 轻量级 | 0.5 | 512Mi | 1 | 1Gi | 前端构建、脚本执行 |
| 中量级 | 1 | 1Gi | 2 | 2Gi | Java 编译、后端打包 |
| 重量级 | 2 | 2Gi | 4 | 4Gi | 镜像构建、集成测试 |
七、监控体系(VictoriaMetrics)
7.1 部署 VictoriaMetrics
helm repo add vm https://victoriametrics.github.io/helm-charts
helm upgrade --install victoria-metrics vm/victoria-metrics-single \
-n monitoring --create-namespace
7.2 暴露 Metrics 端点
Jenkins 安装 Metrics Plugin 和 Prometheus Plugin,然后配置 ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: jenkins-monitor
namespace: jenkins-team-a
spec:
endpoints:
- interval: 15s
path: "/prometheus"
port: "http"
selector:
matchLabels:
app: jenkins
7.3 核心监控指标
| 指标 | 含义 | 告警阈值 |
|---|---|---|
jenkins_queue_size |
构建队列长度 | > 10 持续 5min |
jenkins_node_online_count |
在线 Agent 数 | = 0 持续 5min |
jenkins_executor_count |
Executor 总数 | — |
jenkins_executor_in_use_count |
使用中 Executor 数 | 接近总数 |
jenkins_job_duration |
Job 执行时长 | p99 > 30min |
jenkins_job_success_rate |
Job 成功率 | < 95% |
jenkins_node_disk |
Master 磁盘使用率 | > 85% |
7.4 Grafana 仪表盘
推荐导入 Jenkins Performance Dashboard 或自行创建看板,重点关注:
- 队列积压趋势 — 是否 Agent 不够
- Job 耗时分布 — 哪些 Job 在变慢
- 资源使用率 — Master/Agent CPU、内存
- 成功率趋势 — 失败是否在上升
八、运维操作手册
8.1 备份与恢复
方案一:PVC 快照(推荐)
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: jenkins-backup-20260430
namespace: jenkins-team-a
spec:
volumeSnapshotClassName: csi-hostpath-snap
source:
persistentVolumeClaimName: jenkins-home-jenkins-master-0
方案二:rsync 到外部存储
# 在 Jenkins Master Pod 内执行
kubectl exec -n jenkins-team-a jenkins-master-0 -- \
tar czf - /var/jenkins_home | \
ssh backup-server "cat > /backups/jenkins/jenkins-team-a-$(date +%Y%m%d).tar.gz"
恢复步骤:
- 停掉 Jenkins StatefulSet(replicas=0)
- 删除旧 PVC(保留 PV 的话删除 PVC 即可)
- 从快照恢复 PVC 或 rsync 回数据
- 恢复 StatefulSet
8.2 升级策略
# 1. 备份 JENKINS_HOME
# 2. 拉新镜像
kubectl set image -n jenkins-team-a sts/jenkins-master jenkins=jenkins/jenkins:lts-jdk17
# 3. 观察启动日志
kubectl logs -n jenkins-team-a -l app=jenkins --tail=100 -f
# 4. 验证
kubectl exec -n jenkins-team-a jenkins-master-0 -- cat /var/jenkins_home/jenkins.model.Jenkins.version
升级顺序: 先升插件 → 再升 Jenkins 版本。升完级跑一轮构建验证。
8.3 常见问题
| 问题 | 排查方向 | 解决 |
|---|---|---|
| Agent 无法连接到 Master | Agent → Master 50000 端口网络不通 | 检查 Headless Service DNS、NetworkPolicy |
| PVC 写满 | kubectl exec df -h |
清理构建日志或扩容 PVC:编辑 PVC spec.resources.requests.storage(部分 StorageClass 支持在线扩容) |
| Gateway 路由不生效 | HTTP 404/503 | kubectl describe httproute + kubectl describe gateway + 检查网关控制器日志 |
| JCasC 配置不生效 | Pod 启动后配置未加载 | kubectl logs 检查 CASC 插件日志;验证 ConfigMap YAML 缩进 |
| 构建任务一直 Pending | Agent Pod 创建失败或资源不足 | kubectl describe pod <agent-pod> 查看 Events + kubectl top nodes |
关联页面
| 页面 | 关联点 |
|---|---|
| k8s-statefulset-guide | StatefulSet 原理,Jenkins Master 使用 StatefulSet + PVC Template |
| k8s-persistent-storage-guide | PV/PVC/StorageClass 存储机制,Jenkins JENKINS_HOME 持久化 |
| k8s-probes-guide | 探针配置用于 Jenkins Master 健康检查 |
| k8s-resource-limits-configuration | K8s 资源限制配置,Agent Pod Requests/Limits 参考 |
| service-troubleshooting | Service/Ingress 排障,Gateway API 排查 |
| server-security-hardening-checklist | 基础设施安全基线,Gateway TLS 证书使用 |
| jenkins-ansible-integration-guide | Jenkins + Ansible 集成实战指南 — Ubuntu 24.04 环境安装、插件配置、 |
| k8s-multicluster-istio-canary | K8s 多集群 + Istio 灰度发布与流量治理生产指南 — 全球多活架构、五层治理设计、Cana |
| k8s-cicd-architecture-guide | K8s CI/CD 全链路架构(Jenkins / Argo CD / Helm),Jenkins 作为 CI 核心组件 |