返回首页

Jenkins 多 Master 架构部署方案 — K8S + Gateway API

📅 创建于 2026-05-12 🔄 更新于 2026-05-12 📝 1189 字

Jenkins 多 Master 架构部署方案(K8S + Gateway API)

适用场景:小型公司 Jenkins 服务,作业增多,需要 Master 拆分,K8S 部署,Gateway API 路由。 ⚠️ 大部分场景单 Master 即可满足需求;能调优解决的别拆,调优解决不了的再拆。

一、架构总览

核心原则

  • Master 只做调度(executors=0),不跑构建,否则会增大 Master 负载
  • 构建全在 Agent Pod 中执行,按需创建/销毁
  • 各 Master 完全独立,各自有独立的 PVC 持久化
  • Gateway API 统一入口,按路径/域名路由

架构组件

用户请求 → Gateway (HTTPRoute) → 各 Jenkins Master (StatefulSet)
                                         ↓
                                  Agent Pods (Kubernetes Plugin)
                                         ↓
                                  K8s Node (物理/虚拟机)

每个 Jenkins Master 部署为一个独立的 StatefulSet,包含:

  • Jenkins 容器(JCasC 配置、插件管理)
  • PVC 持久化(JENKINS_HOME)
  • Service(集群内访问)
  • Gateway HTTPRoute(外部路由规则)

二、前置条件

2.1 K8s 集群要求

组件 版本要求 说明
Kubernetes ≥ 1.24 稳定版,如 1.28.15
Gateway API CRD ≥ 1.0.0 kubectl get gatewayclass 验证
Gateway 控制器 取决于实现 推荐 Istio / Envoy Gateway / Kong
Helm ≥ 3.8 部署 Jenkins 用
cert-manager ≥ 1.12 HTTPS 证书管理(可选)

2.2 安装 Gateway API CRD

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml

# 验证
kubectl get crd | grep gateway
# 应看到: gatewayclasses.gateway.networking.k8s.io、gateways.gateway.networking.k8s.io、httproutes.gateway.networking.k8s.io

2.3 存储类确认

kubectl get storageclass
# 确认有默认 StorageClass (annotations 含 storageclass.kubernetes.io/is-default-class=true)

三、命名空间规划

每个 Jenkins Master 专用一个命名空间,隔离资源和权限:

kubectl create ns jenkins-team-a
kubectl create ns jenkins-team-b

⚠️ 如果有多个 Master 共享同一命名空间,Service 命名冲突、RBAC 权限交叉、资源配额互相影响等问题会逐渐暴露。


四、部署 Jenkins Master

4.1 StatefulSet 定义

使用 StatefulSet 而非 Deployment,因为 Jenkins 需要稳定的网络标识独立的持久化存储

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: jenkins-master
  namespace: jenkins-team-a
spec:
  serviceName: jenkins-master-svc
  replicas: 1
  selector:
    matchLabels:
      app: jenkins
  template:
    metadata:
      labels:
        app: jenkins
    spec:
      containers:
      - name: jenkins
        image: jenkins/jenkins:lts-jdk17
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 50000
          name: agent
        env:
        - name: JAVA_OPTS
          value: "-Djenkins.install.runSetupWizard=false"
        - name: CASC_JENKINS_CONFIG
          value: /var/jenkins_home/casc.yaml
        volumeMounts:
        - name: jenkins-home
          mountPath: /var/jenkins_home
        - name: casc-config
          mountPath: /var/jenkins_home/casc.yaml
          subPath: casc.yaml
        resources:
          requests:
            cpu: "1"
            memory: "2Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
      volumes:
      - name: casc-config
        configMap:
          name: jenkins-casc
  volumeClaimTemplates:
  - metadata:
      name: jenkins-home
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 50Gi

4.2 使用 ConfigMap 管理 JCasC(Configuration as Code)

apiVersion: v1
kind: ConfigMap
metadata:
  name: jenkins-casc
  namespace: jenkins-team-a
data:
  casc.yaml: |
    jenkins:
      systemMessage: "Jenkins Team A - Managed by JCasC"
      numExecutors: 0    # ⚠️ Master 不执行构建
      mode: NORMAL
    security:
      globalJobDslSecurity: true
    credentials:
      system:
        domainCredentials:
          - credentials:
              - string:
                  scope: SYSTEM
                  id: "kubernetes-token"
                  secret: "${K8S_TOKEN}"
    unclassified:
      location:
        url: "https://jenkins-team-a.example.com"
      shell:
        shell: "/bin/bash"
    jobs:
      - file: "/var/jenkins_home/init.groovy.d/seed-job.groovy"

4.3 Headless Service(用于 Agent 连接)

apiVersion: v1
kind: Service
metadata:
  name: jenkins-master-svc
  namespace: jenkins-team-a
spec:
  clusterIP: None    # Headless Service
  ports:
  - name: http
    port: 8080
  - name: agent
    port: 50000
  selector:
    app: jenkins

Headless Service 让 Jenkins Agent 通过 Pod DNS 名称(jenkins-master-0.jenkins-master-svc.jenkins-team-a.svc.cluster.local)直连 Master 的 50000 端口,避免经过 kube-proxy 造成不必要的网络跳转。


五、Gateway API 配置

5.1 GatewayClass 与 Gateway

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: internal-gateway
spec:
  controllerName: istio.io/gateway-controller  # 或 envoyproxy/gateway-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: jenkins-gateway
  namespace: jenkins-team-a
spec:
  gatewayClassName: internal-gateway
  listeners:
  - name: https
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: jenkins-tls

5.2 HTTPRoute 路由配置

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: jenkins-team-a-route
  namespace: jenkins-team-a
spec:
  parentRefs:
  - name: jenkins-gateway
  hostnames:
  - "jenkins-team-a.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: jenkins-master-svc
      port: 8080

多 Master 场景:每个 Master 一个命名空间、一个 HTTPRoute、一个独立域名(如 jenkins-team-a.example.comjenkins-team-b.example.com),Gateway 根据 hostnames 路由。

5.3 cert-manager TLS 证书

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: jenkins-tls
  namespace: jenkins-team-a
spec:
  secretName: jenkins-tls
  dnsNames:
  - "jenkins-team-a.example.com"
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

六、Agent 动态伸缩

6.1 Kubernetes Plugin 配置(JCasC)

jenkins:
  clouds:
    - kubernetes:
        name: "kubernetes"
        serverUrl: "https://kubernetes.default.svc.cluster.local"
        namespace: "jenkins-agents"
        skipTlsVerify: false
        templates:
          - name: "default-agent"
            label: "default"
            nodeUsageMode: NORMAL
            containers:
              - name: "jnlp"
                image: "jenkins/inbound-agent:latest-jdk17"
                resourceRequestCpu: "500m"
                resourceRequestMemory: "512Mi"
                resourceLimitCpu: "1"
                resourceLimitMemory: "1Gi"
              - name: "docker"
                image: "docker:24.0-cli"
                command: "cat"
                ttyEnabled: true
                resourceRequestCpu: "100m"
                resourceRequestMemory: "128Mi"
            podRetention: "Never"
            idleMinutes: 5

6.2 资源限制参考

Agent 类型 CPU Request Memory Request CPU Limit Memory Limit 适用场景
轻量级 0.5 512Mi 1 1Gi 前端构建、脚本执行
中量级 1 1Gi 2 2Gi Java 编译、后端打包
重量级 2 2Gi 4 4Gi 镜像构建、集成测试

七、监控体系(VictoriaMetrics)

7.1 部署 VictoriaMetrics

helm repo add vm https://victoriametrics.github.io/helm-charts
helm upgrade --install victoria-metrics vm/victoria-metrics-single \
  -n monitoring --create-namespace

7.2 暴露 Metrics 端点

Jenkins 安装 Metrics PluginPrometheus Plugin,然后配置 ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jenkins-monitor
  namespace: jenkins-team-a
spec:
  endpoints:
  - interval: 15s
    path: "/prometheus"
    port: "http"
  selector:
    matchLabels:
      app: jenkins

7.3 核心监控指标

指标 含义 告警阈值
jenkins_queue_size 构建队列长度 > 10 持续 5min
jenkins_node_online_count 在线 Agent 数 = 0 持续 5min
jenkins_executor_count Executor 总数
jenkins_executor_in_use_count 使用中 Executor 数 接近总数
jenkins_job_duration Job 执行时长 p99 > 30min
jenkins_job_success_rate Job 成功率 < 95%
jenkins_node_disk Master 磁盘使用率 > 85%

7.4 Grafana 仪表盘

推荐导入 Jenkins Performance Dashboard 或自行创建看板,重点关注:

  • 队列积压趋势 — 是否 Agent 不够
  • Job 耗时分布 — 哪些 Job 在变慢
  • 资源使用率 — Master/Agent CPU、内存
  • 成功率趋势 — 失败是否在上升

八、运维操作手册

8.1 备份与恢复

方案一:PVC 快照(推荐)

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: jenkins-backup-20260430
  namespace: jenkins-team-a
spec:
  volumeSnapshotClassName: csi-hostpath-snap
  source:
    persistentVolumeClaimName: jenkins-home-jenkins-master-0

方案二:rsync 到外部存储

# 在 Jenkins Master Pod 内执行
kubectl exec -n jenkins-team-a jenkins-master-0 -- \
  tar czf - /var/jenkins_home | \
  ssh backup-server "cat > /backups/jenkins/jenkins-team-a-$(date +%Y%m%d).tar.gz"

恢复步骤:

  1. 停掉 Jenkins StatefulSet(replicas=0)
  2. 删除旧 PVC(保留 PV 的话删除 PVC 即可)
  3. 从快照恢复 PVC 或 rsync 回数据
  4. 恢复 StatefulSet

8.2 升级策略

# 1. 备份 JENKINS_HOME
# 2. 拉新镜像
kubectl set image -n jenkins-team-a sts/jenkins-master jenkins=jenkins/jenkins:lts-jdk17

# 3. 观察启动日志
kubectl logs -n jenkins-team-a -l app=jenkins --tail=100 -f

# 4. 验证
kubectl exec -n jenkins-team-a jenkins-master-0 -- cat /var/jenkins_home/jenkins.model.Jenkins.version

升级顺序: 先升插件 → 再升 Jenkins 版本。升完级跑一轮构建验证。

8.3 常见问题

问题 排查方向 解决
Agent 无法连接到 Master Agent → Master 50000 端口网络不通 检查 Headless Service DNS、NetworkPolicy
PVC 写满 kubectl exec df -h 清理构建日志或扩容 PVC:编辑 PVC spec.resources.requests.storage(部分 StorageClass 支持在线扩容)
Gateway 路由不生效 HTTP 404/503 kubectl describe httproute + kubectl describe gateway + 检查网关控制器日志
JCasC 配置不生效 Pod 启动后配置未加载 kubectl logs 检查 CASC 插件日志;验证 ConfigMap YAML 缩进
构建任务一直 Pending Agent Pod 创建失败或资源不足 kubectl describe pod <agent-pod> 查看 Events + kubectl top nodes

关联页面

页面 关联点
k8s-statefulset-guide StatefulSet 原理,Jenkins Master 使用 StatefulSet + PVC Template
k8s-persistent-storage-guide PV/PVC/StorageClass 存储机制,Jenkins JENKINS_HOME 持久化
k8s-probes-guide 探针配置用于 Jenkins Master 健康检查
k8s-resource-limits-configuration K8s 资源限制配置,Agent Pod Requests/Limits 参考
service-troubleshooting Service/Ingress 排障,Gateway API 排查
server-security-hardening-checklist 基础设施安全基线,Gateway TLS 证书使用
jenkins-ansible-integration-guide Jenkins + Ansible 集成实战指南 — Ubuntu 24.04 环境安装、插件配置、
k8s-multicluster-istio-canary K8s 多集群 + Istio 灰度发布与流量治理生产指南 — 全球多活架构、五层治理设计、Cana
k8s-cicd-architecture-guide K8s CI/CD 全链路架构(Jenkins / Argo CD / Helm),Jenkins 作为 CI 核心组件