草庐IT

calico-kube-controllers 启动失败处理

greenery 2024-03-22 原文

故障描述

calico-kube-controllers 异常,不断重启

日志信息如下

2023-02-21 01:26:47.085 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0221 01:26:47.086980       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-02-21 01:26:47.087 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2023-02-21 01:26:47.106 [INFO][1] main.go 153: Getting initial config snapshot from datastore
2023-02-21 01:26:47.120 [INFO][1] main.go 156: Got initial config snapshot
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 89: Start called
2023-02-21 01:26:47.120 [INFO][1] main.go 173: Starting status report routine
2023-02-21 01:26:47.120 [INFO][1] main.go 182: Starting Prometheus metrics server on port 9094
2023-02-21 01:26:47.120 [INFO][1] main.go 418: Starting controller ControllerType="Node"
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] node_syncer.go 65: Node controller syncer status updated: wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2023-02-21 01:26:47.120 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.120 [INFO][1] node_controller.go 143: Starting Node controller
2023-02-21 01:26:47.121 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.121 [INFO][1] resources.go 349: Main client watcher loop
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.124 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2023-02-21 01:26:47.125 [INFO][1] node_syncer.go 65: Node controller syncer status updated: resync
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.125 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.125 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.129 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.129 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2023-02-21 01:26:47.129 [INFO][1] node_syncer.go 65: Node controller syncer status updated: in-sync
2023-02-21 01:26:47.137 [INFO][1] hostendpoints.go 90: successfully synced all hostendpoints
2023-02-21 01:26:47.221 [INFO][1] node_controller.go 159: Node controller is now running
2023-02-21 01:26:47.226 [INFO][1] ipam.go 69: Synchronizing IPAM data
2023-02-21 01:26:47.236 [INFO][1] ipam.go 78: Node and IPAM data is in sync

定位问题在这里

Failed to write status error=open /status/status.json: permission denied

进入容器检查目录

尝试进入容器,但是该容器居然没 cat , ls 等常规命令,无法查看容器问题

检查配置

查看pod的配置,对比其它集群,没任何问题,一样的

[grg@i-A8259010 ~]$ kubectl describe pod calico-kube-controllers-9f49b98f6-njs2f -n kube-system
Name:                 calico-kube-controllers-9f49b98f6-njs2f
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 10.254.39.2/10.254.39.2
Start Time:           Thu, 16 Feb 2023 11:14:35 +0800
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=9f49b98f6
Annotations:          cni.projectcalico.org/podIP: 10.244.29.73/32
                      cni.projectcalico.org/podIPs: 10.244.29.73/32
Status:               Running
IP:                   10.244.29.73
IPs:
  IP:           10.244.29.73
Controlled By:  ReplicaSet/calico-kube-controllers-9f49b98f6
Containers:
  calico-kube-controllers:
    Container ID:   docker://21594e3517a3fc8ffc5224496cec373117138acf5417d9a335a1c5e80e0c3802
    Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1
    Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425a
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 21 Feb 2023 09:34:06 +0800
      Finished:     Tue, 21 Feb 2023 09:35:15 +0800
    Ready:          False
    Restart Count:  1940
    Liveness:       exec [/usr/bin/check-status -l] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-55jbn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-55jbn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                        From     Message
  ----     ------     ----                       ----     -------
  Warning  Unhealthy  31m (x15164 over 4d22h)    kubelet  Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
  Warning  BackOff    6m23s (x23547 over 4d22h)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  79s (x11571 over 4d22h)    kubelet  Liveness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input

对比镜像

检查镜像版本,与其它集群一致,没问题

Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1    
Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425a

检查其余集群配置差异

检查与其它集群的配置信息,该机器的 docker 是原来已经安装的,版本是 19,其它机器是新安装的版本 20 。

处理方案

在无法重装 docker 的情况下

重启 pod,无效

百度,无相关信息

调整 calico-kube-controllers 配置

配置文件在 /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

我们针对无法写入目录 /status ,添加卷映射

应用配置

mkdir /var/run/calico/status
chmod 777/var/run/calico/status
kubectl apply -f  /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

到此系统恢复

有关calico-kube-controllers 启动失败处理的更多相关文章

  1. ruby - 如何指定 Rack 处理程序 - 2

    Rackup通过Rack的默认处理程序成功运行任何Rack应用程序。例如:classRackAppdefcall(environment)['200',{'Content-Type'=>'text/html'},["Helloworld"]]endendrunRackApp.new但是当最后一行更改为使用Rack的内置CGI处理程序时,rackup给出“NoMethodErrorat/undefinedmethod`call'fornil:NilClass”:Rack::Handler::CGI.runRackApp.newRack的其他内置处理程序也提出了同样的反对意见。例如Rack

  2. ruby-on-rails - 渲染另一个 Controller 的 View - 2

    我想要做的是有2个不同的Controller,client和test_client。客户端Controller已经构建,我想创建一个test_clientController,我可以使用它来玩弄客户端的UI并根据需要进行调整。我主要是想绕过我在客户端中内置的验证及其对加载数据的管理Controller的依赖。所以我希望test_clientController加载示例数据集,然后呈现客户端Controller的索引View,以便我可以调整客户端UI。就是这样。我在test_clients索引方法中试过这个:classTestClientdefindexrender:template=>

  3. ruby-on-rails - Rails 应用程序中的 Rails : How are you using application_controller. rb 是新手吗? - 2

    刚入门rails,开始慢慢理解。有人可以解释或给我一些关于在application_controller中编码的好处或时间和原因的想法吗?有哪些用例。您如何为Rails应用程序使用应用程序Controller?我不想在那里放太多代码,因为据我了解,每个请求都会调用此Controller。这是真的? 最佳答案 ApplicationController实际上是您应用程序中的每个其他Controller都将从中继承的类(尽管这不是强制性的)。我同意不要用太多代码弄乱它并保持干净整洁的态度,尽管在某些情况下ApplicationContr

  4. ruby-on-rails - rails : How to make a form post to another controller action - 2

    我知道您通常应该在Rails中使用新建/创建和编辑/更新之间的链接,但我有一个情况需要其他东西。无论如何我可以实现同样的连接吗?我有一个模型表单,我希望它发布数据(类似于新View如何发布到创建操作)。这是我的表格prohibitedthisjobfrombeingsaved: 最佳答案 使用:url选项。=form_for@job,:url=>company_path,:html=>{:method=>:post/:put} 关于ruby-on-rails-rails:Howtomak

  5. ruby-on-rails - 启动 Rails 服务器时 ImageMagick 的警告 - 2

    最近,当我启动我的Rails服务器时,我收到了一长串警告。虽然它不影响我的应用程序,但我想知道如何解决这些警告。我的估计是imagemagick以某种方式被调用了两次?当我在警告前后检查我的git日志时。我想知道如何解决这个问题。-bcrypt-ruby(3.1.2)-better_errors(1.0.1)+bcrypt(3.1.7)+bcrypt-ruby(3.1.5)-bcrypt(>=3.1.3)+better_errors(1.1.0)bcrypt和imagemagick有关系吗?/Users/rbchris/.rbenv/versions/2.0.0-p247/lib/ru

  6. ruby - 即使失败也继续进行多主机测试 - 2

    我已经构建了一些serverspec代码来在多个主机上运行一组测试。问题是当任何测试失败时,测试会在当前主机停止。即使测试失败,我也希望它继续在所有主机上运行。Rakefile:namespace:specdotask:all=>hosts.map{|h|'spec:'+h.split('.')[0]}hosts.eachdo|host|begindesc"Runserverspecto#{host}"RSpec::Core::RakeTask.new(host)do|t|ENV['TARGET_HOST']=hostt.pattern="spec/cfengine3/*_spec.r

  7. ruby-on-rails - 如何在 Rails Controller Action 上触发 Facebook 像素 - 2

    我有一个ruby​​onrails应用程序。我按照facebook的说明添加了一个像素。但是,要跟踪转化,Facebook要求您将页面置于达到预期结果时出现的转化中。即,如果我想显示客户已注册,我会将您注册后转到的页面作为成功对象进行跟踪。我的问题是,当客户注册时,在我的应用程序中没有登陆页面。该应用程序将用户带回主页。它在主页上显示了一条消息,所以我想看看是否有一种方法可以跟踪来自Controller操作而不是实际页面的转化。我需要计数的Action没有页面,它们是ControllerAction。是否有任何人都知道的关于如何执行此操作的gem、文档或最佳实践?这是进入布局文件的像素

  8. ruby - Sinatra set cache_control to static files in public folder编译错误 - 2

    我不知道为什么,但是当我设置这个设置时它无法编译设置:static_cache_control,[:public,:max_age=>300]这是我得到的syntaxerror,unexpectedtASSOC,expecting']'(SyntaxError)set:static_cache_control,[:public,:max_age=>300]^我只想将“过期”header设置为css、javaascript和图像文件。谢谢。 最佳答案 我猜您使用的是Ruby1.8.7。Sinatra文档中显示的语法似乎是在Ruby1.

  9. UE4 源码阅读:从引擎启动到Receive Begin Play - 2

    一、引擎主循环UE版本:4.27一、引擎主循环的位置:Launch.cpp:GuardedMain函数二、、GuardedMain函数执行逻辑:1、EnginePreInit:加载大多数模块int32ErrorLevel=EnginePreInit(CmdLine);PreInit模块加载顺序:模块加载过程:(1)注册模块中定义的UObject,同时为每个类构造一个类默认对象(CDO,记录类的默认状态,作为模板用于子类实例创建)(2)调用模块的StartUpModule方法2、FEngineLoop::Init()1、检查Engine的配置文件找出使用了哪一个GameEngine类(UGame

  10. ruby-on-rails - 创建 ruby​​ 数据库时惰性符号绑定(bind)失败 - 2

    我正在尝试在Rails上安装ruby​​,到目前为止一切都已安装,但是当我尝试使用rakedb:create创建数据库时,我收到一个奇怪的错误:dyld:lazysymbolbindingfailed:Symbolnotfound:_mysql_get_client_infoReferencedfrom:/Library/Ruby/Gems/1.8/gems/mysql2-0.3.11/lib/mysql2/mysql2.bundleExpectedin:flatnamespacedyld:Symbolnotfound:_mysql_get_client_infoReferencedf

随机推荐