Eu criei um cluster kubernetes 1 mestre e 2 nós de trabalho há 2 meses, hoje um nó de trabalho começou a falhar e não sei por quê. Acho que nada de anormal aconteceu com meu trabalhador.
Usei flanela e kubeadm para criar o cluster e estava funcionando muito bem.
Se eu descrever o nó:
tommy@bxybackend:~$ kubectl describe node bxybackend-node01
Name: bxybackend-node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=bxybackend-node01
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"06:ca:97:82:50:10"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.168.10.4
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 03 Nov 2019 09:41:48 -0600
Taints: node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 11 Dec 2019 11:17:05 -0600 Wed, 11 Dec 2019 10:37:19 -0600 KubeletNotReady Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Addresses:
InternalIP: 10.168.10.4
Hostname: bxybackend-node01
Capacity:
cpu: 12
ephemeral-storage: 102684600Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 14359964Ki
pods: 110
Allocatable:
cpu: 12
ephemeral-storage: 94634127204
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 14257564Ki
pods: 110
System Info:
Machine ID: 3afa24bb05994ceaaf00e7f22b9322ab
System UUID: 80951742-F69F-6487-F2F7-BE2FB7FEFBF8
Boot ID: 115fbacc-143d-4007-90e4-7fdcb5462680
Kernel Version: 4.15.0-72-generic
OS Image: Ubuntu 18.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.7
Kubelet Version: v1.17.0
Kube-Proxy Version: v1.17.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-flannel-ds-amd64-sslbg 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 8m31s
kube-system kube-proxy-c5gxc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8m52s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (0%) 100m (0%)
memory 50Mi (0%) 50Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SystemOOM 52m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12170
Normal NodeHasNoDiskPressure 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientPID
Normal NodeNotReady 52m (x6 over 23d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeNotReady
Normal NodeHasSufficientMemory 52m (x12 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientMemory
Warning ContainerGCFailed 52m (x3 over 6d23h) kubelet, bxybackend-node01 rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal NodeReady 52m (x13 over 38d) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeReady
Normal NodeAllocatableEnforced 43m kubelet, bxybackend-node01 Updated Node Allocatable limit across pods
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 9699
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12639
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 16194
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 19618
Warning SystemOOM 43m kubelet, bxybackend-node01 System OOM encountered, victim process: dotnet, pid: 12170
Normal Starting 43m kubelet, bxybackend-node01 Starting kubelet.
Normal NodeHasSufficientMemory 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasSufficientPID
Normal NodeNotReady 43m kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeNotReady
Normal NodeHasNoDiskPressure 43m (x2 over 43m) kubelet, bxybackend-node01 Node bxybackend-node01 status is now: NodeHasNoDiskPressure
Normal Starting 42m kubelet, bxybackend-node01 Starting kubelet.
Se eu assistir syslog no trabalhador:
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552152 19331 kuberuntime_manager.go:981] updating runtime config through cri with podcidr 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552162 19331 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552352 19331 docker_service.go:355] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.1.0/24,},}
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552600 19331 kubelet_network.go:77] Setting Pod CIDR: -> 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.555142 19331 kubelet_node_status.go:70] Attempting to register node bxybackend-node01
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.652843 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753179 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-xtables-lock") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753249 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-lib-modules") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753285 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy-token-ztrh4" (UniqueName: "kubernetes.io/secret/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy-token-ztrh4") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753316 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-run") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753342 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "cni" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-cni") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753461 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-cfg" (UniqueName: "kubernetes.io/configmap/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-cfg") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753516 19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-token-ts2qt" (UniqueName: "kubernetes.io/secret/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-token-ts2qt") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753531 19331 reconciler.go:156] Reconciler: start to sync state
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052813 19331 kubelet_node_status.go:112] Node bxybackend-node01 was previously registered
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052921 19331 kubelet_node_status.go:73] Successfully registered node bxybackend-node01
Dec 11 11:20:13 bxybackend-node01 kubelet[19331]: E1211 11:20:13.051159 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:16 bxybackend-node01 kubelet[19331]: E1211 11:20:16.051264 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:18 bxybackend-node01 kubelet[19331]: E1211 11:20:18.451166 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:21 bxybackend-node01 kubelet[19331]: E1211 11:20:21.251289 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:25 bxybackend-node01 kubelet[19331]: E1211 11:20:25.019276 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: E1211 11:20:46.772862 19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: F1211 11:20:46.772895 19331 csi_plugin.go:281] Failed to initialize CSINodeInfo after retrying
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Failed with result 'exit-code'.
kubernetes
kubeadm
Tommy
fonte
fonte
Respostas:
Durante a instalação do kubeadm, você deve executar o seguinte comando para conter os pacotes kubelet, kubeadm e kubectl e impedir que eles sejam atualizados por engano.
Reproduzi seu cenário e o que aconteceu com seu cluster é que, três dias atrás, uma nova versão do Kubernetes foi lançada (v 1.17.0) e seu kubelet foi atualizado acidentalmente.
No novo Kubernetes, algumas alterações foram feitas no CSI e é por isso que você tem alguns problemas neste nó.
Sugiro que você drene esse nó, configure um novo com o Kubernetes 1.16.2 e junte o novo ao seu cluster.
Para drenar esse nó, você precisa executar:
Opcionalmente, você pode fazer o downgrade do seu kubelet para a versão anterior usando o seguinte comando:
Não se esqueça de marcar seu kubelet para impedir que ele seja atualizado novamente:
Você pode usar o comando
apt-mark showhold
para listar todos os pacotes retidos e verificar se o kubelet, kubeadm e kubectl estão em espera.Para atualizar de 1.16.x para 1.17.x, siga este guia da documentação do Kubernetes. Eu o validei e funciona como pretendido.
fonte
Também enfrentei esse mesmo problema hoje no CentOS Linux versão 7.7.1908. Minha versão do kubernetes era a v1.16.3 e executei o comando "yum update" e a versão do kubernetes foi atualizada para a v1.17.0. Depois disso, fiz o "yum history undo" no "e depois voltei para a versão antiga do kubernetes e ele começou a funcionar novamente. Depois disso, segui o método de atualização oficial e agora o kubernetes v1.17.0 está funcionando bem sem nenhum problema.
fonte
official upgrade method
!