Friday, June 25, 2021

Troubleshooting tips for AKS


Here i am providing some common issues which we are facing in AKS clusters and the method of troubleshooting that .


1. In some cases we may need to login to the pods using "SSH" to collect the logs, troubleshooting purpose etc. Let's check how to configure that 

First create a SSH connection to the Linux node we have 1 pod is running in the cluster 









To connect to the pod use the kubectl debug command to run the container image and connect 





2. If we are getting the error as "quota exceeded error" during creation or upgrade we have to request for more vcpu's by creating a support request .

Code=OperationNotAllowed

Message=Operation results in exceeding quota limits of Core.

Maximum allowed: 4, Current in use: 4, Additional requested: 2.

select subscriptions 



 






select the required subscriptions which we need to increase the quota 









select usage + quotas 









select request increase and corresponding metrics which we need to increase 










3. Troubleshooting the cluster issues with AKS diagnostic tool 

There is a good tool provided by Azure along with AKS to identify common cluster and network related issues in AKS.  It is called as " Diagnose and solve problems" in the left side of the AKS configuration , there is two type of diagnose available for this .. cluster insights and networking 










cluster diagnose is given below , we can check the each link to get more details about the diagnose process 








We have other testing which is related to network perspective 







 


Once we will click on each tab we will get more details on each network perspective 



 





This is one of the best method to identify cluster & network related issues for AKS 

4. Getting error while connecting to the Kube API server as " Error dialing backend TCP ..."

In this case we have to make sure as "aks-link" or "tunnel front"  is working fine in the "kubectl get pods --namespace kube-system" command . If it is not working we may need to delete the pod and recreate it 








5. When we are trying to upgrade or scale the cluster , getting the error as below 

"Changing property (image reference) is not allowed

This error is due to modifying or deleting the  tags in the agent nodes inside the AKS cluster . This is an unexpected error due to the changes in the AKS cluster properties 

6. The next error used to get while scaling the cluster is as "cluster is in failed state and upgrading or scaling will not work until it is fixed" 

This issue is due to lack of compute resources , so first we have to bring back the cluster with in the stable state quota, then create a service request to upgrade the quota .

7. Too many requests - 429 error's" 

When a kubernetes cluster on Azure (AKS or no) does a frequent scale up/down or uses the cluster autoscaler (CA), those operations can result in a large number of HTTP calls that in turn exceed the assigned subscription quota leading to failure.

Service returned an error. Status=429 Code=\"OperationNotAllowed\" Message=\"The server rejected the request because too many requests have been received for this subscription.\" Details=[{\"code\":\"TooManyRequests\",\"message\":\"{\\\"operationGroup\\\":\\\"HighCostGetVMScaleSet30Min\\\",\\\"startTime\\\":\\\"2021-05-20T07:13:55.2177346+00:00\\\",\\\"endTime\\\":\\\"2021-05-20T07:28:55.2177346+00:00\\\",\\\"allowedRequestCount\\\":1800,\\\"measuredRequestCount\\\":2208}\",\"target\":\"HighCostGetVMScaleSet30Min\"}] InnerError={\"internalErrorCode\":\"TooManyRequestsReceived\"}"}

Make sure you are running at least AKS 1.18.x , if not we may need to upgrade the latest version 

We can integrate Prometheus with azure monitor to monitor the cluster/container issues very closely and i will explain the same in another session 

Thank you for the reading ..  












Wednesday, June 16, 2021

Kubernetes troubleshooting diagram

Here i am providing a detailed flow chart about the kubernetes troubleshooting scenarios
                                                                                                                                   
                    









































1. As per the above diagram first we have to check whether the pods are in pending state or not  using the command 

PS /home/unixchips> kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
webfrontend-5dc5464686-c7ncf   1/1     Running   0          18h
webfrontend-5dc5464686-ccrj2   1/1     Running   0          4d5h
webfrontend-5dc5464686-gjmwp   1/1     Running   0          18h

2. if the pods are in pending mode check the logs as below , so any error is related to the cluster status will be reflected on below output 

*********************************************************************
PS /home/unixchips> kubectl describe pod webfrontend-5dc5464686-c7ncf
Name:         webfrontend-5dc5464686-c7ncf
Namespace:    default
Priority:     0
Node:         aks-agentpool-54305753-vmss000000/10.240.0.4
Start Time:   Mon, 14 Jun 2021 19:29:16 +0000
Labels:       app.kubernetes.io/instance=webfrontend
              app.kubernetes.io/name=webfrontend
              pod-template-hash=5dc5464686
Annotations:  <none>
Status:       Running
IP:           10.240.0.107
IPs:
  IP:           10.240.0.107
Controlled By:  ReplicaSet/webfrontend-5dc5464686
Containers:
  webfrontend:
    Container ID:   containerd://9bba1ea0024a1e44d8d1d760985541365c48bc656e31214e06d7d68ae8905819
    Image:          unixchipsacr1.azurecr.io/webfrontend:v1
    Image ID:       unixchipsacr1.azurecr.io/webfrontend@sha256:156eeb3ef36728fbfb914591a852ab890c6eff297a463ef26d2781c373957f3e
Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 14 Jun 2021 19:29:17 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from webfrontend-token-92dz6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  webfrontend-token-92dz6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webfrontend-token-92dz6
Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

3. If the cluster is full and you want to increase the cluster size you can scale up the nodes manually using kubectl scale command as below 









if you want to add extra worker nodes to the existing cluster you can use the steps as below using kubeadm command 

sudo kubeadm join \ 192.168.122.195:6443 \ --token nx1jjq.u42y27ip3bhmj8vj \ --discovery-token-ca-cert-hash sha256:c6de85f6c862c0d58cc3d10fd199064ff25c4021b6e88475822d6163a25b4a6c

Detailed steps are given below 

https://computingforgeeks.com/join-new-kubernetes-worker-node-to-existing-cluster/

4. If the pods are not running or not in the pending state we may need to check the application related logs . if the ports are not listening correctly or any application related issues should be reflected here .. 

PS /home/unixchips> kubectl logs webfrontend-5dc5464686-c7ncf
Listening on port 80

PS /home/unixchips> kubectl logs webfrontend-5dc5464686-ccrj2 --previous
Error from server (BadRequest): previous terminated container "webfrontend" in pod "webfrontend-5dc5464686-ccrj2" not found

5. If the pods are not ready and the readiness probe failing we may need to increase the InitialDelaySeconds in the deployments.yml inside the /template folder for helm installation 


PS /home/unixchips> kubectl describe pod webfrontend-5dc5464686-c7ncf | grep -i readiness
    Readiness:      http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3

ports:
            - name: http
              containerPort: 80
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: http
          readinessProbe:
            httpGet:
              path: /
              port: http
             InitialDelayseconds: 20


6. If "kubectl describe  webfrontend-5dc5464686-c7ncf" will give error related to quota limits we have to increase the quota limits by updating the limit range config for the pod's 

**********************************************
apiVersion: v1
kind: LimitRange
metadata:
  name: webfrontend-limit
spec:
  limits:
  - max:
      memory: 1Gi
    min:
      memory: 500Mi
    type: Container

******************************************************
apply the same as below 
kubectl apply -f limitrange.yml 
more details are given in below link 
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/

7. Errors related to  image's .. most of the cases when the image pulled is not correct we used to get the error as "ImagePullBackoff" while we giving the command "kubectl get pods"

This is due to issues with image names mismatch or image pull policy issues or even issue with credentials of a private registry 

for example if we check the below pod status it is showing as errorimage pull & imahepullbackoff 








if we check the logs it is showing as
 
PS /home/unixchips/dev-spaces/samples/nodejs/getting-started/webfrontend/webfrontend> kubectl logs webfrontend-8485955f44-8fjsl Error from server (BadRequest): container "webfrontend" in pod "webfrontend-8485955f44-8fjsl" is waiting to start: trying and failing to pullimage

if we check more details using kubectl describe pod webfrontend-8485955f44-8fjsl

Warning  Failed   69m (x4 over 71m)    kubelet  Failed to pull image "unixchipsacr1.azurecr.io/webfronten:v1": [rpc error: code = NotFound desc = failed to pull and unpack image "unixchipsacr1.azurecr.io/webfronten:v1": failed to resolve reference "unixchipsacr1.azurecr.io/webfronten:v1": unixchipsacr1.azurecr.io/webfronten:v1: not found, rpc error: code = Unknown desc = failed to pull and unpack image "unixchipsacr1.azurecr.io/webfronten:v1": failed to resolve reference "unixchipsacr1.azurecr.io/webfronten:v1": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized]
  Warning  Failed   69m (x4 over 71m)    kubelet  Error: ErrImagePull
  Normal   BackOff  16m (x242 over 71m)  kubelet  Back-off pulling image "unixchipsacr1.azurecr.io/webfronten:v1"
  Warning  Failed   76s (x308 over 71m)  kubelet  Error: ImagePullBackOff


we can see that from the image name it is not correct and due to that pods are not coming up ..

so image file name is not correct as there is one "d" is missing in the webfrontend .. so thats couse the issue and we have to change the deployment.yml file 

many more issues need to be  highlighted in this chart and i will project the same in later blogs 

thank you for the reading .. stay tuned ..











Friday, June 11, 2021

Deploy Azure Kubernetes Service ( AKS) with helm

 What is Helm ..

Helm is an opensource tool which help us to manage complex kubernetes applications . Also it will act as a package manager for kubernetes which help us to install,upgrade remove packages with in the kubernetes cluster. We can say helm is same as yum for Redhat linux and apt for ubundu . 

Why we have to use Helm

Writing and maintaining Kubernetes YAML manifests for all the required Kubernetes objects can be a time consuming and tedious task. For the simplest of deployments, you would need at least 3 YAML manifests with duplicated and hardcoded values. Helm simplifies this process and creates a single package that can be advertised to your cluster.

   Helm 2 is a client server model where the server called " tiller" is using to listen the requests from the helm client and pass it to the kubernetes cluster 













But in the helm 3 version the tiller component is removed and the helm client will directly communicative with the kubernetes cluster .
















Helm contains mainly three components as below 

Chart - This is  a bundle of information necessary to create the instance of a kubernetes application
Config - The config contains configuration information which can be merged with chart to create a release object
Release - This is a running instance of a chart with specific config

A helm chart structure will be as below 

YOUR-CHART-NAME/
helmignore -> includes all the files which need to be ignored on chart 
 | 
 Chart.yaml -> all the information about your chart you are packaging
 | 
 values.yaml-> all the values need to be injected to the template
 | 
 charts/ -> This is the place where we will keep other chart details which is using in the main chart
 |
 templates/ This folder is where you put the actual manifest you are deploying with the chart. 
 For example you might be deploying an nginx deployment that needs a service, configmap and  secrets.  You will have your deployment.yaml, service.yaml, config.yaml and secrets.yaml all in the template dir. They will all get their values from values.yaml from above.

Now let's configure the Azure Kubernetes cluster with helm . Below are prerequisites needed for configuration 

Azure Cli installed in the local machine and Helm also installed locally . As i am configuring using the azure cloud shell helm is installed as default .

Type helm version 
 






Installing ACR ( Azure Container Registery) 

We need to create ACR for storing the images to run the applications using helm in AKS . So let's create  ACR named as unixchipsacr using below command 

az acr create --resource-group unixchipsrg1 --name unixchipsacr1 --sku basic













Creating AKS 



Next step is we have to create the AKS and attach the ACR . I am proceeding this task using portal as i am getting some error related to quota while using the cli method . You can refer the steps on my previous blog about creating the AKS using azure portal 

http://unixchips.blogspot.com/2019/07/azure-kubernetes-service-aks.html




















To connect to this cluster using kubectl we have to get the credentials of the cluster unixchipsaks and same will be stored in /home/unixchips/.kube/config














We have to clone the application from the GIT and navigate to the application directory 










Next step is to create a docker file as below 










We have to build and push the sample application to ACR 









Now we have to create the helm chart as below  we can see webfrontend directory is created under
the GIT repository downloaded locally and we can see helm structure under that webfrontend directory

















We have to update the values.yml with the login server details of ACR which we created and change the service type as load balancer as below . We will get the login server details from the ACR portal










replicaCount: 1

image:
  repository: unixchipsacr1.azurecr.io/webfrontend
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: ""

------------------output is omitted ---------------

service:
  type: LoadBalancer
  port: 80

*******************************************************

Install the helm chart as below 

helm install webfrontend webfrontend/










It takes a few minutes for the service to return a public IP address. Monitor progress using the kubectl get service command with the --watch argument.













If we try to access the application using external IP we can see the webpage as below 











So we have successfully configured the application using helm in AKS .

Thank you for reading and please post your comments 

Wednesday, June 9, 2021

Azure Kubernetes with Azure arc

This blog is about building azure AKS and configuring the same with Azure Arc. As we know AKS is the kubernetes platform build in with azure and Azure Arc is the architecture platform which will help us to manage hybrid and on premises resources as a better way.  

The basic architecture diagram and functionality details of the azure arc is given below 


From the diagram we can understand that Azure arc is a common platform which is used to manage hybrid/buildin server's, databases, containers . Anthos by Google and Azure Arc are examples of the control planes running the public cloud orchestrating and managing resources deployed in diverse environments. This investment is becoming key to delivering the promise of hybrid cloud and multicloud technologies. For example, a Linux VM deployed in Google Compute Engine (GCE) is managed by Azure. The logs and metrics from the VM are ingested into Azure Monitoring and Log Analytics. Similarly, BigQuery Omni, the multicloud flavor of BigQuery, can be deployed in AWS. Anthos can take control of Azure Kubernetes Clusters (AKS) and deploy workloads to it. All this is possible with the extension of the control plane and observability offerings.


Now let;s create an AKS cluster using azure CLI and configure it with Azure arc .

1. Let's create a resource group now using azure cli as below . we have created the resource group named as unixchipsrg2







2. Next step is to verify Microsoft.operationsManagement and Microsoft.operationalInsights are registered on your subscription as below . We can see that it is not registered from the below output 










3. Let's register the subscription with Microsoft.operationsManagement and Microsoft.operationalInsights using below command 










4. So we have registered the subscription with respective name spaces and now we have to create the AKS as below . The AKS name is unixchipsaks in the eastus region . This command will create a detailed output in jason format as below and i have omitted the output due to space constrain







output

***********************************************************************

PS /home/unixchips> az aks create --resource-group unixchipsrg2 --name unixchipsaks --node-count 2 --enable-addons monitoring --generate-ssh-keys

AAD role propagation done[############################################]  100.0000%{

  "aadProfile": null,

  "addonProfiles": {

    "omsagent": {

      "config": {

        "logAnalyticsWorkspaceResourceID": "/subscriptions/994b8397-cf9d-4f89-9aca-55b9313f9996/resourcegroups/defaultresourcegroup-eus/providers/microsoft.operationalinsights/workspaces/defaultworkspace-994b8397-cf9d-4f89-9aca-55b9313f9996-eus"

      },

      "enabled": true,

      "identity": {

        "clientId": "aec78f06-6bcb-4ed3-ba3a-e507fc79e48d",

        "objectId": "cf8fa808-c6e5-4258-8519-32a0a05f72cf",

        "resourceId": "/subscriptions/994b8397-cf9d-4f89-9aca-55b9313f9996/resourcegroups/MC_unixchipsrg2_unixchipsaks_eastus/providers/Microsoft.ManagedIdentity/userAssignedIdentities/omsagent-unixchipsaks"

      }

    }

  },

  "agentPoolProfiles": [

    ------------------output is omitted -----------------------------------------------------------------------

5. We have created a 2 node cluster called unixchipsaks and we have to connect to the cluster using kubectl command. If we are using azure cli kubectl is installed as default and we may need to install it separately if we are connecting from outside. Let's create the credentials inside the azure cli to connect to the kubernetes cluster 


 




6. We can check the kubernetes cluster status as below and both the nodes are ready for deployment 






7. Let's deploy an application in these nodes . We are using the application called azure-vote application and it contains a python application instance and a redis instance. So we have 2 deployments and service configurations . Let's merge both these deployments in a single yamal file as below 

***************************************************************************

apiVersion: apps/v1

kind: Deployment

metadata:

  name: azure-vote-back

spec:

  replicas: 1

  selector:

    matchLabels:

      app: azure-vote-back

  template:

    metadata:

      labels:

        app: azure-vote-back

    spec:

      nodeSelector:

        "beta.kubernetes.io/os": linux

      containers:

      - name: azure-vote-back

        image: mcr.microsoft.com/oss/bitnami/redis:6.0.8

        env:

        - name: ALLOW_EMPTY_PASSWORD

          value: "yes"

        resources:

          requests:

            cpu: 100m

            memory: 128Mi

          limits:

            cpu: 250m

            memory: 256Mi

        ports:

        - containerPort: 6379

          name: redis

---

apiVersion: v1

kind: Service

metadata:

  name: azure-vote-back

spec:

  ports:

  - port: 6379

  selector:

    app: azure-vote-back

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: azure-vote-front

spec:

  replicas: 1

  selector:

    matchLabels:

      app: azure-vote-front

  template:

    metadata:

      labels:

        app: azure-vote-front

    spec:

      nodeSelector:

        "beta.kubernetes.io/os": linux

      containers:

      - name: azure-vote-front

        image: mcr.microsoft.com/azuredocs/azure-vote-front:v1

        resources:

          requests:

            cpu: 100m

            memory: 128Mi

          limits:

            cpu: 250m

            memory: 256Mi

        ports:

        - containerPort: 80

        env:

        - name: REDIS

          value: "azure-vote-back"

---

apiVersion: v1

kind: Service

metadata:

  name: azure-vote-front

spec:

  type: LoadBalancer

  ports:

  - port: 80

  selector:

    app: azure-vote-front

**************************************************************************

8. save the above configuration in a yml file named as zure-vote.yml and pass it to the configuration as below 






9. We can monitor the progress as below and from the output the app is deployed successfully


 




10. we can connect to the app using the public IP 52.188.27.38 using port 80 as default 









Also the pods are running successfully


 









11. Next step is to configure the Azure arc with our cluster unixchipsaks. For that we have install the connectedk8s extension with the azure cli 







12. Let's register the existing AKS cluster with the azure arc as below. Please keep it in mind that the port number 443 and 9418 should be open to outside to connect to the azure arc 


























14. If we check the azure portal-kubernetes-azure arc we can see that our cluster is succefully registered in the azure arc 














We can use the azure arc to manage /monitor and troublshoot kubernetes cluster and integrate it with GitOPS for centralized manifest management . I will explain those details in another post .

Thank you for reading ...