1 # Setting up your first cluster with Kubespray
3 This tutorial walks you through the detailed steps for setting up Kubernetes
4 with [Kubespray](https://kubespray.io/).
6 The guide is inspired on the tutorial [Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way), with the
7 difference that here we want to showcase how to spin up a Kubernetes cluster
8 in a more managed fashion with Kubespray.
12 The target audience for this tutorial is someone looking for a
13 hands-on guide to get started with Kubespray.
17 * [kubespray](https://github.com/kubernetes-sigs/kubespray) v2.17.x
18 * [kubernetes](https://github.com/kubernetes/kubernetes) v1.17.9
22 * Google Cloud Platform: This tutorial leverages the [Google Cloud Platform](https://cloud.google.com/) to streamline provisioning of the compute infrastructure required to bootstrap a Kubernetes cluster from the ground up. [Sign up](https://cloud.google.com/free/) for $300 in free credits.
23 * Google Cloud Platform SDK: Follow the Google Cloud SDK [documentation](https://cloud.google.com/sdk/) to install and configure the `gcloud` command
24 line utility. Make sure to set a default compute region and compute zone.
25 * The [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) command line utility is used to interact with the Kubernetes
27 * Linux or Mac environment with Python 3
29 ## Provisioning Compute Resources
31 Kubernetes requires a set of machines to host the Kubernetes control plane and the worker nodes where containers are ultimately run. In this lab you will provision the compute resources required for running a secure and highly available Kubernetes cluster across a single [compute zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones).
35 The Kubernetes [networking model](https://kubernetes.io/docs/concepts/cluster-administration/networking/#kubernetes-model) assumes a flat network in which containers and nodes can communicate with each other. In cases where this is not desired [network policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) can limit how groups of containers are allowed to communicate with each other and external network endpoints.
37 > Setting up network policies is out of scope for this tutorial.
39 #### Virtual Private Cloud Network
41 In this section a dedicated [Virtual Private Cloud](https://cloud.google.com/compute/docs/networks-and-firewalls#networks) (VPC) network will be setup to host the Kubernetes cluster.
43 Create the `kubernetes-the-kubespray-way` custom VPC network:
46 gcloud compute networks create kubernetes-the-kubespray-way --subnet-mode custom
49 A [subnet](https://cloud.google.com/compute/docs/vpc/#vpc_networks_and_subnets) must be provisioned with an IP address range large enough to assign a private IP address to each node in the Kubernetes cluster.
51 Create the `kubernetes` subnet in the `kubernetes-the-kubespray-way` VPC network:
54 gcloud compute networks subnets create kubernetes \
55 --network kubernetes-the-kubespray-way \
59 > The `10.240.0.0/24` IP address range can host up to 254 compute instances.
63 Create a firewall rule that allows internal communication across all protocols.
64 It is important to note that the vxlan protocol has to be allowed in order for
65 the calico (see later) networking plugin to work.
68 gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-internal \
69 --allow tcp,udp,icmp,vxlan \
70 --network kubernetes-the-kubespray-way \
71 --source-ranges 10.240.0.0/24
74 Create a firewall rule that allows external SSH, ICMP, and HTTPS:
77 gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-external \
78 --allow tcp:80,tcp:6443,tcp:443,tcp:22,icmp \
79 --network kubernetes-the-kubespray-way \
80 --source-ranges 0.0.0.0/0
83 It is not feasible to restrict the firewall to a specific IP address from
84 where you are accessing the cluster as the nodes also communicate over the public internet and would otherwise run into
85 this firewall. Technically you could limit the firewall to the (fixed) IP
86 addresses of the cluster nodes and the remote IP addresses for accessing the
91 The compute instances in this lab will be provisioned using [Ubuntu Server](https://www.ubuntu.com/server) 18.04.
92 Each compute instance will be provisioned with a fixed private IP address and
93 a public IP address (that can be fixed - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)).
94 Using fixed public IP addresses has the advantage that our cluster node
95 configuration does not need to be updated with new public IP addresses every
96 time the machines are shut down and later on restarted.
98 Create three compute instances which will host the Kubernetes control plane:
102 gcloud compute instances create controller-${i} \
104 --boot-disk-size 200GB \
106 --image-family ubuntu-1804-lts \
107 --image-project ubuntu-os-cloud \
108 --machine-type e2-standard-2 \
109 --private-network-ip 10.240.0.1${i} \
110 --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
111 --subnet kubernetes \
112 --tags kubernetes-the-kubespray-way,controller
116 > Do not forget to fix the IP addresses if you plan on re-using the cluster
117 after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
119 Create three compute instances which will host the Kubernetes worker nodes:
123 gcloud compute instances create worker-${i} \
125 --boot-disk-size 200GB \
127 --image-family ubuntu-1804-lts \
128 --image-project ubuntu-os-cloud \
129 --machine-type e2-standard-2 \
130 --private-network-ip 10.240.0.2${i} \
131 --scopes compute-rw,storage-ro,service-management,service-control,logging-write,monitoring \
132 --subnet kubernetes \
133 --tags kubernetes-the-kubespray-way,worker
137 > Do not forget to fix the IP addresses if you plan on re-using the cluster
138 after temporarily shutting down the VMs - see [guide](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address)
140 List the compute instances in your default compute zone:
143 gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way"
149 NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
150 controller-0 us-west1-c e2-standard-2 10.240.0.10 XX.XX.XX.XXX RUNNING
151 controller-1 us-west1-c e2-standard-2 10.240.0.11 XX.XXX.XXX.XX RUNNING
152 controller-2 us-west1-c e2-standard-2 10.240.0.12 XX.XXX.XX.XXX RUNNING
153 worker-0 us-west1-c e2-standard-2 10.240.0.20 XX.XX.XXX.XXX RUNNING
154 worker-1 us-west1-c e2-standard-2 10.240.0.21 XX.XX.XX.XXX RUNNING
155 worker-2 us-west1-c e2-standard-2 10.240.0.22 XX.XXX.XX.XX RUNNING
158 ### Configuring SSH Access
160 Kubespray is relying on SSH to configure the controller and worker instances.
162 Test SSH access to the `controller-0` compute instance:
165 IP_CONTROLLER_0=$(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way AND name:controller-0" --format="value(EXTERNAL_IP)")
167 ssh $USERNAME@$IP_CONTROLLER_0
170 If this is your first time connecting to a compute instance SSH keys will be
171 generated for you. In this case you will need to enter a passphrase at the
174 > If you get a 'Remote host identification changed!' warning, you probably
175 already connected to that IP address in the past with another host key. You
176 can remove the old host key by running `ssh-keygen -R $IP_CONTROLLER_0`
178 Please repeat this procedure for all the controller and worker nodes, to
179 ensure that SSH access is properly functioning for all nodes.
183 The following set of instruction is based on the [Quick Start](https://github.com/kubernetes-sigs/kubespray) but slightly altered for our
186 As Ansible is a python application, we will create a fresh virtual
187 environment to install the dependencies for the Kubespray playbook:
191 source venv/bin/activate
194 Next, we will git clone the Kubespray code into our working directory:
197 git clone https://github.com/kubernetes-sigs/kubespray.git
199 git checkout release-2.17
202 Now we need to install the dependencies for Ansible to run the Kubespray
206 pip install -r requirements.txt
209 Copy ``inventory/sample`` as ``inventory/mycluster``:
212 cp -rfp inventory/sample inventory/mycluster
215 Update Ansible inventory file with inventory builder:
218 declare -a IPS=($(gcloud compute instances list --filter="tags.items=kubernetes-the-kubespray-way" --format="value(EXTERNAL_IP)" | tr '\n' ' '))
219 CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
222 Open the generated `inventory/mycluster/hosts.yaml` file and adjust it so
223 that controller-0, controller-1 and controller-2 are control plane nodes and
224 worker-0, worker-1 and worker-2 are worker nodes. Also update the `ip` to the respective local VPC IP and
225 remove the `access_ip`.
227 The main configuration for the cluster is stored in
228 `inventory/mycluster/group_vars/k8s_cluster/k8s_cluster.yml`. In this file we
229 will update the `supplementary_addresses_in_ssl_keys` with a list of the IP
230 addresses of the controller nodes. In that way we can access the
231 kubernetes API server as an administrator from outside the VPC network. You
232 can also see that the `kube_network_plugin` is by default set to 'calico'.
233 If you set this to 'cloud', it did not work on GCP at the time of testing.
235 Kubespray also offers to easily enable popular kubernetes add-ons. You can
237 list of add-ons in `inventory/mycluster/group_vars/k8s_cluster/addons.yml`.
238 Let's enable the metrics server as this is a crucial monitoring element for
239 the kubernetes cluster, just change the 'false' to 'true' for
240 `metrics_server_enabled`.
242 Now we will deploy the configuration:
245 ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa cluster.yml
248 Ansible will now execute the playbook, this can take up to 20 minutes.
250 ## Access the kubernetes cluster
252 We will leverage a kubeconfig file from one of the controller nodes to access
253 the cluster as administrator from our local workstation.
255 > In this simplified set-up, we did not include a load balancer that usually
257 three controller nodes for a high available API server endpoint. In this
258 simplified tutorial we connect directly to one of the three
261 First, we need to edit the permission of the kubeconfig file on one of the
265 ssh $USERNAME@$IP_CONTROLLER_0
267 sudo chown -R $USERNAME:$USERNAME /etc/kubernetes/admin.conf
271 Now we will copy over the kubeconfig file:
274 scp $USERNAME@$IP_CONTROLLER_0:/etc/kubernetes/admin.conf kubespray-do.conf
277 This kubeconfig file uses the internal IP address of the controller node to
278 access the API server. This kubeconfig file will thus not work of from
279 outside of the VPC network. We will need to change the API server IP address
280 to the controller node his external IP address. The external IP address will be
282 TLS negotiation as we added the controllers external IP addresses in the SSL
283 certificate configuration.
284 Open the file and modify the server IP address from the local IP to the
285 external IP address of controller-0, as stored in $IP_CONTROLLER_0.
293 certificate-authority-data: XXX
294 server: https://35.205.205.80:6443
299 Now, we load the configuration for `kubectl`:
302 export KUBECONFIG=$PWD/kubespray-do.conf
305 We should be all set to communicate with our cluster from our local workstation:
314 NAME STATUS ROLES AGE VERSION
315 controller-0 Ready master 47m v1.17.9
316 controller-1 Ready master 46m v1.17.9
317 controller-2 Ready master 46m v1.17.9
318 worker-0 Ready <none> 45m v1.17.9
319 worker-1 Ready <none> 45m v1.17.9
320 worker-2 Ready <none> 45m v1.17.9
327 Verify if the metrics server addon was correctly installed and works:
336 NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
337 controller-0 191m 10% 1956Mi 26%
338 controller-1 190m 10% 1828Mi 24%
339 controller-2 182m 10% 1839Mi 24%
340 worker-0 87m 4% 1265Mi 16%
341 worker-1 102m 5% 1268Mi 16%
342 worker-2 108m 5% 1299Mi 17%
345 Please note that metrics might not be available at first and need a couple of
346 minutes before you can actually retrieve them.
350 Let's verify if the network layer is properly functioning and pods can reach
354 kubectl run myshell1 -it --rm --image busybox -- sh
356 # launch myshell2 in separate terminal (see next code block) and ping the hostname of myshell2
357 ping <hostname myshell2>
361 kubectl run myshell2 -it --rm --image busybox -- sh
363 ping <hostname myshell1>
369 PING 10.233.108.2 (10.233.108.2): 56 data bytes
370 64 bytes from 10.233.108.2: seq=0 ttl=62 time=2.876 ms
371 64 bytes from 10.233.108.2: seq=1 ttl=62 time=0.398 ms
372 64 bytes from 10.233.108.2: seq=2 ttl=62 time=0.378 ms
374 --- 10.233.108.2 ping statistics ---
375 3 packets transmitted, 3 packets received, 0% packet loss
376 round-trip min/avg/max = 0.378/1.217/2.876 ms
381 In this section you will verify the ability to create and manage [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/).
383 Create a deployment for the [nginx](https://nginx.org/en/) web server:
386 kubectl create deployment nginx --image=nginx
389 List the pod created by the `nginx` deployment:
392 kubectl get pods -l app=nginx
398 NAME READY STATUS RESTARTS AGE
399 nginx-86c57db685-bmtt8 1/1 Running 0 18s
404 In this section you will verify the ability to access applications remotely using [port forwarding](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/).
406 Retrieve the full name of the `nginx` pod:
409 POD_NAME=$(kubectl get pods -l app=nginx -o jsonpath="{.items[0].metadata.name}")
412 Forward port `8080` on your local machine to port `80` of the `nginx` pod:
415 kubectl port-forward $POD_NAME 8080:80
421 Forwarding from 127.0.0.1:8080 -> 80
422 Forwarding from [::1]:8080 -> 80
425 In a new terminal make an HTTP request using the forwarding address:
428 curl --head http://127.0.0.1:8080
436 Date: Thu, 13 Aug 2020 11:12:04 GMT
437 Content-Type: text/html
439 Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
440 Connection: keep-alive
445 Switch back to the previous terminal and stop the port forwarding to the `nginx` pod:
448 Forwarding from 127.0.0.1:8080 -> 80
449 Forwarding from [::1]:8080 -> 80
450 Handling connection for 8080
456 In this section you will verify the ability to [retrieve container logs](https://kubernetes.io/docs/concepts/cluster-administration/logging/).
458 Print the `nginx` pod logs:
461 kubectl logs $POD_NAME
468 127.0.0.1 - - [13/Aug/2020:11:12:04 +0000] "HEAD / HTTP/1.1" 200 0 "-" "curl/7.64.1" "-"
473 In this section you will verify the ability to [execute commands in a container](https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#running-individual-commands-in-a-container).
475 Print the nginx version by executing the `nginx -v` command in the `nginx` container:
478 kubectl exec -ti $POD_NAME -- nginx -v
484 nginx version: nginx/1.19.1
487 ### Kubernetes services
489 #### Expose outside of the cluster
491 In this section you will verify the ability to expose applications using a [Service](https://kubernetes.io/docs/concepts/services-networking/service/).
493 Expose the `nginx` deployment using a [NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport) service:
496 kubectl expose deployment nginx --port 80 --type NodePort
499 > The LoadBalancer service type can not be used because your cluster is not configured with [cloud provider integration](https://kubernetes.io/docs/getting-started-guides/scratch/#cloud-provider). Setting up cloud provider integration is out of scope for this tutorial.
501 Retrieve the node port assigned to the `nginx` service:
504 NODE_PORT=$(kubectl get svc nginx \
505 --output=jsonpath='{range .spec.ports[0]}{.nodePort}')
508 Create a firewall rule that allows remote access to the `nginx` node port:
511 gcloud compute firewall-rules create kubernetes-the-kubespray-way-allow-nginx-service \
512 --allow=tcp:${NODE_PORT} \
513 --network kubernetes-the-kubespray-way
516 Retrieve the external IP address of a worker instance:
519 EXTERNAL_IP=$(gcloud compute instances describe worker-0 \
520 --format 'value(networkInterfaces[0].accessConfigs[0].natIP)')
523 Make an HTTP request using the external IP address and the `nginx` node port:
526 curl -I http://${EXTERNAL_IP}:${NODE_PORT}
534 Date: Thu, 13 Aug 2020 11:15:02 GMT
535 Content-Type: text/html
537 Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
538 Connection: keep-alive
545 We will now also verify that kubernetes built-in DNS works across namespaces.
549 kubectl create namespace dev
552 Create an nginx deployment and expose it within the cluster:
555 kubectl create deployment nginx --image=nginx -n dev
556 kubectl expose deployment nginx --port 80 --type ClusterIP -n dev
559 Run a temporary container to see if we can reach the service from the default
563 kubectl run curly -it --rm --image curlimages/curl:7.70.0 -- /bin/sh
564 curl --head http://nginx.dev:80
572 Date: Thu, 13 Aug 2020 11:15:59 GMT
573 Content-Type: text/html
575 Last-Modified: Tue, 07 Jul 2020 15:52:25 GMT
576 Connection: keep-alive
581 Type `exit` to leave the shell.
585 ### Kubernetes resources
587 Delete the dev namespace, the nginx deployment and service:
590 kubectl delete namespace dev
591 kubectl delete deployment nginx
592 kubectl delete svc/nginx
597 Note: you can skip this step if you want to entirely remove the machines.
599 If you want to keep the VMs and just remove the cluster state, you can simply
600 run another Ansible playbook:
603 ansible-playbook -i inventory/mycluster/hosts.yaml -u $USERNAME -b -v --private-key=~/.ssh/id_rsa reset.yml
606 Resetting the cluster to the VMs original state usually takes about a couple
609 ### Compute instances
611 Delete the controller and worker compute instances:
614 gcloud -q compute instances delete \
615 controller-0 controller-1 controller-2 \
616 worker-0 worker-1 worker-2 \
617 --zone $(gcloud config get-value compute/zone)
620 <!-- markdownlint-disable no-duplicate-heading -->
622 <!-- markdownlint-enable no-duplicate-heading -->
624 Delete the fixed IP addresses (assuming you named them equal to the VM names),
628 gcloud -q compute addresses delete controller-0 controller-1 controller-2 \
629 worker-0 worker-1 worker-2
632 Delete the `kubernetes-the-kubespray-way` firewall rules:
635 gcloud -q compute firewall-rules delete \
636 kubernetes-the-kubespray-way-allow-nginx-service \
637 kubernetes-the-kubespray-way-allow-internal \
638 kubernetes-the-kubespray-way-allow-external
641 Delete the `kubernetes-the-kubespray-way` network VPC:
644 gcloud -q compute networks subnets delete kubernetes
645 gcloud -q compute networks delete kubernetes-the-kubespray-way