The BioExcel Binder deployment¶
The Binder infrastructure under the hood leverages the potential of container and cloud technologies by being built on top of Kubernetes, the most popular container orchestration system, and using container and Kubernetes-based tools and infrastructure such as DockerHub, Helm, Helmsman, Grafana and Fluentd for k8s.
Prerequisites/recommended skill level¶
In this part, we detail the underlying technology of a BioExcel Binder deployment. The intended audience is contributors, collaborators, or IT services interested in forking the code and trying to run their own instances.
We heavily use Kubernetes for the BioExcel Binder deployment. It's recommended a deep knowledge of how to install (at least using a public provider) and operate it (eg: Ingress networking, Kustomize, Helm charts ... ).
Moreover, since we have implemented customizations, an understanding of installing and operating a JupytherHub instance is recommended.
All the deployment scripts are based on standard Linux tools like Bash and Make to make it easier to integrate the automation in CI/CD agents.
The BioExcel Binder is currently hosted in the EMBL-EBI private OpenStack cloud, an isolated Infrastructure as a Service (IaaS) capability within EMBL-EBI’s Data Centre.
The use of Kubernetes and deployment automation guarantees portability between cloud infrastructures and will enable the installation of satellite deployments at other partners’ sites or in the public cloud (Cloud agnostic approach).
Indeed, to increase the availability and meet further needs we are recently extending the infrastructure to use the Google Cloud Platform (GCP).
Kubernetes clusters setup¶
The cluster setup is configured differently according to the infrastructures providers:
- In the OpenStack cloud, we use the Magnum OpenStack API.
Here you can find an example of a command to create a cluster using Magnum.
- In GCP, we use a Terraform script (based on the official module terraform-google-modules/kubernetes-engine).
You can download an example of Terraform configuration here.
Once the Kubernetes cluster is installed, we run a script to set up all the configurations required to initialize a brand new instance. Eg:
- Set up the access for the deploy agents.
- Set up user accounts.
After this step - ideally - all the other operations should be done by a deployment agent authorized to operate on the cluster (eg: Helmsman using a dedicated service account).
Here you can see the script we are using to accomplish that.
Deployments over Kubernetes¶
The BinderHub deployment is packaged using a Helm chart. We also use charts to install other components. Eg:
To further improve the automation and the configurability we use Helmsman : "a Helm Charts as Code tool which allows you to automate the deployment/management of your Helm charts from version-controlled code".
This way we have a single configuration to manage all the components (Charts) in various environments (Cloud infrastructures and setup flavours).
There are secrets at various levels in the configurations. Eg:
- Kubernetes cluster tokens
- Docker registry accounts
- Jupyterhub Tokens
- TLS certificates
To integrate those sensitive data into the code we use Mozilla SOPS. This is also well integrated with Helm (by a specific Helm plugin) and Helmsman (native support by the helm-secrets plugin).
In the Helmsman example configuration you can see how encrypted configurations can be passed transparently using the "secretsFiles" key:
... valuesFiles: - helm_configs/base/config_binder.yml - helm_configs/base/config_jupytherhub.yml - helm_configs/base/config_ingress.yaml secretsFiles: # Managed by helm-secret plugin (SOPS encrypted files). - helm_configs/base/secret.yaml ...