The Things Stack - Deployment Process, Test Scenarios and Incident Management

This article highlights the deployment process, test scenarios and incident management process of The Things Stack (V3) Cloud Hosted.

Section 1. Deployment Process

The Things Stack is an Open Source project (excluded some proprietary features), allowing anyone to look into the source code, the issues the team is working on and the release notes of each release.

Patch, Minor and Major Releases

We make a distinction between Patch, Minor and Major releases. Patch releases, e.g. version 3.8.4 to version 3.8.5 contain compatible features and fixes. Meaning that no action is required from the user to keep the Network Server’s operations running as expected.

Minor releases, e.g. version 3.8.5 to version 3.9 will contain new features that may require user action to enable. It may also contain (only) security fixes that may break existing operation. If a minor release contains a breaking change, the user is informed at least 5 working days in advance about the breaking change/additional configuration required and instructions are communicated to be executed by the user.

Example of a breaking security fix in the minor release to version 3.9:

“Option to allow unauthenticated LoRa Basic Station connections (gs.basic-station.allow-unauthenticated). ⚠ Without this option, Basic Station gateways that do not use authentication will not be allowed to connect.”

All breaking functional changes (that are not security fixes) will only be targeted to our next Major release, i.e., v4.

Deployment cycles

The Things Industries team works in short sprints and aims for a deployment every 2 weeks. These deployments contain new features and bug fixes which are run through automated tests to check for compatibility and regression (for more information, see Section 2, Test Process).

Once completed, the release cycle starts with deploying the new version to the staging environment which contains physical devices and gateways that continuously run LoRaWAN tests to verify the inner workings of the Stack. If all tests with physical hardware are completed, the new version is deployed to the production environment and release notes are published on GitHub. In the post-deployment phase, The Things Industries engineers monitor all operations to validate the devices, gateways and integrations’ behavior. 

The Things Industries’ release commitments:

  1. We will not break the API towards gateways and applications within the major version. This includes how gateways communicate (with Gateway Server) and how applications work with data (with Application Server)

  2. We will not break the public command-line interface and configuration within the major version. This means that you can safely build scripts and migrate configuration.

  3. We will not break the API between components and events within minor versions. So at least the same minor versions of components are compatible with each other.

Staging environment for customers
By the end of the year, customers will get access to a staging environment which runs the beta version of the upcoming The Things Stack release. This allows for testing the latest features and experimenting with The Things Stack in general without touching the production network server.

Downtime

If any noticeable downtime is expected during the deployment process or maintenance activity, customers are informed at least 5 working days in advance. This is done by adding an update on The Things Industries Status Page with information on the starting date and time, the duration of the update and the affected software components and server clusters. Notifications are sent to the subscribers of the status page when the deployment or maintenance is scheduled, and when the update is about to start.

Section 2. Test Process

During the development process, 2 type of tests are executed.

  1. End-to-end tests 

  2. Physical tests

End-to-end tests

Before any feature or bug fix is merged with the upcoming deployment branch, automated end-to-end tests are executed to test compatibility and regression. End-to-end tests are designed to mimic user behavior and to simulate different LoRaWAN scenarios. User behavior scenarios contain complete user stories that are critical to the overall integrity of the application and usually comprise multiple components and views (e.g. creating applications, registering gateways, changing device settings etc). The LoRaWAN end-to-end test validate compatibility and test regressions by simulating scenarios such as sending uplinks and downlinks and process join requests.

Link to more information

Physical tests

When end-to-end tests are completed, the new feature can be merged to the upcoming deployment branch. The release cycle starts with deploying The Things Stack, including its newly merged features to the staging environment. Only when the staging environment passes all physical tests, a release is planned for the production environment.

The Things Industries operates a physical test lab, containing multiple gateways and devices which continuously run LoRaWAN test loops. These test loops include:

  • Sending uplinks.

  • Receiving downlinks.

  • Testing operations mode class A and C.

  • Running through the join process for a Cluster Join Server and the Global Join Server.

  • Testing integrations using HTTP Webhooks, MQTT and Pub/Subs.

  • Exchanging traffic between different tenants within The Things Stack Cloud hosted, and between dedicated The Things Stack deployments using the Packet Broker.

  • For the physical tests, gateways which run the UDP Packet Forwarder as well as LoRa Basics Station are used.

Section 3. Incident Management Process

24 hours a day, at least one support engineers is dedicated to monitor The Things Industries’ network operations and to take direct actions when alerts are received from Opsgenie. When an incident is reported, our engineers triage the alerts, and in case of a global incident affecting service availability (per regional cluster), they add a report to The Things Industries status page to notify customers.