Monitoring Service Deployments

Preliminary/Beta design. What is described here is prone to change.

Kumori Platform tracks basic information about the execution of role instances within a deployed service. Typical measurements are usage of various resources (CPU, memory,…) at the instance level.

However, in many circumstances it is necessary to extract other metrics from the execution of roles in a service that are particular to the function the service is carrying out. The goal may be to raise an alarm to a specific condition, or to even be able to take action when the high level state of a service revealed by those metrics falls out of an acceptable range.

Many different approaches can be devised that can be used across a variety of components (e.g., for components running within a Java VM, specific metrics of the VM itself could be captured to reveal conditions that must be taken care of, independently of the concrete functionality being ran in that VM). Another example could be related to the number of high level tasks waiting to be started/completed/…, that can feed a mechanism to autoscale parts of the deployed service.

The goal of the present description is to present the basic mechanism Kumori Platform will employ to capture metrics for roles. This basic mechanism can be employed directly by service application integrators to actually capture such metrics information.

Future progressions of the monitoring facility will allow configuring this capture optionally on deployment.

Basic approach

As hinted above, specific monitoring for a service application will depend on details of what is important to track for the various roles within the service.

Our basic approach is to deploy a special role, the service monitor role within each deployed service that will be in charge of capturing metrics from the rest of the roles in the service.

The SMR within a service is modeled as any other role. Thus, it needs to be connected to all the roles that need to be monitored by it. The connection is carried out via a complete connector.

The SMR will have a client channel connected to the full connector, and the full connector will, in turn be connected to server channels provided by each role capable of being monitored.

The SMR can, resolving its client channel, find out the collection of instances of the roles it has to monitor, and, consequently, carry out metrics capture according to configuration it is provided with.

From the above description it is clear that a role’s participation in service monitoring is optional. In order for a role to participate in service metrics collection it must provide a server channel where the SMR will request metrics data from.

Going beyond the basics

The above description just shows that specific service monitoring can be layered on top of the basic service architecture imposed by our service model.

However, to make it practical, we need to impose some order so that the common cases can be automated. To this end, our platform imposes the following conditions:

A component implementing a monitoreable role in a service must identify the server channel providing its metrics, and when doing so, must also provide metadata (TBD) useful to further configure the SMR.
A component implementing a SMR needs to have a client channel capable of resolving all instances behind a complete connector, and retrieving metadata associated with the instance being monitored
A component implementing a SMR needs to provide a server port for other roles to request the data being captured using a TBD protocol.
A component implementing a SMR may declare a client port to persist the data.
A component implementing a SMR may declare a server port through which a visualization interface can be accessed.

Initial support from Kumori platform

Kumori platform will provide initial support for this basic schema by making available a component capable of implementing an SMR role. In addition, we will establish a specific configuration on components to mark the monitor server channel.

The SMR role will consist of a combination of Prometheus and Graphana, with suitable configuration parameters.

The component will contain a special discovery mechanism to interface with the platform dns-based discovery.

Access to Prometheus will be provided by a specific server channel. Likewise, access to Graphana will be provided through a specific server channel.

The intention is to expose the graphana server channel as a service channel of the service application. Access to Graphana dashboards can then be set up through a variety of mechanisms, the most straightforward being simply establishing an inbound to the monitoring server channel.