Service Monitoring – Dashboards and Notifications

This post is a follow up on the Service Monitoring – capture the real user experience post and shows what a user journey based monitoring can provide. Please note that in this example we use Site24x7.com as the monitoring tool. There are other monitoring tools which can provide user journey monitoring. Some settings and reports will depend on the chosen monitoring tool and in this blog I will use examples and screen shots from Site24x7.com.

An example

Setup

Lets choose one of the services we we support: Moodle – one of the virtual learning tools. We choose a very simple user three step journey:

Moodle Entry page => Single sign On => Moodle Home Page

We use the following settings for the monitoring:

Setting Value Comments
Check frequency 5 minutes Will check the user journey every three minutes
Locations Three Will check the user journey from three different locations in the world
Words we look for Staff Moodle Help This is a text we expect on to be present on the Moodle Home page. If this text is not present we mark the service as unavailable
Timeout 30 sec If any of the user journey checks does not respond within 30 seconds we mark the service as down
Number of locations to report monitor as down 2 To eliminate that a single location could be faulty, we state that at least two locations have to mark the service as down in order the service is marked as down.
Response time threshold 8 sec If the total user journey takes longer than 8 seconds, the service will be marked as in trouble.

Notifications

Once all is setup, the monitoring will start to poll and information will be available. There are normally different types of notifications, such as:

  • Dashboard with up/Down information
  • Emails
  • Text messages (SMS)

We implemented dashboards and Email notification but not text messages.

There are different dashboards we use, here a few examples:

In-built Site24x7 Dashboard

Fig. 1: This Dashboard shows if services are currently up or down. The dashboard also shows the response of the last poll/check.

Fig. 2: Mobile app showing status of monitored services. In this example we have 79 services up, 1 down and none in trouble or under maintenance.

Fig. 3: Service availability is published on the Information Service’s “Status and Alerts” page. The availability data is populated real time using an API to access the Site24x7.com data. In this case we just display University top priority services. This dashboard is used for end users to see if University priority services are available or not.

Fig. 4: This dashboard shows all monitored services on one screen. This dashboard is used by technical teams and operators and helps to escalate issues quickly (and resolve them) before users experience issues. In this case one system has downtime.

In my next blog I will show how a user journey based monitoring tool can help establishing route causes as well as how historic information can help establishing when issues started and provide some level of capacity management.

 

Leave a Reply