COSMOS Limits

COSMOS allows you to define limits for your telemetry items to provide feedback to users about how your telemetry is performing. If you define limits, COSMOS requires you to specify both red and yellow limits. Red limits are traditionally where damage may occur while yellow limits are set a certain percentage away from red to give the operator a chance to respond before the red limits are hit. COSMOS also allows the user to set “Operational” limits (blue limits) which specify a desired operational range inside the green limits. This is best illustrated in the COSMOS demo. Here is a partial screenshot of the INST HS screen in Telemetry Viewer.

Demo MetaData

The TEMP1 telemetry point is using the LABELTRENDLIMITSBAR widget in the INST HS screen definition to display both the value (the temperature), the trend (how the data has been moving over the past 60s), as well as the limits bar. The TEMP2 telemetry point is simply using the LABELVALUELIMITSBAR widget so it shows just the value and the limits bar. Notice how the scale of the yellow regions are different for TEMP1 and TEMP2. The regions are scaled according to their defined value ranges.

This type of limits bar is also used by the Limits Monitor tool to display out of limits items system wide.

Demo MetaData

Here I’ve ignored various other items to focus on the INST HEALTHSTATUS packet containing TEMP1 and TEMP2. Notice that the overall limit is yellow since TEMP1 is yellow even though TEMP2 is green. The worst case limts item is reflected in the Monitored Limits State. Also note that TEMP2 appears above TEMP1 because it happened to go out of limits first while I was running the demo. Items appear in Limits Monitor as they go out of limits. Also note that even though TEMP2 is green it still shows up in Limits Monitor because it _previously went out of limits. If you switch to the Log tab you can see exactly when an item went out of limits and when it went back to green.

Scripting Limits

You can get this same information from a script by executing the following script in Script Runner.

puts "Overall limits state: #{get_overall_limits_state}"
get_out_of_limits.each do |item|
  tgt, pkt, item, state = item
  puts "tgt:#{tgt} pkt:#{pkt} item:#{item} state:#{state}"
end

Note that the get_out_of_limits call returns the items which are out of limits at the exact instant of the API call. Thus it doesn’t provide history like the Limits Monitor tool.

You can also disable and enable limits on individual telemetry items using the API. The following script will disable limits on TEMP1 and then after 5 seconds it re-enables limits. Run this with the Limits Monitor open and you’ll notice the text for the INST HEALTH_STATUS TEMP1 telemetry point turns black when limits are disabled. Note that this disables limits events logging in the Command and Telemetry Server for this telemetry point.

puts "TEMP1 limits enable:#{limits_enabled?("INST", "HEALTH_STATUS", "TEMP1")}"
disable_limits("INST", "HEALTH_STATUS", "TEMP1")
puts "TEMP1 limits enable:#{limits_enabled?("INST", "HEALTH_STATUS", "TEMP1")}"
wait 5
enable_limits("INST", "HEALTH_STATUS", "TEMP1")
puts "TEMP1 limits enable:#{limits_enabled?("INST", "HEALTH_STATUS", "TEMP1")}"

Limits Sets

COSMOS requires the first item after the LIMITS keyword to be the name of the limits set. Limits sets are system wide sets of limits which can be applied depending on the environment you’re testing in. Perhaps you have a “Test” set of limits which are different from your “Operational” limits. At Ball Aerospace, we frequently define cyrogentic limits when testing spacecraft in a vacuum chamber at cyrogenic temperatures. COSMOS requires at least one limits definition to belong to “DEFAULT” which is used when COSMOS starts. If you forget to define a DEFAULT limits set, you’ll get this error when you start the COSMOS Command and Telemetry Server.

Demo MetaData

Once you’ve defined your limits sets how do you use them? The COSMOS Command and Telemetry Server has a Status tab which allows you to manually change the limits set using a drop down.

Demo MetaData

Notice the log contains “INFO: Limits Set Changed to: TVAC”. This happened immediately after I manually changed the Limits Set drop down from DEFAULT to TVAC. This message is automatically generated whenever the limits set is changed either manually or from the API.

Scripting Limits Sets

You can also manipulate the limits sets programatically from a script.

get_limits_sets.each { |set| puts "Limits Set:#{set}" }
puts "Current Set:#{get_limits_set()}" # Currently selected limits set
set_limits_set("DEFAULT") # Change to DEFAULT
set_limits_set("TVAC") # Change to TVAC

Limits Groups

COSMOS also has the concept of limits groups. Limits groups are used to group together telemetry items so they can be enabled and disabled together. For example, if you have several physical devices in your system, you can group together items from each device and enable their limits when they power on and disable them when they power off. This is useful to avoid limits violations on items which are currently off and returning zeros.

Limits groups are defined by the LIMITS_GROUP keyword. Typically this is a separate file in your target’s cmd_tlm folder. If you’re creating a group which combines items from multiple targets we recommend to define this in the SYSTEM target’s cmd_tlm folder. This is how the COSMOS demo is configured with a limits_groups.txt file in SYSTEM/cmd_tlm.

Items are assigned to a group using the LIMITS_GROUP_ITEM keyword. Here is the definition in the COSMOS demo.

LIMITS_GROUP INST2_TEMP2
  LIMITS_GROUP_ITEM INST2 HEALTH_STATUS TEMP2

LIMITS_GROUP INST2_GROUND
  LIMITS_GROUP_ITEM INST2 HEALTH_STATUS GROUND1STATUS
  LIMITS_GROUP_ITEM INST2 HEALTH_STATUS GROUND2STATUS

Scripting Limits Groups

You can also manipulate the limits groups programatically from a script.

get_limits_groups.each { |group| puts "Limits Group:#{group}" }
enable_limits_group("INST2_TEMP2")
disable_limits_group("INST2_GROUND")

Automating Limits Groups

COSMOS also has the ability to automatically enable and disable limits groups based on user configurable conditions. For example, you can automatically enable the group for a particular device when the power supply voltage for that device goes high (the device is powered on). The COSMOS demo has an example of this in the SYSTEM/lib/limits_groups.rb file.

require 'cosmos/tools/cmd_tlm_server/limits_groups_background_task'

module Cosmos
  # Inheriting from the LimitsGroupsBackgroundTask provides a framework for
  # automatically enabling and disabling limits groups based on telemetry.
  class LimitsGroups < LimitsGroupsBackgroundTask
    # Initial delay upon starting the server before staring the group checks
    # followed by the background task delay between check iterations
    def initialize(initial_delay = 0, task_delay = 0.5)
      super(initial_delay, task_delay)
      # Creating a Proc allows for arbitrary code to be executed when a group is enabled or disabled
      @temp2_enable_code = Proc.new do
        enable_limits_group('INST2_GROUND') # Enable the INST2_GROUND group
      end
      @temp2_disable_code = Proc.new do
        disable_limits_group('INST2_GROUND') # Disable the INST2_GROUND group
      end
    end
  end
end

You first must require the COSMOS base class and inherit from LimitsGroupsBackgroundTask. The constructor (initialize) takes two delay parameters. The first is the initial delay after starting the Command and Telemetry Server before the task starts checking your logical conditions. The second is the delay between doing the checks (it controls the rate of the task). After calling super (important!) a Proc is created to run arbitrary code when a particular group is enabled or disabled. Note this is entirely optional but in this case we wanted to enable the INST2_GROUND group when the INST2_TEMP2 group got enabled. The @temp2_enable_code and @temp2_disable_code names don’t really matter but they must be unique and have the ‘@’ symbol to make them Ruby instance variables (rather than locals).

After the constructor is configured the method to perform the check is implemented.

def check_inst2_temp2
  process_group(0, @temp2_enable_code, @temp2_disable_code) do
    val = tlm("INST2 HEALTH_STATUS TEMP2")
    return (!val.nan? && !val.infinite?)
  end
end

The method name is very important. It must begin with ‘check_’ and end with the name of a Limits Group defined by the LIMITS_GROUP keyword. In the demo, LIMITS_GROUP(s) are defined in config/targets/SYSTEM/cmd_tlm/limits_groups.txt. Since there is a LIMITS_GROUP INST2_TEMP2 we have a match (the method name should be lower case while the LIMITS_GROUP is typically defined to be uppercase).

Inside the method you must call process_group. The first parameter is the number of seconds to delay before enabling the group when the telemetry check is true. So in this example we are waiting zero seconds to enable the group. Note that when the telemetry check is false the group is instantly disabled. The next two parameters are Proc objects which are called when the group is enabled and disabled respectively. Remember that we defined our Proc objects in the constructor to enable and disable the INST2_GROUND group. These parameters are entirely optional so you can also simply call process_group(0) with just the delay.

The code inside the process_group block (the lines after ‘do’ and before ‘end’) must return a boolean. True will enable the group and false will disable the group. In this example, the group is enabled when the value is not NaN (Not a Number) and not Infinite. If the value is NaN or Infinite the group is disabled. You can watch this happening in the demo by watching the bottom log portion of the Command and Telemetry Server.

2018/02/12 09:12:58.118  ERROR: INST HEALTH_STATUS TEMP2 = NaN is RED_LOW
2018/02/12 09:12:58.121  ERROR: INST2 HEALTH_STATUS TEMP2 = NaN is RED_LOW
2018/02/12 09:12:58.220  INFO: Disabling Limits Group: INST2_TEMP2
2018/02/12 09:12:58.220  INFO: Disabling Limits Group: INST2_GROUND

First TEMP2 transitions to NaN (this is done on purpose for the demo) which is a RED_LOW limits violation. 100ms later (due to processing time) the INST2_TEMP2 group is disabled followed by the INST2_GROUND group due to the Proc code we created.

Other Limits APIs

There are a few other APIs which are provided to work with COSMOS limits.

get_stale returns all the stale packets in the system. Stale packets are those which have not been received by the Command and Telemetry Server for the STALENESS_SECONDS setting (by default 30s).

get_limits(tgt, pkt, item) returns the limits settings for a particular telemetry item. It returns an array of the group name, persistence setting, enable setting, red low, yellow low, yellow high, red high, blue low, and blue high limits values. If no blue operational limits were set those values are nil.

set_limits(tgt, pkt, item, red_low, yellow_low, yellow_high, red_high, blue_low = nil, blue_high = nil, limits_set = :CUSTOM, persistence = nil, enabled = true) allows the user to set new values for the limits settings. Some values have defaults and some do not. Thus you must provide at least the first 7 parameters for this call to be successful. Note the order of parameters: red low, yellow low, yellow high, red high, and then blue low, blue high. By default the limits_set is :CUSTOM which is probably not a limits set you already have defined. Thus is you don’t set the limits_set parameter it may appear as if nothing has happened. To enable the new limits you have to manually change to the CUSTOM limits set: set_limits_set("CUSTOM"). Alternatively, you can specify :DEFAULT for the limits_set parameter and you will typically immediately see the change (assuming DEFAULT is the active set). If you only want to change one particular value you can call get_limits first to get the current settings and only change the value you want.

You can also use API methods to subscribe to limits events. This allows you to process all limits events in the same way the Limits Monitor application does. To start you must subscribe to limits events using subscribe_limits_events(queue_size = 1000) which returns an ID. You can then call get_limits_event(id) and pass in the ID to get limits event type and data from the server. There are three different limits event types: LIMITS_CHANGE, LIMITS_SET, and LIMITS_SETTING. LIMITS_CHANGE is when a particular item goes out of limits. This is by far the most common and probably the one you’re most interested in. LIMITS_SET events happen when the overall limits set changes. LIMITS_SETTING events are generated when the set_limits method is called to change the limits on an individual item. The get_limits_event method takes a boolean second parameter to determine whether or not to block when waiting for an event. The default is false which means it blocks waiting for an event. You can also pass true which means to not block waiting for events. However if no event is present get_limits_event will raise a ThreadError. Thus you must catch the error to allow your code to continue. Finally you call unsubscribe_limits_events(id) when shutting down.

Overflown Queues Are Deleted

By default the limits queue is 1000 events deep. If you don't call get_limits_event fast enough to keep up with the population of this queue and it overflows, COSMOS will clean up the resources and delete the queue. At this point when you call get_limits_event you will get a "RuntimeError : Packet data queue with id X not found." Note you can pass a larger queue size to the subscribe_limits_events method.

Here’s a example of using the subscription APIs.

id = subscribe_limits_events()
while true
  type, data = get_limits_event(id) # block waiting
  puts "type:#{type} data:#{data}"
  # break if XXX # Break the loop when ...
end
unsubscribe_limits_events(id)

id = subscribe_limits_events()
while true
  begin
    type, data = get_limits_event(id, true) # don't block
    puts "type:#{type} data:#{data}" # break if XXX # Break the loop when ...
  rescue ThreadError # raised if no event present
    wait 1 # Wait before checking again
  end
end
unsubscribe_limits_events(id)

Conclusion

Limits are a powerful concept in COSMOS. Sets provide for a system wide setting that affects all limits items. Groups allow limits items to be grouped to typically enable and disable them together. APIs allow the user to query and set limits across the system while subscriptions provide notifications for every limits event action.

If you have a question which would benefit the community or find a possible bug please use our Github Issues.