At the heart of an RTOS is the scheduler: the algorithm that is responsible for choosing which task to run next. In a priority-based scheduler, a “runnable” task with the highest priority is selected (a non-runnable task may be blocked waiting for a resource, or “sleeping” waiting for an event etc.).


Sometimes a high priority task cannot run because a lower priority task is still holding onto a resource that the high priority task needs. Normally, good firmware programming practices would minimize the amount of time that the low priority task holds the shared resource. However, there are instances where unintended interactions between tasks can cause a problem known as priority inversion to arise, as in the case of the 1997 Mars Pathfinder robot rebooting itself after some hours of operation.


In the Mars Pathfinder, a shared information bus was used for communication between different components of the robot. In this particular case, a low priority task and a high priority were both using the information bus for different purposes. The software used a mutex (“mutual exclusion” - a primitive to control access to shared resource) to control access to the bus: before accessing the bus, a task would have to obtain the mutex.


What happened to Pathfinder is that on some occasions when a low priority task had control of the mutex, before it could finish its functions and release the mutex, a medium priority task (not using the bus) would start to run. As a medium priority task has a higher priority than a low priority task, the low priority task was prevented from continuing to run while the medium task was running. This caused a high priority task (which needed to use the bus) to wait an unexpectedly lengthy amount of time for the mutex to be unlocked, therefore the firmware’s watchdog system automatically kicked in and reset the system.


In essence, even though the high priority task needed to access the mutex, it was shut out from the system due to the low priority task holding onto the necessary resource, thus behaving as if the priorities of the two tasks were inverted.


One fix for this issue is for the RTOS to provide priority inheritance: if a high priority task wants to access a mutex that is currently held by another task that has a low priority, the owning task should have its task priority temporarily elevated to that of the high priority task until the mutex is released. This would disallow the preemption of the normally low priority task by a medium priority task.


The Mars Pathfinder problem occurred in 1997. Indeed, the Pathfinder RTOS did provide optional priority inheritance, but that feature was not enabled for that mutex.


Note that priority inheritance does not solve all resource-sharing problems. For example, task resource sharing may form a chain. However, it’s arguable that a well designed system should not exhibit those issues.


There is really no reason not to use priority inheritance so any modern RTOS should provide that feature (or something similar) to solve the issue of priority inversion. If you are writing an RTOS, or using an existing RTOS, make sure that this feature is supported.

NOTE: REXIS is coming! REXIS (Real-time EXecutive for Intelligent Systems) is a message passing executive kernel from ImageCraft. Some preliminary information including API documentation is available here: http://imagecraft.com/documentation/rexis-rtos. More information soon.