4/24/2012

Designing safe software systems? I've got three articles to keep you on track

Believe it or not, the men in this video are performing a safety procedure:



So are the men in this video:



I know what you're probably thinking: What's so safe about standing near, or hanging from, a moving train? Are these people nuts?

On the other hand, you may know exactly what is happening: The men are exchanging railway tokens. In a nutshell, only one token exists for any given section of track, and only the train driver possessing the token can access that section. (If you're a software developer, think mutex.) The idea, of course, is to prevent two or more trains, especially those traveling in opposite directions, from using the same section of track at the same time.

From what I can gather, token-based systems have proved highly effective in preventing train-to-train collisions. Indeed, they remain in use in several areas, particularly on heritage railway lines.

That said, the world of rail transportation has moved on. High-speed freights, such as the TGV postal in France, zoom along at over 250 km/h, while passenger trains, such the China Railway High-speed, carry passengers at speeds reaching 350 km/h. The Shanghai Maglev Train, meanwhile, operates at a jaw-dropping 430 km/h — and is designed for speeds up to 500 km/h.

Available and correct
None of these trains could run without software control systems. Let me re-phrase that: safe software control systems. A safe software system possesses two key characteristics: It always responds when a response is required, and it always provides the correct response.

For instance, the software system controlling a train’s brakes must be available whenever required — a delayed response could result in an accident. The software system must also apply the brakes appropriately — too little can result in a collision, and too much can damage the train or cause a derailment.

To meet these requirements, the software system needs to use a real-time OS (RTOS) that meets specific claims of reliability and availability. But the software that runs on top of the OS (i.e. the part you design) must also embody these qualities. Which is where my colleague Chris Hobbs comes in.

Chris spends a lot of time thinking about the design of safe software systems — when he isn't actually helping people design them. So, not surprisingly, he has produced a series of articles and white papers to ground developers in key concepts and to help companies develop a safety culture. Electronic Design magazine has published three of his pieces so far, and I wouldn't be surprised if they publish more in the future.

Without further ado, here are the Electronic Design articles:

The Limits of Testing in Safe Systems — Key takeaway: Testing can prove the presence of faults, but it can't prove their absence. It isn't enough to test your systems; you must use other methods, such as design validation, as well. That said, testing can tell you a lot, especially when you apply statistical analysis to your test results, and when you use techniques like fault injection to estimate remaining faults and to observe how the system behaves under fault conditions.

Define And State Your Safety Requirements Before Design and Test — Key takeaway: Safety must be built into a system from the start, and everything you do should follow from the premise that all software contains faults and these faults may lead to failures. As you build your system, you must reduce the number of faults included in the design and implementation, prevent faults from becoming errors, prevent errors from becoming failures, and handle failures when they do occur.

Clear SOUP And COTS Software Can Reliably Serve Safety-Critical Systems — Key takeaway: Some device manufacturers want to use COTS software, but worry that COTS means SOUP — software of uncertain provenance. And SOUP can make a mess of safety claims... or perhaps not. If you take a nuanced approach and distinguish between opaque SOUP (which should be avoided) and clear SOUP (for which source code, fault histories, and long in-use histories are available), you may, in fact, discover that COTS software is a good choice for your safety-related project.
 

No comments: