Root causes for errors on DCC

DCC Data Transmission, Considerations on Bit Errors

Content

1. Introduction

2. Error sources

3. Receiving technologies

4. Usage of check codes

5. Practical experiments / measurements

7. Official Tests

7. Summary

1. Introduction

RCN128, DCC-A

^-18

2. Error sources

Power interruptions
This is usually caused by problems in the wheel-rail contact or by switching operations (e.g., in a reversing loop). Power interruptions lead with noticeably short durations to extension of the current bit phase (voltage level simply remains), with longer interruptions to a drop of the voltage at the decoder and to an abort/restart of the DCC receiving.

The duration of power interruptions is difficult to estimate, but two possible sources may be considered here:
- Contact wheel to rail:
  Suppose a wheel runs over a hair on the track. The wheel will take off and touch down again after the hair. How long does this process take?
  Here is a simple example calculation: Assumptions: The model runs 0.3m/s (=100km/h, 1:87). Wheel diameter 11mm (wheel circumference: 34.6mm), a hair with 0.1mm diameter on the rail.
  The angular velocity of the wheel is given by: 300mm/34.6mm = 8.7 rounds per second; scaled by 360° gives 3120°/s.
  The wheel lifts off when cos(phi) = 5.4/5.5. This calculates phi = arccos(5.4/5.5) = 11°; i.e. the wheel is in the air for 22°.
  With the angular velocity above, this results in a drop of 7.05ms.
  Of course, not only one wheel is involved in the current transfer, but the possible interruptions may well extend into the milliseconds range.
- Switching operations:
  Common relays have switching times from about 400us to about 1ms. (These values come from a test series on a reversing loop module)
Short circuits
Wheel angles in the range of 5° to 15° are likely to be involved. Analogous to the above consideration, this results in drops in the range of 1ms to 3ms.
Systematic errors, correlated
These can be caused, for example, by unclean booster edges. However, systematic errors are likely to lead to such a high bit error rate that operation is hardly possible, and an error analysis is unnecessary.
Systematic errors, not correlated This can be, for example, interference from other decoders whose motor drivers causes the level on the track to collapse. The result is noticeably short (in the range of a few µs) voltage drops on the track.

3. Receiving technologies

Edge triggering:
An edge (as interrupt) is used to make the bit decision together with a timer. This method is extremely susceptible to interference and can only be used with stationary decoders.
Edge analysis:
One edge is (or both edges are) measured and the bit decision is made from this. The procedure sorts out the above errors very well because the errors either lengthen or shorten a bit inadmissibly. If both edges are evaluated, the 'skewed' return of the algorithm to a correct sequence of half bits is also detected.
Sampling:
The signal is sampled, depending on the resolution of the sampler, interferences are detected and a DCC message is discarded. If the sampler operates in quite rough raster, it becomes increasingly more susceptible to interference. Also, with sampling there is then still the difference of the single or double edge evaluation. In the border case of exceptionally fine scanning resolution the scanning is identical to the edge evaluation.

4. Usage of Check Codes

Initial situation

Theoretical consideration

XOR generates a false-positive if two bits flip in the same data word. I.e. considering a 6 byte message, we have a length of 48 bits. The error probability for a single bit error is p. Then we can calculate the probability for a double error as:

(1-p)⁴⁶ * p² * (48 over 2)

with (48 over 2) evaluates to 48 * 47 / 2 = 1128. This would be value for a single XOR bit. However, DCC does a bytewise XOR. Here this false-positive is only possible, if the same bit position in the byte is affected. This reduces the number of possible partners for a double error from 47 to 5. The probability for a false-positive reduces by a factor of 5/47.

How does this scale with data word length: (1-p)^length can be evaluated for small values of p into a series, the first term being linear: p*length. For small values of p this can be neglected, too and we see p^2 * (length over 2) as dominant component. For XOR, In general the probability for undetected double error raises with the square of the data word length, on bytewide XOR this is scaled with (B/8-1)/(B-1) (B=total data word length).
Im comparision a (theoretically assumed perfect) CRC. Such a CRC spreads the bit error statistically equal to all checksum bits. The chance for two identical spreads is 1/(2^crc_length), on a 8 bit CRC this is 1/256. Instead of a gain 5/47 we get 1/256. (However: this is only valid if the spread is possible between the bit errors, i.e. there are sufficiant shift + divide operations between the errors. This is not always true.)

Implementation

XOR is available on all systems. In model railway decoders often 8 bit processors are used, typically such processors do not offer a a hardware unit for CRC. Newer generations like ARM Cortex M3, M33 do have hardware support for some 16- or 32-bit polynoms.
CRC can be calculated bitwise or wordwise (based on tables). It can be implemented very fast with a single xor operation and a table lookup, however this requires a table of length 2ⁿ in memory.
Calculation timing can be critical, since the decoder need to check the correct reception of the DCC message before it can send out the railcom answer. The time gap between end of DCC message and start of railcom message is very small, thus a possible railcom answer is typically precalculated and enabled only in case of checksum=good. Therefore checksum calculation must be performed inside the receiving interrupt routines.

Literature

The Effectiveness of Checksums for Embedded Networks

Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks

_-10

5. Practical experiments / measurements

Döhler und Haass

Results:

Error source 1: Most of the errors were so-called "clock pause errors".
(Explanation: D&H also supports SX; with SX, the bits on the track are separated by a so-called "clock pause" (= short "cutout" with a length of 10µs). This must not be longer than 15µs, otherwise reception is aborted; there must be a clock pause for SX between each bit.
Such a check is "unnecessary" for DCC, but common code for data acquisition, DCC is also analyzed with the accuracy required for SX. )
Error source 2: timing errors � in the first half of the DCC signal. Neither a "1" (around 58µs) nor a "0" (from 100µs) being recognized.
Error source 3: "mismatch" errors � meaning the first half of the bit was recognized as "half 1", the second half was recognized as "half 0" (or vice versa).

Test series at Tams Elektronik

Tams Elektronik

1. DCC evaluation with edge evaluation on both sides, XOR check activated. (Regular operation of the decoder)
2. DCC evaluation with edge evaluation on both sides, XOR check deactivated.
3. DCC evaluation with one-sided edge evaluation, XOR check activated.
4. DCC evaluation with one-sided edge evaluation, XOR check deactivated.

Results:

Case 1. (both edges + XOR): Down to a reception quality of 10%, a locomotive could still be controlled manually in an acceptable manner.
Case 2. No difference to case 1. The DCC messages were already discarded before the XOR evaluation.
Case 3. Below a reception quality of 25% there were sporadic malfunctions, messages were passed on to the evaluation despite XOR.
Case 4. Hardly any reasonable operation possible.

OpenDCC

DCC Data Transmission, Considerations on Bit Errors

Content

1. Introduction

2. Error sources

3. Receiving technologies

4. Usage of Check Codes

Initial situation

Theoretical consideration

Implementation

Literature

5. Practical experiments / measurements

6. Official tests

7. Summary