Newline

As in so many other things, the newline (or line break or end-of-line or EOL or however you call it) is something we couldn’t agree on from the beginning so we ended up having a lot of different flavors of the same thing.

The idea is simple: the newline character or group of characters say that the very next character after it should appear on a new line, immediately following the current line. The problem is that the character(s) that represent a newline vary widely across operating systems and even different versions of the same system.

The most common forms use one or two characters to encode a newline and among these the best known version is the ASCII one (or ones, as different systems based on ASCII use different versions).
These ASCII flavors use one or both of these two characters:

  • CR (carriage return, 0X0D, usually expressed as ‘r’)
  • LF (line feed, 0X0A, usually expressed as ‘n’ in programming languages)

Example of systems that use these are:

  • CR – older versions of Mac OS
  • LF – Unix, GNU/Linux, FreeBSD, Mac OSX
  • CR+LF – Windows

If you’re using Unicode, there are also Unicode versions of these:

  • CR – U+000D
  • LF – U+000A
  • CR+LF – U+000D U+000A

OK, so why should we care about all these different notations for the same thing? If we’re developing for a single platform, probably we don’t need to care much. But seeing how the Internet becomes one big computer, the situations where you develop for one system and can be absolutely sure you will not interact with anybody else become more and more rare.

So why don’t we care if we’re developing for a single platform? Because the good people who worked on the C standard thought of this. C provides two escape sequences that represent the two codes from above. These are ‘n’ (newline) and ‘r’ (carriage return). The probably unexpected thing about these is that they’re not required to conform to the ASCII values. The only things required by the standard are:

  • each of these has a unique value that fits inside a char, but the actual value is implementation defined;
  • when writing to a text file, the newline character (‘n’) is transformed transparently to the system’s character (or character group) for newline.

What this last point means is that if you take the same piece of code that writes to a text file separating lines by ‘n’ and compile and run it on Windows and Linux for example, the two output files will be different. On Windows you will get CR+LF and on Linux just LF separating the lines.
This implies that if you’re not careful when reading such files and write code that depends on the actual character values of the newline you will run into trouble when moving files from one system to another.

Leave a Reply

Your email address will not be published. Required fields are marked *