2.4 - Whitespace

Whitespace characters are special characters used to represent vertical and horizontal spacing. The most common types are spaces, tabs, and newlines. There are other whitespace types that are rarely used, such as form-feeds, zero-width spaces or vertical tabs - but we will ignore these.

Horizontal whitespace adds spacing from left to right, where vertical whitespace adds spacing from top to bottom.

Spaces

A space is the most common type of whitespace. It is a single width horizontal character, created with the spacebar. An interesting note is that spaces have an ASCII value 32, or a hexadecimal value of 0x20. (This is why some web URLs have %20, to encode a space.)

Tabs

A tab is a the other main type of horizontal whitespace, created with the TAB key. The tab-width depends on the application rendering it. A typical tab-width is 4 spaces, so pressing TAB will go to the next multiple of 4. If you are at position 3, tab moves to 4. A tab at position 4 moves to position 8. Some less common tab-widths are 2, 3, and 8.

Newlines

Newlines, also referred to as a line feed, create a single line of vertical whitespace. Breaking the text at the current line, and continuing at the start of the next. You can create newlines with the ENTER key.

Carriage Returns

A carriage return is a special control character to move the cursor to the beginning of a line. Anything typed after it will replace the contents of the current-line, unless it is a newline character. While it is technically considered whitespace, it does not have any visual representation. Carriage returns are usually used in combination with line-feeds on windows systems.

Java Escape Sequences

You can use whitespaces inside Java Strings. Some whitespaces must be escaped using special escape sequences that begin with a backslash. Others can be directly typed in.

A space is typed using the spacebar.

A tab is typed with the TAB key, or using the \t escape sequence.

A newline can only be created with the \n escape sequence for Strings. In %chapter #%, we will learn about text blocks to directly type in newlines without \n.

A carriage return can only be created with the \r escape sequence.

Examples of each are shown below.

public static void main(String[] args) {
    System.out.println("Hello World"); // Space
    System.out.println("Hello\tWorld"); // Tab
    System.out.println("Hello\nWorld"); // Newline
    System.out.println("Hello\rWorld"); // Carriage Return
    System.out.println("Hello\r\nWorld"); // CRLF (carriage-return line-feed)
}

Try running the previous code, commenting off all but one line at a time to see the different outputs. When you run the Carriage return line, you should only see World in the output. As previously mentioned, carriage returns move the cursor to the beginning of the line, and replaces the contents (unless the next character is \n). We will rarely use this fact, but it is nice to be aware of it.

Line Endings: Windows vs. UNIX

Line endings are used to create a single line of whitespace, but they are not always a simple \n. On UNIX systems, newlines are just a single \n. But, Windows systems use \r\n instead. This is sometimes abbreviated as CRLF. Sometimes be an issue when using files across different operating systems, but versioning tools like git can seamlessly convert between them.

Java provides a way to get the correct line ending for the current system with System.lineSeparator(). This returns \r\n for Windows, and \n for UNIX systems. Some methods, such as System.out.println(), will automatically use the correct line separator. Most console apps should work fine with \n (even on Windows), but text files can cause problems.

print vs. println

Java provides a few methods to print to the standard output. So far, we have only used System.out.println(). Now that we know about whitespaces, we will introduce System.out.print(). This acts the same way as System.out.println(), except it does not add a line ending to the output. Recall that println stands for print line. Let's see some examples.

// Print "hello" and "world" with a line ending between
System.out.println("hello");
System.out.println("world");

// Use multiple 'print' methods to effectively print "hello world"
System.out.print("he");
System.out.print("llo");
System.out.print(" world");

// Use combinations of both
System.out.print("hello ");
System.out.println("world");
System.out.println("Exiting...");

Most of the time, println() is what we want. But, sometimes we will use print() to keep the cursor on the same line, such as prompting a user for input (starting in %chapter #%).

Whitespace Significance

Generally speaking, whitespace does not matter in Java, but there are some places that do. Some places require at least one whitespace (vertical or horizontal). This is usually between tokens. An egregious example hello world program could be

// This works... DONT do this.
 public     class
   HorribleExample                   {
      public        static
void     main    (    String  []    args
  )
 {         System  .
      out    . println  (   "hello world"
      )
   ;
      }             }

but this is ugly and very difficult to read. So we will use a standard format like we have already been doing.

Something we can't do is publicstaticvoidmain. It is not obvious where one token starts and the next begins, so the Java compiler fails to parse this.

One area that whitespace can cause problems is with Strings. Whitespace does matter for Strings, as that is part of the String. "hello world" is different from "helloworld". And as we mentioned with escape sequences, you have to escape newlines in Strings. This means that the following example will not work:

System.out.println(" 
    1. one
    2. two
    3. three
");

This is because String literals must begin and end on the same line. Instead we can use the newline escape sequences with System.out.println("1. one\n2. two\n3. three");, or wait until we learn how to use text blocks.

Whitespace also can matter for comments. Consider a single-line comment like:

// This is part of the comment
but this is not, and is invalid
System.out.println("hello world");

Because single line comments only apply to a single line, newlines end the comment. The next line is assumed to be code, unless it begins another comment. Thus they carry significance here.

Tabs vs. Spaces

While we are on the topic of whitespace, I will quickly bring up a historical debate. Should you use tabs or spaces (for indenting your code)? This is left as an exercise for the reader. This is mostly an inside joke for programmers, though some people take the debate seriously. IDEs can be configured to use spaces/tabs interchangeably, replacing the opposite type based on tab-width.