2.4 - Whitespace
Whitespace characters are special characters used to represent vertical and horizontal spacing. The most common types are spaces, tabs, and newlines. There are other whitespace types that are rarely used, such as form-feeds, zero-width spaces or vertical tabs - but we will ignore these.
Horizontal whitespace adds spacing from left to right, where vertical whitespace adds spacing from top to bottom.
Spaces
A space is the most common type of whitespace. It is a
single width horizontal character, created with the spacebar. An
interesting note is that spaces have an ASCII value 32
, or
a hexadecimal value of 0x20
. (This is why some web URLs
have %20
, to encode a space.)
Tabs
A tab is a the other main type of horizontal whitespace, created with the TAB key. The tab-width depends on the application rendering it. A typical tab-width is 4 spaces, so pressing TAB will go to the next multiple of 4. If you are at position 3, tab moves to 4. A tab at position 4 moves to position 8. Some less common tab-widths are 2, 3, and 8.
Newlines
Newlines, also referred to as a line feed, create a single line of vertical whitespace. Breaking the text at the current line, and continuing at the start of the next. You can create newlines with the ENTER key.
Carriage Returns
A carriage return is a special control character to move the cursor to the beginning of a line. Anything typed after it will replace the contents of the current-line, unless it is a newline character. While it is technically considered whitespace, it does not have any visual representation. Carriage returns are usually used in combination with line-feeds on windows systems.
Java Escape Sequences
You can use whitespaces inside Java Strings. Some whitespaces must be escaped using special escape sequences that begin with a backslash. Others can be directly typed in.
A space is typed using the spacebar.
A tab is typed
with the TAB key, or using the \t
escape sequence.
A newline can only be created with the \n
escape
sequence for Strings. In %chapter #%, we will learn about text blocks to
directly type in newlines without \n
.
A carriage return can only be
created with the \r
escape sequence.
Examples of each are shown below.
public static void main(String[] args) { System.out.println("Hello World"); // Space System.out.println("Hello\tWorld"); // Tab System.out.println("Hello\nWorld"); // Newline System.out.println("Hello\rWorld"); // Carriage Return System.out.println("Hello\r\nWorld"); // CRLF (carriage-return line-feed) }
Try running the previous code, commenting off all but one line at a time
to see the different outputs. When you run the
Carriage return line, you should only see World
in
the output. As previously mentioned, carriage returns move the cursor to
the beginning of the line, and replaces the contents (unless
the next character is \n
). We will rarely use this fact,
but it is nice to be aware of it.
Line Endings: Windows vs. UNIX
Line endings are used to create a single line of whitespace, but they
are not always a simple \n
. On UNIX systems, newlines
are just a single \n
. But, Windows systems use
\r\n
instead. This is sometimes abbreviated as
CRLF. Sometimes be an issue when using files across different
operating systems, but versioning tools like git can seamlessly
convert between them.
Java provides a way to get the correct line ending for the current
system with System.lineSeparator()
. This returns
\r\n
for Windows, and \n
for UNIX systems.
Some methods, such as System.out.println()
, will
automatically use the correct line separator. Most console apps should
work fine with \n
(even on Windows), but text files can
cause problems.
print
vs. println
Java provides a few methods to print to the standard output. So far, we
have only used System.out.println()
. Now that we know about
whitespaces, we will introduce System.out.print()
. This
acts the same way as System.out.println()
, except it
does not add a line ending to the output. Recall that
println
stands for print line. Let's see some
examples.
// Print "hello" and "world" with a line ending between System.out.println("hello"); System.out.println("world"); // Use multiple 'print' methods to effectively print "hello world" System.out.print("he"); System.out.print("llo"); System.out.print(" world"); // Use combinations of both System.out.print("hello "); System.out.println("world"); System.out.println("Exiting...");
Most of the time, println()
is what we want. But, sometimes
we will use print()
to keep the cursor on the same line,
such as prompting a user for input (starting in %chapter #%).
Whitespace Significance
Generally speaking, whitespace does not matter in Java, but there are some places that do. Some places require at least one whitespace (vertical or horizontal). This is usually between tokens. An egregious example hello world program could be
// This works... DONT do this. public class HorribleExample { public static void main ( String [] args ) { System . out . println ( "hello world" ) ; } }
but this is ugly and very difficult to read. So we will use a standard format like we have already been doing.
Something we can't do is
publicstaticvoidmain
. It is not obvious where one token
starts and the next begins, so the Java compiler fails to parse this.
One area that whitespace can cause problems is with Strings. Whitespace
does matter for Strings, as that is part of the String.
"hello world"
is different from
"helloworld"
. And as we mentioned with escape
sequences, you have to escape newlines in Strings. This means that the following
example will not work:
System.out.println(" 1. one 2. two 3. three ");
This is because String literals must begin and end on the same line.
Instead we can use the newline escape sequences with
System.out.println("1. one\n2. two\n3. three");
,
or wait until we learn how to use text blocks.
Whitespace also can matter for comments. Consider a single-line comment like:
// This is part of the comment but this is not, and is invalid System.out.println("hello world");
Because single line comments only apply to a single line, newlines end the comment. The next line is assumed to be code, unless it begins another comment. Thus they carry significance here.
Tabs vs. Spaces
While we are on the topic of whitespace, I will quickly bring up a historical debate. Should you use tabs or spaces (for indenting your code)? This is left as an exercise for the reader. This is mostly an inside joke for programmers, though some people take the debate seriously. IDEs can be configured to use spaces/tabs interchangeably, replacing the opposite type based on tab-width.