3.2.3.4 String Tokenization3.2.3.3 String Conversion3.2.3 Text Input3.2.4 Text Output

3.2.3.4 String Tokenization

Frequently a file consists of lines such as

  Wolfgang 28 81.5

that contain textual representations of multiple data objects separated by delimiting characters (such as the blank character ' '). To separate the individual objects of this line, we have to use an object of class StringTokenizer:  

   public class StringTokenizer implements Enumeration
   {
     public StringTokenizer(String s);
     public StringTokenizer(String s, String d);
     public boolean hasMoreTokens();
     public String nextToken();
   }

The string tokenizer may be used as in the following example:

  StringTokenizer st = new StringTokenizer("Wolfgang 28 81.5");
  while (st.hasMoreTokens())
  {
    String s = st.nextToken();
    System.out.println(s);
  }

The task of the string tokenizer is to separate the individual  tokens of a string, i.e., those parts that are separated by the delimiting characters. By default, the set of delimiting characters is denoted by the string " \t\n\r\f", i.e., it consists of the the space character, the tabulator character, the newline character, the carriage-return character, and the form-feed character. However, the string tokenizer can be also created by a constructor call to which a string that denotes the set of delimiting characters is explicitly passed as an argument.  

As long as hasMoreTokens returns true, a call to nextToken returns the next token in the sequence (otherwise, nextToken throws the runtime exception NoSuchElementException). Above program therefore gives output

   Wolfgang
   28
   81.5

As shown above, each token can then be converted to another Java datatype, e.g. 28 to int and 81.5 to double.

We complete our discussion on text input by the following example.

Example  We write a method processFile which processes a text file that represents the daily turnover of a shop. Each line of the file is of the form
   units price

where units denotes a non-negative integer number and price denotes a floating point number. The line says that some good has been sold units times at the denoted price. The method shall return the total turnover of the shop, e.g., given the file

  2 1.5
  1 3.9
  3 1

the method shall return the floating point value 9.9 = 2*1.5 + 1*3.9 + 3*1. If some error occurs during processing the file, a corresponding message is printed and a -1 is returned. The complete code of the method is given below:

   static double processFile(String name)
   {
     BufferedReader in = null;
     try
     {
       FileReader file = new FileReader(name);
       in = new BufferedReader(file);
       double turnover = 0;
       while (true)
       {
         String line = in.readLine();
         if (line == null) return turnover;
         StringTokenizer t = new StringTokenizer(line);
         int units = Integer.parseInt(t.nextToken());
         double price = Double.parseDouble(t.nextToken());
         turnover += units*price;
       }
     }
     catch(FileNotFoundException e)
     {
       System.err.println("File " + name + "could not be opened.");
     }
     catch(IOException e)
     {
       System.err.println(e);
     }
     catch(NoSuchElementException e)
     {
       System.err.println("Error in tokenizing line: " + line);
     }
     catch(NumberFormatException e)
     {
       System.err.println("Error in parsing line: " + line);
     }
     finally
     {
       try
       {
         if (in != null) in.close();
       }
       catch(IOException e)
       { }
     }
     return -1;
   }

Please note how by the use of exceptions the normal flow of control is clearly separated from the handling of the various kinds of errors that may occur.

 

Please also note that we do not call the method hasMoreTokens before we invoke the function nextToken; we rather catch the runtime exception NoSuchElementException raised if the expected token does not exist and thus terminate the try block. Please remember that the finally clause is executed in any case when the try block is left; also before the program returns the turnover to the caller, the input file in is thus closed. An exception may have been raised when in has still value null, therefore it is necessary to check for this condition before we attempt to close in.


© Wolfgang Schreiner; February 3, 2005

3.2.3.4 String Tokenization3.2.3.3 String Conversion3.2.3 Text Input3.2.4 Text Output