Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4899439

File uses strings for names but file names are byte arrays on OS

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P4
    • Resolution: Duplicate
    • Affects Version/s: 1.4.2, 6, 6u3
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:
    • Subcomponent:
    • CPU:
      generic, x86, sparc
    • OS:
      linux, solaris_9, solaris_nevada

      Description

      Name: rmT116609 Date: 07/31/2003


      FULL PRODUCT VERSION :
      java version "1.4.2"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-b28)
      Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode)

      FULL OS VERSION :
      Any Solaris or Unix

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      This can happen on a Japanese machine, where the locale is "ja" but can probably happen on any locale.

      A DESCRIPTION OF THE PROBLEM :
      You can create a file that Java's java.io.File class cannot read. This is because file names are actually byte-arrays in the os but java.io.File takes a String for a file name (which is composed of Unicode characters).

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      OK, create.c is a program that will create a file whose name is not a character in the 'ja' locale. Note that the OS has no problem with this.

      Lister.java defines a class that lists files in the current directory. For each file, it spits out the (a) 'toString()' version of the file, (b) the char array of the name as hex, and (c) the 'getBytes' byte array of the name.

      So, what you can do is compile and run create.c, which will create a file whose name is a single byte whose hex value is 99. Then compile and run Lister.java, which will give you the following output (shown for two different locales):

      ---------------------------------------------
      $ export LANG=
      $ java Lister
      name:M-^OÀ»; chars:99,; bytes:99,

      $ export LANG=ja
      $ java Lister
      name:?; chars:fffd,; bytes:3f,
      ---------------------------------------------

      Note that when running in the JA locale, there is no character corresponding to byte value 0x99! So, Java uses the replacement character 0xFFFD, and the '?' character 0x3F, as a replacement. Of course, you don't know what characters make up a file name so you can't just swtich character sets arbitrarily when trying to load files using java.io.File.

      The point is that there are files which Java cannot uniquely represent as a straight String. I suppose we could get the filename via JNI, do the conversion ourselves, and then use the private-use area of Unicode to encode all our strings, but Ugh!

      //--------------------------------------------------------
      // create.c
      //--------------------------------------------------------

      #include <stdio.h>

      int main()
      {
              const char* name = "\x99";
              FILE* file = fopen( name, "w" );
              if( file == NULL )
              {
                      printf( "could not open file %s\n", name );
                      return 1;
              }

              fclose( file );
              return 0;
      }

      //--------------------------------------------------------
      // Lister.java
      //--------------------------------------------------------

      import java.io.*;

      public class Lister
      {
          public static void main( String[] args )
          {
              new Lister().run();
          }

          public void run()
          {
              try
              {
                  doRun();
              }
              catch( Exception e )
              {
                  System.out.println( "Encountered exception: " + e );
              }
          }

          private void doRun() throws Exception
          {
              File cwd = new File( "." );
              String[] children = cwd.list();
              for( int i = 0; i < children.length; ++i )
              {
                  printName( children[ i ] );
              }
          }
          
          private void printName( String s )
          {
              System.out.print( "name:" );
              System.out.print( s );
          
              System.out.print( "; chars:" );
              printCharsAsHex( s );
          
              System.out.print( "; bytes:" );
              printBytesAsHex( s );
          
              System.out.println();
          }

          private void printCharsAsHex( String s )
          {
              for( int i = 0; i < s.length(); ++i )
              {
                  char ch = s.charAt( i );
          
                  System.out.print( Integer.toHexString( ch ) + "," );
              }
          }

          private void printBytesAsHex( String s )
          {
              byte[] bytes = s.getBytes();
              for( int i = 0; i < bytes.length; ++i )
              {
                  byte b = bytes[ i ];
                  
                  System.out.print( Integer.toHexString( unsignedExtension( b ) ) + "," );
              }
          }

          private int unsignedExtension( byte b )
          {
              return (int)b & 0xFF;
          }
      }


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Being able to read a file and then use the name associated with the file to reopen the file.
      ACTUAL -
      ---------------------------------------------
      $ export LANG=
      $ java Lister
      name:M-^OÀ»; chars:99,; bytes:99,

      $ export LANG=ja
      $ java Lister
      name:?; chars:fffd,; bytes:3f,
      ---------------------------------------------


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      //--------------------------------------------------------
      // create.c
      //--------------------------------------------------------

      #include <stdio.h>

      int main()
      {
              const char* name = "\x99";
              FILE* file = fopen( name, "w" );
              if( file == NULL )
              {
                      printf( "could not open file %s\n", name );
                      return 1;
              }

              fclose( file );
              return 0;
      }

      //--------------------------------------------------------
      // Lister.java
      //--------------------------------------------------------

      import java.io.*;

      public class Lister
      {
          public static void main( String[] args )
          {
              new Lister().run();
          }

          public void run()
          {
              try
              {
                  doRun();
              }
              catch( Exception e )
              {
                  System.out.println( "Encountered exception: " + e );
              }
          }

          private void doRun() throws Exception
          {
              File cwd = new File( "." );
              String[] children = cwd.list();
              for( int i = 0; i < children.length; ++i )
              {
                  printName( children[ i ] );
              }
          }
          
          private void printName( String s )
          {
              System.out.print( "name:" );
              System.out.print( s );
          
              System.out.print( "; chars:" );
              printCharsAsHex( s );
          
              System.out.print( "; bytes:" );
              printBytesAsHex( s );
          
              System.out.println();
          }

          private void printCharsAsHex( String s )
          {
              for( int i = 0; i < s.length(); ++i )
              {
                  char ch = s.charAt( i );
          
                  System.out.print( Integer.toHexString( ch ) + "," );
              }
          }

          private void printBytesAsHex( String s )
          {
              byte[] bytes = s.getBytes();
              for( int i = 0; i < bytes.length; ++i )
              {
                  byte b = bytes[ i ];
                  
                  System.out.print( Integer.toHexString( unsignedExtension( b ) ) + "," );
              }
          }

          private int unsignedExtension( byte b )
          {
              return (int)b & 0xFF;
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      None that I am aware of. It seems that there should be a special object (or byte array) that java.io.File takes so that if you can read a file from a directory listing, that you can use the same name to Open the file.
      (Incident Review ID: 189047)
      ======================================================================

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alanb Alan Bateman
                Reporter:
                rmandalasunw Ranjith Mandala (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: