-
Type:
Bug
-
Status: Resolved
-
Priority:
P4
-
Resolution: Not an Issue
-
Affects Version/s: 8, 11, 12, 13
-
Fix Version/s: None
-
Component/s: core-libs
-
Labels:
-
Subcomponent:
-
CPU:x86_64
-
OS:linux
A DESCRIPTION OF THE PROBLEM :
The code for Charset.defaultCharset() is written in a way that if it is unable to find file.encoding in the vm params it initialises defaultCharset to UTF-8. However the else statement here is actually dead code if you consider the vm holistically, The reason i am stating this is that if you don't pass file.encoding param to the vm it tries to infer the value based on LC_ALL, LANG, LC_CTYPE and even if the are not set the file.encoding gets initialised to US_ASCII. So there is actually a contradiction in these two processes i.e. the initialisation of file.encoding and Charset.defaultCharset() code, while one is giving signal that the encoding default should be UTF-8 the other is making it to US_ASCII
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Remove the environment variables LC_ALL, LANG, LC_CTYPE from your shell.
2. Write a code in java to invoke Charset.defaultCharset() and print result.
3. Invoke the code without specifying file.encoding param.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The result will be US_ASCII
ACTUAL -
The actual result should be UTF-8 or the code in Charset.defaultCharset() should be changed to US_ASCII too to make it consistent.
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to pass -Dfile.encoding=UTF-8 so that it matches with the expected default in Charset.defaultCharset()
The code for Charset.defaultCharset() is written in a way that if it is unable to find file.encoding in the vm params it initialises defaultCharset to UTF-8. However the else statement here is actually dead code if you consider the vm holistically, The reason i am stating this is that if you don't pass file.encoding param to the vm it tries to infer the value based on LC_ALL, LANG, LC_CTYPE and even if the are not set the file.encoding gets initialised to US_ASCII. So there is actually a contradiction in these two processes i.e. the initialisation of file.encoding and Charset.defaultCharset() code, while one is giving signal that the encoding default should be UTF-8 the other is making it to US_ASCII
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Remove the environment variables LC_ALL, LANG, LC_CTYPE from your shell.
2. Write a code in java to invoke Charset.defaultCharset() and print result.
3. Invoke the code without specifying file.encoding param.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
The result will be US_ASCII
ACTUAL -
The actual result should be UTF-8 or the code in Charset.defaultCharset() should be changed to US_ASCII too to make it consistent.
CUSTOMER SUBMITTED WORKAROUND :
The workaround is to pass -Dfile.encoding=UTF-8 so that it matches with the expected default in Charset.defaultCharset()