Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8233385

Align some one-way conversion in MS950 charset with Windows

    Details

    • Type: CSR
    • Status: Closed
    • Priority: P3
    • Resolution: Approved
    • Fix Version/s: 15
    • Component/s: core-libs
    • Labels:
      None
    • Subcomponent:
    • Compatibility Kind:
      behavioral
    • Compatibility Risk:
      low
    • Compatibility Risk Description:
      Hide
      Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.
      Show
      Applications that expect the existing mapping for those one-way conversion code points will not work. Since these are uncommon box character code points, the risk is low. It should be best avoided to introduce a property to switch back to the old behavior.
    • Interface Kind:
      Java API
    • Scope:
      Implementation

      Description

      Summary

      MS950 charset encoder behaves differently as defined in the Traditional Chinese Windows specification

      Problem

      Windows code page 950 has some n:1 byte-to-char mappings for certain code points. In JDK's MS950 charset, there are 4 char-to-byte mappings differ from Traditional Chinese Windows.
      (Actual issue was in https://bugs.openjdk.java.net/browse/JDK-8232161)

      Solution

      I recommend that following 4 char-to-byte mappings need to change.

      Before:

      \u2550 -> \xA2\xA4
      \u255E -> \xA2\xA5
      \u2561 -> \xA2\xA7
      \u256A -> \xA2\xA7
      

      After:

      \u2550 -> \xF9\xF9
      \u255E -> \xF9\xE9
      \u2561 -> \xF9\xEB
      \u256A -> \xF9\xEA
      


      Definition:
      Traditional Chinese Windows conversion table is:
      https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
      Newer MS950 definition is:
      https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit950.txt

      \u2550, \u255E, \u2561 and \u256A are in BOX DRAWINGS Unicode range.
      (See attached 4Chras.png for font glyphs)

      Specification

      N/A

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                itakiguchi Ichiroh Takiguchi
                Reporter:
                itakiguchi Ichiroh Takiguchi
                Reviewed By:
                Naoto Sato, Roger Riggs
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: