Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8241931

DatabaseMetaData.getExtraNameCharacters() is incorrectly architected

    XMLWordPrintable

    Details

      Description

      A DESCRIPTION OF THE PROBLEM :
      The getExtraNameCharacters() method of DatabaseMetaData is currently defined as:

      > Retrieves all the "extra" characters that can be used in unquoted identifier names (those beyond a-z, A-Z, 0-9 and _).

      That is not exactly how identifiers work, at least not for any system that follows the Unicode specification for identifiers. According to "Unicode® Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX" ( https://www.unicode.org/reports/tr31/#Default_Identifier_Syntax ) there are two categories of characters: Start and Continue.

      * "Start" characters (having the Unicode properties of id_start / xid_start) are valid in any position.
      * "Continue" characters (having the Unicode properties of id_continue / xid_continue) are valid in any position _except_ the first character.

      So, all "start" characters are also "continue" characters, but _not_ all "continue" characters are "start" characters. For example, in Microsoft SQL Server (and likely some others), it is valid to have decimal digits 0-9 in the name, _but_ the name cannot _start_ with a digit. In the following example, the first statement succeeds but the second statement fails:

          USE [tempdb];
          CREATE TABLE dbo.A1 (col INT);
          GO
          CREATE TABLE dbo.1A (col INT);
          GO
          /*
          Msg 102, Level 15, State 1, Line XXXXX
          Incorrect syntax near '.1'.
          */

      This is an important distinction because in its current form, getExtraNameCharacters() returns potentially incorrect information. For example, in SQL Server this method ( https://github.com/microsoft/mssql-jdbc/blob/dev/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDatabaseMetaData.java ) returns:

      $#@

      If someone is relying on this method to get those "extra" characters, it certainly appears to be saying that "$" is valid. Well, the following example works like the first example in that the first statement succeeds while the second statement fails:

          USE [tempdb];
          CREATE TABLE dbo.Bob$ (col INT);
          GO
          CREATE TABLE dbo.$Bob (col INT);
          GO
          /*
          Msg 102, Level 15, State 1, Line XXXXX
          Incorrect syntax near '$Bob'.
          */

      Hence, the getExtraNameCharacters method needs to be replaced with two methods:

      * getExtraNameStartCharacters
      * getExtraNameContinueCharacters


      To see the full list of characters that SQL Server should be returning for each of those two methods, please see:

      https://sqlquantumleap.com/reference/completely-complete-list-of-valid-t-sql-identifier-characters/


      For systems that do not follow the Unicode specification and only have a single set of characters, those two functions could simply return the same value.


      Take care,
      Solomon...
      https://SqlQuantumLift.com/
      https://SqlQuantumLeap.com/
      https://SQLsharp.com/



      FREQUENCY : always


        Attachments

          Activity

            People

            Assignee:
            lancea Lance Andersen
            Reporter:
            webbuggrp Webbug Group
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: