Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8264765

BreakIterator sees bogus sentence boundary in parenthesized “i.e.” phrase

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b18
    • Verification:
      Not verified

      Description

      ADDITIONAL SYSTEM INFORMATION :
      java version "1.8.0_112"
      Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
      Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

      But this has also be reproduced on newer JDK versions e.g., 14.

      A DESCRIPTION OF THE PROBLEM :
      When a sentence contains text like "blah blah (i.e., blah blah), blah blah" the BreakIterator.getSentenceInstance() incorrectly detects a break after the "i.e" and before the "., blah blah)", but this is not actually a sentence boundary.

      FWIW, Stack Overflow discussion here: https://stackoverflow.com/q/66933006/263801

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Run the test case program below.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      bi.preceding(30) returned -1
      first sentence: "Due to a problem (e.g., software bug), the server is down."

      ACTUAL -
      bi.preceding(30) returned 21
      first sentence: "Due to a problem (e.g"

      ---------- BEGIN SOURCE ----------
      import java.text.BreakIterator;
      import java.util.Locale;
      public class BreakIteratorTest {
          public static void main(String[] args) throws Exception {
              String text = "Due to a problem (e.g., software bug), the server is down.";
              BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US);
              bi.setText(text);
              int r = bi.preceding(30);
              System.out.println("bi.preceding(30) returned " + r);
              String sentence = r == BreakIterator.DONE ? text : text.substring(0, r);
              System.out.println("first sentence: \"" + sentence + "\"");
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      None known

      FREQUENCY : always


        Attachments

          Issue Links

            Activity

              People

              Assignee:
              naoto Naoto Sato
              Reporter:
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: