Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression failure for certain strings #8711

Closed
wenleix opened this issue Aug 9, 2017 · 6 comments · Fixed by #12406
Closed

Regular expression failure for certain strings #8711

wenleix opened this issue Aug 9, 2017 · 6 comments · Fixed by #12406
Labels

Comments

@wenleix
Copy link
Contributor

wenleix commented Aug 9, 2017

presto> select REGEXP_EXTRACT('Baby K', 'by ([A-Z].*)\b[a-z]');


Query 20170809_223707_70717_ucc54 failed: 6
java.lang.ArrayIndexOutOfBoundsException: 6
        at io.airlift.jcodings.MultiByteEncoding.safeLengthForUptoFour(MultiByteEncoding.java:64)
        at io.airlift.jcodings.specific.NonStrictUTF8Encoding.length(NonStrictUTF8Encoding.java:30)
        at io.airlift.jcodings.specific.BaseUTF8Encoding.mbcToCode(BaseUTF8Encoding.java:91)
        at io.airlift.jcodings.specific.NonStrictUTF8Encoding.mbcToCode(NonStrictUTF8Encoding.java:22)
        at io.airlift.jcodings.Encoding.isMbcWord(Encoding.java:469)
        at io.airlift.joni.ByteCodeMachine.opWordBound(ByteCodeMachine.java:1063)
        at io.airlift.joni.ByteCodeMachine.matchAt(ByteCodeMachine.java:239)
        at io.airlift.joni.Matcher.matchCheck(Matcher.java:304)
        at io.airlift.joni.Matcher.searchInterruptible(Matcher.java:457)
        at io.airlift.joni.Matcher.search(Matcher.java:318)
        at com.facebook.presto.operator.scalar.JoniRegexpFunctions.regexpExtract(JoniRegexpFunctions.java:256)
        at com.facebook.presto.operator.scalar.JoniRegexpFunctions.regexpExtract(JoniRegexpFunctions.java:242)
        at com.facebook.presto.$gen.PageProjection_20170809_223707_70717_ucc54_0_71_270090.project(Unknown Source)
        at com.facebook.presto.$gen.PageProjection_20170809_223707_70717_ucc54_0_71_270090.project(Unknown Source)
        at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.processBatch(PageProcessor.java:186)
        at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.computeNext(PageProcessor.java:132)
        at com.facebook.presto.operator.project.PageProcessor$PositionsPageProcessorIterator.computeNext(PageProcessor.java:106)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
        at com.facebook.presto.operator.project.PageProcessorOutput.hasNext(PageProcessorOutput.java:51)
        at com.facebook.presto.operator.FilterAndProjectOperator.isFinished(FilterAndProjectOperator.java:71)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:297)
        at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:538)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:229)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:623)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
        at com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
@wenleix wenleix added the bug label Aug 9, 2017
@satybald
Copy link
Contributor

satybald commented Aug 10, 2017

Looks like this bug is caused by regex library that presto is using airlift/joni. There's already issue opened by @haozhun
jruby/joni#28

@haozhun
Copy link
Contributor

haozhun commented Dec 6, 2017

This was fixed in joni 2.1.7. Latest is joni 2.1.13. We are currently on joni 2.1.5.

@electrum How do we update given we have a fork?

@lopex
Copy link

lopex commented Dec 6, 2017

Is there a reason you're using NonStrictUTF8Encoding ? It's been introduced as a non validating version for old ruby regexp engine compatibility. We're going to remove it from jruby/joni repo as it's no longer supported.

@haozhun
Copy link
Contributor

haozhun commented Dec 6, 2017

@lopex Thank you for your attention. We are on non-strict because of jruby/joni#17

@lopex
Copy link

lopex commented Dec 6, 2017

@haozhun ah totally forgot about that one.

@electrum
Copy link
Contributor

Fixed in trinodb/trino#350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants