New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21774][SQL] The rule PromoteStrings should cast a string to double type when compare with a int/long #18986
Conversation
@@ -127,8 +127,10 @@ object TypeCoercion { | |||
case (DateType, TimestampType) => Some(StringType) | |||
case (StringType, NullType) => Some(StringType) | |||
case (NullType, StringType) => Some(StringType) | |||
case (StringType, IntegerType) => Some(DoubleType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How abount other types (e.g., LongType)? Is the similar issue still exists?
I think the logic in pr-15880 (#15880) is reasonable.
ok to test |
@stanzhai Implicit type casting has to follow some rules. If you think this is a bug, could you do a complete investigation about this issue? For example, how Hive handles the type promotion? If needed, we can consider supporting different type promotion/casting modes. (e.g., one is Hive compatible) |
Test build #80849 has finished for PR 18986 at commit
|
In MySQL conversion of values from one string type to numeric, will be compared as floating-point (real) numbers. https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html The following rules describe how conversion occurs for comparison operations:
|
Test build #80866 has finished for PR 18986 at commit
|
Test build #80868 has finished for PR 18986 at commit
|
@gatorsmile @maropu Cast to
|
In Hive, |
Either way, we couldn't touch on this behaviour until the next major release? (It seems we've already fixed this policy for the 2.x releases in #15880... maybe). Btw, I feel we first should document this implicit casting behaviour somewhere like https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions cuz this makes users much confused. |
@maropu Previously I did it in https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala#L34-L59 We can follow what MySQL did https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html. It looks more user friendly. |
Yea, since this topic is important for some users, I mean we better move the doc into |
@gatorsmile @wangyum For this issue, I think the behevior in PromoteStrings rule is reasonable, but there are problems in underlying converter UTF8String. As described in PR-15880 (#15880):
However, the underlying UTF8String still returns true for cases that are not castable. From the below code, we can get res=true and wrapper.value=0. Consequently, it leads to the wrong answer in SPARK-21774.
|
@DonnyZone @gatorsmile @cloud-fan PostgreSQL will throw an error when comparing a string to a int.
|
@gatorsmile @DonnyZone When comparing a string to a int in Hive, it will cast string type to double.
So, I think that cast a string to double type when compare with a numeric is more reasonable. Actually, my usage scenarios are for Spark compatibility. The problem I found when I upgraded Spark to 2.2.0, and lots of SQL's results are wrong. |
2.4.0 has the same problem without this patch. For example, |
I don't think there is a perfect solution here. When comparing a string and a number, you have no idea which type we should cast the string to, without looking into the string content. For this specific case: |
Can one of the admins verify this patch? |
any update for this issue right now? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
The rule PromoteStrings should cast a string to double type when compare with a int.
This PR fixed this.
How was this patch tested?
Origin test cases updated.