-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fixed round-trip bug of double.ToString("R"). #13140
Conversation
Removed the "15 digits precision first, then 17 digits precision" converting logic in double.ToString("R"). This change fixes https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double Fix #13106
@tannergooding @tarekgh @jkotas
I fixed a bug of previous submitting. Now the performance of positive infinity, negative infinity, NaN and zeros after change applied is the same as old one, whereas other cases are nearly 2x faster than the old one. Before change
After change
|
I'll add the test case to CoreFX after this change officially included. |
@dotnet-bot test Ubuntu x64 corefx_baseline |
@jkotas @joperezr @AlexGhiondea please have a look. note that, this is kind of breaking change but I think we should go with this change for net core as it is producing better results when formatting the floating numbers. @mazong1123 LGTM. Thanks again for getting this done too. |
|
||
DoubleToNumber(value, DOUBLE_PRECISION, &number); | ||
|
||
// Per IEEE double-precision floating-point format, double number in 17 digits precision should be round-trippable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to quote the exact IEEE statement here:
Conversions from a supported binary format bf to an external character sequence and back again results in a copy of the original number so long as there are at least Pmin(bf) significant digits specified and the rounding-direction attributes in effect during the two conversions are round to nearest rounding-direction attributes
With an additional note that, for binary64 (double), Pmin(binary64)
is 17. The algorithm given is Pmin(bf) = 1 + ceiling(p * log10(2))
where p
is the number of significant bits in bf
(this is 53 for binary64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll add these comments.
I have enabled the corefx on this PR to catch any failure for the tests written expecting the old results. |
Interesting, looks many tests failed because of that change. The majority of the failures are in the serialization and Xml cases but I am seeing interesting cases in ComponentModel converters too. We'll need to look carefully if there are any wrong cases. |
@tarekgh I think this is why it is called a breaking change :). One of the failed test case is:
The representation of To be clarified about the source path, In https://github.com/dotnet/corefx/blob/master/src/System.ComponentModel.TypeConverter/src/System/ComponentModel/DoubleConverter.cs#L47 return ((Double)value).ToString("R", formatInfo); I'm getting started to understand why we added the "15 digits precision first, then 17 digits precision" logic. So in my opinion, we should introduce another special format specifier (may be "S" => "Special for displaying") which has the old "R" logic, and use that new format specifier in https://github.com/dotnet/corefx/blob/master/src/System.ComponentModel.TypeConverter/src/System/ComponentModel/DoubleConverter.cs#L47 return ((Double)value).ToString("S", formatInfo); @jkotas @tarekgh @tannergooding any thoughts? |
If the breaking changes are enough of a concern, then we would need to expose a new format specifier for the 17 digit IEEE compliant version (perhaps 'R17') and keep 'R' producing the current 15 digits. Otherwise, users who like the "prettier" string can just use "G15" 😉 |
@tannergooding I think keep |
To be clear, I agree. However, this behavior has been this way for nearly 17 years (going back to NetFX 1.0, as best as I can tell), so changing it at this point may be more harmful than helpful. This is something that will likely have to go to through several compat/bar checks to determine:
Additionally, if we determine that the breaking-change is not worthwhile, we need to determine how a user can get an IEEE compliant form of 'R'. This change is already shown to break tests in CoreFX and could also introduce breaking changes in production code (such as Roslyn, various serialization or IPC mechanisms, etc). So we need to be careful, even if we think it is beneficial. |
@tannergooding sounds reasonable from compatibility's perspective. Now it's another question - do we really need this change if the compatibility is a concern? Because if we keep the |
@mazong1123 considering the test results I am worried more about the app compat. as @tannergooding mentioned this can break the consumers of coreclr.
Considering that, now I prefer to keep the current behavior and not change it. what do you think? |
@tarekgh, while 'R' currently works on |
Well, introducing a new specifier will be just aliasing to G17 anyway. we can fix the doc of "R" and "G" to be explicit. if you think a new specifier will help, I will not object it but it will be really a low priority as currently we support all possible scenarios. |
The point of the new specifier is that it would alias G17 for double and G9 for single (the user doesn't need to be "aware" of the internals, just that it works and will round trip appropriately).
I'll probably add a formal proposal sometime later tonight. |
This can still be addressed in the docs. Also using G17 all the time wouldn't work? |
It sounds like the compat break is going to be high. I try to keep the current behavior as-is and figure out a different way forward (i.e a new specifier) |
It would work for double. It would not work for BigInteger (as that can be arbitrary length) and you are requesting twice the number of digits for I agree that it isn't very high priority and is mostly a "nice-to-have". However, it is also something that should be trivial to add and could easily be picked up by the community (or myself one evening). |
Per discussion here, we should close this PR and wait for @tannergooding to provide the proposal for the new specifier and then we can discuss it with the design reviewers and proceed with the implementation. @mazong1123, thanks for trying that and looking forward you participate in the new specifier work. let me know if you have any concern before closing this PR. |
Why do you think so? Number of failing corefx test is not necessarily a good measure of the compat break.
I think that a new specifier for this would be very hard to explain. We would end up with two format specifiers that do the same thing, except that one of them is buggy. |
My concern is that the CoreFx tests breaking is a symptom of what we'll see in the wild. In particular for serialization scenarios where data goes across different versions of the Framework. |
@jkotas, in my mind, it's the types of tests that failed (serialization, parsing, etc) more than the number. I think that investigating if this would break Roslyn and JSON.NET would be the minimum that should be done before the change is merged. That being said, it really comes down to how the user is consuming it. Serializing a value using The CoreFX tests are really only failing because we are checking that |
Agree. Data about whether this breaks real-world code would be useful. |
@mazong1123, that has the same "issue" this fix has. There are some numbers ( So far, the breaks have all been in tests which are explicitly checking that the string produced matches exactly (so they are validating Looking at the tests that have failed so far (both in CoreFX and JSON.NET) it is my view that they are also "technically" broken. The tests (which are effectively testing that the data is serialized properly) should not be testing that the data output exactly but instead validating that the data round-trips appropriately. That is, for most testcases, it shouldn't matter if There should also be tests that validate that the output is in the correct "shape" (that is, |
@mazong1123 @tannergooding it looks from the results we got, it is not clear how much the apps can be broken by this change. Looks the breaking scenario is when the app expect the exact formatted double number that used to get before (I mean when the app persists such formatted string). I am still seeing this change is risky even the broken scenario could be a not common scenario. My preference here is just to keep the current behavior and update the documentation to ask people use G17 as needed. I'll leave it to @AlexGhiondea and @jkotas to advise about that. |
@tarekgh, It only when they compare the outputted string to another string that it is breaking. If they persist the string, but parse it before any comparisons, then everything is fine.
|
Yes, that is what I meant, they compare formatted strings. I am still keeping my preference I mentioned. |
I think I agree with @tarekgh on this one. I would rather not take this due to potential breaks. It looks risky and we already have an alternative in place for it (G17) in case people need the roundtrip. Waiting on @AlexGhiondea and @jkotas to chime in, but I think we shouldn't take it. |
"do nothing" is fine with me for now, but we may want to look at this together with the fixing the parser to be IEEE compliant. Fixing the parser would be a similar kind of observable change. Note that we have been treating changes in floating point precision (precision improvements in particular) as acceptable changes, even in .NET Framework. People are sometimes surprised by it (e.g. #12737), and it breaks poorly written tests or numerically unstable algorithms occasionally (*). .NET Core is not meant to be bug-for-bug compatible from version to version, so I would think that precision improvements are no-brainer to take in .NET Core. (*) I have debugged a pool game once. The ball flown out from the table on a new .NET version from time to time. I turned out that the ball impact algorithm was numerically unstable and floating point precision improvement caused it to not converge. |
We already have the issue https://github.com/dotnet/coreclr/issues/1316 tracking fixing the parser. I think mainly we need to pick the implementation from Roslyn. |
I would second this, especially when the precision improvements also bring a perf improvement like this change. |
The documentation for the "R" format specifier already describes these problems (bugs) and advises the use of "G17" to avoid them. See #13140 (comment) |
@gafter The document is asking user to use "R" which would introduce the issue discussing here. Althougth there's a notice in the middle of the doc, user may not notice that and it does not make sense to ask user to use "G17" for x64 and any cpu, whereas use "R" for x86. So IMO we should always ask user to directly use "G17" for roundtrip , and mark "R" as obsoleted if we decide to keep current R's logic. |
you can log the doc issue here https://github.com/dotnet/docs I would avoid saying R is obsolete in the doc. but I agree the doc should be clear recommending G17 for who want to guarantee the round tripping. |
@tarekgh OK got it. I'm waiting for the conclusion here. Once we decide to keep current behavior, I'll file an issue at https://github.com/dotnet/docs |
go ahead and file the doc issue. we'll keep the current behavior for now. |
What is status of this PR? No update in last 2 weeks ... |
Closing it. @joperezr @AlexGhiondea please re-open if you think otherwise. |
@tarekgh, I still don't like that "Use If we ever extend things to support half precision or quad precision (half precision support may be partially available with the hardware intrinsic proposal from Intel), they will also have similar issues. I think we need to either take the break and fix 'R', or provide a separate formatter that provides IEEE compliant round-tripping. |
@tannergooding as @jkotas suggested before we'll look at the whole story of the formatting and parsing together. some people (including myself) showed app compat concern of changing "R" now. I would suggest we should look more into mitigating the app compat concern before we introduce a new specifier. @AlexGhiondea, what you think about fixing "R" as this PR did with having AppContext switch to go back to the old behavior if needed? does this mitigate the app compat concern? |
I would recommend as a fix a correct implementation of both formatting and conversion, rather than a workaround for bugs in the current implementation. |
Removed the "15 digits precision first, then 17 digits precision" converting logic in double.ToString("R"). This change fixes https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double
Fix #13106