Perf: double.ToString() 3x slower in some cases on Linux #7783

New issue

Closed

dotnet/coreclr

#12894

Closed

Perf: double.ToString() 3x slower in some cases on Linux#7783

dotnet/coreclr

#12894

Assignees

Labels

area-System.Runtimeenhancementhelp wantedos-linuxtenet-performance

Milestone

Future

tarekgh

opened

on Apr 2, 2017

Member

We have enhanced the performance of Double.ToString when running on Linux. we used to be 7x slower on Linux than on Windows but now we are around 3x slower in some cases.this issue is tracking looking at Double.ToString performance to optimize more when running on Linux.

Note that, the current Double.ToString performance is slightly better than Windows in some cases (like when using Double.MaxValue /2) for instance.

#7700 is the original issue which has more details about what we have done so far and what more can be done.

karelz

Member

@tarekgh just to clarify: Are we tracking improvements on Linux (title says that) or on Windows (first paragraph says that). Or both?

tarekgh

MemberAuthor

we are tracking to fix the perf on Linux as the title mention. thanks for the catch, I had typo in the first paragraph and I have fixed it.

karelz

changed the title ~~[-]Perf: double.ToString() slower in some cases on Linux[/-]~~ Perf: double.ToString() 3x slower in some cases on Linux

on Apr 2, 2017

mazong1123

Contributor

Given an up-for-grabs label here, I'm wondering if anybody is working on this issue? If not, I can help on this.

karelz

Member

It is not assigned to anyone, so no one is working on it (see Triage rules). Assigning to you ...

assigned

MemberAuthor

@mazong1123 thanks for looking at this. could you please share your thoughts before applying any changes?

mazong1123

Contributor

@tarekgh I'm still reading the context and previous works you guys have done. Once the preparation finished, I'll list the overview of the issue from my perspective and then put my proposals here for discussion.

This should be done by 7/4 CST.

mazong1123

Contributor

@tarekgh I've read the context of the double.ToString() performance issue. Following are my thoughts and plan. Please let me know if I missed anything or you have any input.

Overview

Originally, double.ToString() on Linux is 7x slower than Windows because of the inefficient implementation of pal _ecvt, which is trying to emulate the behavior of Windows _ecvt.
Re-write _ecvt by leveraging ecvt_r can get the double.ToString()'s performance back to Windows. However, ecvt_r breaks round tripping for edge cases (e.g. Double.MinValue) so we cannot use ecvt_r.
Re-write _ecvt using the code of CoreRT DoubleToNumber can improve the performance (but not reach the Windows performance). We have applied the changes to unblock the customers.

Plan

As mentioned by @jkotas in
https://github.com/dotnet/coreclr/issues/10390#issuecomment-288549085 https://github.com/dotnet/coreclr/issues/10390#issuecomment-289457558 , the long term fix is to re-implement DoubleToNumber in an OS-independent manner, probably in full-managed code, so I'll try followings:

Write a demo based on the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
If the demo is efficient as expected, apply it to DoubleToNumber.
If the performance improved as expected, convert the implementation to managed code.

tarekgh

MemberAuthor

@mazong1123 thanks for looking at that.

Write a demo based on the paper: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf

2 things we need to confirm here if we are going to use this paper:

there is no legal issues to implement this paper in open source. @terrajobst could you please help with this question?
May be this paper implementation will require having a static data. we need to know how big this data would be as we are concerned about the size too.

thanks again for looking at this.

mazong1123

Contributor

May be this paper implementation will require having a static data. we need to know how big this data would be as we are concerned about the size too.

Just to be clarified:

Are you concerning about the size of data stored in static variables of C/C++ code?
Do we have any specification/guideline with details of the size constrain?

tarekgh

MemberAuthor

Are you concerning about the size of data stored in static variables of C/C++ code?

Any data that increase the size of the net core on the disk will be a concern. If the data is really small (e.g. small arrays or static data). But if the data is few KB's then we need to look at that and evaluate the benefits. We had some work to optimize the size of the net core on the disk.

CC @jkotas

jkotas

Member

I would worry about speed and correctness first. Once you have prototype that is fast and correct, we will worry about the rest (disk footprint, license, etc.)

mazong1123

Contributor

@jkotas I'll write a demo first so that we can examine the correctness and speed. If it does not work as expected, we can have a quick turnaround.

BTW, do you have any idea other than trying the paper?

mazong1123

Contributor

@jkotas @tarekgh I've prototype the paper roughly. Before I go further, I'd like to share the information I've learnt so far. With the algorithm,

double.MinValue can be exactly converted to a string -17976931348623157 with an exponent value 308, same as full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), but is more accurarte than using double.ToString(), whose result is -1.79769313486232E+308;
The number 7.9228162514264338E+28 which was mentioned in https://github.com/dotnet/coreclr/issues/10390#issuecomment-290457104 can be converted to a string 7922816251426434 with an exponent value 28, which is less accurate comparing to full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), whose output is 7.9228162514264338E+28, but is more accurate comparing to using double.ToString(), whose output is 7.92281625142643E+28

So I believe generally, the algorithm is more accurate than current implementation, but I need to adopt the double.ToString("R") implementation to my prototype and run the test cases again.

@tarekgh BTW, in https://github.com/dotnet/coreclr/issues/10390#issuecomment-291011320 , when testing double.MinValue against evct_r implementation, were you using double.ToString() or double.ToString("R")? I'm asking because even full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 will produce the wrong number when using double.ToString() to convert double.MinValue.

tarekgh

MemberAuthor

double.MinValue can be exactly converted to a string -17976931348623157 with an exponent value 308, same as full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), but is more accurarte than using double.ToString(), whose result is -1.79769313486232E+308;

Yes this is expected to have ToString("R") is more accurate than ToString(). this is why we have "R" guarantee the round tripping but not ToString().

The number 7.9228162514264338E+28 which was mentioned in #7700 (comment) can be converted to a string 7922816251426434 with an exponent value 28, which is less accurate comparing to full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 using double.ToString("R"), whose output is 7.9228162514264338E+28, but is more accurate comparing to using double.ToString(), whose output is 7.92281625142643E+28

We need to validate parsing back the value produced from "R" is getting exact same number we originally started with before converting to string.

@tarekgh BTW, in #7700 (comment) , when testing double.MinValue against evct_r implementation, were you using double.ToString() or double.ToString("R")? I'm asking because even full .net framework 4.6.1 and .net core 2.0.0-preview2-005905 will produce the wrong number when using double.ToString() to convert double.MinValue.

I was referring to "R". we understand ToString() without "R" doesn't produce accurate numbers. it more rounded.

mazong1123

Contributor

I fixed several bugs of my prototype. Now the accuracy is great. The prototype code can be found at https://github.com/mazong1123/doubletonumber

I'll start to refactor & optimize the code and test the performance on Linux.

mazong1123

Contributor

@tarekgh I just have one question regarding double.ToString("R"). I saw the logic is like:

Try to convert the double to string in precision of 15.
Convert the string back to double and compare to the original double. If they are the same, we return the converted string whose precision is 15.
Otherwise, convert the double to string in precision of 17.

Is there any particular reason preventing us to convert the double to string in precision of 17 directly? I'm asking because:

There's a bug https://stackoverflow.com/questions/24299692/why-is-a-round-trip-conversion-via-a-string-not-safe-for-a-double . Let's say we have following code:

using System;

namespace lab
{
    class Program
    {
        static void Main(string[] args)
        {
            double d1 = 0.84551240822557006;
            string s = d1.ToString("R");
            double d2 = double.Parse(s);
            Console.WriteLine(d1 == d2);
        }
    }
}

In current .NET Core (2.0.0-preview2-005905), the output is False. The reason is d1.ToString("R") wrongly chose the result in precision of 15, which is 0.84551240822557. If we choose result in precision of 17, which is 0.84551240822557006, we can convert it back to double accurately.

Get rid of this logic will simply make the code faster IMO.

tarekgh

MemberAuthor

@mazong1123 I don't know the history why decided to format with "R" as 15 digit precision and not 17 directly. my guess is was the old implementation with 15 digit precision was faster than 17 digits and almost all common cases can fit in 15 digit precision.

The doc https://msdn.microsoft.com/en-us/library/kfsatb94(v=vs.110).aspx is stating this design but doesn't tell the rationale behind it. also the doc sample is exactly the same case mentioned in your code sample.

By default, the return value only contains 15 digits of precision although a maximum of 17 digits is maintained internally. If the value of this instance has greater than 15 digits, ToString returns PositiveInfinitySymbol or NegativeInfinitySymbol instead of the expected number. If you require more precision, specify format with the "G17" format specification, which always returns 17 digits of precision, or "R", which returns 15 digits if the number can be represented with that precision or 17 digits if the number can only be represented with maximum precision.
Notes to Callers:
In some cases, Double values formatted with the "R" standard numeric format string do not successfully round-trip if compiled using the /platform:x64 or /platform:anycpu switches and run on 64-bit systems. To work around this problem, you can format Double values by using the "G17" standard numeric format string. The following example uses the "R" format string with a Double value that does not round-trip successfully, and also uses the "G17" format string to successfully round-trip the original value.

I think we can go with the breaking change to always format as 17 digits precision if it is really faster in the common cases.

@jkotas what you think about that? and do you know why we had this design in the first place?

jkotas

Member

what you think about that?

I think we can give it a try. These kind of changes tend to come with compat surprises...

do you know why we had this design in the first place?

I do not know.

tannergooding

Member

I investigated the 'R' case and validated that the math for converting from string to double is correct.

I also left a comment (see https://github.com/dotnet/coreclr/issues/13106#issuecomment-318844961) indicating that we are not meeting the IEEE round tripping requirements which states that double->string->double is only required to match when done with at least 17 digits.

tarekgh

MemberAuthor

@tannergooding we have another issue tracking the parsing part https://github.com/dotnet/coreclr/issues/1316 as it looks we have other issues than just using 17 significant digits. Roslyn already introduced a new parsing implementation which conforms to IEEE which we can consider porting it to net core.