Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fixed the Equals bug of floating point number and user-defined struct. #13164

Merged
merged 1 commit into from Aug 23, 2017
Merged

Fixed the Equals bug of floating point number and user-defined struct. #13164

merged 1 commit into from Aug 23, 2017

Conversation

mazong1123
Copy link

@mazong1123 mazong1123 commented Aug 2, 2017

This PR disables the fast equal check and fast hash code helper for floating point numbers so that 0.0 and -0.0 can be considered as the same. Same for all NaNs.

This definately can be optimized. For example, we should only opt out 0 and NaN for double/single and let other values use fast equal check and fast hash code helper.

UPDATE

The final implementation is as following:

Other than ContainsPointers and IsNotTightlyPacked, added two new conditions for checking whether a valuetype can go to fast path or not. Following are the details of these 2 conditions:

  • Check whether a valuetype contains a Single/Double field. If it does, we cannot go to fast path. Floating point numbers have special Equals and GetHashCode implementation. We cannot simply compare floating point numbers via bits (e.g. +0.0 should equal to -0.0, but their underlying bits are different).

  • Check whether an user-defined valuetype overrides Equals or GetHashCode. If it does, we cannot go to fast path. Because Equals or GetHashCode result may be different to bit comparison.

To find Single/Double fields and overridden Equals/GetHashCode, we start a DFS to go through the whole hierachy of the source valuetype, and cache the result to MethodTable.

Fix #11452

@mazong1123 mazong1123 changed the title [WIP]Fixed the Equals bug of user defined struct with floating point numbe… [WIP]Fixed the Equals bug of floating point number Equals. Aug 2, 2017
@mazong1123 mazong1123 changed the title [WIP]Fixed the Equals bug of floating point number Equals. [WIP]Fixed the Equals bug of floating point number. Aug 2, 2017
@mazong1123
Copy link
Author

mazong1123 commented Aug 2, 2017

@tannergooding @tarekgh @jkotas
Current changes fixed the bug. However, the performance may be degraded as I disabled fast check for all floating point numbers other than just for 0 and NaN. I created this PR to check with CI and hope someone who is familiar with the type loader can review my solution - is it on the right way?

If the direction is correct, I may add a HasFloatingPointNumberField() method in EEClass, and call HasFloatingPointNumberField(), check 0 / NaN along with !mt->IsNotTightlyPacked() to determine whether we can use the fast check. The hard part is I do not know how to add a new flag for HasFloatingPointNumberField() - seems like all available bits have been taken

VMFLAG_NOT_TIGHTLY_PACKED = 0x10000000,
.

@mazong1123
Copy link
Author

@dotnet-bot test Ubuntu x64 corefx_baseline
@dotnet-bot test Windows_NT x86 corefx_baseline

@tarekgh
Copy link
Member

tarekgh commented Aug 2, 2017

CC @AlexGhiondea @joperezr

Would be nice to wait for @jkotas to review this change.

@@ -1809,7 +1811,7 @@ MethodTableBuilder::BuildMethodTableThrowing(

if (IsValueClass())
{
if (bmtFP->NumInstanceFieldBytes != totalDeclaredFieldSize || HasOverLayedField())
if (bmtFP->NumInstanceFieldBytes != totalDeclaredFieldSize || HasOverLayedField() || hasFloatingPointNumberField)
GetHalfBakedClass()->SetIsNotTightlyPacked();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IsNotTighlyPacked bit is used to disable JIT "struct promotion" optimizations. I do not think we want to be disabling the "struct promotion" optimization when the struct has float field. You need to find a new bit for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas I tried to define a new bit for HasFloatingPointNumberField() but seems like we already exhausted the available bits. You can see all VMFLAG_*:

VMFLAG_MARSHALINGTYPE_STANDARD = 0xc0000000,

Or should I define another enum struct?

@jkotas
Copy link
Member

jkotas commented Aug 3, 2017

The same fundamental bug exists for all fields with a custom Equals operator that do not contain GC reference. float/double are just one instance of this problem. If we are doing something about this, it maybe nice to fix it for all cases, not just float/double.

@mazong1123
Copy link
Author

mazong1123 commented Aug 3, 2017

@jkotas I may miss some background knowledge about this bug. Could you please give me an example to demonstrate the bug in non-double/float fields?

@jkotas
Copy link
Member

jkotas commented Aug 3, 2017

Could you please give me an example to demonstrate the bug in non-double/float fields?

Actual output: true (NeverEquals.Equals is not called)
Expected output: false

using System;

struct NeverEquals
{
    byte _a;

    public override bool Equals(object obj)
    {
        return false;
    }

    public override int GetHashCode()
    {
        return 0;
    }
}

struct MyStruct
{
    NeverEquals _x;
    NeverEquals _y;
}

class Program
{
    static void Main(string[] args)
    {
        MyStruct i = new MyStruct();
        MyStruct j = new MyStruct();

        Console.WriteLine(i.Equals(j));
    }
}

@jkotas
Copy link
Member

jkotas commented Aug 3, 2017

More realistic example.

Actual output: false (NeverEquals.Equals is not called)
Expected output: true

using System;

struct BoolAndHint
{
    private byte _a;

    public bool Value => (_a & 1) != 0;
 
    public byte Hint => (byte)(_a >> 1);

    public BoolAndHint(bool value, int hint)
    {
        _a = (byte)((value ? 1 : 0) + (hint << 1));
    }

    public override bool Equals(object obj)
    {
        if (!(obj is BoolAndHint))
            return false;

        BoolAndHint other = (BoolAndHint)obj;
        return Value == other.Value;
    }

    public override int GetHashCode()
    {
        return 0;
    }
}

struct MyStruct
{
    public MyStruct(bool value, int hint)
    {
        _x = new BoolAndHint(value, hint);
    }

    BoolAndHint _x;
}

class Program
{
    static void Main(string[] args)
    {
        MyStruct i = new MyStruct(false, 42);
        MyStruct j = new MyStruct(false, 43);

        Console.WriteLine(i.Equals(j));
    }
}

@mazong1123
Copy link
Author

mazong1123 commented Aug 3, 2017

@jkotas Thanks for the example! Another question is seems like we've exhausted the bits for VMFLAG_*:

VMFLAG_MARSHALINGTYPE_STANDARD = 0xc0000000,

I mean we've already reached 0x80000000 , I cannot find a larger value that has only 1 bit set...

Should I create a new enum to store my new flag bit?

@jkotas
Copy link
Member

jkotas commented Aug 3, 2017

Find an existing unused bit, e.g. VMFLAG_REMOTING_PROXY_ATTRIBUTE or VMFLAG_NOSUPPRESSUNMGDCODEACCESS are not actually used in .NET Core. It is fine to submit separate cleanup PR to get these bits freed up.

@mazong1123
Copy link
Author

mazong1123 commented Aug 4, 2017

To be clear, I need two bits. One is for floating/double +0.0, -0.0, NaN scenario, another one is for user-defined struct with overrided Equals.

VMFLAG_REMOTING_PROXY_ATTRIBUTE is used by HasRemotingProxyAttribute at

inline BOOL HasRemotingProxyAttribute()

HasRemotingProxyAttribute is used by RuntimeTypeHandle::HasProxyAttribute at

FC_RETURN_BOOL(pMT->GetClass()->HasRemotingProxyAttribute());

RuntimeTypeHandle::HasProxyAttribute is a FCall which is NOT used in CoreFX now. So VMFLAG_REMOTING_PROXY_ATTRIBUTE is not used by .net core now. I can take this bit.

However, VMFLAG_NOSUPPRESSUNMGDCODEACCESS is used by HasSuppressUnmanagedCodeAccessAttr, which has been used in some important function like ECall::GetQCallImpl:

CONSISTENCY_CHECK_MSGF(pMD->HasSuppressUnmanagedCodeAccessAttr(),

I think I should find another available bit.

@jkotas
Copy link
Member

jkotas commented Aug 4, 2017

CONSISTENCY_CHECK_MSGF(pMD->HasSuppressUnmanagedCodeAccessAttr(),

This calls MethodDesc::HasSuppressUnmanagedCodeAccessAttr() that just returns true in .NET Core, it does not even check any bits.

@mazong1123
Copy link
Author

mazong1123 commented Aug 4, 2017

@jkotas how about another (last one) usage in NDirect::CreateCLRToNativeILStub:

!pMD->HasSuppressUnmanagedCodeAccessAttr())

Seems like it has some logic related to security.

@jkotas
Copy link
Member

jkotas commented Aug 4, 2017

All this is left over code from full .NET Framework partial trust that can be deleted. (It would be nice to do this cleanup in separate PR.)

@mazong1123
Copy link
Author

mazong1123 commented Aug 4, 2017

@jkotas Thanks. I'll finish this PR first and then clean up these left over code in another PR :) . For now, as I need to use these two bits, I may change something related to make it compile-able.

@mazong1123
Copy link
Author

mazong1123 commented Aug 4, 2017

@jkotas to fix the overrided Equals issue, I briefly did:

  1. Check if current class/struct has overrided Equals.
  2. Check if any field type of current class/struct has overrided Equals.
  3. If 1 or 2 is true, call GetHalfBakedClass()->SetHasOverridedEquals();
  4. In CanCompareBits, check mt->HasOverridedEquals().

To achieve step 2, I have to leverage pByValueClassCache to access MethodTable:
https://github.com/mazong1123/coreclr/blob/36f3c079de2cba5b58839721de435934eab281dd/src/vm/methodtablebuilder.cpp#L4205-L4208

I tried to use pByValueClass which looks like more elegant. However, it would cause a segmentation fault for cross gen. Please let me know if there's any elegant way to access MethodTable.

UPDATE:
I see some test failed. I'd like to investigate this tomorrow. @jkotas Please let me know if I'm on the right direction if you have time to take a look :)

// mazong1123: a valid overrided method.
if (!*hasOverridedEqualsMethod)
{
*hasOverridedEqualsMethod = strcmp((*it)->GetMethodSignature().GetName(), "Equals") == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can have false positives.


// mazong1123: a valid overrided method.
if (!*hasOverridedEqualsMethod)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing method names and signatures is not free. Adding this kind of logic to type loader will make every type pay for it, but Equals will only ever be called on a few types. It would be better to do this lazily, only once somebody calls Equals on the type.

Copy link
Author

@mazong1123 mazong1123 Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, try to find an overrided Equals when somebody calls Equals in a ValueType will need to DFS or BFS all the available fields of the types in worse case. If the hierarchy is deep it's very inefficient IMO. For instance:

    struct MyStruct2
    {
        public NeverEquals _x;
        public NeverEquals _y;
    }

    struct MyStruct3
    {
        public MyStruct2 _x;
        public MyStruct2 _y;
    }

    struct NeverEquals
    {
        public double _a;

        public override bool Equals(object obj)
        {
            return false;
        }

        public override int GetHashCode()
        {
            return 0;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            MyStruct3 i = new MyStruct3();
            MyStruct3 j = new MyStruct3();

            Console.WriteLine(i.Equals(j));
        }
    }

We need to drill down to NeverEquals to find an overrided Equals. That makes the "fast equal" no more fast...
Although we'll cache the result, that still break the original purpose of "fast equal" because it may be slower than the naive comparison at the first time.

I think we may need some trade-off. For example, something like "if you override any method in a ValueType, CanCompareBits will return false". Because we can easily record whether the user override any method in the hierarchy, we can simply check it in CanCompareBits.

Similarly, we can easily record whether the user used any floating point number field in the hierarchy, but it's hard to check if any floating point number field stores 0.0 or NaN at runtime (Need a DFS or BFS).

@jkotas please let me know if I missed something. Thanks.

@jkotas
Copy link
Member

jkotas commented Aug 5, 2017

If the hierarchy is deep it's very inefficient IMO

Right, checking for the Equals overrides is not cheap and that's why the result should be cached. The good place to cache the result of this check would be MethodTableWriteableData::m_dwFlags

it's hard to check if any floating point number field stores 0.0 or NaN

This is close to impossible to check, not just hard.

@jkotas
Copy link
Member

jkotas commented Aug 5, 2017

I think we may need some trade-off.

Yet another way to fix this issue would be to auto-generate the IL code for the Equals method. It would have good steady-state performance for all cases.

@mazong1123
Copy link
Author

mazong1123 commented Aug 7, 2017

@jkotas I've roughly implemented the algorithm to determine whether a value type has overridden Equals in its type hierachy.

I'm going to optimize the performance but since I'm not very familiar about the type system codebase, it would be great if you can give me a hint:

  1. I'm using ApproxFieldDescIterator to iterate fields of a type. It's inefficient since memory copy happens each time an iterator popped from or pushed into the stack during DFS. I'm wondering is there any other lightweight way to iterate all fields of a type?

  2. During the DFS, I'd like to know if a methodtable has already been checked previously. If yes, I can skip that field. I think hashtable should be the proper data structure to achieve this. However, I didn't find any suitable hashtable implementation in CLR. Can I use std::unordered_set?

@jkotas
Copy link
Member

jkotas commented Aug 7, 2017

I'm using ApproxFieldDescIterator to iterate fields of a type

That's fine. I do not think that the efficiency of ApproxFieldDescIterator is a problem - you can profile it to tell for sure. A similar algorithm is used in number of other places.

Also, value types are expected to be relatively simple and small. Everything will be slow for big complex structs.

During the DFS, I'd like to know if a methodtable has already been checked previously

MethodTableWriteableData::m_dwFlags is a good place to cache this information.

@mazong1123
Copy link
Author

@jkotas Thanks. I'll optimize the code and profile the performance tommorrow.

@mazong1123
Copy link
Author

mazong1123 commented Aug 8, 2017

@jkotas I just realized CanUseFastGetHashCodeHelper has a similar issue if user overrides GetHashCode method. And the 0.0 and NaN issue also exists in CanUseFastGetHashCodeHelper.

I think I should fix it in a separate PR. What's your opinion?

@jkotas
Copy link
Member

jkotas commented Aug 8, 2017

You may want to do the GetHashCode fix first, so that a.Equals(b) => a.GetHashCode() == b.GetHashCode() invariant is maintained at all times.

@jkotas
Copy link
Member

jkotas commented Aug 8, 2017

(Having both fixes in one PR would be fine with me as well.)

@mazong1123
Copy link
Author

mazong1123 commented Aug 8, 2017

Now I found another interesting case when we have a class type inside a value type:

    class Cls
    {
        public byte _a;

        public override bool Equals(object obj)
        {
            return false;
        }

        public override int GetHashCode()
        {
            return 0;
        }
    }

    struct MyStruct6
    {
        public Cls _x;
        public Cls _y;
    }

    static void Main(string[] args)
    {
            MyStruct6 i = new MyStruct6();
            MyStruct6 j = new MyStruct6();

            Console.WriteLine(i.Equals(j));
    }

The output is True. CanCompareBits returns false because it contains pointers. That means in a NORMAL compare process, i is considered equal to j. Although it seems like not related to this issue, I'm wondering is it an expected behavior or a bug?

_x and _y would be null so i should equal to j.

@jkotas
Copy link
Member

jkotas commented Aug 16, 2017

tests should be added to corefx

Agree. Thanks!

@jkotas
Copy link
Member

jkotas commented Aug 16, 2017

@mazong1123 Could you please squash the commits and write a description that matches the final change?

@mazong1123
Copy link
Author

@jktoas sure. Will do it later today.

@mazong1123
Copy link
Author

mazong1123 commented Aug 17, 2017

@jkotas I squashed most of the commits. However, some commits came before I merge commits from master branch to resolve a conflict. If I squash those commits, the history of the merged commits would be lost. So I kept them. Please let me know if I need to squash those commits as well.

@jkotas
Copy link
Member

jkotas commented Aug 17, 2017

If I squash those commits, the history of the merged commits would be lost

I do not think that the history for this merge is valuable. It can be all squashed into one commit.

@mazong1123
Copy link
Author

Oops.. I think I need to redo the squash as the merged commits has been lost.

@mazong1123
Copy link
Author

mazong1123 commented Aug 17, 2017

@jkotas I cherry-picked some commits so I'd like to run my local tests this morning (I don't have a dev environment now). Please don't merge it before I finish the tests in case of any regression bug.

@mazong1123
Copy link
Author

@dotnet-bot test Ubuntu x64 corefx_baseline
@dotnet-bot test Windows_NT x86 corefx_baseline

@mazong1123
Copy link
Author

mazong1123 commented Aug 18, 2017

@jkotas the code passed my local tests so the logic should be fine. However, the CI (Windows_NT x86 Checked Build and Test (Jit - CoreFx)) started to complain today. The code change would not cause this failure 2 days ago. Do we have any update on CI server?

08:14:46   SGEN: error SGEN1: Missing required command-line argument: The name of the source assembly..
08:14:46   In Development
08:15:04   
08:15:04 
  Assert failure(PID 4212 [0x00001074], Thread: 1256 [0x04e8]): Assertion failed 'src1->OperIs(GT_LONG)' in 'Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationWriter1:Write64_LongEnum(long):ref:this' (IL size 90)
08:15:04   
08:15:04       File: d:\j\workspace\x86_checked_w---92c0aac3\src\jit\lower.cpp Line: 2167
08:15:04       Image: D:\j\workspace\x86_checked_w---92c0aac3\_\fx\bin\testhost\netcoreapp-Windows_NT-Release-x86\dotnet.exe

@jkotas
Copy link
Member

jkotas commented Aug 18, 2017

There are two problems here:

@mazong1123
Copy link
Author

mazong1123 commented Aug 19, 2017

@jkotas I'll start a PR to test if master branch code has the same JIT asserting issue. If not, it would be introduced by this code change.

@mazong1123
Copy link
Author

mazong1123 commented Aug 19, 2017

@jkotas Now I'm pretty much sure the JIT assertion failure is not caused by my changes here. See the same failure here: https://ci.dot.net/job/dotnet_coreclr/job/master/job/jitstress/job/x86_checked_windows_nt_corefx_baseline_prtest/49/
which is from my test PR: #13480

I've updated the test PR to involve most recently changes. If it still fails, I will fire an issue.

But for this PR I think it should be ready to merge.

@mazong1123
Copy link
Author

@dotnet-bot test OSX10.12 x64 Checked Build and Test

@mazong1123
Copy link
Author

mazong1123 commented Aug 19, 2017

@jkotas I've filed an issue to track the JIT assertion failure: https://github.com/dotnet/coreclr/issues/13486

@mazong1123
Copy link
Author

@dotnet-bot test Windows_NT x86 corefx_baseline
@dotnet-bot test this

BOOL ret = FALSE;

HELPER_METHOD_FRAME_BEGIN_RET_ATTRIB_1(Frame::FRAME_ATTR_EXACT_DEPTH|Frame::FRAME_ATTR_CAPTURE_DEPTH_2, objRef);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra line

src/vm/class.h Outdated
@@ -1398,29 +1398,6 @@ class EEClass // DO NOT CREATE A NEW EEClass USING NEW!
}

public:

inline void SetDoesNotHaveSuppressUnmanagedCodeAccessAttr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These diffs are not part of your change - they are in master already. Could you please rebase them away?

{
// Check current field type.
MethodTable* fieldMethodTable = pField->GetApproxFieldTypeHandleThrowing().GetMethodTable();
if(!CanCompareBitsOrUseFastGetHashCode(fieldMethodTable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space after it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find extra space after this line or at tail. Did you mean I need add space after this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say add space between if and (.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK got it. I'll update it.

@mazong1123
Copy link
Author

I saw the CI failure has been fixed https://github.com/dotnet/coreclr/issues/13486#issuecomment-324081359

@jkotas Can this PR be merged?

@jkotas
Copy link
Member

jkotas commented Aug 22, 2017

I have just look at this. Yes, it looks good to me. I have just commented on a few last nits.

Other than `ContainsPointers` and `IsNotTightlyPacked`, added two new conditions for checking whether a valuetype can go to fast path or not. Following are the details of these 2 conditions:

- Check whether a valuetype contains a Single/Double field. If it does, we cannot go to fast path. Floating point numbers have special `Equals` and `GetHashCode` implementation. We cannot simply compare floating point numbers via bits (e.g. +0.0 should equal to -0.0, but their underlying bits are different).

- Check whether an user-defined valuetype overrides `Equals` or `GetHashCode`. If it does, we cannot go to fast path. Because `Equals` or `GetHashCode` result may be different to bit comparison.

To find Single/Double fields and overridden `Equals`/`GetHashCode`, we start a DFS to go through the whole hierachy of the source valuetype, and cache the result to `MethodTable`.

Fix #11452
@mazong1123
Copy link
Author

@jkotas I've updated accordingly. Please take a look. Thanks.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jkotas jkotas merged commit 495ece4 into dotnet:master Aug 23, 2017
@mazong1123 mazong1123 deleted the fix-11452 branch August 23, 2017 23:23
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
@jkotas
Copy link
Member

jkotas commented Feb 24, 2018

This change fixed ValueType.Equals, but it did not actually fixed ValueType.GetHashCode. Opened: https://github.com/dotnet/coreclr/issues/16545

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants