-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-4873][Streaming] Use Future.zip
instead of Future.flatMap
(for-loop) in WriteAheadLogBasedBlockHandler
#3721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #24542 has started for PR 3721 at commit
|
Test build #24543 has started for PR 3721 at commit
|
Test build #24542 has finished for PR 3721 at commit
|
Test PASSed. |
Test build #24543 has finished for PR 3721 at commit
|
Test PASSed. |
The flaky test fix here looks like it addresses https://issues.apache.org/jira/browse/SPARK-4790. /cc @harishreedharan, who is investigating that flaky test. |
This would not address the test issue as we are still returning from cleanupOldLogs before the deletion is completed. This |
The eventually ensures that the block is retried - but the failure is happening before the eventually, so the test would still throw. |
Hmm, looking at it again - this would fix the test as well, though I think the approach in #3726 is cleaner. |
My first trial was making |
Used |
Test build #24566 has started for PR 3721 at commit
|
Test build #24566 has finished for PR 3721 at commit
|
Test PASSed. |
@@ -125,7 +125,7 @@ private[streaming] class WriteAheadLogManager( | |||
* between the node calculating the threshTime (say, driver node), and the local system time | |||
* (say, worker node), the caller has to take account of possible time skew. | |||
*/ | |||
def cleanupOldLogs(threshTime: Long): Unit = { | |||
def cleanupOldLogs(threshTime: Long): Future[Unit] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be exposing this future. That is internal implementation detail and we'd have to stick with this implementation. At some point, we might want to delete the files synchronously - at which point returning this Future might not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I prefer to also return Future
for an asynchronous action. Returning Unit
hides the asynchronous feature and such method will be misused easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that deleting asynchronously is an implementation detail - which we choose to do now, which can be changed later, don't you think? If we change it later, we'd have to change this method's signature - which can cause pain if there is code that uses Await.result(*) on this future.
If we expose it via a parameter, we can choose to ignore the param and still the calling code will not have to change. Since it is unlikely that the calling code will actually depend on the async nature, it is unlikely to see any difference in functionality and no change in code is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that deleting asynchronously is an implementation detail - which we choose to do now, which can be changed later, don't you think? If we change it later, we'd have to change this method's signature - which can cause pain if there is code that uses Await.result(*) on this future.
I doubt if it will be changed. However, asynchronously
is an important implementation detail that the caller should know it, or they may misuse it.
If we expose it via a parameter, we can choose to ignore the param and still the calling code will not have to change. Since it is unlikely that the calling code will actually depend on the async nature, it is unlikely to see any difference in functionality and no change in code is required.
I don't think a parameter is enough. At least, it needs a more parameter, a timeout
parameter. In your PR, you used 1 second
which may not be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's more, if we really want to change it to a synchronously deleting. Returning Future does still work. Just simply writing something like:
deleteFiles()
return Promise[Unit]().success(null).future
@@ -169,10 +171,7 @@ private[streaming] class WriteAheadLogBasedBlockHandler( | |||
} | |||
|
|||
// Combine the futures, wait for both to complete, and return the write ahead log segment | |||
val combinedFuture = for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch. I wasnt aware that for-yield
is flatMap
which is obviously ordered.
High level comment, I think I completely agree on the Future.zip, but not sure if I find the other changed related to other two valid.
So it would be great if you can reduce the scope of this PR to the |
Future.zip
instead of Future.flatMap
(for-loop) in WriteAheadLogBasedBlockHandler
Now this PR only contains |
Test build #24799 has started for PR 3721 at commit
|
LGTM. Will merge if tests pass. |
Test build #24799 has finished for PR 3721 at commit
|
Test PASSed. |
Merged this, thanks very much! |
…for-loop) in WriteAheadLogBasedBlockHandler Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one. Author: zsxwing <zsxwing@gmail.com> Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits: 46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop) (cherry picked from commit b4d0db8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Use
Future.zip
instead ofFuture.flatMap
(for-loop).zip
implies these two Futures will run concurrently, whileflatMap
usually means one Future depends on the other one.