-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-11389][CORE] Add support for off-heap memory to MemoryManager #9344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #44556 has finished for PR 9344 at commit
|
Test build #44570 has finished for PR 9344 at commit
|
/cc @davies @andrewor14 for review. This is still WIP pending some tests, but the high-level design is ready for comments. I've posted a PR description to help guide you through the key changes. |
Test build #44633 has finished for PR 9344 at commit
|
One bad complication: until we can completely support off-heap memory for execution, we need to perform separate accounting for on-heap and off-heap memory, so |
Looking a bit more closely, it looks like all existing implementations of |
The current approach of having separate methods named |
Test build #44755 has finished for PR 9344 at commit
|
Test build #44770 has finished for PR 9344 at commit
|
Jenkins, retest this please. |
Test build #44793 has finished for PR 9344 at commit
|
Jenkins, retest this please. |
Test build #44843 has finished for PR 9344 at commit
|
@JoshRosen Had done a round, once you address these comments, I think it's good to go. |
@davies, I've updated this PR to incorporate your feedback. In the process, I found and fixed a minor bug. |
One more minor thing that I might want to address: adding documentation for the new configuration. I'll do that now. |
Actually, I may want to defer the user-facing configuration to a followup since I still might want to rename it. Will add a small followup task so I don't forget. |
*/ | ||
abstract class MemoryPool(lock: Object) { | ||
|
||
@GuardedBy("lcok") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
LGTM, pending on tests. |
Test build #45260 has finished for PR 9344 at commit
|
Test build #45262 has finished for PR 9344 at commit
|
Test build #45265 has finished for PR 9344 at commit
|
Test build #2000 has finished for PR 9344 at commit
|
You can ignore the Python failure - since I broke the build. The test should be ok for this pr. |
Alright, going to merge this now. |
In order to lay the groundwork for proper off-heap memory support in SQL / Tungsten, we need to extend our MemoryManager to perform bookkeeping for off-heap memory. ## User-facing changes This PR introduces a new configuration, `spark.memory.offHeapSize` (name subject to change), which specifies the absolute amount of off-heap memory that Spark and Spark SQL can use. If Tungsten is configured to use off-heap execution memory for allocating data pages, then all data page allocations must fit within this size limit. ## Internals changes This PR contains a lot of internal refactoring of the MemoryManager. The key change at the heart of this patch is the introduction of a `MemoryPool` class (name subject to change) to manage the bookkeeping for a particular category of memory (storage, on-heap execution, and off-heap execution). These MemoryPools are not fixed-size; they can be dynamically grown and shrunk according to the MemoryManager's policies. In StaticMemoryManager, these pools have fixed sizes, proportional to the legacy `[storage|shuffle].memoryFraction`. In the new UnifiedMemoryManager, the sizes of these pools are dynamically adjusted according to its policies. There are two subclasses of `MemoryPool`: `StorageMemoryPool` manages storage memory and `ExecutionMemoryPool` manages execution memory. The MemoryManager creates two execution pools, one for on-heap memory and one for off-heap. Instances of `ExecutionMemoryPool` manage the logic for fair sharing of their pooled memory across running tasks (in other words, the ShuffleMemoryManager-like logic has been moved out of MemoryManager and pushed into these ExecutionMemoryPool instances). I think that this design is substantially easier to understand and reason about than the previous design, where most of these responsibilities were handled by MemoryManager and its subclasses. To see this, take at look at how simple the logic in `UnifiedMemoryManager` has become: it's now very easy to see when memory is dynamically shifted between storage and execution. ## TODOs - [x] Fix handful of test failures in the MemoryManagerSuites. - [x] Fix remaining TODO comments in code. - [ ] Document new configuration. - [x] Fix commented-out tests / asserts: - [x] UnifiedMemoryManagerSuite. - [x] Write tests that exercise the new off-heap memory management policies. Author: Josh Rosen <joshrosen@databricks.com> Closes #9344 from JoshRosen/offheap-memory-accounting. (cherry picked from commit 30b706b) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* | ||
* @return the number of bytes granted to the task. | ||
*/ | ||
def acquireMemory(numBytes: Long, taskAttemptId: Long): Long = lock.synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using ReadWriteLock (for lock) to improve performance ?
This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251. (cherry picked from commit 23a9e62) Signed-off-by: Andrew Or <andrew@databricks.com>
This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (apache#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#10237 from JoshRosen/SPARK-12251.
In order to lay the groundwork for proper off-heap memory support in SQL / Tungsten, we need to extend our MemoryManager to perform bookkeeping for off-heap memory.
User-facing changes
This PR introduces a new configuration,
spark.memory.offHeapSize
(name subject to change), which specifies the absolute amount of off-heap memory that Spark and Spark SQL can use. If Tungsten is configured to use off-heap execution memory for allocating data pages, then all data page allocations must fit within this size limit.Internals changes
This PR contains a lot of internal refactoring of the MemoryManager. The key change at the heart of this patch is the introduction of a
MemoryPool
class (name subject to change) to manage the bookkeeping for a particular category of memory (storage, on-heap execution, and off-heap execution). These MemoryPools are not fixed-size; they can be dynamically grown and shrunk according to the MemoryManager's policies. In StaticMemoryManager, these pools have fixed sizes, proportional to the legacy[storage|shuffle].memoryFraction
. In the new UnifiedMemoryManager, the sizes of these pools are dynamically adjusted according to its policies.There are two subclasses of
MemoryPool
:StorageMemoryPool
manages storage memory andExecutionMemoryPool
manages execution memory. The MemoryManager creates two execution pools, one for on-heap memory and one for off-heap. Instances ofExecutionMemoryPool
manage the logic for fair sharing of their pooled memory across running tasks (in other words, the ShuffleMemoryManager-like logic has been moved out of MemoryManager and pushed into these ExecutionMemoryPool instances).I think that this design is substantially easier to understand and reason about than the previous design, where most of these responsibilities were handled by MemoryManager and its subclasses. To see this, take at look at how simple the logic in
UnifiedMemoryManager
has become: it's now very easy to see when memory is dynamically shifted between storage and execution.TODOs