You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I check the last PR in spark-deep-learning is KerasImageFileEstimator, and when i review the code, i find it will collect all trainning data to driver and then broadcast to executors. This means all tranning data should fit in one server memory and it will definitely not work in real world especially when deep learning is a data-hungry ML algrithom.
Maybe we can write tranning data to a distributed message queue eg. Kafka, then we invoke tf queue to recevie data from kafka and consume data from tf queue when tf session starts.
We also can put data in HDFS as optional , but Message Queue sees to be a perfect choice.
The text was updated successfully, but these errors were encountered:
allwefantasy
changed the title
[PR:#45] To Avoid collect tranning data to driver and broadcast them
[PR:#45] To Avoid collect trainning data to driver and broadcast them
Sep 29, 2017
allwefantasy
changed the title
[PR:#45] To Avoid collect trainning data to driver and broadcast them
[PR:#45] To Avoid collecting trainning data to driver and broadcasting them
Sep 29, 2017
I check the last PR in spark-deep-learning is KerasImageFileEstimator, and when i review the code, i find it will collect all trainning data to driver and then broadcast to executors. This means all tranning data should fit in one server memory and it will definitely not work in real world especially when deep learning is a data-hungry ML algrithom.
Maybe we can write tranning data to a distributed message queue eg. Kafka, then we invoke tf queue to recevie data from kafka and consume data from tf queue when tf session starts.
We also can put data in HDFS as optional , but Message Queue sees to be a perfect choice.
The text was updated successfully, but these errors were encountered: