Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR:#45] To Avoid collecting trainning data to driver and broadcasting them #53

Open
allwefantasy opened this issue Sep 29, 2017 · 0 comments

Comments

@allwefantasy
Copy link

allwefantasy commented Sep 29, 2017

I check the last PR in spark-deep-learning is KerasImageFileEstimator, and when i review the code, i find it will collect all trainning data to driver and then broadcast to executors. This means all tranning data should fit in one server memory and it will definitely not work in real world especially when deep learning is a data-hungry ML algrithom.

Maybe we can write tranning data to a distributed message queue eg. Kafka, then we invoke tf queue to recevie data from kafka and consume data from tf queue when tf session starts.

class KerasImageFileEstimator(Estimator, HasInputCol, HasInputImageNodeName,
                              HasOutputCol, HasOutputNodeName, HasLabelCol,
                              HasKerasModel, HasKerasOptimizer, HasKerasLoss,
                              CanLoadImage, HasOutputModeDistributedModel="ParamsParallel", KafkaServer="127.0.0.1"):

We also can put data in HDFS as optional , but Message Queue sees to be a perfect choice.

@allwefantasy allwefantasy changed the title [PR:#45] To Avoid collect tranning data to driver and broadcast them [PR:#45] To Avoid collect trainning data to driver and broadcast them Sep 29, 2017
@allwefantasy allwefantasy changed the title [PR:#45] To Avoid collect trainning data to driver and broadcast them [PR:#45] To Avoid collecting trainning data to driver and broadcasting them Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants