Skip to content

[PR:#45] To Avoid collecting trainning data to driver and broadcasting them  #53

Open
@allwefantasy

Description

@allwefantasy

I check the last PR in spark-deep-learning is KerasImageFileEstimator, and when i review the code, i find it will collect all trainning data to driver and then broadcast to executors. This means all tranning data should fit in one server memory and it will definitely not work in real world especially when deep learning is a data-hungry ML algrithom.

Maybe we can write tranning data to a distributed message queue eg. Kafka, then we invoke tf queue to recevie data from kafka and consume data from tf queue when tf session starts.

class KerasImageFileEstimator(Estimator, HasInputCol, HasInputImageNodeName,
                              HasOutputCol, HasOutputNodeName, HasLabelCol,
                              HasKerasModel, HasKerasOptimizer, HasKerasLoss,
                              CanLoadImage, HasOutputModeDistributedModel="ParamsParallel", KafkaServer="127.0.0.1"):

We also can put data in HDFS as optional , but Message Queue sees to be a perfect choice.

Activity

changed the title [-][PR:#45] To Avoid collect tranning data to driver and broadcast them [/-] [+][PR:#45] To Avoid collect trainning data to driver and broadcast them [/+] on Sep 29, 2017
changed the title [-][PR:#45] To Avoid collect trainning data to driver and broadcast them [/-] [+][PR:#45] To Avoid collecting trainning data to driver and broadcasting them [/+] on Sep 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @allwefantasy@phi-dbq

        Issue actions

          [PR:#45] To Avoid collecting trainning data to driver and broadcasting them · Issue #53 · databricks/spark-deep-learning