[SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive #19663

yaooqinn · 2017-11-06T05:33:15Z

What changes were proposed in this pull request?

When I ran self contained sql apps, such as

import org.apache.spark.sql.SparkSession

object ShowHiveTables {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName("Show Hive Tables")
      .enableHiveSupport()
      .getOrCreate()
    spark.sql("show tables").show()
    spark.stop()
  }
}

with yarn cluster mode and hive-site.xml correctly within $SPARK_HOME/conf,they failed to connect the right hive metestore for not seeing hive-site.xml in AM/Driver's classpath.

Although submitting them with --files/--jars local/path/to/hive-site.xml or puting it to $HADOOP_CONF_DIR/YARN_CONF_DIR can make these apps works well in cluster mode as client mode, according to the official doc, see @ http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

We may respect these configuration files too or modify the doc for hive-tables in cluster mode.

How was this patch tested?

cc @cloud-fan @gatorsmile

jerryshao · 2017-11-06T08:24:24Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

+    if (dir.isDirectory) {
+      val files = dir.listFiles(new FileFilter {
+        override def accept(pathname: File): Boolean = {
+          pathname.isFile && pathname.getName.endsWith("xml")


Shall we explicitly match the file name, like "hive-site.xml"? Looks like only checking file name ended with "xml" will also include other unwanted files indefinitely.

According to the doc,

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

here we may not only get hive-site.xml

Yes, I understand. My question is that do we need to explicitly check the expected file names, rather than blindly match any xml file?

I guess that we do not check the $HADOOP(YARN)_CONF_DIR either

jerryshao · 2017-11-06T08:25:24Z

Please also add [YARN] tag to the PR title, this is actually a yarn problem.

cloud-fan · 2017-11-06T12:27:02Z

ok to test

cloud-fan · 2017-11-06T12:31:13Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

+          pathname.isFile && pathname.getName.endsWith("xml")
+        }
+      })
+      files.foreach { f => hadoopConfFiles(f.getName) = f }


This indicates files in SPARK_CONF_DIR have higher priority than HADOOP_CONF_DIR or YARN_CONF_DIR, is it expected?

yes, we follow that order to build classpath. plz check

spark/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java

Line 134 in 12ab7f7

List<String> buildClassPath(String appClassPath) throws IOException {

SparkQA · 2017-11-06T13:27:25Z

Test build #83487 has finished for PR 19663 at commit e56f039.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2017-11-06T14:26:20Z

This is not SPARK-21888. please file a separate jira for this issue. SPARK-21888 is meant to add things to the client classpath on the gateway/launcher box.

SparkQA · 2017-11-07T05:15:36Z

Test build #83529 has finished for PR 19663 at commit 6e93d74.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-11-08T00:21:48Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

@@ -687,6 +687,20 @@ private[spark] class Client(
  private def createConfArchive(): File = {
    val hadoopConfFiles = new HashMap[String, File]()

+    // SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR
+    val localConfDir = System.getProperty("SPARK_CONF_DIR",


SPARK_CONF_DIR is set by Spark's launch scripts, so you should just be able to do:

sys.env.get("SPARK_CONF_DIR").foreach { ... }

not exactly till now , plz check #19688

So it's being fixed and you can apply my suggestion, no?

ok, thanks for you advise

vanzin · 2017-11-08T00:22:09Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

+    if (dir.isDirectory) {
+      val files = dir.listFiles(new FileFilter {
+        override def accept(pathname: File): Boolean = {
+          pathname.isFile && pathname.getName.endsWith("xml")


vanzin · 2017-11-08T00:23:16Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

+      files.foreach { f => hadoopConfFiles(f.getName) = f }
+    }
+
+    // Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files


This comment doesn't make a lot of sense, at least not in this position. What are you trying to say?

ok, i'd remove it

SparkQA · 2017-11-08T03:01:01Z

Test build #83576 has finished for PR 19663 at commit f8c1f63.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ute archive

SparkQA · 2017-11-09T06:50:44Z

Test build #83632 has finished for PR 19663 at commit 61b342c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-11-09T08:22:56Z

thanks, merging to master!

yaooqinn changed the title ~~[SPARK-21888][Hive]add hadoop/hive/hdfs configuration files in SPARK_CONF_DIR to distribute archive~~ [SPARK-21888][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017

yaooqinn changed the title ~~[SPARK-21888][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive~~ [SPARK-21888][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017

jerryshao reviewed Nov 6, 2017

View reviewed changes

yaooqinn changed the title ~~[SPARK-21888][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive~~ [SPARK-21888][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017

cloud-fan reviewed Nov 6, 2017

View reviewed changes

yaooqinn changed the title ~~[SPARK-21888][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive~~ [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 7, 2017

vanzin reviewed Nov 8, 2017

View reviewed changes

jerryshao mentioned this pull request Nov 8, 2017

[SPARK-22466][Spark Submit]export SPARK_CONF_DIR while conf is default #19688

Closed

yaooqinn added 6 commits November 9, 2017 13:56

add hadoop/hive/hdfs configuration files in SPARK_CONF_DIR to distrib…

d22f76f

…ute archive

SPARK_CONF_DIR has no default value

2520d58

file.sep

8378028

ut

a15718d

review

c9b2a91

after solve SPARK_CONF_DIR missing iusse

61b342c

asfgit closed this in c755b0d Nov 9, 2017

[SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive #19663

[SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive #19663

Conversation

yaooqinn commented Nov 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jerryshao Nov 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Nov 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryshao commented Nov 6, 2017

Uh oh!

cloud-fan commented Nov 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2017

Uh oh!

tgravescs commented Nov 6, 2017

Uh oh!

SparkQA commented Nov 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 8, 2017

Uh oh!

SparkQA commented Nov 9, 2017

Uh oh!

cloud-fan commented Nov 9, 2017

Uh oh!

yaooqinn commented Nov 6, 2017 •

edited

Loading

jerryshao Nov 6, 2017 •

edited

Loading

yaooqinn Nov 6, 2017 •

edited

Loading