Skip to content

[SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive #19663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 6, 2017

What changes were proposed in this pull request?

When I ran self contained sql apps, such as

import org.apache.spark.sql.SparkSession

object ShowHiveTables {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName("Show Hive Tables")
      .enableHiveSupport()
      .getOrCreate()
    spark.sql("show tables").show()
    spark.stop()
  }
}

with yarn cluster mode and hive-site.xml correctly within $SPARK_HOME/conf,they failed to connect the right hive metestore for not seeing hive-site.xml in AM/Driver's classpath.

Although submitting them with --files/--jars local/path/to/hive-site.xml or puting it to $HADOOP_CONF_DIR/YARN_CONF_DIR can make these apps works well in cluster mode as client mode, according to the official doc, see @ http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

We may respect these configuration files too or modify the doc for hive-tables in cluster mode.

How was this patch tested?

cc @cloud-fan @gatorsmile

@yaooqinn yaooqinn changed the title [SPARK-21888][Hive]add hadoop/hive/hdfs configuration files in SPARK_CONF_DIR to distribute archive [SPARK-21888][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017
@yaooqinn yaooqinn changed the title [SPARK-21888][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive [SPARK-21888][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017
if (dir.isDirectory) {
val files = dir.listFiles(new FileFilter {
override def accept(pathname: File): Boolean = {
pathname.isFile && pathname.getName.endsWith("xml")
Copy link
Contributor

@jerryshao jerryshao Nov 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we explicitly match the file name, like "hive-site.xml"? Looks like only checking file name ended with "xml" will also include other unwanted files indefinitely.

Copy link
Member Author

@yaooqinn yaooqinn Nov 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the doc,

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

here we may not only get hive-site.xml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. My question is that do we need to explicitly check the expected file names, rather than blindly match any xml file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that we do not check the $HADOOP(YARN)_CONF_DIR either

@jerryshao
Copy link
Contributor

Please also add [YARN] tag to the PR title, this is actually a yarn problem.

@yaooqinn yaooqinn changed the title [SPARK-21888][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive [SPARK-21888][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 6, 2017
@cloud-fan
Copy link
Contributor

ok to test

pathname.isFile && pathname.getName.endsWith("xml")
}
})
files.foreach { f => hadoopConfFiles(f.getName) = f }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indicates files in SPARK_CONF_DIR have higher priority than HADOOP_CONF_DIR or YARN_CONF_DIR, is it expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we follow that order to build classpath. plz check

List<String> buildClassPath(String appClassPath) throws IOException {

@SparkQA
Copy link

SparkQA commented Nov 6, 2017

Test build #83487 has finished for PR 19663 at commit e56f039.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

This is not SPARK-21888. please file a separate jira for this issue. SPARK-21888 is meant to add things to the client classpath on the gateway/launcher box.

@yaooqinn yaooqinn changed the title [SPARK-21888][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive Nov 7, 2017
@SparkQA
Copy link

SparkQA commented Nov 7, 2017

Test build #83529 has finished for PR 19663 at commit 6e93d74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -687,6 +687,20 @@ private[spark] class Client(
private def createConfArchive(): File = {
val hadoopConfFiles = new HashMap[String, File]()

// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR
val localConfDir = System.getProperty("SPARK_CONF_DIR",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK_CONF_DIR is set by Spark's launch scripts, so you should just be able to do:

sys.env.get("SPARK_CONF_DIR").foreach { ... }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly till now , plz check #19688

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's being fixed and you can apply my suggestion, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for you advise

if (dir.isDirectory) {
val files = dir.listFiles(new FileFilter {
override def accept(pathname: File): Boolean = {
pathname.isFile && pathname.getName.endsWith("xml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

".xml"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

files.foreach { f => hadoopConfFiles(f.getName) = f }
}

// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make a lot of sense, at least not in this position. What are you trying to say?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i'd remove it

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83576 has finished for PR 19663 at commit f8c1f63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2017

Test build #83632 has finished for PR 19663 at commit 61b342c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in c755b0d Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants