-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive #19663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (dir.isDirectory) { | ||
val files = dir.listFiles(new FileFilter { | ||
override def accept(pathname: File): Boolean = { | ||
pathname.isFile && pathname.getName.endsWith("xml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we explicitly match the file name, like "hive-site.xml"? Looks like only checking file name ended with "xml" will also include other unwanted files indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the doc,
Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.
here we may not only get hive-site.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand. My question is that do we need to explicitly check the expected file names, rather than blindly match any xml file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we do not check the $HADOOP(YARN)_CONF_DIR
either
Please also add [YARN] tag to the PR title, this is actually a yarn problem. |
ok to test |
pathname.isFile && pathname.getName.endsWith("xml") | ||
} | ||
}) | ||
files.foreach { f => hadoopConfFiles(f.getName) = f } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indicates files in SPARK_CONF_DIR have higher priority than HADOOP_CONF_DIR or YARN_CONF_DIR
, is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we follow that order to build classpath. plz check
spark/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
Line 134 in 12ab7f7
List<String> buildClassPath(String appClassPath) throws IOException { |
Test build #83487 has finished for PR 19663 at commit
|
This is not SPARK-21888. please file a separate jira for this issue. SPARK-21888 is meant to add things to the client classpath on the gateway/launcher box. |
Test build #83529 has finished for PR 19663 at commit
|
@@ -687,6 +687,20 @@ private[spark] class Client( | |||
private def createConfArchive(): File = { | |||
val hadoopConfFiles = new HashMap[String, File]() | |||
|
|||
// SPARK_CONF_DIR shows up in the classpath before HADOOP_CONF_DIR/YARN_CONF_DIR | |||
val localConfDir = System.getProperty("SPARK_CONF_DIR", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK_CONF_DIR
is set by Spark's launch scripts, so you should just be able to do:
sys.env.get("SPARK_CONF_DIR").foreach { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly till now , plz check #19688
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's being fixed and you can apply my suggestion, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for you advise
if (dir.isDirectory) { | ||
val files = dir.listFiles(new FileFilter { | ||
override def accept(pathname: File): Boolean = { | ||
pathname.isFile && pathname.getName.endsWith("xml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
".xml"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
files.foreach { f => hadoopConfFiles(f.getName) = f } | ||
} | ||
|
||
// Ensure HADOOP_CONF_DIR/YARN_CONF_DIR not overriding existing files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't make a lot of sense, at least not in this position. What are you trying to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i'd remove it
Test build #83576 has finished for PR 19663 at commit
|
Test build #83632 has finished for PR 19663 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
When I ran self contained sql apps, such as
with yarn cluster mode and
hive-site.xml
correctly within$SPARK_HOME/conf
,they failed to connect the right hive metestore for not seeing hive-site.xml in AM/Driver's classpath.Although submitting them with
--files/--jars local/path/to/hive-site.xml
or puting it to$HADOOP_CONF_DIR/YARN_CONF_DIR
can make these apps works well in cluster mode as client mode, according to the official doc, see @ http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tablesWe may respect these configuration files too or modify the doc for hive-tables in cluster mode.
How was this patch tested?
cc @cloud-fan @gatorsmile