Closed
Description
Starting from 2.0 requests does not send filename
attribute of Content-Disposition
header for multipart files with unicode names. Instead of this attribute with name filename*
is sent.
# requests 1.2.3
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
'--db7a9522a6344e26a4ca2933aecad887\r\nContent-Disposition: form-data; name="file"; filename="\xd1\x84\xd0\xb0\xd0\xb9\xd0\xbb"\r\nContent-Type: application/octet-stream\r\n\r\n123\r\n--db7a9522a6344e26a4ca2933aecad887--\r\n'
# requests 2.0
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
'--a9f0de2871da46df86140bc5b72fc722\r\nContent-Disposition: form-data; name="file"; filename*=utf-8\'\'%D1%84%D0%B0%D0%B9%D0%BB\r\n\r\n123\r\n--a9f0de2871da46df86140bc5b72fc722--\r\n'
And this is a big problem, because looks like some systems does not recognize such fields as files. At least we encountered a problem with Django. Django places entire file's content in request.POST
instead of request.FILES
. It is clear from sources:
https://github.com/django/django/blob/1.7c1/django/http/multipartparser.py#L599-L601
Activity
sigmavirus24 commentedon Jul 2, 2014
This is actually a bug in Django then. Using this syntax is what we're supposed to be using. We have to indicate to the server that we're sending a field whose content is not ASCII or Latin-1 (ISO-8859-1). The proper way to do so is the syntax you see there. It is defined in Section 3.2.1 of RFC 5987. This bug should be filed against Django instead for not conforming to the proper handling of that value. Edit Note specifically that
parameter
is redefined asreg-parameter
orext-parameter
whereext-parameter
is defined asparmname
(e.g.,filename
) concatenated with a*
character followed by = and theext-value
. This confirms that this is the proper handling of those header values. I believe this RFC also is applied to MIME header values which are what you're using in yourmultipart/form-data
upload.homm commentedon Jul 2, 2014
I test php 5.5.9 server, and it also does not understand form-data with
filename*
attribute as files. I think requests should be close to real world then to rfc, especially when compromise is possible. How about both attributes?Lukasa commentedon Jul 2, 2014
It's unclear what the compromise should be. What text encoding should we use?
homm commentedon Jul 2, 2014
Compromise is using both attributes. Encoding is not so important. It can be even
filename=blah.bin
, this will be enough for server to recognize files.Lukasa commentedon Jul 2, 2014
Filename is hugely important. If we just make random choices, I guarantee that people will simply raise a new bug report claiming that we mangled their filenames.
If you don't like the way we do it, encode the filename yourself. =)
homm commentedon Jul 2, 2014
Feel the difference: "I can't even upload file to majority of servers" and "server does not understand file name".
There are two cases:
filename*
property. No problems.homm commentedon Jul 2, 2014
By the way, before 2.0 requests use some encoding, and as I know everything works well.
Lukasa commentedon Jul 2, 2014
Resist the temptation to say "Understands unicode". That phrase is meaningless when it comes to networked connections. The server needs to understand the specific text encoding we've chosen, and there is no consistency here. We will get this wrong, and what happens in this situation is totally unclear. Some servers will decode the text and get lucky, and read the filename as something totally obtuse and unclear, and that would be terrible.
I would rather fail clearly and allow the user to make the choice than to guess and get it wrong. This is not PHP.
sigmavirus24 commentedon Jul 2, 2014
How would using both attributes work? If you read the grammar I linked to, it updates the meaning to say that you can only send one, i.e., either
filename=
orfilename*=
. If you send both, you're sending the wrong thing.It worked purely by chance that those characters could be encoded to bytes. We were just sending whatever the bytestring was (if you look at the string it is what you would get if you took your original string and called
encode('<encoding>')
. You get the literal bytes and that's what we sent. You could potentially still encode it yourself and then pass that as the filename. You really should only be giving us bytes anyway.Seeing as this is not a bug, and there is no good reason for us to change how we handle what is given us by the user, I'm closing this.
homm commentedon Jul 2, 2014
This is totally wrong.
sigmavirus24 commentedon Jul 2, 2014
Interesting. I had forgotten that we could send both. Regardless, we can't provide an ASCII representation for you because we can't guess at how to generate ASCII from what you give us. You would still have to generate the value yourself because there's no sane way to provide that functionality via an API.
homm commentedon Jul 3, 2014
When I try to encode filename by myself as requests 1.2.3 does, I've got a error:
I can't use just ascii representation for
filename
, because there is no way to send bothfilename
with ascii and encodedfilename*
.Lukasa commentedon Jul 3, 2014
There's so much going on in that line of code I don't know where the breakage is. Can you split that up into multiple lines so I can see which action causes that traceback?
homm commentedon Jul 3, 2014
22 remaining items