Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept underscores in unicode escapes #43716

Merged
merged 1 commit into from Sep 12, 2017
Merged

Conversation

MaloJaffre
Copy link
Contributor

Fixes #43692.

I don't know if this need an RFC, but at least the impl is here!

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

///
/// At this point, we have already seen the \ and the u, the { is the current character. We
/// will read at least one digit, and up to 6, and pass over the }.
/// At this point, we have already seen the `\` and the `u`, the `{` is the current character. We
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is unhappy with this line because it is too long 🙂

[00:03:14] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:968: line longer than 100 chars
[00:03:15] some tidy checks failed

@kennytm
Copy link
Member

kennytm commented Aug 7, 2017

issue-43692.rs should be a run-pass test, not a compile-fail test. It should check (assert_eq) if '\u{10__FFFF}' and "\u{10_F0FF}foo\u{1_0_0_0}" are equivalent to their no-underscore counterparts.

@arielb1
Copy link
Contributor

arielb1 commented Aug 8, 2017

r? @petrochenkov

@arielb1 arielb1 assigned petrochenkov and unassigned nikomatsakis Aug 8, 2017
@arielb1 arielb1 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 8, 2017
@petrochenkov
Copy link
Contributor

petrochenkov commented Aug 9, 2017

@SimonSapin what do you think about this?

Also ping @rust-lang/lang
This probably doesn't need a full RFC, but will certainly need an FCP.

@SimonSapin
Copy link
Contributor

Making numeric escape sequences consistent with integer literals makes sense to me. 👍

@joshtriplett
Copy link
Member

Please don't allow prefixes or suffixes, but otherwise this seems like a great idea.

@petrochenkov
Copy link
Contributor

@MaloJaffre
Could you somehow share the lexing code between unicode escapes and normal hexadecimal literals to ensure the rules are identical?
For example, scan_digits can be reused for unicode escapes.
Unicode escapes are more restrictive, but the restrictions could be enforced after a unicode escape is lexed (this can also give better error reporting and recovery).

@MaloJaffre
Copy link
Contributor Author

MaloJaffre commented Aug 13, 2017

Thanks for the suggestion @petrochenkov.
I've also rebased on master.

Edit: Travis failure looks spurious (workers failed to start)

loop {
match self.ch {
Some('}') => {
if valid && count == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if count == 0 would give the same result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because in the case \u{#}, we don't want to say that the escape is empty, so we check there was no invalid characters before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, this is in a loop, okay then.

self.err_span_char(start_bpos,
self.pos,
"invalid character in unicode escape",
c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error can now be reported a lot of times in case of unterminated unicode escapes.
It probably should be reported only the first time.

diag.struct_span_err(span, "invalid unicode character escape")
.help("unicode escape must be at most 10FFFF")
.emit();
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid an changing the return type to option here and just return something like Replacement character U+FFFD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaloJaffre
Could you also squash commits after updating the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @petrochenkov!

Ok, I will shortly do another round of changes and squash everything.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 17, 2017
@petrochenkov
Copy link
Contributor

Implementation LGTM, modulo comments.

@rfcbot fcp merge

@petrochenkov
Copy link
Contributor

I have no rights for @rfcbot, could someone start an FCP?

@MaloJaffre
Copy link
Contributor Author

MaloJaffre commented Aug 17, 2017

@petrochenkov Done.
I've also added a more precise help message for surrogates.

Edit: Travis failure looks spurious (OSX jobs failed to start).

@petrochenkov petrochenkov removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 20, 2017
@MaloJaffre
Copy link
Contributor Author

MaloJaffre commented Aug 25, 2017

Friendly ping @nikomatsakis, to start a FCP, if there are no concerns about the implementation.

@petrochenkov
Copy link
Contributor

petrochenkov commented Aug 25, 2017

@MaloJaffre
There's one more thing that I forgot about - this needs a feature gate (unless the lang team decides it doesn't), #[feature(unicode_escape_underscores)] or something.
See 50ecee2 for an example of how to add it.
"Parse session" is available from the lexer, so it shouldn't be a problem (I think).

@SimonSapin
Copy link
Contributor

SimonSapin commented Aug 25, 2017

I think this is fine without a feature gate. (Though I’m not in any team that would make that decision.)

@aturon
Copy link
Member

aturon commented Aug 25, 2017

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Aug 25, 2017

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 25, 2017
self.bump();
count += 1;
if let Some('_') = self.ch {
// disallow leading `_`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a compile-fail test checking that leading _ is disallowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a parse-fail test that checks that, do I need to move it to compile-fail?

@rfcbot
Copy link

rfcbot commented Sep 1, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 1, 2017
@rfcbot
Copy link

rfcbot commented Sep 11, 2017

The final comment period is now complete.

@petrochenkov
Copy link
Contributor

The final comment period is now complete.

@bors r+

@bors
Copy link
Contributor

bors commented Sep 11, 2017

📌 Commit d4e0e52 has been approved by petrochenkov

@bors
Copy link
Contributor

bors commented Sep 12, 2017

⌛ Testing commit d4e0e52 with merge 11f64d8...

bors added a commit that referenced this pull request Sep 12, 2017
Accept underscores in unicode escapes

Fixes #43692.

I don't know if this need an RFC, but at least the impl is here!
@bors
Copy link
Contributor

bors commented Sep 12, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: petrochenkov
Pushing 11f64d8 to master...

@bors bors merged commit d4e0e52 into rust-lang:master Sep 12, 2017
@MaloJaffre MaloJaffre deleted the _-in-literals branch September 12, 2017 05:08
@chris-morgan
Copy link
Member

chris-morgan commented Sep 20, 2017

It is worth noting that most syntax highlighters will need updating to support this. (I just did Vim.)

We need something like a mailing list for syntax highlighters where syntax changes can be announced.

Regular expression highlighters will now need something like \\u\{(?:\x_*){1,6}\}.

chris-morgan added a commit to rust-lang/rust.vim that referenced this pull request Sep 20, 2017
gnomesysadmins pushed a commit to GNOME/gtksourceview that referenced this pull request Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet