Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router.add_resource(), regexps and curly braces #1778

Closed
socketpair opened this issue Apr 1, 2017 · 16 comments
Closed

router.add_resource(), regexps and curly braces #1778

socketpair opened this issue Apr 1, 2017 · 16 comments

Comments

@socketpair
Copy link
Contributor

socketpair commented Apr 1, 2017

Well. We stated in the docs:

image

What if regexp contains curly braces ?

Is it correct (now, and in future) to write regexps like:

resource = app.router.add_resource(r'/{name:\}+}')
resource = app.router.add_resource(r'/{name:[}]+}')

Technically speaking, they are correct regexps, but will aiohttp's resource format parser eat them?

In other words, it is not said which symbols (and how) must be escaped (from parser) in these regexps.

@justanr
Copy link

justanr commented Apr 2, 2017

I've been able to use curly brackets in regexes, though this should be clarified in the documentation

@fafhrd91
Copy link
Member

fafhrd91 commented Apr 6, 2017

could anyone create PR?

@socketpair
Copy link
Contributor Author

Ans also, can anyone explain how route parsing is done ? regexps from sources are very complex. I can't prove that they don't have bugs.

@fafhrd91
Copy link
Member

fafhrd91 commented Apr 8, 2017

I don't really know, @asvetlov wrote this part.

@fafhrd91
Copy link
Member

@asvetlov could you review this?

@justanr
Copy link

justanr commented Apr 15, 2017

Doing some research, it looks like the two examples given in the original report produce invalid route errors -- e.g. the default aiohttp router refuses to build the route map. However, a route like {id:\d{4}} builds just fine.

Edit: Looks like the the ROUTE_RE pattern on the UrlDispatcher is what needs examining: r'(\{[_a-zA-Z][^{}]*(?:\{[^{}]*\}[^{}]*)*\})')

Any unmatched { or } in a routing regex will cause the build error -- the error message is a little cryptic though: ValueError: Invalid path '/'['+}']

@asvetlov
Copy link
Member

I'm inclining to forbidding curved brackets inside embedded regexps.

@justanr
Copy link

justanr commented Apr 15, 2017

I think that's a move too far, though I could be bias since I have date based routes that use the {m,n} matching. I could move the actual matching groups inside the handler, but if the router can ward off unnecessary matches for me I'd rather use that.

@socketpair
Copy link
Contributor Author

Maybe, we should invent some escaping? if so, any regexp can be used.

@socketpair
Copy link
Contributor Author

socketpair commented Apr 16, 2017

For example, in sed, regexp borders may be specified by ANY symbol. for example, /regexp/ or |regexp| or even @regexp@. Additionally, if same (as border one) symbol should be used in regexp, it should be escaped using backslash, i.e. @some\@regexp@ is exactly the same as |some@regexp|

@fafhrd91
Copy link
Member

Is there any good routing open source libraries? I would replace our own implementation with some good library

@kxepal
Copy link
Member

kxepal commented Apr 16, 2017

I don't think there is a reason to do that for current router. It has own flaws, but afaik it was never pretended to be very fast, optimal and featured. Just the one that works in most of cases and easy to be implemented.

Instead of making it more complicated and slower, may be take a look on possible alternatives?

P.S. As for this issue, it seems to be possible solved by working with path segments instead of whole path and few more stringy work to decouple identifier and regexp matching. Curved brackets there are just a marker to strip and go.

@kxepal
Copy link
Member

kxepal commented Apr 16, 2017

@fafhrd91
Few from my bookmarks:
https://github.com/c9s/r3
https://github.com/nitely/kua

Though, not sure about good.

@fafhrd91
Copy link
Member

I prefer c-library.

@Dith3r
Copy link

Dith3r commented Feb 6, 2019

Aiohttp supports only simple regexps and disallow to implement own router with different matching behavior. One of issue is to match for example UUID (/[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/) which is not possible now. With introduction of f-strings very complex regex could be presented in very simple manner. For example:

uuid = "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"

which could be used:

test_uuid_1 = f"(?P<test_uuid_1>{uuid})"
test_uuid_2 = f"(?P<test_uuid_2>{uuid})"

router.add_route("*", f'/test/{test_uuid_1}/{test_uuid_2}', test_handler)

To recreate reverse url:

def reverse_regex(compiled: Pattern) -> str:
          reversed = []
          indexes_names = {value:key for key, value in compiled.groupindex.items()}
     
          for part in sre_parse.parse(compiled.pattern):
              code, data = part
              if code == sre_constants.SUBPATTERN:
                  index, *_ = data
                  reversed.append(f"{{{indexes_names[index]}}}")
              elif code == sre_constants.LITERAL:
                  reversed.append(chr(data))
          return "".join(reversed)

Example: '/test/(?P<test_uuid_1>[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})/(?P<test_uuid_2>[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})' reversed to '/test/{test_uuid_1}/{test_uuid_2}'. Inner capture groups are ignored.

This doesn't cover all regex possibilites, but allow to define more complex scenarios.

@asvetlov
Copy link
Member

I prefer keeping simple regexps only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants