-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Right now if someone wants to use sidekiq they must configure the number of threads they want to run on or use the default (25). If they are using a database (likely) then to not hit a connection timeout error they will need to have at least as many connections in their database pool as sidekiq threads. Currently they must configure these two values in two different places, database pool in database.yml
for ActiveRecord, and sidekiq connections either via command line or the sidekiq yml file. This is a problem as it is difficult to remember when you are modifying one value that you need to modify both.
In Rails 5 there is a default environment variable to configure the number of threads that will be in the database connection pool
# config/database.yml
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
This convention is used along side of web server configuration values. For example Rails 5 also ships with puma which is configured to read from this value:
# config/puma.rb
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }.to_i
threads threads_count, threads_count
It is common that the number of sidekiq threads run in a "worker" process would be greater than the number of web threads used by a server like Puma. For this reason it does not make sense to default the sidekiq connections to RAILS_MAX_THREADS
. However it would make sense for sidekiq to use this enviornment variable interface and se the environment variable:
# sidekiq/lib/<somewhere>.rb
ENV["RAILS_MAX_THREADS"] = Sidekiq.number_of_threads #pseudo code
So now when the application boots in a "worker" process via the sidekiq
command and we know that we are using Rails, the appropriate number of database connections will be configured by default.
Potential Problems
This behavior may be slightly unexpected (for Sidekiq to change an environment variable at boot time) however I believe it is perfectly warranted. As a mitigation step we could detect when an enviornment variable was already set
# sidekiq/lib/<somewhere>.rb
puts "Sidekiq is setting RAILS_MAX_THREADS=#{ Sidekiq.number_of_threads }, was previously #{ ENV['RAILS_MAX_THREADS'] }"
ENV["RAILS_MAX_THREADS"] = Sidekiq.number_of_threads #pseudo code
A warning or similar would eliminate any confusion. It would also allow the use of RAILS_MAX_THREADS
for when the web process is booted, and a different configuration option for use when sidekiq is booted.
People may want to opt out of this behavior. They can do this by modifying their database.yml
file to use a different environment variable
# config/database.yml
pool: <%= ENV['SIDEKIQ_CUSTOM_DB_SIZE'] || ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
So now they have the ability to not rely on the behavior of setting RAILS_MAX_THREADS
.
Recap
Right now if you try the default database size of 5 will artificially limit the default sidekiq size of 25. Applying this change means that Rails applications will work with sidekiq out of the box. It also means that as Sidekiq is scaled up or down that the database size will be set appropriately without additional action (if conventions are being followed). I believe the costs to be minimal and the benefits to outweigh them.
Activity
nateberkopec commentedon May 24, 2016
This strikes me as easier to solve with documentation than with code.
or
then
All that would be required would be a change wiki docs, and maybe a nice announcement in that fancy Sidekiq Newsletter ;)
edit: or just set
:concurrency
equal toRAILS_MAX_THREADS
. either way.mperham commentedon May 24, 2016
Thanks for the video conference this morning, @schneems.
I think having Sidekiq use this variable over the config settings is reasonable if it means the db pool will auto-size itself. The logic would be something like:
Procfile would just need to be (some pretty nice symmetry here):
WDYT?
nateberkopec commentedon May 24, 2016
That Procfile does look very nice!
In production, I'll bet a lot of people are running with db pools that are too small for their Sidekiq concurrency anyway. If modifying
RAILS_MAX_THREADS
inside Sidekiq will fix that, then this change makes sense.What about non-AR ORMs? I've never really used them, though, so I don't know how they configure their connection pools.
schneems commentedon May 24, 2016
I'm not sure about the procfile solution, it would require you re-deploy after every change, which is a bad experience.
Maybe something like this in
Procfile
:Then you could change
SIDEKIQ_CONCURRENCY
at will.Ideally I would like this to work out of the box without a custom procfile.
@jeremyevans any thoughts on @nateberkopec's comment? Would be cool if Sequel could Just work (TM) with Rails 5 + Puma like Active Record. Happy to take that conversation to another thread if you want.
jeremyevans commentedon May 24, 2016
Sequel can usually work fine with a significantly smaller connection pool than ActiveRecord due to the fact that it doesn't leak connections. That said, if someone wants to use the number of Sidekiq threads as the maximum number of Sequel database connections, they would just need to set use the
:max_connections=>Sidekiq.number_of_threads
Sequel::Database option.I'm against adding Rails-specific code to non-Rails-specific libraries as a matter of principle, just as I'm against adding Sequel-specific code to non-Sequel-specific libraries. Giving special treatment to Rails gives Rails an unjust advantage and makes it more difficult for alternatives to Rails to compete on merit. On the flipside, adding support for Rails, Sequel, ROM, Mongoid, etc. leads to a slippery slope.
In generic terms, unless library A depends on library B (either directly or via an adapter for library B that ships with library A), I don't think library A should have library B-specific code.
As a matter of good engineering practices, I think libraries should never set environment variables. I think reading library-specific environment variables is fine, but library A should not care about library B-specific environment variables.
sgrif commentedon May 25, 2016
That tends to help, yes. XD
schneems commentedon May 25, 2016
Ideally I agree, pragmatically I disagree. There's lots of libraries that only serve to integrate with another library. For example devise. There are also the in-between libraries that bridge an ideologically pure library and something else, for example sprockets-rails adds the bridges needed to integrate the two libraries. In this case I actually wish that sprockets shipped with a railtie instead, yes it would add coupling but also decrease overall complexity in an end user's app. Right now sprockets-rails has to support Sprockets 2-4 which cause problems and lots of potentially unneeded flow control.
My gold standard is the end user's experience. If giving them the best possible experience doesn't introduce a huge maintenance burdeon on a library, I would consider doing it. I have some libaries that don't do authorization but need it, for that end I integrate with devise if it's available, however I also provide custom configuration endpoints. One example is oPRO https://github.com/opro/opro#custom-auth. This gives anyone using the currently number one auth library for Rails a drop in solution, but doesn't prevent anyone else from using it.
In the future if other libraries wanted to add support directly into oPRO, there is already a pattern to follow. If the maintenance burdeon got too large or there were too many people adding in support for their random library, then I would probably start pushing back and asking people to make bridge libraries like
opro-devise
etc. sort of like omniauth's strategies.Keep in mind that I make a living working for Heroku maintaining software whose core value proposition is that it is highly knowledgable and coupled to external software. Doing this isn't always the easiest or cleanest proposition, but sometimes it's needed to provide the absolutely best experience. Over time as standards develop hopefully others see that value and start using standard interfaces for example https://blog.heroku.com/archives/2016/5/2/container_ready_rails_5.
On the note of the unfair advantage, it wouldn't be an advantge if it wasn't beneficial for users. I feel it's fine to have each library maintainer make a call on a case by case basis. I agree that docs can do a lot for integrations, but there's a huge difference between "1 step and it works" to working out of the box. Ideally we would have standard interfaces across libraries, a good example of this is
DATABASE_URL
environment variable. It is supported by many ORMs and across multiple languages.jeremyevans commentedon May 25, 2016
If the point of the library is to integrate with one or more other libraries, then as I mentioned earlier, I think it's fine if the library has code specific to the other libraries. That's not the case here, though. If you want to integrate sidekiq with rails, that sounds like a good idea for a sidekiq-rails gem. If you want to ship such support with sidekiq, have it in a separate file that is not required by default (an adapter file).
While the end-user experience is important, I don't believe it trumps good library design principles. I think it is fairly easy to understand that the philosophy of "whatever is good for the majority is what we should do", is harmful to the diversity of any ecosystem.
nateberkopec commentedon May 25, 2016
That's what a Railtie is?
This point is far stronger, IMO. Setting an environment variable at runtime is kinda weird to begin with, then overriding Rails' env variables from Sidekiq doubles that weirdness. My original proposed approach fixes that issue, with the downside of making it not zero-config/plug-n-play for Rails users.
jeremyevans commentedon May 25, 2016
I don't think there is any problem if sidekiq ships a railtie, or has Rails-specific code in a Rails-specific file that sidekiq doesn't load by default (if Rails loads sidekiq's railtie by default, that's fine). I think you only run into issues if you start doing things by default for the most popular library and not other competing alternatives. For example, I would consider code like this to be a problem: https://github.com/phusion/passenger/blob/085504c2e00b6e6322bb0ec97aa5ff43d037c729/src/ruby_supportlib/phusion_passenger/loader_shared_helpers.rb#L381
schneems commentedon May 31, 2016
First of all, thanks for all the discussion. The original issue was me throwing things to the wall to see what stuck. I appreciate your input.
I never loved setting another library's env var.
Right now we have two proposed solutions. Auto setting env var and through explicit procfile writing. In thinking about the two proposed solutions I think I have a middle of the road, hybrid solution.
First a recap of the other solutions:
Auto Env Var
Have sidekiq set the
RAILS_MAX_THREADS
env var on worker boot.Procfile
Document how to set the RAILS_MAX_THREADS in the procfile like
Hybrid solution - Env var default and Procfile docs
I think we should have docs on how to set the procfile:
However we could also default Sidekiq's default number of threads to match Active Record's by default (do this in a Railtie perhaps). For example if a
RAILS_MAX_THREADS
is set and an explicit sidekiq concurrency level isn't specified use the RAILS_MAX_THREADS number instead of the sidekiq default.This still buys us explicitness through documentation. It also buys us that "just works" out of the box experience. In that code run by Sidekiq won't try to check out more connections than AR has.
The downside is that this drastically reduces the number of default threads from 25 to 5. However if they're using the database while running sidekiq they're already artificially limited as only 5 jobs would be able to run at a time since the others will be blocked waiting on a connection. Basically you're limited by amdahl's law anyway so it's not really a downside.
If you REALLY need all possible speed, you can't expect it from defaults, it's reasonable to expect people to look at docs (my opinion)
What do you think?
mperham commentedon Jun 3, 2016
Thanks for all the input, folks. I really have been reading and considering all of the things. I like Jeremy's point that ENV vars should be considered read-only for libraries but Sidekiq is not a library, it's the top-level framework and the binary being run. The user is configuring the Sidekiq process via command line parameters and so tuning various libraries within the Sidekiq process by auto-setting ENV vars if not already set is fair game.
For example, Sidekiq sets the RAILS_ENV and RACK_ENV variables based on the -e parameter. I'm leaning toward -c mapping to RAILS_MAX_THREADS if not set.
With that logic, the 90% use case is the user sets
sidekiq -c 15
and Rails will configure a DB connection pool with 15 connections. Exactly what we want. They can runRAILS_MAX_THREADS=10 sidekiq -c 15
if they want a smaller pool.That naming seems silly though; IMO the
config/database.yml
entry should change to something like:to make the env vars more descriptive:
DB_POOL_SIZE=10 sidekiq -c 15
.HoneyryderChuck commentedon Jun 3, 2016
Why not just make the db connection pool size an active record configuration setter?
From what I understand, rails doesn't care about threads, is thread-safe, these are an application server / bj framework details. Why call it "max threads" when it's just the "db pool size" I want?
schneems commentedon Jun 3, 2016
That's a tangential discussion. The problem we have with that approach would be now there are two canonical ways to set the pool size. What happens when you set via the config AND the database.yml? Who provides the canonical value? What if the ordering of your code loading changes? We ran into this problem by suggesting people set that value in Rails pre 4.1 see https://devcenter.heroku.com/articles/rails-database-connection-behavior#prior-to-rails-4-1. It was bad, causes confusion and would likely make things more confusing instead of less confusing.
I would be all for getting rid of database.yml in favor of non-yml configuration entry points like you've suggested, however that's a discussion that's not really on the table.
HoneyryderChuck commentedon Jun 3, 2016
I'd say that, if you take RAILS_MAX_THREADS as the highest priority (as rails does with DATABASE_URL(?)), you at least won't break backwards compatibility. and config would be the evangelized way (and in heroku you could just ignore the database.yml if pool is already set, which is yours and mine desired solution). But that's just me. I have a hard time seeing these suggested to stuff like sidekiq, which is a layer above rails, as the author suggested. Same as puma/passenger/unicorn/etc. If you make it configurable, as most of the libs are (I don't think devise relies on envvars, for example), you give freedom to the user, and it's not sidekiq's business anymore. It is indeed a breaking change from the link in heroku you sent, but suggesting that the framework should mitigate the lack of configuration in active record is unfair.
As for the author of sidekiq, I have a question: why is sidekiq handling RAILS_ENV and RACK_ENV? Why not SINATRA_ENV or UNICORN_ENV? There was a discussion before on the different meanings of these variables, and that not all of them correlate to one another. Why is sidekiq handling them then? From what I can get from a quick github search, it's used to require some expected rails project files, and all inside the cli.rb file, which is also loaded for non-rails. Isn't this an example of "giving special treatment to rails"? If you want to target rails, just
Rails.env if defined?(Rails)
, and you get 2 envvars less. In the same file, a quick search forRails
suggests me more tailor-made rails code. One suggestion: did you consider going the rack direction and load a config.sk by default? This way you could load environment.rb in rails, configure sidekiq server separately and other server-only configuration. Rails also ships its own custom config.ru for rack.mperham commentedon Aug 1, 2016
How does that look?
Better integration between Sidekiq thread count and Active Record poo…
Use RAILS_MAX_THREADS for Sidekiq concurrency and AR's DB pool size
RAILS_MAX_THREADS
parity with Redis pool for client (i.e. Rails server) #3806Use `RAILS_MAX_THREADS` for client pool size
Use `RAILS_MAX_THREADS` for client pool size
RAILS_MAX_THREADS
for client pool size #3807Use `RAILS_MAX_THREADS` for client pool size
kurtlaunchlabs commentedon Jun 21, 2018
Hi, May I know if the sidekiq concurrency is per client or shared? If shared then, I think it cannot served 50 client sending background task request at the same time, if default is 25. If per client, then much better.
ionutzp commentedon Dec 14, 2018
@kurtlaunchlabs i don't understand what you mean
sharshenov commentedon Sep 13, 2019
Another approach: