First, we’ll have a more general look at Bitcoin Core’s HTTP server to build an essential understanding of the area we’re working on. Then we’ll dive deeper into the specifics of this PR.
HTTP server
Since #5677, Bitcoin Core’s HTTP server is based on libevent2. Libevent is a general purpose event notification library, but is used in Bitcoin Core specifically for HTTP requests (which it supports natively).
Much (not all) of the libevent interface is hidden behind wrappers. For example, HTTPRequest wraps evhttp_request and HTTPEvent wraps event_base
The relevant workflow for how (for example) an RPC request is handled is roughly as follows:
the HTTP server receives an RPC command from a caller, creates an evhttp_request object and passes its pointer to http_request_cb() (this step is completely handled by libevent)
an HTTPWorkItem is created, containing the evhttp_request (wrapped in HTTPRequest hreq) as well as the path and reference to the handler function (which contains the business logic to be executed to deal with the request)
when the handler function of a HTTPWorkItem completes successfully, it calls HTTPRequest::WriteReply(), which triggers the libevent function evhttp_send_reply(), which in turn returns a response to the caller and destroys the evhttp_request object.
Endpoints are registered to the HTTP server by calling RegisterHTTPHandler(), such as e.g. in StartHTTPRPC()
The HTTP server is initiated and started from AppInitServers(), and stopped from Shutdown()
StartHTTPServer()adds a thread for each worker to g_thread_http_workers. These threads will keep running until WorkQueue::Interrupt() sets running to false and the queue is empty.
This PR
This PR changes StopHTTPServer() to destroy the eventHTTP HTTP server (by calling evhttp_free(eventHTTP)) as soon as all workers have finished executing their tasks, making the shutdown process significantly faster.
Libevent requires that no requests are being served when evhttp_free() is called, so we keep track of all the requests created by libevent that have not yet been completed in g_requests.
The change is unlikely to be significant to users, since node shutdowns are typically not very frequent. It does however significantly speed up (verify this yourself!) the feature_abortnode.py functional test, consequently speeding up developer workflows worldwide.
Which of Bitcoin Core’s interface(s) (RPC, REST, ZMQ) make use of the HTTP server? For each interface, how many handlers are registered to pathHandlers?
We already wait for all threads in g_thread_http_workers to be joined before destroying eventHTTP, which only happens when the queue is empty (see notes) - so we would expect all requests to be handled by then. Why do we separately need to track evhttp_request objects, too?
What’s the purpose of g_requests? Can you think of any alternative approaches that would achieve the same goal?
Which (smallest possible) part of the code change in this PR is responsible for the drastic performance speedup in feature_abortnode.py, and why is that making it faster?
What is a GlobalMutex, and why is it used here?
What does the EV_TIMEOUT passed to event_base_once() mean, and why is it used here?
(Bonus): why do we have both a StartRPC() as well as a StartHTTPRPC() function?
<stickies-v> welcome everyone - thank you for joining us again in the new year! Today we're looking at #26742, authored by fjahr, building on previous work done by promag. The notes and questions are available on https://bitcoincore.reviews/26742
<stickies-v> quick note before diving into the questions: the PR has been updated slightly since these notes were released, so the current commit hashes are different. I'll keep linking to the "old" (6d3a848 and earlier) commits here.
<stickies-v> most of the code of the PR actually does not (directly) contribute to the performance improvement, but deals with shutting down the HTTP server safely, i.e. ensure we don't kill it before all inbound requests are dealt with in one way or another
<stickies-v> Yaz: an easy way to observe the effect of the PR is by running `./test/functional/feature_abortnode.py` and on this branch. You should notice a drastic speedup on this branch
<stickies-v> 1. Which of Bitcoin Core's interface(s) (RPC, REST, ZMQ) make use of the HTTP server? For each interface, how many handlers are registered to `pathHandlers`?
<LarryRuane> stickies-v: "You should notice a drastic speedup on this branch" -- you're right, with this branch takes my system about 1.5s, without takes 31s
<stickies-v> Yaz: the RPC, REST and ZMQ interfaces are meant to directly interact with your node (e.g. request or alter information about your wallet, transactions, blocks, ...) in a user-facing way. This is separate from the networking that your node does with other peers, to relay blocks, transactions and addresses etc, which it needs to do in order to be operational
<stickies-v> LarryRuane: yes, which may be surprising given that the RPC interface is much more extensive. For RPC, we've just standardized the way we organize RPC methods quite well, so we can abstract all that away with just 2 handlers. For REST, however, everything is kinda ad-hoc, and every endpoint is its own handler
<stickies-v> I think a big reason (speculating) why we have the handlers the way we do, is because REST calls follow the HTTP URI scheme, where the endpoint is encoded in the URI (e.g. http://somewebsite.com/my/endpoint). For RPC, we don't include the method in the URI but in the payload, I think?
<LarryRuane> if anyone would like to fire up REST, add `-rest` to your config, restart, then for example `curl -s localhost:8332/rest/chaininfo.json|json_pp`
<stickies-v> 2. We already wait for all threads in `g_thread_http_workers` to be joined before destroying `eventHTTP`, which only happens when the queue is empty (see notes) - so we would expect all requests to be handled by then. Why do we separately need to track `evhttp_request` objects, too?
<stickies-v> LarryRuane: yes, exactly. I'd say in probably >99% of the time they should be synchronized, but if a new request comes in right before stopping the HTTP server, it's possible that the queue is empty and it looks like we can terminate but we actually still need to process that last-minute request
<stickies-v> LarryRuane: yes, but that's not really the issue here. Once a request is added onto `g_work_queue`, it will prevent the worker threads from terminating. So we wouldn't terminate the HTTP server prematurely
<LarryRuane> i know the convention is that the `g_` means it's a global variable (so they're easier to identify when we want to assassinate them later)
<stickies-v> it's just that libevent requires there to be no unhandled requests when we call `evhttp_free(eventHTTP)`, which we need to ensure ourselves
<stickies-v> so `g_requests` simply keeps track of all the `evhttp_request` objects created by libevent, so we can easily keep track of whether or not they're all destroyed before we destroy `eventHTTP`
<stickies-v> I see two alternative solutions: we could use a lock for almost the entire duration of `http_request_cb` to ensure that no requests are being processed to be put in the `g_work_queue`, but that would be quite expensive
<stickies-v> a more elegant approach is that instead of tracking all the `evhttp_request` objects (or well, pointers to them), we could just keep a simple counter on how many requests we haven't yet handled, as discussed here: https://github.com/bitcoin/bitcoin/pull/26742#discussion_r1063664153
<stickies-v> 4. Which (smallest possible) part of the code change in this PR is responsible for the drastic performance speedup in `feature_abortnode.py`, and why is that making it faster?
<stickies-v> then, no. the PR doesn't use an atomic counter, that's just something I suggested but is not implemented (because we need to track the actual requests in a follow up PR, so fjahr would rather not change that again)
<stickies-v> the majority of the code in this PR is to make it go from "as soon as all the workers have finished executing their tasks" to "as soon as all the `evhttp_request` objects created by libevent have been destroyed
<stickies-v> BUT we may encounter the edge case that we call `evhttp_free(eventHTTP)` before all `evhttp_request` objects have been destroyed, which libevent does not allow (I'm not sure if it leads to a crash, UB, or anything else)
<svav> Re Q6 I would just like to mention that a mutex is analogous to a rubber chicken being used in a meeting which people have to be holding to allow them to talk :')
<stickies-v> `g_thread_http` is the thread in which we run the HTTP server, and even though all the `evhttp_request`s may be handled, connections to the HTTP server can still be open, albeit idle. Calling `g_thread_http.join()` will not return until also all idle connections are closed