Multiprocess: Add basic spawn and IPC support (build system)

https://github.com/bitcoin/bitcoin/pulls/19160

Host: ryanofsky  -  PR author: ryanofsky

The PR branch HEAD was 0602f87 at the time of this review club meeting.

Notes

In this review club, we’ll focus on the second commit from this branch (multiprocess: Add basic spawn and IPC support). IPC here is an acronym for Inter-process communication.

The key things this commit adds are an IpcProcess::spawn() method, which a bitcoin parent process can call to spawn a new child process, and an IpcProcess::serve() method, which a bitcoin child process can call after startup to communicate back to its parent process.

In follow-up PR #10102 Multiprocess bitcoin, this functionality is used so a bitcoin-gui process can spawn a bitcoin-node process, and a bitcoin-node process can spawn a bitcoin-wallet process, and GUI, node, and wallet functionality can run in separate processes that are protected from each other. In further follow-ups #19460 Add bitcoin-wallet -ipcconnect option and #19461 Add bitcoin-gui -ipcconnect option, more flexibility is added so node, wallet, and GUI processes can be started and stopped independently.

All communication between Bitcoin Core processes happens through internal C++ interfaces, which are just C++ classes with pure virtual methods. The virtual methods allow bitcoin GUI, node, and wallet code to be written in a straightforward way that doesn’t require dealing with complications of IPC. IPC is handled by the multiprocess framework added in this PR, to avoid the need to complicate application code with low level I/O.

Specifically, follow-up PR #10102 Multiprocess bitcoin uses the IPC framework added here to generate subclasses for each C++ interface class, with every overridden virtual method of every subclass implemented to send method calls and arguments to a remote process, wait for a response, and then return the response as the method return value. For example, if the GUI wants to find out if a wallet address is spendable, it calls the interfaces::Wallet::isSpendable() method. If GUI code and wallet code are running in the same process, this directly invokes the interfaces::WalletImpl::isSpendable() method implementation. But if GUI and wallet code are running in different process, the multiprocess framework instead provides a different (generated) interfaces::Wallet class implementation that forwards the method request and arguments to a remote wallet process, waits for the results, and returns them, instead of directly calling wallet code.

The design goal is for cross-process communication to happen through normal method calls, and for node, wallet, and GUI code not to have to change drastically to support process separation.

IPC framework support is added in the second commit aa4d626 Add basic spawn and IPC support of #19160, which is then tested in the third commit 0602f87 Add echoipc RPC method and test (branch), and finally put to real use in #10102 Multiprocess bitcoin (branch). This review is focused on the second framework commit, and the 3 classes introduced there:

The IpcProcess class has spawn(exe_name, pid) and serve(exit_status) methods and is responsible for spawning new child processes, creating pipes child and parent processes can use to communicate, and passing pipe file descriptors to IpcProtocol objects in parent and child processes.

The IpcProtocol class is what actually sends method calls across the pipe, turning every method call into a request and a response, and tracking object lifetimes. The IpcProcotol and IpcProcess classes could have been melded together into a single class, but separating them allows the spawn and pipe setup code to work with protocols other than Cap’n Proto, which is the internal protocol currently used by the IpcProtocol class.

The Init interface is similar to other cross-process C++ interfaces like interfaces::Node, interfaces::Wallet, interfaces::Chain and interfaces::ChainClient, providing virtual methods that can be called from other processes. What makes it special is that unlike other interfaces which are not implemented by every process—interfaces::Node is only implemented by the node process and interfaces::Wallet is only implemented by the wallet process—interfaces::Init is implemented by every process that supports being spawned, and it is the initial interface returned by the IpcProtocol::connect(fd) method, allowing the parent process to control the child process after the connection is established. The interfaces::Init interface has methods that allow the parent process to get access to every interface supported by the child process, and when the parent process frees the interfaces::Init object, the child process shuts down.

Questions

  1. The entry points for spawned bitcoin-node and bitcoin-wallet processes both call IpcProcess::serve() then immediately exit when it returns. How do the child processes provide useful functionality to the parent processes if they never run the code after the IpcProcess::serve() calls?

  2. When does the IpcProcessImpl::serve() method return true and when does it return false? Is it ever expected to return false, or is it always an error?

  3. The IpcProcessImpl::spawn() implementation has a lambda that generates a vector of command line arguments for the process that should be spawned. What does the generated command line look like, and why does the generated command line for the child process depend on m_argv[0] of the parent process? Why does it include a pipe file descriptor (int fd).

  4. The MakeCapnpProtocol(LocalInit& init) function returns an IpcProtocolImpl protocol implementation which is a dumb wrapper around libmultiprocess functions that translate interfaces::Init method calls to pipe reads & writes (for a parent process) and translate pipe read & writes to interfaces::Init interface method calls (for a child process). If we wanted to replace libmultiprocess and use a different protocol to communicate across the pipe, would the interfaces::IpcProtocol interface need to change? If communication needed to go to a different channel other than a pipe, like an IP address, or an SSL socket, would the interfaces::IpcProtocol interface need to change then? How would it change?

  5. The new init_bitcoind.cpp file introduced in this PR is linked into the bitcoind executable and the new init_bitcoin-node.cpp file is linked into the bitcoin-node executable. Without this change, and before this PR, the bitcoind and bitcoin-node executables were identical. In follow-up PR #10102 Multiprocess bitcoin there are more changes to init_bitcoind.cpp and init_bitcoin-node.cpp that give bitcoin-node significantly different behavior from bitcoind, running wallet code in a new spawned bitcoin-wallet process instead of in the same process. In this PR, the differences between init_bitcoin-node.cpp and init_bitcoind.cpp are more minor, but what are they? Do they lead to differences in observable behavior?

Meeting Log

  117:00 <ryanofsky> #startmeeting
  217:00 <ryanofsky> (hope I did that right)
  317:00 <michaelfolkson> hi
  417:00 <troygiorshev> hi
  517:00 <ryanofsky> Hi, welcome one and all to the weekly PR review club!
  617:00 <fjahr> hi
  717:00 <lightlike> hi
  817:01 <jkczyz> hi
  917:01 <jonatack> hi
 1017:01 <ryanofsky> PR this week is https://bitcoincore.reviews/19160.html
 1117:01 <ryanofsky> You can say y/n if you've looked or haven't had a chance to look at it
 1217:01 <nehan> hi
 1317:02 <emzy> Hi
 1417:02 <fjahr> y
 1517:02 <emzy> n
 1617:02 <troygiorshev> y
 1717:02 <jkczyz> n
 1817:02 <lightlike> y
 1917:02 <nehan> y
 2017:02 <jonatack> (hi also for ariard who is here)
 2117:02 <jonatack> y/n
 2217:02 <ryanofsky> Thanks, this is a big PR that and notes and questions there are about a portion of it, mostly meant to be a prompt
 2317:03 <michaelfolkson> I've looked at it but I feel as if I haven't :)
 2417:03 <ariard> y
 2517:03 <ryanofsky> Yeah we can start off with any questions you might have. No stupid questions and we can really talk about anything
 2617:04 <ryanofsky> Even "how does a pipe work?" general stuff not specific to bitcoin
 2717:04 <ryanofsky> Or just general difficulties or feedback
 2817:05 <ariard> okay wrt to the threading model, it's handle lower by libmultiprocess?
 2917:06 <ryanofsky> Threading is just supposed to be transparent
 3017:06 <ryanofsky> So if GUI calls a wallet method to find out the wallet balance, and the wallet is running in a different process
 3117:07 <ariard> the wallet process spwans a thread to serve ?
 3217:07 <ryanofsky> On the GUI process side, the getbalance method sends a requests, blocks, waits for a response, and then returns
 3317:08 <ryanofsky> Yes, and the wallet process sees that a call has come in from gui thread #234
 3417:08 <fjahr> I haven't studied capnp before. I remember seeing some discussion on it in the core dev irc. Is it worth re-reading any of that to get more of the context about it's use in core?
 3517:08 <ryanofsky> If it already has a thread to handle requests for #234, it runs calls the wallet getbalance method on that thread, otherwise it makes a new thread to handle that request and future ones
 3617:09 <ariard> and this service thread die with the connection?
 3717:09 <ryanofsky> ariard, yes, the service thread dies if the connection is closed
 3817:10 <lightlike> a very general q: what is the main goal for introducing multiprocesssing to core: Mostly architectural, i.e. is better separation of wallet/node/gui? or would it also affect performance?
 3917:10 <ryanofsky> it also dies if the thread #234 is joined before the connection is closed
 4017:11 <nehan> why are the threads short lived instead of long lived? presumably the wallet, gui, etc are going to run for a long time
 4117:11 <michaelfolkson> fjahr: I don't think capnp understanding is key to this https://github.com/bitcoin/bitcoin/pull/10102#issuecomment-289999980
 4217:11 <ryanofsky> lightlike, there are different goals and tradeoffs. generally good for security, bad for performance, good for flexibility, like being able to have node run in backgground and attach/detach wallets and guis
 4317:11 <michaelfolkson> But yeah I haven't studied it either
 4417:12 <fjahr> michaelfolkson: but I want to understand it :)
 4517:12 <ryanofsky> what kinds of things do you want to know about capnp?
 4617:13 <ariard> memory separation, if your bitcoin-node gets corrupted, it won't be able to swallow your keys
 4717:13 <ryanofsky> it is basically a code generator, io frameowork, file format, and protocol
 4817:13 <ariard> also exposing interfaces means you can build tool against them
 4917:14 <michaelfolkson> lightlike: See the comments on this Slide 5 https://docs.google.com/presentation/d/1AeJ-7gD-dItUgs5yH-HoEzLvXaEWe_2ZiGUUxYIXcws/edit#slide=id.g255500ac67_0_21
 5017:14 <ryanofsky> you could write a custom protocol, the advantage of using capnp is that when you want to add a new method or class or parameter, you just add it in a schema instead of having to write manual code
 5117:14 <fjahr> ryanofsky: no specific things about it, just if there is something worth reading on why it was chosen (i guess you chose it?) for this job
 5217:15 <lightlike> great thanks!
 5317:15 <ryanofsky> It was chosen just because I played with it an liked it. A similar alternative would be gRPC
 5417:16 <ariard> and also you can now move wallet/gui code in different repo, you don't have wallet utxo tracking meddle with validation
 5517:16 <fjahr> ryanofsky: that's good enough for me, thanks!
 5617:16 <ryanofsky> gRPC is lower lever, though, it doesn't track objects so requires more work to support bidirectional callbacks
 5717:17 <ryanofsky> in capnp each object has an identity, and you call a method on a specific object. while in gRPC you just define request and response formats and have to look up the objects yourself from the request
 5817:18 <michaelfolkson> Adding spawn support seems like it would be one of the final steps after untangling all the code between the different components. Is that all done now and ready to be merged?
 5917:18 <michaelfolkson> I can see the various open PRs
 6017:19 <ryanofsky> michaelfolkson, yes basically all that code is already merged
 6117:19 <michaelfolkson> Oh wow. This is further along than I thought it was then
 6217:19 <ryanofsky> all the code in the src/interfaces/ directory was introduce to define the interfaces between node, wallet, and gui components
 6317:21 <michaelfolkson> We can go through the questions you posed ryanofsky https://bitcoincore.reviews/19160.html
 6417:21 <ryanofsky> So this PR adds support for spawning, and then the next pr 10102 calls it to actually make bitcoin-node, bitcoin-gui, bitcoin-wallet processes specialize and talk to each other with the spawn support here
 6517:22 <jkczyz> Ah, so by "bidirectional", it doesn't simply mean returning data back from child to parent process but rather supporting callbacks from child to parent more generally
 6617:22 <ryanofsky> michaelfolkson, sure, one pre-question. The notes focused on IpcProcess, IpcProtocol, and Init classes introduced in this PR. We it clear what these classes do, and anyone want to summarize?
 6717:23 <ryanofsky> jkczyz, exactly yes, lots of interface methods take std::function arguments, or objects arguments
 6817:24 <ryanofsky> when a client passes a server a std::function or an object, the server can call back to that function or call an object method at any time, and the framework handles it
 6917:24 <nehan> ryanofsky: IpcProcess sets things up, IpcProtocol is used for the parent and child to actually talk to each other
 7017:25 <michaelfolkson> IpcProcess spawns new child processes
 7117:25 <ryanofsky> nehan, right, and one reason IpcProtocol is separate from IpcProcess, is so different protocols other than capnp could be supported in the future
 7217:26 <michaelfolkson> Child being a process that the parent process needs to complete a task right
 7317:26 <ryanofsky> michaelfolkson, right, it handles all the details of spawning and being spawned
 7417:26 <troygiorshev> ryanofsky: and init actually, er, holds everything?
 7517:26 <ryanofsky> michaelfolkson, right in 10102, bitcoin-gui spawns a bitcoin-node, and bitcoin-node spawns a bitcoin-wallet, so these are long running tasks
 7617:27 <nehan> is there some kind of architecture/picture diagram of the processes?
 7717:27 <troygiorshev> ^ +1
 7817:27 <michaelfolkson> So the child processes could be running continuously. The "parent" would be the process which is was running first
 7917:28 <troygiorshev> Something like the steps in init.h but for those of us who like pictures :)
 8017:28 <ryanofsky> troygiorshev, yes. Init does a few things but the main reason it exists is because the only way the framework allows processes to communicate by calling object methods, and Init is the object a spawned process provides to start off with
 8117:29 <ariard> nehan: https://docs.google.com/presentation/d/1AeJ-7gD-dItUgs5yH-HoEzLvXaEWe_2ZiGUUxYIXcws/edit#slide=id.p
 8217:29 <ryanofsky> No diagram exists, but a sequence diagram would be a good thing to have summarizing the init.h comment I agree
 8317:29 <michaelfolkson> I get why you want to be able to start/stop processes. But generally the processes would be running continuously and concurrently. They aren't going to be regularly stop/started?
 8417:31 <ryanofsky> michaelfolkson, maybe could be. Use cases I'm thinking of is you leave node running, but start and stop wallets, and start and stop gui
 8517:31 <michaelfolkson> Maybe they would... you only really need the node process running all the time
 8617:31 <nehan> ariard: saw that. doesn't have a diagram.
 8717:31 <ryanofsky> It's just an interesting thing you could do, though
 8817:32 <ryanofsky> So question 1 was just about the entry point line 180 in main(): https://github.com/ryanofsky/bitcoin/blob/pr/ipc-echo.7/src/bitcoind.cpp#L180-L185
 8917:32 <nehan> in addition to what's described in init.h, it would be nice to see a diagram which shows the steps for the node, gui, and wallet. it's not obvious to me why the gui spawns a node and the node spawns the wallet?
 9017:33 <troygiorshev> related is, if the gui spawns a node, then does that mean we can't nicely shut down the gui without shutting down the node? Or is the distinction between parent and child sortof flexible?
 9117:34 <ryanofsky> nehan, that's a good question. gui spawning node and node spawning wallet are just artifacts of the way the code works currently and are just supported so no user changes are required
 9217:34 <michaelfolkson> You mean the ordering doesn't make sense? What is the parent and the child in this context nehan?
 9317:34 <nehan> michaelfolkson: gui is parent to node, node is parent to wallet
 9417:35 <ryanofsky> yes. Should note that the parent/child relationships talked about with respect to spawning aren't some permanent part of the connections. Connections are fully bidirectional
 9517:35 <michaelfolkson> But the roles can be reversed right? The node can be the parent to the wallet. Just what happens in the general case
 9617:35 <ryanofsky> So if the node spawns a few wallets on startup, and some separate wallet processes are started which connect back to the node, all the wallets are equivalent
 9717:35 <nehan> ryanofsky: ah, thanks for clarifying
 9817:36 <thomasb06> https://capnproto.org/
 9917:37 <ryanofsky> I guess the first question was: How do the child processes provide useful functionality to the parent processes if they never run the code after the IpcProcess::serve() calls in main()?
10017:38 <michaelfolkson> I didn't get this question :) I don't know why they would need to run the code *after* to be useful
10117:39 <ryanofsky> michaelfolkson, good point
10217:40 <troygiorshev> the communication is done, ultimately, through the fd used in serve. Is that what this question is going for?
10317:40 <ryanofsky> They definitely don't need to run code after. The question was assuming if you saw code in main that said if (condition) exit, you would might be suspicious
10417:41 <ryanofsky> troygiorshev, yes. Answer is just that the serve method blocks and handles requests, so there is nothing to do when it exits
10517:41 <ryanofsky> Probably bad question :) I definitely baked an assumption into it
10617:42 <ryanofsky> Next question goes into the serve() implementation: https://github.com/ryanofsky/bitcoin/blob/pr/ipc-echo.7/src/interfaces/ipc.cpp#L35-L61
10717:42 <nehan> ryanofsky: i had trouble tracing through the code to answer this question. where is the block?
10817:42 <michaelfolkson> Practise definitely makes perfect in all things (including PR review club hosting ;) )
10917:42 <ryanofsky> yep getting close to perfection soon I think :)
11017:43 <ryanofsky> nehan, yes, that's basically the question! the "if (init->m_process && init->m_process->serve(exit_status))" serve() call blocks
11117:44 <nehan> here? https://github.com/ryanofsky/bitcoin/blob/pr/ipc-echo.7/src/interfaces/capnp/ipc.cpp#L79
11217:44 <lightlike> i thought that the protocol implementation (capnp) has a loop in its serve() method, so everything useful would happen during the IpcProcess::serve() call.
11317:44 <troygiorshev> nehan: I was thinking Line 62 of that file
11417:44 <ryanofsky> that line is saying "if I am a child process spawned to handle requests from a parent, and I am done handling requests, then exit without executing the rest of main()"
11517:45 <ryanofsky> yes in this case it is line 62
11617:45 <ryanofsky> line 79 is the equivalent place, but in the parent process, not the child process
11717:46 <troygiorshev> ryanofsky: ah thanks
11817:47 <troygiorshev> i think these questions link nicely into your question 4
11917:47 <michaelfolkson> line 79 is the equivalent place, but in the parent process? Why is the parent exiting?
12017:48 <nehan> where is m_loop initialized?
12117:48 <ryanofsky> michaelfolkson, oh I just meant it is the equivalent place in terms of blocking, lines 62 and 79 both block the thread and wait for and respond to incoming requests
12217:49 <michaelfolkson> Oh ok gotcha
12317:50 <ryanofsky> nehan, it's initialized in the m_loop.emplace() calls
12417:50 <nehan> ryanofsky: ah, ok thanks
12517:50 <michaelfolkson> I think troygiorshev wants to answer question 4
12617:51 <ryanofsky> as you can see and as troygiorshev question 4 reference gets to, IpcProtocol is kind of a messy glue code class
12717:51 <ryanofsky> it has connect() and serve() methods that take file descriptors
12817:51 <ryanofsky> in a future PR it also gets a listen() method to be able to accept incoming connections
12917:52 <ryanofsky> but question 4 first asks if you wanted to get rid of capnproto, and use a different protocol like gRPC or JSONRPC or something custom, would the method definitions hve to change?
13017:53 <ryanofsky> and then if you wanted to use a different type of channel other than pipes/file descriptors, how would it have to change?
13117:55 <michaelfolkson> I'm guessing the first part of that question is no
13217:55 <troygiorshev> without looking deeply into any IPC protocols, I want to guess no to both questions
13317:56 <ryanofsky> michaelfolkson, yes, that's basically design goal to make protocol swappable
13417:56 <nehan> i guess if you wanted to use something other than file descriptors you'd have to tell the processes how to talk to each other
13517:56 <troygiorshev> i can't picture a useful connectionless IPC protocol (though I'm sure there's something out there)
13617:56 <ryanofsky> troygiorshev, could be yes/no in different cases, like this is just passing int file descriptors
13717:56 <nehan> like spawn() would have to pass in something else
13817:57 <ryanofsky> but on windows file descriptors aren't weird HANDLE types instead of ints, so maybe you'd pass handles
13917:57 <ryanofsky> or if you wanted to use openssl, maybe you'd pass objects with read and write methods, or callbacks or something else
14017:57 <troygiorshev> right, unless those can be represented by an int
14117:58 <troygiorshev> (but that's very antithetical to the whole "everything is an object" approach of windows)
14217:58 <ryanofsky> I think it works in practice if your int is big enough, but yeah
14317:59 <ryanofsky> THese questions were mostly intended to be prompts to look at the code, so glad we did a little bit of that :)
14418:00 <ryanofsky> Can wrap up, and happy to answer try to clarify anything else later
14518:00 <troygiorshev> thanks ryanofsky and thanks for the amazing notes!
14618:00 <nehan> ryanofsky: why did you choose file descriptors as the interface instead of something more general? what would it take to run processes on different machines?
14718:00 <nehan> thanks!
14818:01 <lightlike> thanks!
14918:01 <ryanofsky> nehan, just for convenience because I never did the windows port yet. When I do I'll probably just make a typedef
15018:01 <ryanofsky> Even using file descriptors though, you could pass connections to different machines since TCP sockets give you file descriptors too
15118:02 <michaelfolkson> So this process separation marathon is almost over? Just the four remaining PRs to get merged?! https://github.com/bitcoin/bitcoin/projects/10
15218:03 <michaelfolkson> It is really hard to catch up on all these years of work in an afternoon haha. I think we covered a previous PR on interfaces at a previous PR review club
15318:04 <ryanofsky> michaelfolkson, Basically yes with exception that starting bitcoin-wallet tool and connecting to node isn't greately useful even with those 4 prs
15418:04 <michaelfolkson> This in February https://bitcoincore.reviews/17954.html
15518:04 <ryanofsky> followup PR would add a "serve" wallet tool subcommand or something similar so the wallet tool could connect to the node and then actually do useful operations
15618:05 <ryanofsky> Yeah, there's a lot here. Appreciate you guys taking interest and digging in.