Multiprocess bitcoin (gui, rpc/rest/zmq)

https://github.com/bitcoin/bitcoin/pull/10102

Host: ryanofsky  -  PR author: ryanofsky

The PR branch HEAD was 8a18a12a232 at the time of this review club meeting.

Notes

Currently Bitcoin Core code always runs within a single operating system process. So if you are running bitcoind, then you are running node and wallet code together in the same process. If you are running bitcoin-qt, then you are running node, wallet, and GUI code in the same process. Running all code in one process does not provide a lot of isolation, so it means a crash in wallet code could bring down the node, or a vulnerablility in the node could expose wallet data. Additionally it’s not possible to run the node and wallet on different machines, or to stop and start the GUI independently of the node, or use the GUI to connect to nodes or wallets on headless machines.

PR #10102 starts to move away from the single process model by adding basic support needed for bitcoin node, wallet, and GUI code to run in different processes and communicate with each other internally. Followups #19460 and #19461 expand on it to allow wallet and GUI processes to be started and stopped independently, and allow the processes to communicate with each other externally, and run on different machines.

PR #10102, combined with the --enable-multiprocess build option, builds a new bitcoin-node executable that can be used as a drop-in replacement for bitcoind, and a new bitcoin-gui executable that can be used a drop-in replacement of bitcoin-qt. The new executables are used the same way as the previous ones and don’t provide any new features externally, but internally they will spawn multiple processes and use process separation. Instructions for building and testing #10102 can be found in doc/multiprocess.md and can be done in a few commands on linux:

cd <BITCOIN_SOURCE_DIRECTORY>
make -C depends NO_QT=1 MULTIPROCESS=1
CONFIG_SITE=$PWD/depends/x86_64-pc-linux-gnu/share/config.site ./configure
make
src/bitcoin-node -regtest -printtoconsole -debug=ipc

When this is run, bitcoin-node will spawn a bitcoin-wallet executable to run wallet code. The node and wallet processes will communicate across a socket, with the node code controlling the wallet with interfaces::WalletClient methods, and the wallet code calling the node with interfaces::Chain methods. Similarly when GUI support is enabled, bitcoin-gui will spawn a bitcoin-node process, and control it by calling interfaces::Node methods, and control wallets by calling interface::Wallet methods.

#10102 adds the plumbing that allows gui <-> node <-> wallet cross-process communication to work transparently, without changing existing code, only adding new code. This is possible because the interfaces in src/interfaces/ (interfaces::Chain, interfaces::Node, interfaces::Wallet described above) are all abstract classes with virtual methods, so different implementations can be substituted without changes to calling code. This PR adds new implementations of each interface that forward method calls from one process to another. In early versions of this PR, adding new implementations of each interface required adding a lot of boilerplate code. Every method in every interface, and every argument and return value of every method had to had custom C++ code written that would handle the method call serialization and forwarding from the calling process to the called process. For example, the Wallet::encryptWallet abstract method:

    virtual bool encryptWallet(const SecureString& wallet_passphrase) = 0;

had an ipc::capnp::WalletImpl::encryptWallet implmentation which forwarded the encrypt call from the gui process to the wallet process using the Cap’n Proto RPC framework:

    bool encryptWallet(const SecureString& wallet_passphrase) override
    {
        auto call = MakeCall(m_loop, [&]() {
            auto request = m_client.encryptWalletRequest();
            request.setWalletPassphrase(ToArray(wallet_passphrase));
            return request;
        });
        return call.send([&]() { return call.m_response->getResult(); });
    }

with corresponding code in an ipc::capnp::WalletServer::encryptWallet method in the wallet process to handle the incoming RPC:

    kj::Promise<void> encryptWallet(EncryptWalletContext context) override
    {
        context.getResults().setResult(
            m_impl->encryptWallet(ToSecureString(context.getParams().getWalletPassphrase())));
        return kj::READY_NOW;
    }

by forwarding it to the local ipc::local::WalletImpl::encryptWallet method:

    bool encryptWallet(const SecureString& wallet_passphrase) override
    {
        return m_wallet.EncryptWallet(wallet_passphrase);
    }

which is the same method that would have been called directly if code were running in a single process instead of multiple processes.

Because the ipc::capnp::WalletImpl::encryptWallet and ipc::capnp::WalletServer::encryptWallet methods above and all similar methods just contain boilerplate code forwarding arguments and return values, newer versions of this PR no longer define these methods manually, and instead generate them automatically. (You can see these methods in generated src/ipc/capnp/wallet.capnp.proxy-client.c++ and src/ipc/capnp/wallet.capnp.proxy-server.c++ files after building this PR, but they are not part of the PR source code). The code generation means that adding a new method or changing an existing method signature now just requires editing a single line in the interface’s .capnp file:

    encryptWallet @1 (context :Proxy.Context, walletPassphrase :Data) -> (result :Bool);

and the corresponding forwarding code is generated from that.

Library support for everything described above was merged previously in #19160 Multiprocess: Add basic spawn and IPC support (review club), so the most significant part of #10102 is just adding .capnp interfaces for all the interfaces in src/ipc/capnp/.

#10102 is a large PR and it is divided into multiple commits.

The most significant commits are the “Add canp serialization…” and “Add capnp wrapper…” commits which add Cap’n Proto schema definitions describing Bitcoin’s struct and interface types that are shared between processes: (common.capnp, chain.capnp, node.capnp, wallet.capnp). These commits also add a lot of glue code in BuildField / ReadField/ BuildMessage / ReadMessage function. There is a lot of ugly C++ template syntax needed to declare the function types, but the actual function bodies are pretty straightforward and just copy information between Bitcoin Core’s native data structures and corresponding capnp messages.

The other two significant commits in this PR are the “Make bitcoin-gui spawn a bitcoin-node process” commit and “Make bitcoin-node spawn a bitcoin-wallet process” commit. These commits don’t add much new code. Instead they strip out wallet initialiatation code from the BitcoinNodeInit class (so the bitcoin-node process no longer has wallet functionality linked in) and strip out node and wallet initialization code from the BitcoinGuiInit class (so the bitcoin-gui class no longer has node and wallet functionality linked in. Then they add spawnProcess calls in appropriate places to start making cross-process calls. A good entry point into these two commits is to search for the new spawnProcess calls they add.

The other commits in this PR are needed for completeness but aren’t related to its core functionality. They are smaller and just update dependencies, logging, and test code.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What are some disadvantages of separating bitcoin node, wallet, and GUI into separate processes? What are some benefits?

  3. Did you try building with --enable-multiprocess ? And did you try running with debug=1 to view IPC logging?

  4. Why is it important that Cap’n Proto types and libraries are only accessed in the src/ipc/capnp/ directory, and not in any other public header files?

  5. What is the difference between node and wallet running on different processes vs different threads? Why must they communicate via an IPC framework instead of calling each other’s functions normally?

  6. Where are the spawnProcess calls added in the “Make bitcoin-gui spawn…” and “Make bitcoin-node spawn…” commits? What type of interface pointer is requested from each of the spawned processes?

Meeting Log

  117:00 <ryanofsky> #startmeeting
  217:00 <ryanofsky> Hi!
  317:00 <lightlike> hi
  417:00 <ryanofsky> Today going to review Multiprocess bitcoin https://bitcoincore.reviews/10102
  517:00 <b10c> hi
  617:00 <michaelfolkson> hi
  717:00 <svav> Hi
  817:01 <ryanofsky> Hi and first question is the usual who's here and did you did you review the PR? (Concept ACK, approach ACK, tested ACK, or NACK)
  917:02 <lightlike> concept ACK, looked at it but not too much in-depth
 1017:02 <b10c> I haven't managed to look closer, but concept ACK
 1117:02 <michaelfolkson> Yeah all the ACKs from me from what I've done so far
 1217:02 <ryanofsky> Also wondering if there were parts of the description or implementation that could be explained better? Things that were confusing?
 1317:02 <svav> I read the notes
 1417:03 <michaelfolkson> I guess this particular PR has been open for years and has hundreds of comments that are hard to read due to GitHub
 1517:04 <ryanofsky> Yeah, but if people mostly think purpose is clear and top description is clear I'm happy
 1617:05 <svav> One thing I would say is interfaces::WalletClient  and interface::Wallet are very similarly named, so you don't get any hint of what is different about them.
 1717:06 <ryanofsky> Right WalletClient is the interface the bitcoin-node uses to control bitcoin-wallet, Walelt is the interface bitcoin-gui uses to control bitcoin-wallet
 1817:06 <michaelfolkson> Just wasn't clear to me initially whether this was a PR covering the whole project or a prototype or whether it was just one PR in a series
 1917:06 <michaelfolkson> It is the latter but in that case I don't know how it has been open for years in that case :)
 2017:07 <ryanofsky> I wonder what a better name for WalletClient might be. Really it only exists for backwards compatability so bitcoin-node will continue to load wallets by default. More ideally bitcoin-wallet would just connect to node, I think
 2117:08 <ryanofsky> It's not the whole project, or a prototype. It's base cross-process functionality that features can be built on later
 2217:08 <svav> ryanofsky: I guess what I am saying is from the names, you can't tell that one is for use by the Node, and the other for use by the GUI
 2317:09 <michaelfolkson> But it needed all the previous PRs in the project to be merged first right? https://github.com/bitcoin/bitcoin/projects/10
 2417:09 <svav> ryanofsky: So, it might be something that could create confusion
 2517:09 <ryanofsky> svav, do you think interfaces::Wallet is not a good name for the interface used to control a Wallet? It seems to me interfaces::Wallet is a good name, but only WalletClient is the bad name
 2617:10 <ryanofsky> michaelfolkson, that's right it has a lot of dependencies that were merged previously
 2717:11 <ryanofsky> Q2 is What are some disadvantages of separating bitcoin node, wallet, and GUI into separate processes? What are some benefits?
 2817:11 <michaelfolkson> For what its worth I think the names are fine as well they are well documented (ideally both in the code and in the separate multiprocess doc)
 2917:12 <michaelfolkson> *as long as they are
 3017:12 <ryanofsky> That's good, always looking for ideas of better names and things that are missing from docs
 3117:13 <svav> Well, as a suggestion, you could have interfaces::WalletForNode and interface::WalletForGUI so they are differentiated. I'm just saying at the moment, they both seem very similar.
 3217:14 <lightlike> i'd say one disadvantage is that it's slower if things get serialized at one process, sent over a socket, deserialized by the another process
 3317:14 <svav> One disadvantage of separation might be there is more code to maintain.
 3417:14 <ryanofsky> Thanks svav! Yes ForXXX would be possible convention to use for interface names
 3517:15 <michaelfolkson> +1
 3617:15 <ryanofsky> lightlike, yes it adds more code, more dependencies, strictly worse performance
 3717:15 <michaelfolkson> Lots of work too :)
 3817:16 <ryanofsky> Well work is in the benefits column for me
 3917:16 <svav> The advantages as said in the notes are greater resilience, as node fault will not bring down wallet and vice versa.
 4017:16 <lightlike> advantages: there could be alternative implementations for parts, such as GUI or wallet; separating code like this makes it easier to be sure that parts that should be independent really are
 4117:17 <ryanofsky> Yeah main thing I like about it is it forces codebase to be more modular, and can let you run a node continuosly and connect wallets and gui instances to it as needed
 4217:17 <ryanofsky> Q3 Did you try building with `--enable-multiprocess` ? And did you try running with `-debug=1` to view IPC logging?
 4317:18 <ryanofsky> (and feel free to keep going on Q2 if more to say there)
 4417:19 <lightlike> yes, building and running on regtest worked fine for me.
 4517:19 <michaelfolkson> Q3 - Yes I ran on regtest and signet
 4617:19 <michaelfolkson> I wasn't sure what I'd see. Like should the wallet only communicate the node when the wallet wants to make a transaction or wants to know its balance
 4717:20 <michaelfolkson> But there was so many log messages it was also hard to see specific logs when you made a specific action
 4817:21 <michaelfolkson> I think Sjors said that in a comment too
 4917:21 <ryanofsky> michaelfolkson, yeah, there is a lot of crosstalk between the processes, especially when you use the GUI
 5017:22 <ryanofsky> It just shows all the places where gui, node, and wallet are notifying or polling each other
 5117:22 <ryanofsky> michaelfolkson, I wonder if this is unexpected, or you see it as a problem
 5217:23 <ryanofsky> To me purpose of IPC logging is just to debug IPC, but if you want to use it for other things then it could be a problem
 5317:24 <michaelfolkson> I guess in an idealized world there would be less cross talk and they would only talk when absolutely necessary or when they need something. But in reality we are dealing with Satoshi's inheritance so I'm not sure what the idealized state would be
 5417:25 <michaelfolkson> A oversimplification would be the wallet only hears from the node say every block? Or if the wallet wants to make a transaction or hasn't figured out the UTXOs it controls
 5517:26 <ryanofsky> michaelfolkson, very much agree. I think removing the most egregious polling is probably low hanging fruit for cleanup prs (independent of anything in 10102), and would also be good for improving gui performance
 5617:27 <ryanofsky> michaelfolkson, well the wallet also needs to know about every transaction added to the mempool to see if it's relevant
 5717:28 <michaelfolkson> ryanofsky: Right for zero conf transactions
 5817:28 <ryanofsky> Exactly, yes
 5917:29 <michaelfolkson> And the node has to ask the wallet if it is interested in every transaction as the node doesn't know any prior keys, addresses. I guess that's a lot of chatter in itself
 6017:30 <lightlike> are the objects that are exchanged displayed in Cap'n'Protos serialization format? I see lots of hex code in my logs.
 6117:30 <ryanofsky> Any object that can be sent in bitcoin's native serialization format is serialized that way, and displayed as hex
 6217:31 <ryanofsky> UniValue objects are sent as JSON. A few objects that don't have bitcoin serialize methods are serialized as capnproto structs
 6317:32 <ryanofsky> A good example of the last case is NodeStats https://github.com/ryanofsky/bitcoin/blob/pr/ipc.168/src/ipc/capnp/node.capnp#L139
 6417:33 <ryanofsky> Q4 is semirelated
 6517:33 <ryanofsky> Q4 Why is it important that Cap'n Proto types and libraries are only accessed in the `src/ipc/capnp/` directory, and not in any other public header files?
 6617:35 <michaelfolkson> Don't know... so it doesn't impact non-multiprocess users?
 6717:35 <lightlike> 1) its better to keep it in one place so we can easily add alternatives to capnproto, 2) to be able to still build bitcoind without having capnproto installed
 6817:37 <ryanofsky> Yes to both. Goal is to make it possible to run without multiprocess code, and make the dependency confined and easier to replace if needed
 6917:38 <ryanofsky> Q5 What is the difference between node and wallet running on different processes vs different threads? Why must they communicate via an IPC framework instead of calling each other's functions normally?
 7017:38 <lightlike> are there plans of moving towards multi-process only in the long term?
 7117:40 <michaelfolkson> That would require https://github.com/chaincodelabs/libmultiprocess in the Core repo right?
 7217:40 <ryanofsky> What would advantages / disadvantages be?
 7317:41 <michaelfolkson> So threads run in shared memory space whereas processes run in separate memory spaces https://stackoverflow.com/questions/200469/what-is-the-difference-between-a-process-and-a-thread
 7417:41 <ryanofsky> michaelfolkson, that's probably true, but I think kind of a project management question more than a technical one
 7517:41 <lightlike> Maybe it reduces code complexity not to have to maintain code for both options? But I'm not sure about specifics.
 7617:41 <ryanofsky> lightlike, yeah I think that's the way I see it
 7717:42 <michaelfolkson> If there are trade-offs you kinda need to support both
 7817:42 <ryanofsky> No reason to require multiprocess unless you are really going to separate code into different projects
 7917:43 <lightlike> But I guess if the loss of performance is significant, it's not really an option anyway?
 8017:43 <ryanofsky> Right I'm also assuming supporting multiprocess + single process with everything in one repo is not more work than just supporting multiprocess
 8117:44 <michaelfolkson> For the security benefits of separation you want separate processes rather than separate threads
 8217:44 <svav> If you have multi-process, why do you have to keep single process?
 8317:44 <lightlike> did you try IBD with multiprocess and compare the performance?
 8417:45 <ryanofsky> lightlike, that would be a good experiment. I think there should be no change in performance really unless the wallet is somehow slowing down IBD
 8517:45 <michaelfolkson> Have to add Core multiprocess to Jameson Lopp's node performance reviews :)
 8617:46 <ryanofsky> svav, you don't have to keep single process, but dropping support for single process removes very little code
 8717:46 <michaelfolkson> https://blog.lopp.net/2020-bitcoin-node-performance-tests/
 8817:46 <ryanofsky> 10102 almost strictly adds code and changes very little existing code
 8917:48 <michaelfolkson> svav: If users like the trade-offs offered by single process rather than multiprocess you shouldn't take it away (assuming the trade-offs are significant)
 9017:48 <ryanofsky> In general on performance, I think most of the loss is caused by unnecessary cross talk, not necessary overhead
 9117:48 <lightlike> if not IBD, are there any specific things/user actions where you would expect a notable performance loss?
 9217:49 <ryanofsky> lightlike, that's a good question. I guess I'd expect performance loss in random weird places
 9317:50 <ryanofsky> Especially the GUI does a lot of weird things querying a lot of unnecessary information from the wallet
 9417:50 <ryanofsky> I wouldn't be surprised if IBD was slow, or if some other random thing was slow
 9517:52 <ryanofsky> Q6 is a code question
 9617:52 <ryanofsky> Q6 Where are the `spawnProcess` calls added in the "Make bitcoin-gui spawn..." and "Make bitcoin-node spawn..." commits? What type of interface pointer is requested from each of the spawned processes?
 9717:53 <ryanofsky> https://github.com/bitcoin/bitcoin/pull/10102/commits/11df1f8701f4f693b0c2fbdc0000649a625ec150
 9817:53 <ryanofsky> https://github.com/bitcoin/bitcoin/pull/10102/commits/188de5680348dfb6993d4f8c0a43437eb1436ffd
 9917:54 <lightlike> wallet/init.cpp the wallet is spawned when it didn't already exist
10017:54 <ryanofsky> lightlike, yes exactly
10117:54 <lightlike> I think the requested pointer is std::unique_ptr<interfaces::Init>
10217:54 <ryanofsky> Yes!
10317:55 <lightlike> will that wallet init code try something if we compile without a wallet?
10417:55 <ryanofsky> If you are running `bitcoind` executable `auto wallet_client = node.init->makeWalletClient(*node.chain);` is not null, so no need to call spawnProcess
10517:56 <ryanofsky> lightlike, if compiled with --disable-wallet, this code isn't compiled at all, and src/dummywallet.cpp is used instead
10617:56 <lightlike> ah, thanks!
10717:56 <michaelfolkson> So --enable-multiprocess and --disable-wallet configure options
10817:56 <ryanofsky> If you are running `bitcoin-node` `wallet_client` is null, so call to spawnProcess is made instead
10917:57 <ryanofsky> The idea is bitcoind has wallet code linked in, doesn't need to spawn anything. bitcoin-node doesn't have wallet code linked in so needs to spawn a bitcoin-wallet process
11017:58 <ryanofsky> Regardless of which happens, the node code calls wallet code with an interfaces::WalletClient pointer
11117:58 <ryanofsky> But in one case the calls do IPC communicate, in the other case IPC is skipped and everything happen locally
11218:00 <lightlike> That makes sense!
11318:00 <ryanofsky> Hours up but feel free to ask me any questions. I learned I need to actually do something about the WalletClient name (this complaint has been made before!)
11418:01 <lightlike> thanks ryanofsky, that was really interesting!
11518:01 <ryanofsky> Thanks you for putting in work to understand this and bringing up great points!
11618:02 <svav> Thanks ryanofsky and all
11718:02 <ryanofsky> Maybe WalletLoader would be a good name instead of WalletClient...
11818:03 <michaelfolkson> One final question. Could you re-introduce non security critical shared state between the wallet and node just for say addresses that the wallet is tracking? To reduce the chatter?
11918:04 <michaelfolkson> I get it is going back in the opposite direction (!) but just wondering if there is a grey area that is optimal
12018:04 <ryanofsky> michaelfolkson, yes that would be possible and it could reduce amount of traffic substantially
12118:04 <lightlike> michaelfolkson: you mean as an alternative way enabled only for non-multiprocess builds? wouldn't otherwise the multiprocess build be impossible?
12218:05 <ryanofsky> lightlike, probably you would only do this for multiprocess builds and leave single process case alone
12318:06 <michaelfolkson> Would a multiprocess build with some limited shared state between multiple processes be impossible?
12418:06 <ryanofsky> wallet code would tell node code what addresses it cares about
12518:07 <lightlike> isn't one idea of the mulitprocess build that it could also apply in situations where sharing state in other ways would be physically impossible (like different processes running on different computers)?
12618:07 <michaelfolkson> The shared state could be stored on both machines?!
12718:07 <ryanofsky> Easiest way to implement shared state would be to have it reside inside the bitcoin-node process. Of course it would be technically possible to have it somewhere else too though
12818:08 <ryanofsky> lightlike, I think I'm just talking about "shared state" at a high level, like what information each process knows about. Not shared memory literally
12918:09 <lightlike> oh ok, understood, i thought about shared memory.
13018:10 <ryanofsky> Yeah, I would avoid using shared memory / shared files by default unless there was some big advantage to using that because it adds many limitations
13118:13 <michaelfolkson> The wallet can't do anything without the node so it is not like the wallet process can continue doing useful stuff if the node goes down. So the wallet giving the node a bit of extra information that isn't strictly node information could make sense
13218:13 <michaelfolkson> Anyway, really interesting. Many thanks ryanofsky (for all your work on this)
13318:14 <ryanofsky> Thank you!
13418:15 <michaelfolkson> There's some interesting process rearchitecture going on in c-lightning too. I'm also struggling to understand that :)
13518:15 <michaelfolkson> https://btctranscripts.com/c-lightning/2021-11-29-developer-call/#rearchitecting-c-lightning-daemons
13618:18 <ryanofsky> Thanks for pointers, interesting to know about ibd and clightning stuff
13718:19 <michaelfolkson> Although that is daemons vs processes
13818:19 * michaelfolkson looks up definitions
13918:19 <sipa> a daemon is a background process
14018:19 <sipa> the "d" in bitcoind stands for daemon
14118:20 <michaelfolkson> Right but the foreground, background thing isn't clear to me
14218:20 <sipa> like: you don't see it running
14318:21 <sipa> you start it, it keeps running in the background
14418:21 <sipa> but it doesn't show a GUI, or occupy your terminal
14518:23 <lightlike> a bit weird though that bitcoind needs an additional "-daemon" argument in order to actually be a daemon :)
14618:23 <sipa> True!
14718:23 <michaelfolkson> Ha
14818:48 <ryanofsky> FWIW, made pr to rename WalletClient https://github.com/bitcoin/bitcoin/pull/23842