Add importmempool RPC (rpc/rest/zmq)

Host: larryruane  -  PR author: MarcoFalke

The PR branch HEAD was fab8b370257c7770abc32649ad4940eefc512f44 at the time of this review club meeting.


  • The mempool is the list of unconfirmed (pending) transactions. (Rabbit hole warning:) More information on the mempool can be found here

  • Initially, the mempool was stored only in memory, as its name implies. PR 8448 implemented persisting the mempool to disk so that its entries are available after a restart. This PR was merged in v0.14.0.

  • The mempool.dat file, located in the datadir, is a binary file in a proprietary format, making it difficult to edit it manually.

  • The entire mempool is kept in-memory, it is not just a cached subset of a larger data structure.

  • The mempool is flushed to disk when the node shuts down, and also when requested using the savemempool RPC.

  • The -maxmempool configuration option sets mempool size, default is 300 (MB).

  • Specifying the -blocksonly configuration option reduces the -maxmempool default to 5 MB.

  • The getmempoolinfo RPC shows a summary of the local mempool.

  • The getrawmempool RPC displays the full contents of the local mempool.

  • Another way to modify your node’s mempool is using the peer-to-peer network. BIP35 introduced the NetMsgType::MEMPOOL P2P message, which allows a node to request the contents of a peer’s mempool, although this message has mostly fallen out of use; there is a pull request (currently draft) to remove it.

  • This PR adds a new RPC, importmempool, to add the transactions in a given mempool.dat file to the existing mempool.


  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. What are the advantages of persisting the mempool to disk?

  3. Briefly, in your own words, what does this PR do, what would be some of its use cases, and what problems does it solve?

  4. How large is the mainnet mempool.dat file on your system? Does this size differ significantly from the -maxmempool setting? If so, why?

  5. What happens if the imported mempool file contains transactions that are already in the mempool?

  6. What happens if the -maxmempool configuration value is too small to accommodate the imported file?

  7. The RPC arguments include three boolean options:
    • use_current_time
    • apply_fee_delta_priority
    • apply_unbroadcast_set

    What does each of these do, and why they are options to this RPC?

  8. The second commit adds the ImportMempoolOptions struct. What is its purpose?

  9. The PR description states that it’s possible to copy an existing mempool.dat file between two data directories. Does this work even if the architectures are different (for example, 32-bit versus 64-bit, big-endian versus little-endian)?

  10. What are these calls to Ensure*() doing? (These occur in many RPC handlers.)

  11. What does the “1” mean here?

Meeting Log

  117:00 <LarryRuane> #startmeeting
  217:00 <michaelfolkson> hi
  317:00 <LarryRuane> Hi everyone, welcome! Today we'll be discussing
  417:00 <abubakarsadiq> hi
  517:00 <LarryRuane> Feel free to say hi to let everyone know you're here
  617:00 <svanstaa> hi
  717:01 <LarryRuane> any review club first-timers here?
  817:01 <michaelfolkson> Quiet today
  917:01 <ccdle12> hi
 1017:01 <effexzi> Hi every1
 1117:01 <turkycat> hello everyone
 1217:02 <LarryRuane> yes, I think there's some meeting going on that many of the usuals are busy with
 1317:03 <LarryRuane> Today's PR is pretty simple, so I added a few notes items that aren't directly related to the PR, but just for discussion, background, and learning
 1417:03 <LarryRuane> any questions about the notes, or is there anything I got wrong, or you'd like to expand on?
 1517:04 <LarryRuane> I'm not an expert on the mempool, so I may have made some mistakes :)
 1617:05 <michaelfolkson> No first thought was what people will use this for but that's question 3
 1717:05 <LarryRuane> One thing I'd like to make special mention of is the link provided in the first note: ... I wasn't aware of that myself until putting together these notes
 1817:05 <LarryRuane> It seems to be a nice way to search the mailing list discussions!
 1917:06 <LarryRuane> Yes, Michael, we can discuss that right now, what are the use cases for this proposed feature?
 2017:07 <michaelfolkson> No sorry, you can keep to the order :)
 2117:07 <LarryRuane> Or actually, first, what is being proposed here, what is the feature?
 2217:07 <LarryRuane> haha okay, let's first ask, did anyone have a chance to review the PR?
 2317:07 <svanstaa> importting mempool.dat via RPC as opposed to copying it into the .bitcoin dir
 2417:08 <svanstaa> yes
 2517:08 <abubakarsadiq> tested ACK the PR
 2617:08 <svanstaa> built and ran the tests
 2717:08 <LarryRuane> svanstaa: good! how does one use the RPC interface to do this, what exactly is the interface?
 2817:09 <LarryRuane> I guess I'm asking, what are the arguments to this new RPC?
 2917:09 <abubakarsadiq> this PR add rpc call for importing transactions in a mempool.dat file into a node mempool
 3017:09 <svanstaa> bitcoin-cli  importmempool path/mempool.dat
 3117:10 <michaelfolkson> svanstaa: On running the tests
 3217:10 <turkycat> no, didn't have time to review this week.
 3317:11 <LarryRuane> svanstaa: thanks for that link, I hadn't seen that, looks very helpful
 3417:11 <svanstaa> not sure about the meaning of the boolean arguments though
 3517:11 <michaelfolkson> svanstaa: Basically try to go a little further and fiddle around with the relevant tests a bit. Just running them can be of limited value. But doesn't hurt to obvs
 3617:12 <LarryRuane> maybe we can take a little diversion, why should reviewers test a PR when CI is already doing so?
 3717:12 <svanstaa> @Larry it was Michael who posted the link, not me
 3817:14 <michaelfolkson> It can be helpful on strange OSes, strange hardware if not overlapping with CI
 3917:14 <LarryRuane> yes, good answer by michael, his SE answer kind of covers what I was asking
 4017:15 <LarryRuane> also as he says there, it's good to modify the tests slightly if you have enough understanding to do so, and that may uncover new problems
 4117:15 <michaelfolkson> Yup. Especially as tests are changed in this PR
 4217:15 <LarryRuane> I also like running the tests in debuggers (both on the python test and on bitcoind itself) and look around at various points along the execution of the tests, to see if things are as expected by my understanding
 4317:16 <LarryRuane> michaelfolkson: +1
 4417:16 <LarryRuane> let's go back in history a ways, question 2, What are the advantages of persisting the mempool to disk?
 4517:17 <abubakarsadiq> to recover the mempool after restart
 4617:17 <LarryRuane> abubakarsadiq: yes, and why is that useful?
 4717:19 <LarryRuane> actually an even more basic question (i remember wondering this myself when first getting started), why do full nodes even need a mempool *if they're not mining*?
 4817:20 <LarryRuane> it's pretty obvious that miners need a mempool (to assumble a non-empty block so they can get the fees), but why do non-mining nodes want to maintain a mempool? there is some cost, after all
 4917:21 <turkycat> so that nodes can verify the transactions in a block. a single transaction doesn't contain key info like the amount of the input being spent or the scriptPubkey for that input. each node can independently verify that an input isn't a double spen
 5017:21 <abubakarsadiq> I might be wrong, since the received broadcasted transaction why not keep it, not to verify it twice when the received new blocks, some transactions in the block might be in their mempool
 5117:22 <michaelfolkson> Some argue they don't need to :) But more efficient if they have already verified all the transactions in a block before they receive details of a mined block
 5217:22 <LarryRuane> turkycat: I don't think that's correct, because there's a separate "coins" database that all full nodes maintain, and that's independent of the mempool, and the coins db is how double-spending is detected
 5317:22 <AlexWiederin> Agree with abubakarsadiq! Probably also reduces the "noise" of transactions going around if only transactions that are not in the mempool are forwarded to peers
 5417:23 <AlexWiederin> But not entirely sure
 5517:23 <LarryRuane> michaelfolkson: yes, that's a great reason ... there's a feature called compact blocks, and the reason they're compact is it's assumed that the receiver of the compact block has already seen and verified the "missing" transactions because they're in the mempool
 5617:24 <LarryRuane> another reason to have a mempool even if you're not mining is fee estimation ... if your own node is contructing a transaction, it needs to decide on a competitive fee
 5717:25 <LarryRuane> you don't want to either underpay or overpay ... the mempool helps a lot with that
 5817:25 <turkycat> I didn't know about the coins db, where is that located? (bit of an aside)
 5917:26 <turkycat> I thought there was only blocks, rev, chainstate, and indexes (including txindex if enabled)
 6017:26 <LarryRuane> abubakarsadiq: yes, there's a script verification "cache" that prevents us from having to re-verify transactions that we've already verified (i don't know much detail on that)
 6117:26 <LarryRuane> turkycat: it's in the `chainstate` subdirectory of the data directory
 6217:27 <turkycat> thanks
 6317:27 <LarryRuane> oh you mentioned it already, ... it's not the most intuitively named!
 6417:28 <LarryRuane> also i think having mempools within full nodes helps with transaction relay, as @AlexWiederin said (i think)
 6517:28 <LarryRuane> so if we don't persist the mempool to disk, then when we restart, we'd have the problems we just mentioned, because we have forgotten all about the mempool
 6617:29 <abubakarsadiq> why does it require upto 300mb as default
 6717:29 <LarryRuane> michaelfolkson: you may know this, correct me if i'm wrong, but maintaining a mempool also lets us construct transactions that use unconfirmed (mempool) transactions as inputs
 6817:30 <LarryRuane> abubakarsadiq: that's a great question! I have an idea, but anyone else want to answer that?
 6917:31 <michaelfolkson> LarryRuane: Huh yeah hadn't thought of that. If the parent transaction wasn't created by our wallet I guess. The wallet would trust transactions it itself had constructed
 7017:31 <LarryRuane> abubakarsadiq: yes the 300mb mempool size, my first impression is it seems kind of small by today's hardware standards, doesn't it?
 7117:32 <abubakarsadiq> for me it's kind of too much
 7217:32 <LarryRuane> that 300mb is *memory* size, by the way, not just the sum of the transactions as they appear on the wire or on disk ... the memory size is quite a bit larger, anyone know why?
 7317:32 <LarryRuane> *sum of the transactions SIZES i should have said
 7417:33 <svanstaa> because deserialized transactions take up more space
 7517:35 <LarryRuane> svanstaa: yes exactly! in C and C++, `struct` variables often have "holes" in them because of alignment requirements.. so for example, if an object is a byte plus an 8-byte integer, that serialzes to a 9-byte stream ...
 7617:35 <svanstaa> abubakarsadiq 300MB is less than 100 blocks... it happens that it gets filled up completely
 7717:36 <LarryRuane> but when stored in memory, the struct is padded out to the "alignment" of the struct, which in this case would be 8 bytes, so that struct would need 16 bytes in memory
 7817:37 <svanstaa> Is there a reaon why it is 300MB? Or just an arbitrary value? Would it hurt to increase the default value?
 7917:37 <LarryRuane> svanstaa: yes, that 300mb translates to i think around 150mb of transactions, which is somewhere around 100 blocks as you say, maybe a little less
 8017:38 <LarryRuane> so if we go with a rough estimate of 100 blocks worth of transactions, that's a LOT of blocks, the next block will be constructed from around the top 1% (by fee) of the mempool
 8117:39 <LarryRuane> in other words, 300mb is a lot from that perspective (which is what @abubakarsadiq said too)
 8217:39 <michaelfolkson> Just a default obvs, the user can lower it
 8317:40 <LarryRuane> svanstaa: i think the 300mb is a balance between nodes having a mempool that is pretty similar to miners' mempools, and also being a size that most nodes (even on a raspberry pi) can do
 8417:40 <LarryRuane> miners have an incentive to have a larger mempool, can anyone say why?
 8517:40 <yashraj> why does -blocksonly lower it all the way to 5 MB?
 8617:41 <AlexWiederin> LarryRuane to pick the one with the best fees?
 8717:41 <AlexWiederin> *pick the ones
 8817:41 <svanstaa> LarryRuane larger pool to pick high fee tranascations from
 8917:41 <svanstaa> *transactions
 9017:41 <LarryRuane> AlexWiederin: yes but, even with a 300MB mempool, they're only taking roughly the top 1%, so that will be the same even with a 500mb mempool
 9117:42 <LarryRuane> i think the reason may be (or at least one i thought of) is in case there are no new transactions being generated for an extended period of time, just a dropoff in demand...
 9217:43 <svanstaa> so what happens if my mempool is full, and I see another (high  fee) transaction incoming, would my node replace one of the txns currently in the mempool?
 9317:43 <michaelfolkson> yashraj: Blocksonly doesn't participate in transaction relay
 9417:43 <LarryRuane> the mempool will slowly shrink as miners produce blocks ... so a miner would hate to completely run out of transactions to include in the block, because they'd like to get at least SOME fees
 9517:44 <LarryRuane> michaelfolkson: thanks, that's a very good one!
 9617:44 <yashraj> ah crap, thanks michael
 9717:45 <michaelfolkson> yashraj: So yeah just verifying blocks right Larry? Not maintaining a mempool?
 9817:45 <LarryRuane> block explorer nodes, by the way, usually run a larger mempool, right now shows that the mempool is 279 out of 300mb
 9917:45 <michaelfolkson> If no mempool only needs something minimal like 5MB
10017:45 <LarryRuane> but i've seen recently where the mempool size is greater than 300, it might say 400 / 300
10117:46 <LarryRuane> the way that explorer knows that is by running a larger mempool
10217:47 <LarryRuane> because of the inscription stuff going on recently, the mempool has exceeded 300mb (for those nodes that configured a higher value, obviously)
10317:47 <LarryRuane> what happens if a node receives more transactions that it can fit in its mempool?
10417:47 <LarryRuane> *than it can fit
10517:48 <svanstaa> yeah that was my question. Does it kick out lower fee txns in favour of higher fee ones?
10617:48 <yashraj> kick lower fees ones?
10717:48 <LarryRuane> yashraj: yes, exactly... technically, the lowest *feerate* transactions are dropped
10817:49 <svanstaa> that way 300MB should be completely sufficient
10917:49 <LarryRuane> and actually tells you the min feerate needed to stay in a 300mb mempool
11017:50 <yashraj> of course, thanks.
11117:50 <michaelfolkson> I think package relay would incentivize the need for larger default mempools if your machine can handle it, not resource constrained
11217:50 <abubakarsadiq> a basic question, what will make a transaction to be dropped from the mempool
11317:51 <michaelfolkson> Ideally you'd keep those low fee transactions around rather than booting them
11417:51 <svanstaa> are we going to talk about the PR a little more? I was wondering about the meaning of these options: use_current_time
11517:51 <svanstaa> apply_fee_delta_priority
11617:51 <svanstaa> apply_unbroadcast_set
11717:51 <LarryRuane> @michaelfolkson pointed out that the mempool size is configurable, but there's an advantage to leaving it default (even if you have a lot of memory), which is that your mempool will be similar to that of other nodes, and that makes fee estimation more accurate, and tx relay more efficient
11817:51 <LarryRuane> svanstaa: yes, sorry! all these sidetracks ... anyone want to say what those options are for?
11917:52 <LarryRuane> i can answer the first one, the mempool records the time each tx entered, because when a tx gets to be 2 weeks old, it gets dropped, no matter what its feerate is
12017:52 <LarryRuane> (and even if there's room to keep it)
12117:53 <LarryRuane> so when you're importing a mempool, do you want to reset the tx entrance times to the current time? or use the times stored in the imported mempool.dat?
12217:53 <LarryRuane> i think the author of the PR wanted to let the user decide
12317:53 <abubakarsadiq> i dont know about use_current_time but apply_fee_rate option to true means the transactions will be prioritize based delta fee, while importing transactions with high delta fee rates are prioritize
12417:53 <LarryRuane> (in some situations one might be better than the other, or opposite)
12517:54 <svanstaa> so default is true:  use the current system time
12617:54 <svanstaa> which means the txns are made to appear like they have just been broadcasted?
12717:55 <LarryRuane> so there's an RPC to artifically change a mempool tx's feerate, `prioritisetransaction`
12817:55 <LarryRuane> svanstaa: yes ... so it's as if they've been been relayed to us over the p2p network
12917:56 <svanstaa> and this would 'extend' the expiry date of two weeks. In which circumstances is this favourable?
13017:56 <abubakarsadiq> apply_unbroadcast_set option to true, while importing unbroadcasted transactions will be added to ubroadcast set, if it's false it will not be added
13117:56 <LarryRuane> this `prioritisetransaction` value (per-tx) is also stored in `mempool.dat`, so do we want to import those from the file? or let them be zero?
13217:56 <svanstaa> why extend their lifetime?
13317:57 <LarryRuane> svanstaa: i don't know, guess it depends on exactly why you're importing transactions in the first place
13417:57 <LarryRuane> abubakarsadiq: yes ... what are unbroadcast transactions, anyone?
13517:58 <svanstaa> feels like you are pouring zombie transactions into the network, but thats just my gut feeling :)
13617:58 <svanstaa> got to ask Marco about it
13717:58 <LarryRuane> yes you could ask on the PR
13817:58 <LarryRuane> why is `use_current_time` defaulting to true?
13917:59 <michaelfolkson> We didn't answer the use case question yet right? I'm assuming testing, I can't think of why a user would want to do this
14017:59 <yashraj> maybe i kicked low-fee-rate txs coz memool was full, now it's empty so give them another chance?
14118:00 <LarryRuane> michaelfolkson: thanks, good point.. I think it may be if you're an enterprise and want to spin up a new node, and make it effective ASAP... you could copy a mempool.dat from one of your existing nodes
14218:00 <abubakarsadiq> Unbroadcast transactions are transactions that have been created and signed but not yet broadcasted to the network, how do they get to the mempool?
14318:00 <LarryRuane> welp, guess that's all we have time for
14418:00 <LarryRuane> #endmeeting
14518:00 <michaelfolkson> LarryRuane: Ok, thanks
14618:00 <LarryRuane> but feel free to stick around (i will) to keep discussing
14718:00 <svanstaa> thanks, everyone :)
14818:00 <LarryRuane> sorry we didn't have time to get through all the questions
14918:01 <yashraj> great stuff thanks :larryruane I loved the mini-detours
15018:01 <AlexWiederin> Thanks all
15118:02 <abubakarsadiq> larryRuane:thanks
15218:02 <yashraj> thanks :michael gotta read that SE answer again
15318:02 <LarryRuane> abubakarsadiq: unbroadcast has a very specific meaning, see this answer by @michaelfolkson
15418:02 <michaelfolkson> yashraj: Sure don't expect you to be able to read it during the meeting :)
15518:03 <michaelfolkson> (For later)
15618:03 <LarryRuane> yes @michaelfolkson does an amazing job on stackexchange!
15718:03 <michaelfolkson> Maybe if your mempool was corrupted you'd want to import a different mempool from another of your nodes. Can't imagine that happens too much
15818:04 <michaelfolkson> LarryRuane: Ha thanks
15918:04 <michaelfolkson> I'll ask on the PR, people on the PR seem to want it
16018:04 <LarryRuane> michaelfolkson: regarding your SE answer there about unbroadcast ... is that concept only applicable to transactions that WE originate?
16118:04 <LarryRuane> michaelfolkson: +1
16218:06 <michaelfolkson> LarryRuane: Yeah I think so. You don't care really if it isn't your transaction
16318:06 <michaelfolkson> Not your responsibility. But if your transaction isn't propagating that is your problem
16418:06 <abubakarsadiq> thanks for the link larryRuane
16518:06 <LarryRuane> got it, that makes sense.. if we received the tx on the p2p network (not one we're originating), then we know that it has been relayed (at least to us)
16618:07 <michaelfolkson> Yeah "initial broadcast"
16718:07 <michaelfolkson> LarryRuane: Right
16818:08 <abubakarsadiq> +1 michealfolkson: i understand
16918:09 <LarryRuane> what about question 4: How large is the mainnet mempool.dat file on your system? Does this size differ significantly from the -maxmempool setting? If so, why?
17018:09 <LarryRuane> on my system, it's only 89mb
17118:10 <abubakarsadiq> I did not check that on mine
17218:10 <yashraj> checking
17318:10 <abubakarsadiq> currently on signet for testing purposes
17418:11 <LarryRuane> which is only only about 29% of 300mb ... so most of the 300mb is indeed deserialization overhead, and also (forgot to mention this earlier), index overhead (because lots of fast lookups are needed for mempool items)
17518:11 <LarryRuane> there's a complicated map (container) that lets you look up mempool txes in various ways, such as largest feerate
17618:12 <yashraj> wtf is happening with mine? node window says 277 MB...the file itself only 131 KB
17718:13 <LarryRuane> oh wait sorry, on that system (with the 89mb mempool.dat), the configuration is: `maxmempool=250`
17818:13 <LarryRuane> (this is a raspi system, so not a lot of memory, they reduce it slightly)
17918:14 <LarryRuane> so the mempool.dat file size is 35% of 300mb (not 29% as i said earlier)
18018:15 <LarryRuane> yashraj: what is your `maxmempool`?
18118:15 <LarryRuane> is that what "node window" means? i'm not familiar with "node window"
18218:16 <yashraj> haha when I press cmd+I in gui
18318:16 <yashraj> Information tab on that window
18418:16 <LarryRuane> oh i almost never use the gui, i should try that
18518:16 <LarryRuane> you can look for `maxmempool` in your bitcoin.conf file, what does it say?
18618:17 <yashraj> it's not set in my conf default?
18718:18 <abubakarsadiq> if it's not set then it's probably the default
18818:18 <LarryRuane> yes, then default, 300
18918:18 <LarryRuane> 141kb is incredibly small!
19018:18 <LarryRuane> is that mainnet?
19118:18 <LarryRuane> if it's regtest or testnet or signet, then it might make sense
19218:19 <yashraj> mainnet...block height shows 787114
19318:20 <yashraj> is my OS counting it differently?
19418:20 <LarryRuane> are you running `ls -l` on the mempool.dat file?
19518:21 <LarryRuane> (to get that number, 141kb)
19618:23 <yashraj> was doing it from Finder but running ls -l gives similar value
19718:25 <LarryRuane> try `bitcoin-cli getmempoolinfo`, what does that say?
19818:28 <yashraj> RPC tells me "size": 129831, "usage": 274325152, "maxmempool": 3000000
19918:29 <yashraj> not sure what size means
20018:34 <LarryRuane> i think size is the number of transactions
20118:34 <LarryRuane> is there a "bytes" value shown?
20218:34 <yashraj> "bytes" : 47345422
20318:36 <LarryRuane> i think that's large your mempool is configured to be, that's even greater than the default?
20418:37 <yashraj> what does that mean? maxmempool is 300 usage is ~270 what is bytes?
20518:38 <LarryRuane> oh wait here's what we should look at:
20618:38 <yashraj> also found this:
20718:39 <LarryRuane> good find, that's easier to read!
20818:39 <yashraj> {RPCResult::Type::NUM, "bytes", "Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted"}, so witness stuff is causing this?
20918:40 <yashraj> mismatch is count
21018:41 <LarryRuane> your "usage" seems about right.. (274 mb or so)
21118:42 <LarryRuane> the in-memory mempool doesn't get flushed out until you shut down the node, can you do a clean shutdown and see if the file size increases?
21218:43 <yashraj> bingo, now shows 94 MB
21318:44 <yashraj> had this qn for the past half hour that why're we looking at mempool.dat if the mempool is in memory?
21418:44 <LarryRuane> cool! so maybe when your node started the previous time, it wasn't fully synced with the blockchain?
21518:45 <yashraj> how did you get the right number straight away?
21618:45 <LarryRuane> my node was already synced
21718:45 <yashraj> mine was synced too
21818:45 <LarryRuane> (the most recent time it shutdown)
21918:45 <LarryRuane> hmm i don't know then what's going on
22018:45 <yashraj> oh
22118:47 <LarryRuane> actually i'm not sure if the mempool gets flushed out to disk (mempool.dat) other than shutdown... I've been looking at the coins db recently (the chainstate), and i may be getting these two mixed up in my mind
22218:47 <LarryRuane> the coins db doesn't get flushed except during shutdown, that i'm sure of
22318:48 <yashraj> hmm, right when I shut down the node I saw a temp file like or something for a sec then the file size bumped to 94MB
22418:53 <LarryRuane> that makes sense, it writes a new file and then renames it, in case the process (or entire system) crashes halfway through the writes
22518:53 <LarryRuane> the way it's done prevents file corruption