The mempool is the list of unconfirmed (pending) transactions.
(Rabbit hole warning:) More information on the mempool can be found
here
Initially, the mempool was stored only in memory, as its name implies.
PR 8448
implemented persisting the mempool to disk so that its entries are
available after a restart. This PR was merged in
v0.14.0.
The mempool.dat file, located in the datadir, is a binary file in a
proprietary format, making it difficult to edit it manually.
The entire mempool is kept in-memory, it is not just a cached subset
of a larger data structure.
The mempool is flushed to disk when the node shuts down,
and also when requested using the
savemempool
RPC.
The -maxmempool
configuration option sets mempool size, default is 300 (MB).
Specifying the -blocksonly configuration option
reduces
the -maxmempool default to 5 MB.
The getmempoolinfo
RPC shows a summary of the local mempool.
The getrawmempool
RPC displays the full contents of the local mempool.
Another way to modify your node’s mempool is using the peer-to-peer network.
BIP35
introduced the
NetMsgType::MEMPOOL
P2P message, which allows a node to request the contents of a peer’s mempool,
although this message has mostly fallen out of use; there is a
pull request (currently draft)
to remove it.
This PR adds a new RPC, importmempool, to add the transactions in a given mempool.dat
file to the existing mempool.
The PR description
states that it’s possible to copy an existing mempool.dat file between two
data directories. Does this work even if the architectures are different
(for example, 32-bit versus 64-bit, big-endian versus little-endian)?
What are these
calls to Ensure*()
doing? (These occur in many RPC handlers.)
<LarryRuane> Today's PR is pretty simple, so I added a few notes items that aren't directly related to the PR, but just for discussion, background, and learning
<LarryRuane> One thing I'd like to make special mention of is the link provided in the first note: https://bitcoinsearch.xyz/?q=mempool ... I wasn't aware of that myself until putting together these notes
<michaelfolkson> svanstaa: Basically try to go a little further and fiddle around with the relevant tests a bit. Just running them can be of limited value. But doesn't hurt to obvs
<LarryRuane> also as he says there, it's good to modify the tests slightly if you have enough understanding to do so, and that may uncover new problems
<LarryRuane> I also like running the tests in debuggers (both on the python test and on bitcoind itself) and look around at various points along the execution of the tests, to see if things are as expected by my understanding
<LarryRuane> actually an even more basic question (i remember wondering this myself when first getting started), why do full nodes even need a mempool *if they're not mining*?
<LarryRuane> it's pretty obvious that miners need a mempool (to assumble a non-empty block so they can get the fees), but why do non-mining nodes want to maintain a mempool? there is some cost, after all
<turkycat> so that nodes can verify the transactions in a block. a single transaction doesn't contain key info like the amount of the input being spent or the scriptPubkey for that input. each node can independently verify that an input isn't a double spen
<abubakarsadiq> I might be wrong, since the received broadcasted transaction why not keep it, not to verify it twice when the received new blocks, some transactions in the block might be in their mempool
<michaelfolkson> Some argue they don't need to :) But more efficient if they have already verified all the transactions in a block before they receive details of a mined block
<LarryRuane> turkycat: I don't think that's correct, because there's a separate "coins" database that all full nodes maintain, and that's independent of the mempool, and the coins db is how double-spending is detected
<AlexWiederin> Agree with abubakarsadiq! Probably also reduces the "noise" of transactions going around if only transactions that are not in the mempool are forwarded to peers
<LarryRuane> michaelfolkson: yes, that's a great reason ... there's a feature called compact blocks, and the reason they're compact is it's assumed that the receiver of the compact block has already seen and verified the "missing" transactions because they're in the mempool
<LarryRuane> another reason to have a mempool even if you're not mining is fee estimation ... if your own node is contructing a transaction, it needs to decide on a competitive fee
<LarryRuane> abubakarsadiq: yes, there's a script verification "cache" that prevents us from having to re-verify transactions that we've already verified (i don't know much detail on that)
<LarryRuane> so if we don't persist the mempool to disk, then when we restart, we'd have the problems we just mentioned, because we have forgotten all about the mempool
<LarryRuane> michaelfolkson: you may know this, correct me if i'm wrong, but maintaining a mempool also lets us construct transactions that use unconfirmed (mempool) transactions as inputs
<michaelfolkson> LarryRuane: Huh yeah hadn't thought of that. If the parent transaction wasn't created by our wallet I guess. The wallet would trust transactions it itself had constructed
<LarryRuane> that 300mb is *memory* size, by the way, not just the sum of the transactions as they appear on the wire or on disk ... the memory size is quite a bit larger, anyone know why?
<LarryRuane> svanstaa: yes exactly! in C and C++, `struct` variables often have "holes" in them because of alignment requirements.. so for example, if an object is a byte plus an 8-byte integer, that serialzes to a 9-byte stream ...
<LarryRuane> but when stored in memory, the struct is padded out to the "alignment" of the struct, which in this case would be 8 bytes, so that struct would need 16 bytes in memory
<LarryRuane> svanstaa: yes, that 300mb translates to i think around 150mb of transactions, which is somewhere around 100 blocks as you say, maybe a little less
<LarryRuane> so if we go with a rough estimate of 100 blocks worth of transactions, that's a LOT of blocks, the next block will be constructed from around the top 1% (by fee) of the mempool
<LarryRuane> svanstaa: i think the 300mb is a balance between nodes having a mempool that is pretty similar to miners' mempools, and also being a size that most nodes (even on a raspberry pi) can do
<LarryRuane> AlexWiederin: yes but, even with a 300MB mempool, they're only taking roughly the top 1%, so that will be the same even with a 500mb mempool
<LarryRuane> i think the reason may be (or at least one i thought of) is in case there are no new transactions being generated for an extended period of time, just a dropoff in demand...
<svanstaa> so what happens if my mempool is full, and I see another (high fee) transaction incoming, would my node replace one of the txns currently in the mempool?
<LarryRuane> the mempool will slowly shrink as miners produce blocks ... so a miner would hate to completely run out of transactions to include in the block, because they'd like to get at least SOME fees
<LarryRuane> block explorer nodes, by the way, usually run a larger mempool, right now https://mempool.space/ shows that the mempool is 279 out of 300mb
<LarryRuane> because of the inscription stuff going on recently, the mempool has exceeded 300mb (for those nodes that configured a higher value, obviously)
<LarryRuane> @michaelfolkson pointed out that the mempool size is configurable, but there's an advantage to leaving it default (even if you have a lot of memory), which is that your mempool will be similar to that of other nodes, and that makes fee estimation more accurate, and tx relay more efficient
<LarryRuane> i can answer the first one, the mempool records the time each tx entered, because when a tx gets to be 2 weeks old, it gets dropped, no matter what its feerate is
<LarryRuane> so when you're importing a mempool, do you want to reset the tx entrance times to the current time? or use the times stored in the imported mempool.dat?
<abubakarsadiq> i dont know about use_current_time but apply_fee_rate option to true means the transactions will be prioritize based delta fee, while importing transactions with high delta fee rates are prioritize
<abubakarsadiq> apply_unbroadcast_set option to true, while importing unbroadcasted transactions will be added to ubroadcast set, if it's false it will not be added
<LarryRuane> this `prioritisetransaction` value (per-tx) is also stored in `mempool.dat`, so do we want to import those from the file? or let them be zero?
<LarryRuane> michaelfolkson: thanks, good point.. I think it may be if you're an enterprise and want to spin up a new node, and make it effective ASAP... you could copy a mempool.dat from one of your existing nodes
<abubakarsadiq> Unbroadcast transactions are transactions that have been created and signed but not yet broadcasted to the network, how do they get to the mempool?
<michaelfolkson> Maybe if your mempool was corrupted you'd want to import a different mempool from another of your nodes. Can't imagine that happens too much
<LarryRuane> got it, that makes sense.. if we received the tx on the p2p network (not one we're originating), then we know that it has been relayed (at least to us)
<LarryRuane> what about question 4: How large is the mainnet mempool.dat file on your system? Does this size differ significantly from the -maxmempool setting? If so, why?
<LarryRuane> which is only only about 29% of 300mb ... so most of the 300mb is indeed deserialization overhead, and also (forgot to mention this earlier), index overhead (because lots of fast lookups are needed for mempool items)
<yashraj> {RPCResult::Type::NUM, "bytes", "Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted"}, so witness stuff is causing this?
<LarryRuane> the in-memory mempool doesn't get flushed out until you shut down the node, can you do a clean shutdown and see if the file size increases?
<LarryRuane> actually i'm not sure if the mempool gets flushed out to disk (mempool.dat) other than shutdown... I've been looking at the coins db recently (the chainstate), and i may be getting these two mixed up in my mind