Transactions in Bitcoin are gossiped by
inv messages (a list of tx hashes) after they have been validated by the node. Peers can respond to these inv
messages with a getdata message to request the transaction.
Bitcoin Core keeps a “relay memory” of all transactions that were sent out as invs for 15 minutes. getdata
requests are answered from this relay memory.
A recent change (#16851) in the master branch of Bitcoin Core
enables relay of transactions that have been removed from the relay memory. The motivation for this change was that
low fee transactions that either didn’t make the inital relay or were evicted later on can now be requested. This
allows for example to bump a low fee transaction higher in the mempool with a CPFP transaction.
Also, it makes it possible for nodes to catch up on parent transaction when the node has been started with an empty
mempool.
Transactions that made it into the relay memory stay there for 15 minutes, even if they have been included in a block
or have been replaced by a higher fee transaction in the mempool. The proposed change
(#17303)) removes the relay memory and uses the mempool exclusively as
the relay memory. The change claims to improve privacy.
What is a shared pointer and why are transactions in Bitcoin Core generally managed as a shared pointer
(CTransactionRef)?
What is the difference between the memory pool (mempool) and the relay memory (mapRelay)?
Is there an upper bound on the memory usage of the relay memory (mapRelay)? Under what circumstances is the expected
memory usage low, under what circumstances is the memory usage high? (Hint: CTransactionRef is a shared pointer)
Transactions are added to mapRelay when they are announced via an inv. Does mapRelay keep track of which peers
it announced a transaction to? If no, how does that affect transaction relay privacy?
Are there any remaining transaction relay privacy issues after this change?
Do you think this change can be tested efficiently with unit or functional tests and why?
<@jnewbery> Please just jump right in! The host will prompt the conversation with the questions at https://bitcoincore.reviews/, but don't feel like you need to wait until after those questions to make your comment or ask your question.
<@jnewbery> We can have multiple threads going at the same time. If it gets too confusing the host can step in and ask people to address one topic first, but in general just go for it.
<@jnewbery> Also, don't ask to ask questions. There's no need to say things like "I have a question that I think might be off topic" or "is it ok if I ask my question now?". Just jump right in and ask. If it's off-topic, then people will tell you!
<@jnewbery> < pinheadmz> 1 why is the loglevel for the testframework not `debug` (or even `spam`) by default? Today's PR modifies a test where I found the `debug` level output especially helpful
<@jnewbery> < pinheadmz> 2 the python tests often have random elements or non-deterministic elements. Today's test runs 100 times and breaks once a certain case is found -- 100 iterations make the expected behavior very likely, but couldn't the tests be written more acutely?
<@jnewbery> < MarcoFalke> re 1) log level of debug by default would be a bit spammy in the normal case (no failures expected). If you debug a test, you are assumed to either set -l DEBUG or combine the logs after a failure has been detected, in which case the debug messages are included as well
<ajonas> So relay transaction seems like sort of an odd thing (was added for caching purposes?) and generally removing it simplifies the logic but also I'm a little unclear as to why this was added in the beginning
<MarcoFalke> fjahr: Good point. However, I think mempool and relay are too closely coupled to achieve good tx-origin privacy . There are a lot of tricks you can play to leak privacy
<MarcoFalke> I think the main goal of the mapRelay expiry is to have some upper bound on its memory usage, which is achieved by the tight loop. (To some extend)
<MarcoFalke> jonatack: It is a shared_ptr to minimize mental load. I think a unique_ptr makes most sense when there is exaclty one owner, but in Bitcoin Core there can be many owners of a single transaction: E.g. (1) the most recent block, (2) the mempool, (3) mapRelay, (4) anything I forgot?
<ajonas> am I right with a unique pointer once it's destroyed in one place it is reclaimed, but shared is all have to destroy it in order to deallocate?
<MarcoFalke> ajonas: The shared pointer uses a reference count as pointed out by jkczyz. This is a slight overhead over unique_ptr, but gives more flexibility
<MarcoFalke> ok, next question: Is there an upper bound on the memory usage of the relay memory (mapRelay)? Under what circumstances is the expected memory usage low, under what circumstances is the memory usage high? (Hint: CTransactionRef is a shared pointer)
<pinheadmz> hmm all this talk about sahred pointers - maprealy shouldn't increase mem usage at all except in the case that mempool evicts a tx that maprelay still hangs on to
<amiti> could mapRelay theoretically increase memory usage a lot if theres a lot of churn in the mempool? eg. small mempool and txns are getting INVed but then kicked out really rapidly
<amiti> sdaftuar mentions these changes can turn nodes with small mempools into "accidental DoSers", which I didn't fully understand.. so a node would send an INV to its peer, peer asks for GETDATA, during that time the txn has been evicted from the node's mempool so then it responds with NOTFOUND. why is this a DoS though? is it bc the peer waits until the request times out before asking another node for the tx?
<MarcoFalke> amiti: more specifically, we ignore NOTFOUND from peers to the extend that we don't use them to reschedule the tx download from a different peer
<@jnewbery> amiti: also look at https://github.com/bitcoin/bitcoin/pull/15834. Before that, there was a bug that if a node INVed us a transaction and then didn't respond to the GETDATA, it wouldn't clear from our tx-in-flight map
<MarcoFalke> Transactions are added to mapRelay when they are announced via an inv. Does mapRelay keep track of which peers it announced a transaction to? If no, how does that affect transaction relay privacy?
<amiti> I believe this PR added the param `m_last_inv_sent` to keep track of the time the last set of INVs was announced to a peer, but doesn't keep track of specifically which one
<pinheadmz> but i feel like i must be missing something, bc we woldnt send the same tx to a peer twice, or if it was the peer that sent it to us in the first place...
<ajonas> basically flood network with tx_a and then selectively announce tx_b (RBF of tx_a). Start querying for tx_a - whoever responds with NOTFOUND received tx_b which means the node is closer to the node target.
<amiti> I think currently the attack would be to have lots of connections to nodes on the network & send out GETDATAs for any new txn to identify which nodes already have the txn
<MarcoFalke> < pinheadmz> 2 the python tests often have random elements or non-deterministic elements. Today's test runs 100 times and breaks once a certain case is found -- 100 iterations make the expected behavior very likely, but couldn't the tests be written more acutely?
<pinheadmz> i downloaded the PR and tried the test with and without the change. also tried the new test against the current code (without PR) nothing really changes there
<MarcoFalke> amiti: I think that my set of changes should be kept to a minimum that is still an improvement. I don't claim to fix all privacy issues and I think even fixing one is reason enough
<jonatack> MarcoFalke: do you think the new tests cover the change of behavior sufficiently? What do you think about the current state of our p2p testing?
<@jnewbery> I'll try to get notes and questions up for next week's meeting before the weekend. I'm travelling so I might not be able to attend but I think I have a host lined up