Make orphan processing interruptible (p2p)

https://github.com/bitcoin/bitcoin/pulls/15644

Host: narula  -  PR author: sipa

The PR branch HEAD was 866c805 at the time of this review club meeting.

Notes

  • PR 15644 was merged over a year ago. It changes the way a node looks through its orphan set to figure out if orphans can now be processed and added to the mempool.

  • A transaction is an orphan if we’re missing one or more of its inputs (we refer to the transactions that create the transaction outputs spent by the orphan as parent transactions). This might be a valid transaction, but we don’t know yet. We have to be careful with orphan management because orphans cannot be validated.

  • Orphans are stored in a map called mapOrphanTransactions by their txid. We limit the number of orphan transactions to -maxorphantx which is by default 100.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? You’re always encouraged to put your PR review on GitHub, even after it has been merged.

  2. Why might we want a node to keep track of orphans at all? What does this help with?

  3. Look at the way orphans are added to the orphan maps and how orphans are re-evaluated when we accept new transactions to the mempool. What happens when a node receives a transaction where it has not seen the transaction’s inputs?

  4. Observe that mapOrphanTransactionsByPrev is a map of outpoints to a set of iterators over the orphan transactions map. Why do you think it’s written the way it is, instead of storing, say, orphan txids?

  5. Can you think of a problem with the pre-PR version of the code? What is it and how does the PR fix it? Hint: look at what we are iterating over when processing orphans pre- and post- PR.

  6. How might the adoption of something like SIGHASH_NOINPUT affect orphan processing?

Further Reading

Meeting Log

  117:00 <jnewbery> #startmeeting
  217:00 <jnewbery> hi everyone!
  317:00 <emzy> hi
  417:00 <troygiorshev> hi
  517:00 <willcl_ark> hi
  617:00 <nehan> hi everyone!
  717:00 <swapna> hi eveyone!
  817:00 <michaelfolkson> hi
  917:00 <ariard> hi
 1017:00 <felixweis> hiu
 1117:00 <lightlike> hi
 1217:00 <theStack> hi
 1317:00 <factoid> hi
 1417:00 <gzhao408> hi!
 1517:00 <adiabat> hi
 1617:00 <nehan> thanks for joining today. notes and questions here: https://bitcoincore.reviews/15644.html
 1717:01 <ajonas> hi
 1817:01 <amiti> hi
 1917:01 <nehan> reminder, feel free to ask ANY questions. and let us know if this is your first time at review club!
 2017:01 <nehan> today we're going to talk about orphan processing. who's had a chance to review the pr? (y/n)
 2117:01 <amiti> y
 2217:01 <adiabat> y
 2317:02 <willcl_ark> y
 2417:02 <troygiorshev> y
 2517:02 <theStack> n
 2617:02 <felixweis> n
 2717:02 <jnewbery> y
 2817:02 <lightlike> y
 2917:02 <emzy> y/n
 3017:02 <michaelfolkson> y
 3117:02 <swapna> n
 3217:02 <gzhao408> y
 3317:02 <factoid> y
 3417:03 <nehan> ok. so as stated in the notes, an orphan transaction is one where we are missing one or more of its inputs
 3517:03 <pinheadmz> hi
 3617:03 <nehan> first question: Why might we want a node to keep track of orphans at all? What does this help with?
 3717:03 <nehan> one could imagine just dropping transactions if we can't validate them immediately
 3817:04 <adiabat> It seems like if you don't, you still have to keep track of the fact that you saw it, and not request it again
 3917:04 <willcl_ark> if we already have the orphan then as soon as we have the parent we can validate immediately without needing to receive/request again
 4017:04 <adiabat> but if you only keep track of "this is bad/invalid" then when it's no longer an orphan you won't get it until it's in a block
 4117:04 <emzy> could be that the order they arive are different. So the missing input will arive later.
 4217:04 <michaelfolkson> Though it is removed after 20 mins. Is that right? Seems short time
 4317:05 <nehan> good points. adiabat: given that the orphan map is limited in size, we cannot guarantee we will never request something again
 4417:05 <adiabat> would a different strategy be to hang up on or ban nodes that give you an orphan tx? Are there problems or DoS attacks with doing that?
 4517:06 <amiti> also helps enable CPFP (child pays for parent). if the mempool conditions means the parent isn't accepted, the child would need to be accepted first in order to include both into a block
 4617:06 <jnewbery> adiabat: we can only add txs to 'recent rejects' if we know that they're definitely invalid. Otherwise, there are tx censorship attacks
 4717:06 <gzhao408> seems reasonable to only keep for a short period of time, I'm imagining the case where they're both relayed and you just happen to receive the child first
 4817:06 <nehan> adiabat: good question! what do people think?
 4917:06 <ariard> well the sending node may not be the one responsible for the tx being an orphan at reception, parent might have been replaced
 5017:06 <nehan> i think that would be a bad idea, especially on startup, when your mempool is empty and you might receive transactions out of order
 5117:06 <factoid> ariard +1
 5217:07 <jnewbery> nehan: +1. Orphans are most common at startup or when making new connections
 5317:07 <sipa> orphan transactions are not a sign of misbehavior; they arise naturally due to variations in network connections
 5417:07 <adiabat> it does seem to put a lot of responsibility on the sending node, they'd need to make sure to send all invs in order
 5517:07 <nehan> and a sending node just doesn't know what its peer has seen/not seen
 5617:07 <ariard> they do send inv in topology-order in SendMessages IIRC
 5717:07 <nehan> unless it sent it itself!
 5817:07 <sipa> and the sending node cannot know if perpahs they already gave you the parent, but you replaced it!
 5917:07 <nehan> or that ^
 6017:07 <nehan> ok let's move on to how orphans are handled
 6117:08 <amiti> also a node might request a parent from one peer, child from another & the child is delivered first. doesn't seem like a fault of the peer that delivered!
 6217:08 <nehan> what happens regarding orphans when we accept a new transaction to the mempool?
 6317:08 <jnewbery> adiabat: that is indeed what we do: https://github.com/bitcoin/bitcoin/blob/2c0c3f8e8ca6c64234b1861583f55da2bb385446/src/net_processing.cpp#L4203-L4206
 6417:08 <lightlike> also, if we just discarded the orphans, we might request and discard them again multiple times from different peers even before getting a parent, which seems ineffective.
 6517:08 <gzhao408> a clarification question: how sure are we that an orphan is a valid orphan as opposed to invalid? what validation has it passed before it's called a TX_MISSING_INPUTS?
 6617:09 <sipa> jnewbery, adiabat: but all we can do is _announce_ in topology order; it doesn't guarantee that things are requested/received in the same order when multiple nodes are involved (as amiti points out)
 6717:09 <sipa> gzhao408: none
 6817:09 <sipa> syntactic validity
 6917:09 <ariard> gzhao408: every check in PreChecks before hitting TX_MISSING_INPUTS, like transaction standard size or nLocktime finality
 7017:09 <pinheadmz> nehan when a tx is accpted ot mempool we check it against the map of known orphans
 7117:09 <pinheadmz> to see if it resolves anything
 7217:10 <adiabat> maybe make a new type of INV message that has clumps, like these 4 txids should be requested at once (this is probably a bad idea; just making stuff up :) )
 7317:10 <pinheadmz> with this PR, IIUC, we only resolve one orphan at a time
 7417:10 <ariard> requesting/reception might be biased by your connection type and sending nodes shouldn't make assumptions on receiving peer topology
 7517:10 <sipa> adiabat: there is a very long term plan for that, it's called package relay, but it's nontrivial
 7617:11 <nehan> pinheadmz: yes!
 7717:11 <sipa> adiabat: and it's indeed probably the only complete solution to orphan processing and a number of other issues
 7817:11 <jnewbery> adiabat: we're getting a bit off-topic, but the idea you're referring to is called 'package relay'
 7917:11 <factoid> gzhao408 sipa ariard, it's the case, isn't it, that we can be certain a transaction is invalid, we can't be certain it is valid -- right?
 8017:11 <michaelfolkson> It depends how "expensive" certain operations are. Rejecting it and then requesting the orphan transaction again has to be compared to storing and retrieving from memory
 8117:11 <pinheadmz> factoid somethings maybe like MAXMONEY etc
 8217:11 <pinheadmz> but most TX checks require context (utxo set)
 8317:11 <adiabat> sipa, jnewbery: oh cool that it isn't inherenly a bad idea but yes sounds complicated & getting of topic, thanks
 8417:11 <nehan> what happens if the orphan we are checking is now accepted to the mempool?
 8517:12 <nehan> (it has been un-orphaned)
 8617:12 <willcl_ark> it waits for a parent to arrive which validates it
 8717:12 <ariard> factoid: validity might change due to order of transaction reception or your block tip, but once it's a valid one it should be guaranteed to stay so
 8817:12 <nehan> willcl_ark: not exactly. that's what just happened (a parent arrived that enabled us to validate it; it was valid; we put it in the mempool)
 8917:13 <pinheadmz> nehan do we check the orphanmap again for more decesndants?
 9017:13 <ariard> adiabat: fyi https://bitcoincore.reviews/15644.html
 9117:13 <nehan> pinheadmz: yes! this newly unorphaned transaction might be a parent to _other_ orphan transactions!
 9217:13 <ariard> pinheadmz: yes we do recursively call ProcessOprhanTx IIRC
 9317:13 <amiti> pinheadmz: +1
 9417:14 <factoid> uh oh >:)
 9517:14 <factoid> the harbinger to resource exhaustion?
 9617:14 <nehan> yeah so we see that we need to be careful here.
 9717:15 <nehan> what happens when a node receives a transaction and it hasn't seen one or more of the transaction's inputs?
 9817:15 <nehan> (basically, how do orphans get processed and saved)
 9917:16 <michaelfolkson> Stored in mapOrphanTransactions
10017:16 <jnewbery> nehan: we add them to a global map, but there are also a few other data structures involved
10117:17 <nehan> michaelfolkson: jnewbery: yep
10217:17 <pinheadmz> and the map is limited to size of 100
10317:18 <sipa> and each orphan is limited in size
10417:18 <nehan> https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L916
10517:19 <pinheadmz> sipa ah didnt know - so if a tx is too big we wont even store it in the map?
10617:19 <nehan> that is done in AddOrphanTx. it ignores large orphans. the other important datastructure is mapOrphanTransactionsByPrev. how does that work?
10717:19 <pinheadmz> just a lost cause
10817:19 <pinheadmz> 100k is pretty fair i guess
10917:19 <nehan> pinheadmz: we could process it again once we get its parents
11017:20 <michaelfolkson> But removed at random rather than by lowest fee when full and also removed after 20 minutes. Was unsure why
11117:20 <michaelfolkson> 20 minutes by default, obviously can change that if you want
11217:20 <sipa> orphan transactions are primarily just a way to resolve things that are sent out of order
11317:21 <sipa> usually, when they happen, they resolve immediately
11417:21 <pinheadmz> nehan mapOrphanTransactionsByPrev is used to match new incoming TX with orphans it resolves
11517:21 <sipa> (as we request the parent)
11617:21 <nehan> i *think* that saving orphans is not required for correct operation. it's an optimization.
11717:21 <nehan> someone should check me on that
11817:21 <pinheadmz> nehan the reason a tx is orhpaned is bc we dont know the coins (prevs) its spending - so when a tx comes in that creates those prevs, we can find the orphan and REUNITE THE FAMILY
11917:21 <sipa> nehan: well that depends on how you define correct operation; ignoring every tx announcement is arguably equally correct
12017:22 <adiabat> yeah blocksonly works fine, as long as not everyone does that
12117:22 <nehan> sipa: so it would require some other way to hear about invs again
12217:22 <jnewbery> nehan: you're correct. We don't make any guarantees with orphans. They do help you get a better view of the mempool more quickly, and they may help with compact block relay
12317:22 <sipa> nehan: but sure, orphan processing is a best effort, and its primary motivation is making sure compact block reconstruction doesn't need a roundtrip because of a just-missed out of order relay
12417:22 <nehan> anyone want to answer the mapOrphanTransactionsByPrev question? Why is it a map of iterators?
12517:23 <sipa> iterators are smaller than uint256s
12617:23 <sipa> (8 vs 32 bytes)
12717:23 <aj> nehan: (map of sets of iterators)
12817:23 <nehan> sipa: yes. iterators are 8 bytes (on my platform) vs 32
12917:24 <factoid> If we don't know the the market is for transactions bidding into blocks, they other fee estimations will break -- so that's important to make sure orphans are represent in those stats
13017:24 <amiti> nehan: mapOrphanTransactionsByPrev is a map of outpoint -> to iterators, so it can point to the relevant entries on the orphan list. I thought for quicker access
13117:24 <nehan> amiti: yes that is also true! it saves you a lookup into mapOrphanTransactions. but that is limited in size to 100 so imho that's not necessarily worth optimizing
13217:25 <amiti> yeah fair
13317:25 <jnewbery> if it were txids, we'd just be using those txids to index into mapOrphanTransactions, so we just store the iterators
13417:25 <nehan> aj: yes good point
13517:25 <sipa> the point of having the byPrev map is to quickly find potential dependent orphans that can get reprocessed, without needing to walk the entire set
13617:25 <ariard> factoid: I think orphans aren't processed by fee estimator as we can't' know their validity and otherwise that would be a vector to manipulate your feerate
13717:26 <sipa> it being a map of iterators is both a size savings, and also avoids another lookup in the primary map (compared to it storing a uint256)
13817:27 <nehan> there are a couple of other interesting things to point out: if we receive an orphan from a peer, that peer is the best one to ask about the parents. cause it shouldn't be relaying a txn if it can't validate it!
13917:27 <jnewbery> important to note: mapOrphanTransactions is a std::map, so inserting or erasing elements doesn't invalidate iterators into the map (except an iterator to the erased element)
14017:28 <factoid> ariard ah yeah -- I suppose I meant if we have a delayed (maybe it's negligible amount of time) processing of a txn from orphan to valid, that'll delay a node's view of what the block-space market might be
14117:28 <nehan> jnewbery: good point
14217:28 <sipa> ariard: damn, i just realized you've responding to the nickname factoid here, instead of just dropping random factoids
14317:29 <factoid> ok fixed
14417:29 <factoid> oops
14517:29 <factoid> maybe now