Postscript: see also the #bitcoin-core-dev IRC discussions of
05 Feb 2020 and
06 Feb 2020 below.
What is a wtxid?
BIP
141
(Segregated Witness) introduced the definition of wtxid:
“A new wtxid is defined: the double SHA256 of the new serialization with
witness data.”
“If all txins (transaction inputs) are not witness program, a transaction’s
wtxid is equal to its txid.”
PR #11203 added wtxid to
the mempool entry output of entryToJSON() in src/rpc/blockchain.cpp,
thereby exposing wtxids to callers in the output of RPCs getmempoolentry,
getmempooldescendants, getmempoolancestors, and getrawmempool.
WTXID-based transaction relay
Using the txid (which does not include the witness) is problematic because
the witness can be malleated without changing the txid. See
#8279 for a full discussion
of the issue.
The PR description
contains a very full and clear description of the motivation and changes.
Describe recentRejects: what type of data structure is it, what data does
it contain, and what is it used for? (Hint: git grep -ni rejects).
In your opinion, does this PR save bandwidth for older nodes talking to newer
nodes? What about downloading from old and new peers alike?
According to commit
61e2e97,
using both txid and wtxid-based relay with peers means that we could sometimes
download the same transaction twice, if announced via 2 different hashes from
different peers. What do you think of the heuristic of delaying
txid-peer-GETDATA requests by 2 seconds, if we have at least one wtxid-based
peer?
In this
comment,
Suhas mentions a possible race condition where a peer could send a txid-based
INV to us before it receives the WTXIDRELAY message (changing the relay to be
wtxid-based), which would cause relay of that transaction to fail. What do you
think, and how best should this case be handled?
Do you see any other potential race conditions, DoS vectors, or backward
incompatibilities of this change?
Overall, do you think the potential benefits of these changes merit the
additional complexity and (map relay index) data size, if any?
<jonatack> That's great, but please ask any review questions here on this irc channel, which allows more people to help and also to benefit from the conversation.
<jonatack> We usually start Bitcoin Core IRC meetings with a 'hi' so it's clear who's at keyboard. Feel free to say hi, even if you arrive in the middle of the meeting!
<jonatack> "2. The wtxidrelay message must be sent in response to a VERSION message from a peer whose protocol version is >= 70016, and prior to sending a VERACK."
<jonatack> Question: Describe `recentRejects`: what type of data structure is it, what data does it contain, and what is it used for? (Hint: `git grep -ni rejects`).
<scone> I'll take my chance at summarizing (warning may be inaccurate): In order to improve p2p transaction relay overhead, let's develop a more efficient way of keeping track of transactions. But because of some issues with how tx's are kept track of right now, if we switched to tracking with the Witness txid - better methods would be enabled
<raj_> recentRejects is a bloom filter structure that keep tracks of rejected txs. Previously it only added non segwit txs which are not maleated. Now it adds either invalid segwit txs, or non segwit non maliated txs.
<ajonas> The problem as it currently stands is that we don't punish peers who relay us invalid or DoSy transactions because we don't know for certain whether it's a misbehaving peer or a false positive on the bloom filter
<jonatack> Question: In your opinion, does this PR save bandwidth for older nodes talking to newer nodes? What about downloading from old and new peers alike?
<ajonas> So Suhas mentions this is a concern of his which is why he is experimenting with the delaying download by 2 seconds of txs by txid if we have a wtxid peer
<raj_> It seems to me that old nodes will waste some bandwidth by asking the same tx both by txid and wtxid. New nodes will also face the same issue if they have any old nodes connected. This might be incomplete understanding.
<jonatack> Question: Is it important that feature negotiation of wtxidrelay happen between VERSION and VERACK, to avoid relay problems from switching after a connection is up?
<jonatack> Question: In https://github.com/bitcoin/bitcoin/pull/18044#discussion_r373775416 sdaftuar mentions a possible race condition where a peer could send a txid-based INV to us before it gets this message, which would cause relay of that transaction to fail. Do you agree?
<lightlike> i'd guess it is important to check that the behavior is correct if peers break protocol and send us old txids when we agreed on new mode, and vice versa
<jonatack> lightlike: yes. My initial impression is that tests like p2p_leak and others go in the right direction but can be more exhaustive... I would need to try adding tests to confirm that.
<raj_> yes aggreed with the concept overall. It really doesnt make sense to only see txids while a major chunk of the txs remains un commited to. Changing to wtxids makes a lot of sense. The only issue is correct transition between two phases.
<raj_> I would like to ask the same question here as i asked you personally jonatack. Its obvious that to make better contribution towards code review its essential to develop complete understanding of the different interactions between different structures that are happening, even sometimes asynchrobnnously. Which seems like a daunting hill to climb. So i would like to know how everyone else is going about it? Do
<raj_> thanks, this was a nice pr to discuss. willing to work on it next to provide any further review if i can. I am also willing to work on simulating p2p testing for these type of changes. If someone is already working on such please let me know, would love to extend some helping hand.
<sdaftuar> Is not sending any p2p message between VERSION and VERACK an important thing to do? i was going to add this new wtxidrelay feature negotiation there, but just realized I also would need to change some code that ignores messages received before VERACK, which gave me pause
<sdaftuar> but the issue i wanted to avoid is there being a relay failure in between processing a peer's VERACK (allowing announcement of transactions to that peer, generally) and processing that peer's WTXIDRELAY message (changing the relay to be via wtxid)
<sdaftuar> it's a pretty minor issue IMO, as it is a very short window that this could happen -- but i did notice a test failure due to this, so i thought it better to fix somehow
<sdaftuar> (the intermediate commit, where i tried to move this to before VERACK, was totally busted by the way; i pushed a better version a little while ago)
<jonatack> sdaftuar: in p2p_leak.py it actually states "A node should never send anything other than VERSION/VERACK/REJECT until it's received a VERACK" and test for that
<sdaftuar> sipa: extending the version message would be nice and simple, but i assume no one likes these variable length messages that ensue from that approach?
<sdaftuar> i guess if there's nothing intrinsically wrong with a design where we throw message in between VERSION and VERACK, that seems most extensible to me
<sdaftuar> one option could be to do a bunch more work to support txid or wtxid-based announcements with a peer, so that turning on wtxidrelay on a link is seamless, but that is not clearly worth the effort t ome
<sipa> sdaftuar: perhaps just do it with a message between version and verack (and gated by version number, i guess?), and when things are more ready, discuss on the ML whether that could cause problems
<sdaftuar> possibly i could also just move it to being after VERACK -- in some ways, transaction relay failing between VERACK and negotiation of wtxid relay is no different than transaction relay failing because the connection wasn't set up yet at the time the transaction was announced
<sipa> BIP60 argued that adding optional fields to the version message was problematic, and suggested a solution; even if that solutions wasn't adopted, that does not imply that its concern (optional fields at the end of version) does not matter
<sipa> it still has problems with serialization of deployments (what if two P2P extensions both want to add extra data?)... historically speaking that hasn't been a problem though :p
<sdaftuar> nice-to-have only because any tx-relay protocol change we make in the future (like erlay, or dandelion, etc) should be done on wtxid-based relay
<sdaftuar> sipa: ah yes i have no idea how to do that, if you can advise on how to update the memory calculation better than i did in the PR, please let me know
<sipa> sdaftuar: i have some WIP code that i can probably use to verify whether or current heuristic is accurate... actually replacing it is probably harder
<sdaftuar> i think in the case of package relay though, i might imagine that we'll get those hints, but not in a generic enough way that we could ever git rid of the index
<jeremyrubin> Yeah, I think given that we're going to kill mapTxLinks it's going to be fine (I just don't want people to have a reason not to upgrade to wtxid index)
<sdaftuar> the issue we have with txid relay is that when a peer announces a segwit transaction that doesn't pass your policy checks, then you don't know whether another peer announcing the same txid has the same transactionr or not
<sdaftuar> there's also a related CPU DoS issue with how we determine whether a transaction is witness-stripped, which will be alleviated in the future when we no longer need to worry about adding txid's for segwit transactions to our reject filter
<sdaftuar> maintaining a separate data structure just to shave a few bytes doesn't seem worth the effort to me. the mempool is probably already too big
<gribble>https://github.com/bitcoin/bitcoin/pull/15505 | p2p: Request NOTFOUND transactions immediately from other outbound peers, when possible by sdaftuar · Pull Request #15505 · bitcoin/bitcoin · GitHub
<sdaftuar> aj: i would definitely prefer to get rid of mapRelay, but i think the additional memory i propose using in 18044 is very minor, it's just an extra key