Use wtxid for transaction relay (mempool, p2p)

https://github.com/bitcoin/bitcoin/18044

Host: jonatack

The PR branch HEAD was c4a23a1 at the time of this review club meeting.

Notes

Context

  • The author of this PR, Suhas Daftuar, has been working to improve the privacy and resilience of Bitcoin’s peer-to-peer network. See his PR history.

  • This PR builds on PR #17951 “Use rolling bloom filter of recent block txs for AlreadyHave() check” by the same author, which was just merged.

  • Suhas wrote a BIP draft that this PR implements: WTXID-based transaction relay.

  • Postscript: see also the #bitcoin-core-dev IRC discussions of 05 Feb 2020 and 06 Feb 2020 below.

What is a wtxid?

  • BIP 141 (Segregated Witness) introduced the definition of wtxid:
    • “A new wtxid is defined: the double SHA256 of the new serialization with witness data.”
    • “If all txins (transaction inputs) are not witness program, a transaction’s wtxid is equal to its txid.”
  • PR #11203 added wtxid to the mempool entry output of entryToJSON() in src/rpc/blockchain.cpp, thereby exposing wtxids to callers in the output of RPCs getmempoolentry, getmempooldescendants, getmempoolancestors, and getrawmempool.

WTXID-based transaction relay

  • Using the txid (which does not include the witness) is problematic because the witness can be malleated without changing the txid. See #8279 for a full discussion of the issue.

  • The PR description contains a very full and clear description of the motivation and changes.

Questions

  • Did you review the PR? Concept ACK, approach ACK, ACK <commit>, or NACK?  Don’t forget to put your PR review on GitHub or ask questions.

  • Describe recentRejects: what type of data structure is it, what data does it contain, and what is it used for? (Hint: git grep -ni rejects).

  • In your opinion, does this PR save bandwidth for older nodes talking to newer nodes? What about downloading from old and new peers alike?

  • According to commit 61e2e97, using both txid and wtxid-based relay with peers means that we could sometimes download the same transaction twice, if announced via 2 different hashes from different peers. What do you think of the heuristic of delaying txid-peer-GETDATA requests by 2 seconds, if we have at least one wtxid-based peer?

  • In this comment, Suhas mentions a possible race condition where a peer could send a txid-based INV to us before it receives the WTXIDRELAY message (changing the relay to be wtxid-based), which would cause relay of that transaction to fail. What do you think, and how best should this case be handled?

  • Do you see any other potential race conditions, DoS vectors, or backward incompatibilities of this change?

  • Overall, do you think the potential benefits of these changes merit the additional complexity and (map relay index) data size, if any?

Meeting Log

18:03 <jonatack> We'll get started in an hour -- hopefully everyone isn't at Advancing Bitcoin today :)
18:03 <jonatack> Some thoughts I wanted to share before the meeting:
18:03 <jonatack> A few people have been contacting me privately with questions about the PR or how to review.
18:04 <jonatack> That's great, but please ask any review questions here on this irc channel, which allows more people to help and also to benefit from the conversation.
19:00 <jonatack> #startmeeting
19:00 ⚡ scone waves
19:00 <jonatack> Hi all! Welcome to this week's episode of the Bitcoin Core PR Review club!
19:00 <jonatack> #topic This week, we are looking at PR 18044 - "Use wtxid for transaction relay"(mempool, p2p) by sdaftuar
19:00 <jonatack> We usually start Bitcoin Core IRC meetings with a 'hi' so it's clear who's at keyboard. Feel free to say hi, even if you arrive in the middle of the meeting!
19:00 <SirRichard> hi
19:00 <jonatack> If you didn't have have a chance to read the session notes, the PR description, and the BIP draft, I encourage you to do so.
19:01 <scone> hi
19:01 <amiti> hi
19:01 <lightlike> hi
19:01 <fodediop> Hello, underworld!
19:01 <jonatack> They provide a good background for this discussion and for reviewing the PR.
19:01 <emzy> hi
19:01 <raj_> hi..
19:01 <jonatack> (It's a really interesting one.)
19:01 <jonatack> Thanks to raj_ who I believe put up a review!
19:01 <jonatack> Remember, the goal is to actually review the PR *and* publish it on GitHub, e.g. put the study work into practice and help Bitcoin Core!
19:02 <jonatack> Review is very important in Bitcoin Core. Consistent review is arguably the scarcest resource and the main bottleneck to progress.
19:02 <jonatack> There are currently 758 open issues and 341 open PRs, and the stack keeps growing.
19:02 <jonatack> Please jump in at any point with thoughts and questions. Don't be shy! This discussion is about your thoughts and input.
19:02 <kanzure> hi
19:02 <jonatack> Question 1: Did you review the PR?
19:02 <raj_> yes.
19:02 <emzy> no
19:03 <fodediop> no
19:03 <jonatack> review in progress for me
19:03 <lightlike> a bit
19:03 <jonatack> Question 2: Concept ACK, approach ACK, ACK, or NACK?
19:04 <jonatack> moving on... with this PR a Concept ACK atm is about the best that can be done
19:04 <raj_> Concept ACK.
19:04 <jonatack> Who read the BIP draft? It's short.
19:04 <emzy> yes
19:05 <amiti> yes
19:05 <fodediop> no
19:05 <raj_> read it. Its just on the communication protocol to activate wtxid based transaction exchange.
19:05 ℹ  hardforkthis7 is now known as tylerlevine
19:05 <tylerlevine> hi
19:05 <jonatack> Note that there was a discussion in the last hour on #bitcoin-core-dev that concerns part of the draft
19:05 <scone> BIP https://github.com/sdaftuar/bips/blob/2020-02-wtxid-relay/bip-wtxid-relay.mediawiki
19:06 <jonatack> "2. The wtxidrelay message must be sent in response to a VERSION message from a peer whose protocol version is >= 70016, and prior to sending a VERACK."
19:07 <jonatack> which is contradicted by parts of the current codebase
19:07 <raj_> i am a little dizzy on the rationale behind this. Can you elaborate the issue a little?
19:07 <jonatack> Question: Describe `recentRejects`: what type of data structure is it, what data does it contain, and what is it used for? (Hint: `git grep -ni rejects`).
19:08 <jonatack> raj_: I think the PR, the BIP draft, and the notes provide the background?
19:09 <raj_> regarding this, there is only Suhas' comment i found in the PR saying this might create a race condition.
19:10 <scone> I'll take my chance at summarizing (warning may be inaccurate): In order to improve p2p transaction relay overhead, let's develop a more efficient way of keeping track of transactions.  But because of some issues with how tx's are kept track of right now, if we switched to tracking with the Witness txid - better methods would be enabled
19:10 <jonatack> raj_: are you talking about the pre/post VERACK question?
19:10 <raj_> yes
19:11 <jonatack> raj_: I'd suggest git grepping VERACK
19:11 <raj_> noted.
19:11 <lightlike> recentRecjects: a bloom filter for hashes of txes that we have rejected (and won't send GETDATA for if other peers INV them to us)
19:11 <jonatack> scone: good start! of course, the devil is in the details
19:12 <scone> yep everything is simple at the 1000 m view :]
19:13 <scone> ok so what are we currently focused on in group discussion? ( i got lost)
19:14 <ajonas> So to build a little on what scone said - an attacker can send malleated witness data to the mempool which then corrupts AcceptedtoMempool()
19:14 <raj_> recentRejects is a bloom filter structure that keep tracks of rejected txs. Previously it only added non segwit txs which are not maleated. Now it adds either invalid segwit txs, or non segwit non maliated txs.
19:14 <jonatack> lightlike: yes, but careful -- there is bloomfilter and rollingbloomfilter
19:15 <ajonas> JamesOB left a great comment about recentrejects -> https://github.com/bitcoin/bitcoin/commit/b191c7dfb7ede3f74edb3a32b8ac6fa2f4d6b78a
19:15 <jonatack> some people who reviewed the previous PR about AlreadyHave() mixed the two
19:15 <jonatack> ajonas: agreed!
19:17 <jonatack> Also see net_processing.cpp:L2601-2612
19:17 <ajonas> The problem as it currently stands is that we don't punish peers who relay us invalid or DoSy transactions because we don't know for certain whether it's a misbehaving peer or a false positive on the bloom filter
19:18 <jonatack> ajonas: right! if the witness is malleated, the txid remains the same
19:19 <jonatack> (responding to your prev comment -- i'm slow!)
19:19 <jonatack> This PR covers a lot of domain areas in the codebase, which makes it compelling to study
19:20 <raj_> ajonas: This problem still stands even with wtxids right?
19:20 <jonatack> mempool, txn relay, recent rejects, the orphan map, the relay map, confirmed txns, network protocol
19:21 <ajonas> raj_: not for nodes passing wxtids between them. For the older nodes yes,
19:22 <ajonas> (meant . not ,)
19:23 <jonatack> Recent rejects are defined in the codebase in net_processing.cpp:L148. Let's move forward.
19:23 <jonatack> Question: In your opinion, does this PR save bandwidth for older nodes talking to newer nodes? What about downloading from old and new peers alike?
19:26 <ajonas> So Suhas mentions this is a concern of his which is why he is experimenting with the delaying download by 2 seconds of txs by txid if we have a wtxid peer
19:26 <raj_> It seems to me that old nodes will waste some bandwidth by asking the same tx both by txid and wtxid. New nodes will also face the same issue if they have any old nodes connected.  This might be incomplete understanding.
19:26 <jonatack> hint: see the PR description
19:27 <jonatack> ajonas: yes. interestingly, this isn't the only 2-second delay used to favor actions
19:27 <jonatack> e.g. git grep std::chrono::seconds{2}
19:27 <lightlike> if one of the two nodes is old, nothing should change wrt status quo.
19:27 <raj_> sorry, i will go with no bandwidth waste for old nodes. Old nodes cant see wtxids. Only new nodes are in trouble in the transition period.
19:28 <jonatack> lightlike: right! more an issue WRT to future policy, IIUC
19:29 <jonatack> raj_: i think so, the intent is to remain backward compatible while adding support for newer peers
19:29 <jonatack> Question: Is it important that feature negotiation of wtxidrelay happen between VERSION and VERACK, to avoid relay problems from switching after a connection is up?
19:29 <jonatack> (see latest commit https://github.com/bitcoin/bitcoin/pull/18044/commits/c4a23a1ffc588064f2fbffa9259335322a296a1b)
19:30 <jonatack> or would it be better to do it post-VERACK?
19:31 <jonatack> (this is being discussed RN on bitcoin-core-dev)
19:31 <ajonas> yeah, this is the discussion happening in real time in #bitcoin-core-dev
19:31 <scone> Time Notice: half way through meeting :]
19:32 <jonatack> Question: According to commit 61e2e97 (https://github.com/bitcoin/bitcoin/pull/18044/commits/61e2e97), using both txid and wtxid-based relay with peers means that we could sometimes
19:32 <jonatack> download the same transaction twice, if announced via 2 different hashes from different peers.
19:32 <jonatack> What do you think of the heuristic of delaying txid-peer-GETDATA requests by 2 seconds, if we have at least one wtxid-based peer?
19:33 <jonatack> How would you test this?
19:33 <raj_> I would like to see some result and also variation of perfomance with variation of huristic parameters.
19:33 <jonatack> One of the issues with p2p changes is that things can be broken without unit or functional tests necessarily failing.
19:33 <jonatack> This requires spending time thinking hard about the changes. Or setting up new testing or logging.
19:33 <jonatack> Or benchmarking.
19:34 <raj_> we can create a test network of nodes in regtest. Program them to send around some predefined transactions. and see the heuristic at play.
19:35 <lightlike> could the delay slow down the effectiveness of message propagation in the initial phase where just a low percentage of nodes are new?
19:35 <jonatack> We can also try debug logging. See: bitcoin-cli help logging
19:35 <ajonas> jonatack: Ah yes -- issue #14210 would be great for someone to make progress on
19:35 <jonatack> e.g. bitcoin-cli logging '["net"]'
19:36 <jonatack> ajonas: agree! that's a compelling issue
19:37 <jonatack> ajonas: i didn't realise you were working on it :+1:
19:37 <ajonas> I am?
19:38 <jonatack> lightlike: you worked on that as well?
19:38 <jonatack> ajonas: oops. misread the last comment.
19:39 <jonatack> Question: In https://github.com/bitcoin/bitcoin/pull/18044#discussion_r373775416 sdaftuar mentions a possible race condition where a peer could send a txid-based INV to us before it gets this message, which would cause relay of that transaction to fail. Do you agree?
19:39 <ajonas> I've got to bow out for the rest of the session. Thanks jonatack.
19:40 <jonatack> cheers ajonas
19:40 <lightlike> jonatack: a little, but not recently
19:40 <raj_> jonatack: i am not seeing how this would fail relay of those txs.
19:42 <jonatack> lightlike: at any rate it seems like a potentially valuable contribution to make
19:43 <jonatack> raj_: I would need to dive into the code more on this
19:44 <jonatack> Question: Does anyone see any other potential race conditions (or DoS vectors)?
19:45 <jonatack> These are the kind of questions that the complex state of the p2p network require thinking about.
19:45 <jonatack> Don't be shy :)
19:45 <raj_> Not that any i can see. Need to dive deeper.
19:45 <jonatack> Right.
19:45 <jonatack> Question: Overall, do you think the potential benefits of these changes merit the additional complexity and data storage (if any)?
19:46 <lightlike> i'd guess it is important to check that the behavior is correct if peers break protocol and send us old txids when we agreed on new mode, and vice versa
19:47 <jonatack> lightlike: yes. My initial impression is that tests like p2p_leak and others go in the right direction but can be more exhaustive... I would need to try adding tests to confirm that.
19:47 <jonatack> The p2p_* suite in general.
19:47 <raj_> yes aggreed with the concept overall. It really doesnt make sense to only see txids while a major chunk of the txs remains un commited to. Changing to wtxids makes a lot of sense. The only issue is correct transition between two phases.
19:48 <jonatack> Overall, I'm a concept ack but the details need thinking, verifying, and testing
19:48 <jonatack> Does anyone have any comments? Questions?
19:49 <jonatack> Last ten minutes.
19:49 <fodediop> Thank you jonatack. It's my first time here, so I'm just soaking in the process.
19:50 <jonatack> fodediop: Good to see you here! This PR was a wide one.
19:50 <fodediop> Thank you!
19:51 <jonatack> One thing that makes me hesitate here is the larger data storage and increase in code complexity.
19:51 <lightlike> why is there a larger data storage?
19:51 <scone> any simulations of the p2p communication overhead reductions? could one argue that the domains of efficiency are critical for nodes?
19:52 <raj_> I would like to ask the same question here as i asked you personally jonatack. Its obvious that to make better contribution towards code review its essential to develop complete understanding of the different interactions between different structures that are happening, even sometimes asynchrobnnously. Which seems like a daunting hill to climb. So i would like to know how everyone else is going about it? Do
19:52 <raj_>  you practise any methodical path to gain better understanding of the codebase?
19:52 <jonatack> lightlike: the map relay index iiuc
19:54 <jonatack> lightlike: WDYT?
19:54 <jonatack> scone: yes! better simulations of the p2p network gets back to the issue ajonas linked to above.
19:54 <lightlike> jonatack: oh, ok. I think you are right.
19:55 <jonatack> raj_: it's just time, really, and poking around
19:56 <jonatack> basically as described here: https://jonatack.github.io/articles/how-to-review-pull-requests-in-bitcoin-core#if-youre-not-sure-where-to-start
19:56 <jonatack> that's my understanding of it, at least
19:57 <jonatack> 3 minutes! any last remarks?
19:57 <jonatack> #action - Finish reviewing 18044 and put your review on GitHub!
19:58 <lightlike> thanks for hosting! will follow the pr (it will probably take some months until merged)
19:59 <fodediop> Thank you for hosting jonatack!
19:59 <jonatack> If anyone would like to host a review club session, or propose a PR, don't hesitate!
19:59 <emzy> thanks jonatack
19:59 <raj_> thanks, this was a nice pr to discuss.  willing to work on it next to provide any further review if i can. I am also willing to work on simulating p2p testing for these type of changes. If someone is already working on such please let me know, would love to extend some helping hand.
19:59 <raj_> thanks jonatack
19:59 <jonatack> lightlike: agree, for now the PR and BIP need concept acks and critical thinking
19:59 <scone> ty jonatack
19:59 <jonatack> Thanks everyone!
19:59 <jonatack> #endmeeting
20:00 <jonatack> Will post the meeting log shortly. Don't forget to volunteer to host or to propose PRs you'd like to see.
20:02 <jonatack> raj_: that's great if you can work on/make progress on simulating p2p testing.
20:03 <scone> +1

05 Feb 2020 #bitcoin-core-dev IRC discussion

17:02 <sdaftuar> Is not sending any p2p message between VERSION and VERACK an important thing to do? i was going to add this new wtxidrelay feature negotiation there, but just realized I also would need to change some code that ignores messages received before VERACK, which gave me pause
17:43 <sipa> sdaftuar: how is this different than sendheaders etc?
17:44 <sdaftuar> sendheaders, feefilter, etc all are sent in response to VERACK
17:45 <sdaftuar> but the issue i wanted to avoid is there being a relay failure in between processing a peer's VERACK (allowing announcement of transactions to that peer, generally) and processing that peer's WTXIDRELAY message (changing the relay to be via wtxid)
17:45 <sdaftuar> (of course, with extra bookkeeping there would not necessarily be a failure, but it would be tedious to do that)
17:46 <sdaftuar> it's a pretty minor issue IMO, as it is a very short window that this could happen -- but i did notice a test failure due to this, so i thought it better to fix somehow
17:57 <jonatack> sdaftuar: right, looking at the latest commit at https://github.com/bitcoin/bitcoin/pull/18044/commits/c4a23a1. Was the test failure in p2p_tx_download.py?
18:04 <sipa> sdaftuar: but receiving a message before verack may be a problem for other applications, maybe?
18:09 <sdaftuar> sipa: yeah, i wasn't sure about that. we assign a misbheavior point to a node that gives us an unexpected message before VERACK
18:10 <sdaftuar> jonatack: the test failure i saw when doing this after VERACK was in p2p_permissions i think
18:10 <sdaftuar> (the intermediate commit, where i tried to move this to before VERACK, was totally busted by the way; i pushed a better version a little while ago)
18:14 <jonatack> thanks -- seems safer (in principle) to handle a post-VERACK time gap before WTXIDRELAY, if reasonably feasible
18:27 <jonatack> sdaftuar: in p2p_leak.py it actually states "A node should never send anything other than VERSION/VERACK/REJECT until it's received a VERACK" and test for that
18:27 <sipa> sdaftuar: in the past people have used appending data to the version message for this purpose
18:27 <jonatack> (non-exhaustively apparently)
18:28 <sipa> heh, REJECT was permitted?
18:32 <jonatack> seems so in the test
18:32 <jonatack> (if i'm reading it correctly and it's functioning properly)
19:01 <sdaftuar> cfields: I think you wrote that comment jonatack is referencing, any thoughts?
19:08 <sdaftuar> sipa: extending the version message would be nice and simple, but i assume no one likes these variable length messages that ensue from that approach?
19:09 <sipa> sdaftuar: exactly
19:10 <sipa> some libbitcoin people complained about BIP37 adding an optional field to version
19:13 <sipa> that was also 8 years ago
19:14 <sdaftuar> i guess if there's nothing intrinsically wrong with a design where we throw message in between VERSION and VERACK, that seems most extensible to me
19:15 <sdaftuar> but if software complains, then i am not sure what to do
19:16 <sdaftuar> one option could be to do a bunch more work to support txid or wtxid-based announcements with a peer, so that turning on wtxidrelay on a link is seamless, but that is not clearly worth the effort t ome
19:17 <sipa> sdaftuar: perhaps just do it with a message between version and verack (and gated by version number, i guess?), and when things are more ready, discuss on the ML whether that could cause problems
19:17 <sdaftuar> that is where it's at right now (including gating it on version number, which i bumped)
19:17 <sdaftuar> seems reasonable
19:18 <sipa> i agree that's actually cleanest
19:18 <sdaftuar> possibly i could also just move it to being after VERACK -- in some ways, transaction relay failing between VERACK and negotiation of wtxid relay is no different than transaction relay failing because the connection wasn't set up yet at the time the transaction was announced
19:18 <sdaftuar> eg the test failure i observed could have happened for other reasons
19:19 <sdaftuar> it just happened to have happened because of the thing i could sort of control
19:20 <sdaftuar> anyway i'll leave it for now and revisit, thanks for thinking about this
20:40 <luke-jr> sdaftuar: sipa: for reference https://github.com/bitcoin/bips/blob/master/bip-0060.mediawiki
20:40 <sipa> luke-jr: ah yes, exactly
20:42 <luke-jr> it's not entirely clear to me if this BIP got implemented or not
20:42 <luke-jr> if not, then the ship has sailed, and IMO we should be free to extend this way again :P
20:43 <sipa> it's not because the BIP didn't get adopted that the problem it was trying to address is solved
20:44 <luke-jr> ?
20:47 <sipa> BIP60 argued that adding optional fields to the version message was problematic, and suggested a solution; even if that solutions wasn't adopted, that does not imply that its concern (optional fields at the end of version) does not matter
21:04 <luke-jr> sipa: my point is that if the version message is already variable-length, adding another field doesn't chnage that
21:04 <sipa> it still has problems with serialization of deployments (what if two P2P extensions both want to add extra data?)... historically speaking that hasn't been a problem though :p

06 Feb 2020 #bitcoin-core-dev IRC discussion

20:54 <sdaftuar> hi - i'm back, if anyone has questions about wtxid-relay i can discus
20:55 <jeremyrubin> yay! please do
20:55 <MarcoFalke> sdaftuar: There was a question whether it was needed for package relay
20:55 <sdaftuar> i think it's a nice-to-have, but non-essential
20:56 <sdaftuar> nice-to-have only because any tx-relay protocol change we make in the future (like erlay, or dandelion, etc) should be done on wtxid-based relay
20:56 <jeremyrubin> I guess more concretely where it fits into the https://github.com/bitcoin/bitcoin/projects/14 workflow and where you think it belongs timeline wise
20:56 <sdaftuar> well, i'm probably personally gated on it, as i don't want to work on more p2p relay things based on txid-relay at this point
20:57 <jeremyrubin> Like if new rebroadcasting stuff like what amiti is working on should be done on wtxids then do we try to slot this before it
20:57 <sdaftuar> barring some reason that wtxid-relay is a problem
20:57 <jeremyrubin> ah ok; so it slots before further package relay work for you
20:57 <luke-jr> I never really understood why we didn't do wtxid-relay from the start
20:57 <sdaftuar> luke-jr: we shoudl have! the second best time is now
20:57 <jeremyrubin> we didn't have wtxids before segwit
20:57 <luke-jr> (or if I did, I forgot)
20:58 <sdaftuar> it was just more work, and we were busy
20:58 <sdaftuar> but i think it's pretty straightforward, and we should do it, ideally before we make a standardness change to segwit transactions
20:58 <jonatack> ack
20:58 <sipa> i think initially it wasn't that clear that it was needed in the first place
20:58 <sipa> and when segwit was further along, it got pushed back to "later"
20:58 <sipa> seems later is now
20:59 <sdaftuar> yeah i'm not sure how much anyone thought about it until petertodd pointed out the issues in #8279
20:59 <gribble> https://github.com/bitcoin/bitcoin/issues/8279 | Mempool DoS risk in segwit due to malleated transactions · Issue #8279 · bitcoin/bitcoin · GitHub
20:59 <jeremyrubin> My only concern looking at the code is that a new index in maptx kinda sucks
20:59 <sdaftuar> a bit more memory, but i don't see a way around it, and i think the tradeoff is well worth the benefit
21:00 <luke-jr> could we add both entries to the same index?
21:00 <sdaftuar> luke-jr: we need to look up by both txid and wtxid
21:00 <sdaftuar> so two keys
21:00 <sdaftuar> we use the boost multiindex already, which i think is pretty efficient?
21:01 <jeremyrubin> sdaftuar: you can actually reuse the saltedtxid hasher across both indexes I think
21:01 <sipa> sdaftuar: i need to revive my use-allocator-to-count-multiindex-memory-use stuff... i'm not sure how accurate we currently are
21:02 <sdaftuar> sipa: ah yes i have no idea how to do that, if you can advise on how to update the memory calculation better than i did in the PR, please let me know
21:02 <jonatack> ^ +1
21:02 <sdaftuar> jeremyrubin: i don't think i follow
21:02 <sipa> sdaftuar: i have some WIP code that i can probably use to verify whether or current heuristic is accurate... actually replacing it is probably harder
21:03 <aj> is #14895 really still chasing concept ack?
21:03 <gribble> https://github.com/bitcoin/bitcoin/issues/14895 | Package relay design questions · Issue #14895 · bitcoin/bitcoin · GitHub
21:03 <jeremyrubin> sdaftuar: I need to think about it a little bit. Fundamentally you want to be able to index by either TXID or WTXID
21:04 <jeremyrubin> But because it's a hash table there's a lot of extra overhead (idk what load the boost table works well till)
21:05 <jeremyrubin> Just trying to think if there's a way to be able to index by either
21:06 <jeremyrubin> Do you envision that we ever remove the txid index?
21:07 <sdaftuar> we can't do that very easily
21:07 <jeremyrubin> Or you think it's there forever for compat
21:07 <sdaftuar> because transactions reference inputs by txid
21:07 <jeremyrubin> right
21:07 <sipa> jeremyrubin: UTXOs are indexed by txid
21:07 <luke-jr> hmm
21:07 <sdaftuar> so unless someone gave you a hint for what wtxid to look for, you're screwed
21:07 <luke-jr> sdaftuar: well, only for mempool-spending txs?
21:08 <jeremyrubin> OK I'm OK with it
21:08 <jeremyrubin> BUT
21:08 <sdaftuar> i think in the case of package relay though, i might imagine that we'll get those hints, but not in a generic enough way that we could ever git rid of the index
21:08 <sdaftuar> luke-jr: right
21:08 <jeremyrubin> You have to review my next two PRs first
21:08 <jeremyrubin> Because I get rid of mapTxLinks
21:08 <luke-jr> could hypothetically use wtxids in the tx structure there, and continue to use just the txids for signatures?
21:08 <jeremyrubin> which can pay for this new index ;)
21:08 <sdaftuar> luke-jr: that would be a big relay change though
21:08 <luke-jr> maybe worth it long-term
21:08 <sdaftuar> and it just seems like a lot of edge cases would break
21:08 <sdaftuar> yeah, i can't rule it out
21:09 <sipa> i don't understand what the point is
21:09 <sipa> not having wtxids in transactions is exactly what segwit made possible
21:09 <sdaftuar> yes :)
21:09 <sdaftuar> i think saving a little memory is not worth the effort here
21:10 <kanzure> i think the point was something about rebroadcast logic or first-seen issues?
21:10 <kanzure> right, bad witnesses or something?
21:10 <jeremyrubin> Yeah, I think given that we're going to kill mapTxLinks it's going to be fine (I just don't want people to have a reason not to upgrade to wtxid index)
21:10 <jeremyrubin> kanzure: anyone can malleate witnesses
21:10 <sdaftuar> jeremyrubin: i don't think this has anything to do with mapTxLinks though?  i didn't need to touch it for the wtxid-relay PR
21:11 <jeremyrubin> sdaftuar: I'm saying that one of the PRs I pinged you on kills mapTxLinks
21:11 <sdaftuar> (unless i am missing something!)
21:11 <sdaftuar> right, i imagine that should be fine
21:11 <jeremyrubin> so the memory/hashing overhead going away there is probably the same as a new index
21:12 <sdaftuar> one way to think about this is that only the net_processing layer needs to be able to look things up in the mempool by wtxid
21:12 <sdaftuar> as that's the only place in our code where we need to deteremine whether we already have a wtxid someone is offering
21:12 <jeremyrubin> So I'm OK with not introducing a regression
21:12 <sdaftuar> anything internal to the mempool is unaffected by this change
21:12 <jeremyrubin> Because we have a way to pay for it
21:12 <sdaftuar> oh, you're saying that the extra memory is a wash with those other changes?  even better :)
21:12 <jeremyrubin> Yes
21:13 <jeremyrubin> https://github.com/JeremyRubin/bitcoin/pull/7
21:14 <sipa> jeremyrubin: seems completely unrelated; we need wtxid based relay i think, and even if the only way to do it is by adding memory, we should
21:14 <jeremyrubin> sdaftuar previously said it was nice ot have but not required
21:15 <sipa> and if we can get rid of mapTxLinks, we should, unrelated of wtxid based relay
21:15 <sdaftuar> jeremyrubin: that was for package relay
21:15 <jeremyrubin> ah
21:15 <sdaftuar> i think package relay could be done with or without wtxid relay
21:15 <sdaftuar> but wtxid relay is required to solve some bandwidth-waste issues
21:15 <jeremyrubin> I thought you meant in general
21:15 <jeremyrubin> sipa: I just don't want there to be a reason for economic nodes to not upgrade that's all
21:16 <jeremyrubin> the changes are obviously independent
21:16 <sdaftuar> the issue we have with txid relay is that when a peer announces a segwit transaction that doesn't pass your policy checks, then you don't know whether another peer announcing the same txid has the same transactionr or not
21:16 <sdaftuar> because maybe just the witness was malleated
21:16 <jeremyrubin> But that we're not increasing overheads overall means we can not worry at all
21:16 <sdaftuar> so you have to download it (otherwise, an attacker could malleate transactions to interfere with relay)
21:17 <sdaftuar> and this is wasteful, particularly after a policy change to segwit-transaction-acceptance (eg taproot, or any other policy change)
21:17 <sdaftuar> when you would expect old nodes to reject a certain category of new transaction
21:18 <sdaftuar> there's also a related CPU DoS issue with how we determine whether a transaction is witness-stripped, which will be alleviated in the future when we no longer need to worry about adding txid's for segwit transactions to our reject filter
21:18 <sdaftuar> so i think we definitely need wtxid relay, even if we support txid-based-relay indefinitely to support old software
21:21 <jeremyrubin> I guess it's not clear to me why this has to be a new mempool index rather than a fixed size separate cache only in net_proc
21:21 <jeremyrubin> will look more closely
21:21 <sdaftuar> jeremyrubin: i think code simplicity?
21:21 <sipa> jeremyrubin: for my current mempool, the extra index would be a 0.55% memory usage increase
21:22 <sdaftuar> maintaining a separate data structure just to shave a few bytes doesn't seem worth the effort to me. the mempool is probably already too big
21:22 <jeremyrubin> sipa: I'm concerned with hashing too, we're slowing down all inserts
21:22 <aj> yeah, extra indexes on multi_index are pretty cheap
21:22 <sdaftuar> inserts aren't in the critical path of block acceptance, i think they're small compared to transaction validation speeds
21:22 <sipa> yeah
21:23 <jeremyrubin> Cool -- these are all good things to document & measure in advocating this change
21:23 <sdaftuar> if you want to worry about CPU usage in transaction acceptance, we should reopen by parallel-script-check-thread PR for mempool acceptance
21:23 <sdaftuar> (i do worry about it, but i think the mempool data structures are far from our biggest concern)
21:35 <aj> sdaftuar: did you see https://github.com/bitcoin/bitcoin/pull/17303#issuecomment-581363980 ? there's a patch there that's a different approach for #15505 ; worth trying? (seems silly to add wtxid for mapRelay if we could just get rid of mapRelay first instead)
21:35 <gribble> https://github.com/bitcoin/bitcoin/issues/15505 | p2p: Request NOTFOUND transactions immediately from other outbound peers, when possible by sdaftuar · Pull Request #15505 · bitcoin/bitcoin · GitHub
21:37 <sdaftuar> aj: i would definitely prefer to get rid of mapRelay, but i think the additional memory i propose using in 18044 is very minor, it's just an extra key
21:37 <sdaftuar> and it should be easy to remove either before or after the wtxid-relay PR
21:38 <sdaftuar> (if we can get rid of mapRelay before, i can easily pull that commit out of my branch)
21:39 <aj> sdaftuar: oh, yeah, not a meaningful criticism