#16512 Shuffle inputs and outputs after joining psbts (rpc)

https://github.com/bitcoin/bitcoin/16512

Host: jnewbery

Today’s PR is very small and should be a quick review. We’ll spend half the time talking about the PR and half covering general questions about the Bitcoin Core review process.

Please come prepared with your own questions about review in Bitcoin Core!

Notes

  • Partially Signed Bitcoin Transactions (PSBTs) are a way for Bitcoin wallets and applications to pass unsigned or partially-signed transactions around.
  • This is useful needed when a transaction requires multiple signatures (eg for a multisig output or an off-chain contract).
  • The PSBT format is defined in BIP 174. The BIP also defines several responsibilities in the PSBT workflow: creator, updater, signer, combiner, input finalizer, and transaction extractor.
  • The joinpsbts RPC utility method takes multiple PSBTs and joins them into a single PSBT.
  • Currently, the inputs and outputs are joined in order. This PR changes the method to shuffle both inputs and outputs.

Questions

  • What is the justification for this PR? Why would users want inputs/outputs to be shuffled?

Meeting Log

13:00 < jnewbery> hi
13:00 < jonatack> hi!
13:00 < sosthene> hello there!
13:00 < lightlike> hi
13:00 < fjahr> hi
13:01 < ariard> hi
13:01 < michaelfolkson> Hey
13:01 < emzy> hi
13:01 < jnewbery> first of all, sorry about the late notice for this week's PR, and for the date typo on the website.
13:01 < michaelfolkson> I'll ask you afterwards how I/we can help?
13:01 < jonatack> sosthene: i don't remember, but the discussions have been excellent
13:02 < jnewbery> It was a very busy week at Chaincode last week, and unfortunately PR review club suffered
13:02 < jnewbery> should be back to normal service next week!
13:02 < nehan> hi
13:02 < jnewbery> It was quite a small, straightforward PR this week. It shouldn't take too long to discuss
13:03 < jnewbery> we can use any extra time to just talk about the review process and ask/answer more general questions
13:03 < jnewbery> I only had one question: What is the justification for this PR? Why would users want inputs/outputs to be shuffled?
13:04 < dergigi> My guess would be privacy
13:04 < fjahr> As achow mentions in the message: inputs and outputs could be correlated
13:04 < sosthene> If I got it right in the PR, it is because not shuffling the order of the inputs and outputs in a psbt transaction makes it trivial to map input and ouputs together
13:04 < sosthene> yeah privacy I mean :)
13:04 < jnewbery> yeah, that's it. This is fixing a privacy hole
13:05 < jonatack> https://github.com/bitcoin/bips/blob/master/bip-0069.mediawiki BIP 69 discusses it in more detail
13:05 < jnewbery> thanks jonatack. I was just looking for that link
13:06 < jnewbery> There's some discussion around whether BIP 69 (Lexicographical Indexing of Transaction Inputs and Outputs) is good for privacy. Any thoughts about that?
13:07 < sosthene> maybe it's not very efficient against chain analysis?
13:07 < jonatack> IIRC the deterministic aspect it proposes is debatable or undesirable for privacy
13:08 < jnewbery> jonatack: why?
13:08 < ariard> e.g in multi-party protocols, if a transaction is leaked on chain, some party outputs may not be distinguished until redeem as they are P2WSH following bip69
13:09 < jonatack> jnewbery: a deterministic algo would enable
13:09 < jonatack> distinguishing joinpsbts
13:09 < jnewbery> ariard: I'm not sure I understand
13:09 < nehan> achow101's comment: "Not at all. That gives a different privacy leak as it will indicate that joinpsbts has been used on that transaction."
13:10 < jonatack> thus a different privacy leak iirc
13:10 < nehan> I don't quite understand
13:10 < jnewbery> jonatack: how? How would they look different from other transactions?
13:10 < nehan> why not enable it by default.  then is it still a privacy leak?
13:10 < ariard> jnewbery: in Lightning, commitment transaction outputs are ordered following BIP69
13:11 < ariard> so you shouldn't be able to distinguish between incoming and outgoing HTLC output
13:11 < ariard> and know party total balance until redeem
13:11 < jnewbery> Why can BIP 69 be a privacy leak?
13:12 < ariard> well in this case, it's good for privacy leak
13:12 < jnewbery> The reason is that if _every_ wallet uses BIP 69, then it's good for privacy, but if only a subset of wallets use BIP 69, then they're identifiable
13:13 < jonatack> because very few txns follow bip69 IIUC
13:13 < jnewbery> jonatack: correct
13:13 < nehan> jnewbery: but the trend is (hopefully) more and more wallets adopting it, increasing the anonymity set for good behavior and decreasing it for bad (not adopting it)
13:13 < jnewbery> so if this PR used bip 69 instead of random shuffle, these transactions would be identifiable by chain analysis
13:13 < jonatack> from nopara here: https://github.com/bitcoin/bitcoin/issues/12457#issuecomment-383400504
13:13 < ariard> at least in Lightning it's mandatory bip69, and funding_output is already a fingreprint
13:14 < jnewbery> nehan: perhaps, if wallets do adopt it
13:14 < lightlike> but are there any major advantages for lexicographic ordering as opposed to random?
13:15 < nehan> lightlike: determinism, not having to shuffle a bunch
13:15 < jnewbery> There's some discussion around bip 69 here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-October/016457.html
13:16 < jnewbery> lightlike: another reason might be that two parties in a multi-party contract can agree what the order should be if it's deterministic (eg the Lightning case that ariard mentions)
13:17 < jnewbery> apart from the privacy aspect, did anyone have other questions or observations about the PR?
13:17 < michaelfolkson> Why are so many (up to 10 attempts) required to get a shuffle join? Just being extra safe?
13:18 < jonatack> michaelfolkson: shuffling doesn't ensure they aren't equal every time
13:18 < michaelfolkson> Not guaranteed, but highly likely to be?
13:18 < jnewbery> michaelfolkson: to minimize false positives I suppose
13:19 < jnewbery> (we're talking about the new test here: https://github.com/bitcoin/bitcoin/pull/16512/files#diff-9cd872c8f61017722f3accc95f7874d4R376)
13:19 < jonatack> the test cannot be flakey, thus instagibbs proposing 128 times
13:19 < michaelfolkson> Ok
13:20 < michaelfolkson> Have to do the math to work out probability of staying same after 10 shuffles
13:20 < jnewbery> if it's 2 inputs it's 2^-9
13:21 < jnewbery> any other observations on the PR?
13:21 < jnewbery> any thoughts about this: https://github.com/bitcoin/bitcoin/pull/16512/files#r318133389
13:21 < michaelfolkson> But makes sense you don't even want a tiny probability of a test failing for an invalid reason
13:21 < jonatack> Does everything think the proposed test coverage is enough?
13:22 < lightlike> could also fake the randomness by providing the seed to the RNG
13:22 < jnewbery> michaelfolkson: yeah, false positives are really bad because they reduce people's trust in the testing (and so they stop caring if tests fail)
13:23 < sosthene> I don't quite get this thing about having 2 vectors of inputs and outputs
13:23 < jnewbery> lightlike: I don't think we have the ability to do that in bitcoind. And to do that I think we'd need to do a unit test instead of a functional test
13:23 < jonatack> If this was spaceflight or aeronautical piloting software, for example, or security-critical I'd want to see more coverage behind changes, to be sure they remain valid over time.
13:24 < jnewbery> sosthene: achow is talking about the internal structure of the PSBT
13:24 < jnewbery> basically saying that we can't reach in and just change the ordering of the inputs or outputs because it'd leave the PSBT in an inconsistent state
13:25 < sosthene> jnewbery: yes, I think I got it now
13:25 < sosthene> so it means you have to make the same shuffle operation on both vectors and make sure that the state remains consistent?
13:26 < lightlike> jnewbery: i have encountered this in other places, I think we can do it for some RNG functions
13:26 < jnewbery> My concern about the code here is that it's manually copying the PSBT. If the PSBT object is updated in future to contain new fields, then this code would need to be updated too
13:26 < michaelfolkson> So you'd write a test for this?
13:26 < jonatack> jnewbery: bingo
13:26 < lightlike> there is a flag g_mock_deterministic_tests=true in the functional framework, which applies to GetRand() for example
13:27 < jnewbery> I can easily imagine PSBT being updated and someone forgetting to update the code here, so joinpsbt accidentally drops some of the PSBT details
13:28 < sosthene> how would you test for this?
13:28 < michaelfolkson> Just test that they are equivalent?
13:28 < jonatack> I would want to extract the copy and shuffle functions out and unit test them.
13:28 < jnewbery> jonatack: I agree
13:28 < jonatack> As standalone fns.
13:29 < jnewbery> Make shuffle a member function of PartiallySignedTransaction
13:29 < jonatack> Yes.
13:30 < jnewbery> Let's open up to more general questions about the review process (but feel free to continue discussing 16512)
13:30 < michaelfolkson> Interesting
13:31 < michaelfolkson> I have some basic ones that I'm happy to end with if someone else wants to go first
13:32 < jnewbery> just ask now
13:32 < jonatack> I find it a challenge to decide how much time to spend on a review.
13:32 < jonatack> Even trivial ones can lead down a deep rabbit hole (for me)
13:33 < jonatack> of reading BIPs, history, gathering context, writing tests, etc. It's never enough.
13:33 < michaelfolkson> What can one do to help the author of a PR? Is one able to rebase, address nits etc for the author?
13:33 < jnewbery> jonatack: I agree, it's easy for that to happen! That's the best way to learn about the project I think - when you're motivated to understand something because you need it
13:34 < jonatack> So how do you decide which ones to PR and how deep to go?
13:34 < jonatack> Also: Hugh-priority vs easy ones
13:34 < jonatack> Time allocation is a challenge!
13:34 < jnewbery> jonatack: I have a horrible guilt-driven-devlopment model, where I spend my time looking that the PR I feel most guilty about neglecting for too long
13:35 < jnewbery> Time allocation _is_ a challenge!
13:35 < jonatack> Could spend literally days on some of them.
13:35 < michaelfolkson> (Cont) Or is the help limited to review, testing, providing feedback? (by no means trivial)
13:35 < jonatack> Ha! I feel guilty about dozens of PRs... I empathize.
13:37 < lightlike> So far I just choose the ones that seem really interesting to me.
13:37 < jnewbery> I suspect there are diminishing returns on how much benefit you can provide in your review. Like your 4th hour of reviewing a PR is likely to lead to less interesting observations than your 3rd hour of reviewing it
13:37 < jnewbery> (perhaps)
13:38 < jnewbery> michaelfolkson: rebasing/addressing nits/adding tests can be very useful. I recommend you check with the author before you spend a bunch of time on it though
13:38 < jnewbery> either leave a comment on the PR or message them and say "I want to help by doing this. Would that be helpful for you?"
13:39 < jonatack> TBH there a few PRs I spend more than an hour on, before forcing myself to move on. But for the deeper ones that's not enough.
13:39 < michaelfolkson> Also signing commits. Is it important that a regular contributor signs the commits or is it fine for a new contributor to sign the commit? I don't know whether it is checking the contributor is who they say they are if it is checking someone experienced has looked at it
13:39 < jnewbery> The only commits that need to be signed are the merge commits when a maintainer merges a PR
13:40 < michaelfolkson> Ok so it isn't standard practice for a contributor to sign their commits?
13:40 < jnewbery> feel free to sign commits on your own PRs, but it's not necessary
13:40 < michaelfolkson> Got it
13:40 < jonatack> lightlike: how long do you typically spend reviewing them?
13:42 < jkczyz> Hello, joining late. If you could change something fundamental about Bitcoin Core to improve the PR review process, what would it be? :)
13:42 < michaelfolkson> Another question. I opened a PR to do something that you John had already opened a PR for. Bar going through all the PRs how would I have known that? Eventually DrahtBot told me. Is it fine to use DrahtBot for this purpose? I'm assuming you'd prefer someone checked all the open PRs first...
13:43 < lightlike> jonatack: varies a lot, but usually more than an hour. For example I used a long train ride to understand amiti's WIP PR.
13:43 < michaelfolkson> I'm monitoring a lot of GitHub activity via email notifications but it is a lot. It is swamping my inbox
13:44 < jnewbery> michaelfolkson: I think just search for function names and any key words in the open PRs first. I don't think there's any way to run drahtbot offline (but I do think that would be useful)
13:44 < jonatack> lightlike: yes. for the more involved PRs I would need more as well. I would like to dedicate more time to the deeper and high priority ones, but it takes waaay more time.
13:44 < jnewbery> jkczyz: I'd add more great reviewers :)
13:45 < jnewbery> michaelfolkson: I gave up using github email notifications long ago
13:45 < jkczyz> jnewbery: haha, Fair enough. But anything about the code itself?
13:46 < michaelfolkson> Carl said on an interview he used IFTTT for specific GitHub alerts. Might be worth playing around with
13:46 < jnewbery> jkczyz: better modularization so people aren't stepping on each other's toes as much. This has already improved a lot in the last few months/years
13:46 < jkczyz> michaelfolkson: Filter your mail by author@noreply.github.com may help a bit
13:47 < jnewbery> I think splitting wallet from node and having a well-determined interface makes it way easier to review and think about wallet changes
13:47 < michaelfolkson> <jkczyz>: That is filtering by the author of the PR/issue/comment?
13:48 < michaelfolkson> <jkczyz>: So get notifications on the activity of specific people?
13:49 < jonatack> michaelfolkson: for reviewing PRs at least collisions aren't a worry :)
13:50 < jkczyz> michaelfolkson: Actually, nevermind. My filter is actually anything to me with list:bitcoin.bitcoin.github.com
13:51 < michaelfolkson> <jonatack>: Haha yeah that's right. I'm finding doing some very basic PRs is useful too though. Can't really review well if you haven't done a couple smoothly yourself
13:51 < michaelfolkson> But yeah definitely agree once one is comfortable the 10:1 ratio or whatever it was needs to be upheld
13:52 < jnewbery> I still think that reviewing is the most effective way you can contribute as a new contributor (and also will teach you much more about the code than opening PRs)
13:53 < michaelfolkson> Agreed
13:53 < jonatack> Yes -- much more value-add
13:54 < jnewbery> last 5 minutes. Any final questions?
13:54 < jonatack> and reviewing is also a great way to see things that can be done if you are looking for that
13:54 < jnewbery> jonatack: definitely
13:54 < michaelfolkson> If no one has any other questions shall I ask if you need any help with this club <jnewbery>?
13:54 < jonatack> e.g. in rpc_psbt.py... there is no logging, nada
13:54 < michaelfolkson> How should the PRs be chosen each week? Continue with John choosing them?
13:55 < jnewbery> Thanks Michael. I'm always looking for people to suggest PRs or host meetings
13:55 < jnewbery> You can suggest PRs here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or by messaging me
13:57 < jnewbery> Best PRs are small(ish) and don't require too much contextual knowledge
13:57 < jnewbery> (so it doesn't take you hours and hours to review)
13:58 < michaelfolkson> The larger complicated ones are great for learning but for me at least won't get me in a position to give adequate review
13:58 < jnewbery> ok, if there are no more questions, let's wrap it up there.
13:58 < jnewbery> I'll post next week's PR later today
13:58 < jnewbery> Thanks everyone!
13:59 < jonatack> Thanks, John and all!
13:59 < michaelfolkson> Cool, thanks
13:59 < lightlike> thanks!
14:00 < jonatack> nehan: p.s. if you are still here, great job with the a16z presentation https://www.youtube.com/watch?v=5oQ0djZYXNI
14:03 < achow101> re bip69: according to p2sh.info, somewhere between 40% and 60% of txs use bip69. also core doesn't use bip69 at all
14:08 < jonatack> achow101: thanks
14:09 < milochen0418> thanks