Shuffle inputs and outputs after joining psbts (rpc/rest/zmq)

https://github.com/bitcoin/bitcoin/pull/16512

Host: jnewbery  -  PR author: achow101

Today’s PR is very small and should be a quick review. We’ll spend half the time talking about the PR and half covering general questions about the Bitcoin Core review process.

Please come prepared with your own questions about review in Bitcoin Core!

Notes

  • Partially Signed Bitcoin Transactions (PSBTs) are a way for Bitcoin wallets and applications to pass unsigned or partially-signed transactions around.
  • This is useful needed when a transaction requires multiple signatures (eg for a multisig output or an off-chain contract).
  • The PSBT format is defined in BIP 174. The BIP also defines several responsibilities in the PSBT workflow: creator, updater, signer, combiner, input finalizer, and transaction extractor.
  • The joinpsbts RPC utility method takes multiple PSBTs and joins them into a single PSBT.
  • Currently, the inputs and outputs are joined in order. This PR changes the method to shuffle both inputs and outputs.

Questions

  • What is the justification for this PR? Why would users want inputs/outputs to be shuffled?

Meeting Log

  113:00 <jnewbery> hi
  213:00 <jonatack> hi!
  313:00 <sosthene> hello there!
  413:00 <lightlike> hi
  513:00 <fjahr> hi
  613:01 <ariard> hi
  713:01 <michaelfolkson> Hey
  813:01 <emzy> hi
  913:01 <jnewbery> first of all, sorry about the late notice for this week's PR, and for the date typo on the website.
 1013:01 <michaelfolkson> I'll ask you afterwards how I/we can help?
 1113:01 <jonatack> sosthene: i don't remember, but the discussions have been excellent
 1213:02 <jnewbery> It was a very busy week at Chaincode last week, and unfortunately PR review club suffered
 1313:02 <jnewbery> should be back to normal service next week!
 1413:02 <nehan> hi
 1513:02 <jnewbery> It was quite a small, straightforward PR this week. It shouldn't take too long to discuss
 1613:03 <jnewbery> we can use any extra time to just talk about the review process and ask/answer more general questions
 1713:03 <jnewbery> I only had one question: What is the justification for this PR? Why would users want inputs/outputs to be shuffled?
 1813:04 <dergigi> My guess would be privacy
 1913:04 <fjahr> As achow mentions in the message: inputs and outputs could be correlated
 2013:04 <sosthene> If I got it right in the PR, it is because not shuffling the order of the inputs and outputs in a psbt transaction makes it trivial to map input and ouputs together
 2113:04 <sosthene> yeah privacy I mean :)
 2213:04 <jnewbery> yeah, that's it. This is fixing a privacy hole
 2313:05 <jonatack> https://github.com/bitcoin/bips/blob/master/bip-0069.mediawiki BIP 69 discusses it in more detail
 2413:05 <jnewbery> thanks jonatack. I was just looking for that link
 2513:06 <jnewbery> There's some discussion around whether BIP 69 (Lexicographical Indexing of Transaction Inputs and Outputs) is good for privacy. Any thoughts about that?
 2613:07 <sosthene> maybe it's not very efficient against chain analysis?
 2713:07 <jonatack> IIRC the deterministic aspect it proposes is debatable or undesirable for privacy
 2813:08 <jnewbery> jonatack: why?
 2913:08 <ariard> e.g in multi-party protocols, if a transaction is leaked on chain, some party outputs may not be distinguished until redeem as they are P2WSH following bip69
 3013:09 <jonatack> jnewbery: a deterministic algo would enable
 3113:09 <jonatack> distinguishing joinpsbts
 3213:09 <jnewbery> ariard: I'm not sure I understand
 3313:09 <nehan> achow101's comment: "Not at all. That gives a different privacy leak as it will indicate that joinpsbts has been used on that transaction."
 3413:10 <jonatack> thus a different privacy leak iirc
 3513:10 <nehan> I don't quite understand
 3613:10 <jnewbery> jonatack: how? How would they look different from other transactions?
 3713:10 <nehan> why not enable it by default. then is it still a privacy leak?
 3813:10 <ariard> jnewbery: in Lightning, commitment transaction outputs are ordered following BIP69
 3913:11 <ariard> so you shouldn't be able to distinguish between incoming and outgoing HTLC output
 4013:11 <ariard> and know party total balance until redeem
 4113:11 <jnewbery> Why can BIP 69 be a privacy leak?
 4213:12 <ariard> well in this case, it's good for privacy leak
 4313:12 <jnewbery> The reason is that if _every_ wallet uses BIP 69, then it's good for privacy, but if only a subset of wallets use BIP 69, then they're identifiable
 4413:13 <jonatack> because very few txns follow bip69 IIUC
 4513:13 <jnewbery> jonatack: correct
 4613:13 <nehan> jnewbery: but the trend is (hopefully) more and more wallets adopting it, increasing the anonymity set for good behavior and decreasing it for bad (not adopting it)
 4713:13 <jnewbery> so if this PR used bip 69 instead of random shuffle, these transactions would be identifiable by chain analysis
 4813:13 <jonatack> from nopara here: https://github.com/bitcoin/bitcoin/issues/12457#issuecomment-383400504
 4913:13 <ariard> at least in Lightning it's mandatory bip69, and funding_output is already a fingreprint
 5013:14 <jnewbery> nehan: perhaps, if wallets do adopt it
 5113:14 <lightlike> but are there any major advantages for lexicographic ordering as opposed to random?
 5213:15 <nehan> lightlike: determinism, not having to shuffle a bunch
 5313:15 <jnewbery> There's some discussion around bip 69 here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-October/016457.html
 5413:16 <jnewbery> lightlike: another reason might be that two parties in a multi-party contract can agree what the order should be if it's deterministic (eg the Lightning case that ariard mentions)
 5513:17 <jnewbery> apart from the privacy aspect, did anyone have other questions or observations about the PR?
 5613:17 <michaelfolkson> Why are so many (up to 10 attempts) required to get a shuffle join? Just being extra safe?
 5713:18 <jonatack> michaelfolkson: shuffling doesn't ensure they aren't equal every time
 5813:18 <michaelfolkson> Not guaranteed, but highly likely to be?
 5913:18 <jnewbery> michaelfolkson: to minimize false positives I suppose
 6013:19 <jnewbery> (we're talking about the new test here: https://github.com/bitcoin/bitcoin/pull/16512/files#diff-9cd872c8f61017722f3accc95f7874d4R376)
 6113:19 <jonatack> the test cannot be flakey, thus instagibbs proposing 128 times
 6213:19 <michaelfolkson> Ok
 6313:20 <michaelfolkson> Have to do the math to work out probability of staying same after 10 shuffles
 6413:20 <jnewbery> if it's 2 inputs it's 2^-9
 6513:21 <jnewbery> any other observations on the PR?
 6613:21 <jnewbery> any thoughts about this: https://github.com/bitcoin/bitcoin/pull/16512/files#r318133389
 6713:21 <michaelfolkson> But makes sense you don't even want a tiny probability of a test failing for an invalid reason
 6813:21 <jonatack> Does everything think the proposed test coverage is enough?
 6913:22 <lightlike> could also fake the randomness by providing the seed to the RNG
 7013:22 <jnewbery> michaelfolkson: yeah, false positives are really bad because they reduce people's trust in the testing (and so they stop caring if tests fail)
 7113:23 <sosthene> I don't quite get this thing about having 2 vectors of inputs and outputs
 7213:23 <jnewbery> lightlike: I don't think we have the ability to do that in bitcoind. And to do that I think we'd need to do a unit test instead of a functional test
 7313:23 <jonatack> If this was spaceflight or aeronautical piloting software, for example, or security-critical I'd want to see more coverage behind changes, to be sure they remain valid over time.
 7413:24 <jnewbery> sosthene: achow is talking about the internal structure of the PSBT
 7513:24 <jnewbery> basically saying that we can't reach in and just change the ordering of the inputs or outputs because it'd leave the PSBT in an inconsistent state
 7613:25 <sosthene> jnewbery: yes, I think I got it now
 7713:25 <sosthene> so it means you have to make the same shuffle operation on both vectors and make sure that the state remains consistent?
 7813:26 <lightlike> jnewbery: i have encountered this in other places, I think we can do it for some RNG functions
 7913:26 <jnewbery> My concern about the code here is that it's manually copying the PSBT. If the PSBT object is updated in future to contain new fields, then this code would need to be updated too
 8013:26 <michaelfolkson> So you'd write a test for this?
 8113:26 <jonatack> jnewbery: bingo
 8213:26 <lightlike> there is a flag g_mock_deterministic_tests=true in the functional framework, which applies to GetRand() for example
 8313:27 <jnewbery> I can easily imagine PSBT being updated and someone forgetting to update the code here, so joinpsbt accidentally drops some of the PSBT details
 8413:28 <sosthene> how would you test for this?
 8513:28 <michaelfolkson> Just test that they are equivalent?
 8613:28 <jonatack> I would want to extract the copy and shuffle functions out and unit test them.
 8713:28 <jnewbery> jonatack: I agree
 8813:28 <jonatack> As standalone fns.
 8913:29 <jnewbery> Make shuffle a member function of PartiallySignedTransaction
 9013:29 <jonatack> Yes.
 9113:30 <jnewbery> Let's open up to more general questions about the review process (but feel free to continue discussing 16512)
 9213:30 <michaelfolkson> Interesting
 9313:31 <michaelfolkson> I have some basic ones that I'm happy to end with if someone else wants to go first
 9413:32 <jnewbery> just ask now
 9513:32 <jonatack> I find it a challenge to decide how much time to spend on a review.
 9613:32 <jonatack> Even trivial ones can lead down a deep rabbit hole (for me)
 9713:33 <jonatack> of reading BIPs, history, gathering context, writing tests, etc. It's never enough.
 9813:33 <michaelfolkson> What can one do to help the author of a PR? Is one able to rebase, address nits etc for the author?
 9913:33 <jnewbery> jonatack: I agree, it's easy for that to happen! That's the best way to learn about the project I think - when you're motivated to understand something because you need it
10013:34 <jonatack> So how do you decide which ones to PR and how deep to go?
10113:34 <jonatack> Also: Hugh-priority vs easy ones
10213:34 <jonatack> Time allocation is a challenge!
10313:34 <jnewbery> jonatack: I have a horrible guilt-driven-devlopment model, where I spend my time looking that the PR I feel most guilty about neglecting for too long
10413:35 <jnewbery> Time allocation _is_ a challenge!
10513:35 <jonatack> Could spend literally days on some of them.
10613:35 <michaelfolkson> (Cont) Or is the help limited to review, testing, providing feedback? (by no means trivial)
10713:35 <jonatack> Ha! I feel guilty about dozens of PRs... I empathize.
10813:37 <lightlike> So far I just choose the ones that seem really interesting to me.
10913:37 <jnewbery> I suspect there are diminishing returns on how much benefit you can provide in your review. Like your 4th hour of reviewing a PR is likely to lead to less interesting observations than your 3rd hour of reviewing it
11013:37 <jnewbery> (perhaps)
11113:38 <jnewbery> michaelfolkson: rebasing/addressing nits/adding tests can be very useful. I recommend you check with the author before you spend a bunch of time on it though
11213:38 <jnewbery> either leave a comment on the PR or message them and say "I want to help by doing this. Would that be helpful for you?"
11313:39 <jonatack> TBH there a few PRs I spend more than an hour on, before forcing myself to move on. But for the deeper ones that's not enough.
11413:39 <michaelfolkson> Also signing commits. Is it important that a regular contributor signs the commits or is it fine for a new contributor to sign the commit? I don't know whether it is checking the contributor is who they say they are if it is checking someone experienced has looked at it
11513:39 <jnewbery> The only commits that need to be signed are the merge commits when a maintainer merges a PR
11613:40 <michaelfolkson> Ok so it isn't standard practice for a contributor to sign their commits?
11713:40 <jnewbery> feel free to sign commits on your own PRs, but it's not necessary
11813:40 <michaelfolkson> Got it
11913:40 <jonatack> lightlike: how long do you typically spend reviewing them?
12013:42 <jkczyz> Hello, joining late. If you could change something fundamental about Bitcoin Core to improve the PR review process, what would it be? :)
12113:42 <michaelfolkson> Another question. I opened a PR to do something that you John had already opened a PR for. Bar going through all the PRs how would I have known that? Eventually DrahtBot told me. Is it fine to use DrahtBot for this purpose? I'm assuming you'd prefer someone checked all the open PRs first...
12213:43 <lightlike> jonatack: varies a lot, but usually more than an hour. For example I used a long train ride to understand amiti's WIP PR.
12313:43 <michaelfolkson> I'm monitoring a lot of GitHub activity via email notifications but it is a lot. It is swamping my inbox
12413:44 <jnewbery> michaelfolkson: I think just search for function names and any key words in the open PRs first. I don't think there's any way to run drahtbot offline (but I do think that would be useful)
12513:44 <jonatack> lightlike: yes. for the more involved PRs I would need more as well. I would like to dedicate more time to the deeper and high priority ones, but it takes waaay more time.
12613:44 <jnewbery> jkczyz: I'd add more great reviewers :)
12713:45 <jnewbery> michaelfolkson: I gave up using github email notifications long ago
12813:45 <jkczyz> jnewbery: haha, Fair enough. But anything about the code itself?
12913:46 <michaelfolkson> Carl said on an interview he used IFTTT for specific GitHub alerts. Might be worth playing around with
13013:46 <jnewbery> jkczyz: better modularization so people aren't stepping on each other's toes as much. This has already improved a lot in the last few months/years
13113:46 <jkczyz> michaelfolkson: Filter your mail by author@noreply.github.com may help a bit
13213:47 <jnewbery> I think splitting wallet from node and having a well-determined interface makes it way easier to review and think about wallet changes
13313:47 <michaelfolkson> <jkczyz>: That is filtering by the author of the PR/issue/comment?
13413:48 <michaelfolkson> <jkczyz>: So get notifications on the activity of specific people?
13513:49 <jonatack> michaelfolkson: for reviewing PRs at least collisions aren't a worry :)
13613:50 <jkczyz> michaelfolkson: Actually, nevermind. My filter is actually anything to me with list:bitcoin.bitcoin.github.com
13713:51 <michaelfolkson> <jonatack>: Haha yeah that's right. I'm finding doing some very basic PRs is useful too though. Can't really review well if you haven't done a couple smoothly yourself
13813:51 <michaelfolkson> But yeah definitely agree once one is comfortable the 10:1 ratio or whatever it was needs to be upheld
13913:52 <jnewbery> I still think that reviewing is the most effective way you can contribute as a new contributor (and also will teach you much more about the code than opening PRs)
14013:53 <michaelfolkson> Agreed
14113:53 <jonatack> Yes -- much more value-add
14213:54 <jnewbery> last 5 minutes. Any final questions?
14313:54 <jonatack> and reviewing is also a great way to see things that can be done if you are looking for that
14413:54 <jnewbery> jonatack: definitely
14513:54 <michaelfolkson> If no one has any other questions shall I ask if you need any help with this club <jnewbery>?
14613:54 <jonatack> e.g. in rpc_psbt.py... there is no logging, nada
14713:54 <michaelfolkson> How should the PRs be chosen each week? Continue with John choosing them?
14813:55 <jnewbery> Thanks Michael. I'm always looking for people to suggest PRs or host meetings
14913:55 <jnewbery> You can suggest PRs here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or by messaging me
15013:57 <jnewbery> Best PRs are small(ish) and don't require too much contextual knowledge
15113:57 <jnewbery> (so it doesn't take you hours and hours to review)
15213:58 <michaelfolkson> The larger complicated ones are great for learning but for me at least won't get me in a position to give adequate review
15313:58 <jnewbery> ok, if there are no more questions, let's wrap it up there.
15413:58 <jnewbery> I'll post next week's PR later today
15513:58 <jnewbery> Thanks everyone!
15613:59 <jonatack> Thanks, John and all!
15713:59 <michaelfolkson> Cool, thanks
15813:59 <lightlike> thanks!
15914:00 <jonatack> nehan: p.s. if you are still here, great job with the a16z presentation https://www.youtube.com/watch?v=5oQ0djZYXNI
16014:03 <achow101> re bip69: according to p2sh.info, somewhere between 40% and 60% of txs use bip69. also core doesn't use bip69 at all
16114:08 <jonatack> achow101: thanks
16214:09 <milochen0418> thanks