💡

Bitcoin Core PR Review Club

Shuffle inputs and outputs after joining psbts (rpc/rest/zmq)

Sep 4, 2019

https://github.com/bitcoin/bitcoin/pull/16512

Host: jnewbery - PR author: achow101

Today’s PR is very small and should be a quick review. We’ll spend half the time talking about the PR and half covering general questions about the Bitcoin Core review process.

Please come prepared with your own questions about review in Bitcoin Core!

Notes

Partially Signed Bitcoin Transactions (PSBTs) are a way for Bitcoin wallets and applications to pass unsigned or partially-signed transactions around.
This is useful needed when a transaction requires multiple signatures (eg for a multisig output or an off-chain contract).
The PSBT format is defined in BIP 174. The BIP also defines several responsibilities in the PSBT workflow: creator, updater, signer, combiner, input finalizer, and transaction extractor.
The joinpsbts RPC utility method takes multiple PSBTs and joins them into a single PSBT.
Currently, the inputs and outputs are joined in order. This PR changes the method to shuffle both inputs and outputs.

Questions

What is the justification for this PR? Why would users want inputs/outputs to be shuffled?

Meeting Log

13:00

<jnewbery> hi

13:00

<jonatack> hi!

13:00

<sosthene> hello there!

13:00

<lightlike> hi

13:00

<fjahr> hi

13:01

<ariard> hi

13:01

<michaelfolkson> Hey

13:01

<emzy> hi

13:01

<jnewbery> first of all, sorry about the late notice for this week's PR, and for the date typo on the website.

13:01

<michaelfolkson> I'll ask you afterwards how I/we can help?

13:01

<jonatack> sosthene: i don't remember, but the discussions have been excellent

13:02

<jnewbery> It was a very busy week at Chaincode last week, and unfortunately PR review club suffered

13:02

<jnewbery> should be back to normal service next week!

13:02

<nehan> hi

13:02

<jnewbery> It was quite a small, straightforward PR this week. It shouldn't take too long to discuss

13:03

<jnewbery> we can use any extra time to just talk about the review process and ask/answer more general questions

13:03

<jnewbery> I only had one question: What is the justification for this PR? Why would users want inputs/outputs to be shuffled?

13:04

<dergigi> My guess would be privacy

13:04

<fjahr> As achow mentions in the message: inputs and outputs could be correlated

13:04

<sosthene> If I got it right in the PR, it is because not shuffling the order of the inputs and outputs in a psbt transaction makes it trivial to map input and ouputs together

13:04

<sosthene> yeah privacy I mean :)

13:04

<jnewbery> yeah, that's it. This is fixing a privacy hole

13:05

<jonatack> https://github.com/bitcoin/bips/blob/master/bip-0069.mediawiki BIP 69 discusses it in more detail

13:05

<jnewbery> thanks jonatack. I was just looking for that link

13:06

<jnewbery> There's some discussion around whether BIP 69 (Lexicographical Indexing of Transaction Inputs and Outputs) is good for privacy. Any thoughts about that?

13:07

<sosthene> maybe it's not very efficient against chain analysis?

13:07

<jonatack> IIRC the deterministic aspect it proposes is debatable or undesirable for privacy

13:08

<jnewbery> jonatack: why?

13:08

<ariard> e.g in multi-party protocols, if a transaction is leaked on chain, some party outputs may not be distinguished until redeem as they are P2WSH following bip69

13:09

<jonatack> jnewbery: a deterministic algo would enable

13:09

<jonatack> distinguishing joinpsbts

13:09

<jnewbery> ariard: I'm not sure I understand

13:09

<nehan> achow101's comment: "Not at all. That gives a different privacy leak as it will indicate that joinpsbts has been used on that transaction."

13:10

<jonatack> thus a different privacy leak iirc

13:10

<nehan> I don't quite understand

13:10

<jnewbery> jonatack: how? How would they look different from other transactions?

13:10

<nehan> why not enable it by default. then is it still a privacy leak?

13:10

<ariard> jnewbery: in Lightning, commitment transaction outputs are ordered following BIP69

13:11

<ariard> so you shouldn't be able to distinguish between incoming and outgoing HTLC output

13:11

<ariard> and know party total balance until redeem

13:11

<jnewbery> Why can BIP 69 be a privacy leak?

13:12

<ariard> well in this case, it's good for privacy leak

13:12

<jnewbery> The reason is that if _every_ wallet uses BIP 69, then it's good for privacy, but if only a subset of wallets use BIP 69, then they're identifiable

13:13

<jonatack> because very few txns follow bip69 IIUC

13:13

<jnewbery> jonatack: correct

13:13

<nehan> jnewbery: but the trend is (hopefully) more and more wallets adopting it, increasing the anonymity set for good behavior and decreasing it for bad (not adopting it)

13:13

<jnewbery> so if this PR used bip 69 instead of random shuffle, these transactions would be identifiable by chain analysis

13:13

<jonatack> from nopara here: https://github.com/bitcoin/bitcoin/issues/12457#issuecomment-383400504

13:13

<ariard> at least in Lightning it's mandatory bip69, and funding_output is already a fingreprint

13:14

<jnewbery> nehan: perhaps, if wallets do adopt it

13:14

<lightlike> but are there any major advantages for lexicographic ordering as opposed to random?

13:15

<nehan> lightlike: determinism, not having to shuffle a bunch

13:15

<jnewbery> There's some discussion around bip 69 here: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-October/016457.html

13:16

<jnewbery> lightlike: another reason might be that two parties in a multi-party contract can agree what the order should be if it's deterministic (eg the Lightning case that ariard mentions)

13:17

<jnewbery> apart from the privacy aspect, did anyone have other questions or observations about the PR?

13:17

<michaelfolkson> Why are so many (up to 10 attempts) required to get a shuffle join? Just being extra safe?

13:18

<jonatack> michaelfolkson: shuffling doesn't ensure they aren't equal every time

13:18

<michaelfolkson> Not guaranteed, but highly likely to be?

13:18

<jnewbery> michaelfolkson: to minimize false positives I suppose

13:19

<jnewbery> (we're talking about the new test here: https://github.com/bitcoin/bitcoin/pull/16512/files#diff-9cd872c8f61017722f3accc95f7874d4R376)

13:19

<jonatack> the test cannot be flakey, thus instagibbs proposing 128 times

13:19

<michaelfolkson> Ok

13:20

<michaelfolkson> Have to do the math to work out probability of staying same after 10 shuffles

13:20

<jnewbery> if it's 2 inputs it's 2^-9

13:21

<jnewbery> any other observations on the PR?

13:21

<jnewbery> any thoughts about this: https://github.com/bitcoin/bitcoin/pull/16512/files#r318133389

13:21

<michaelfolkson> But makes sense you don't even want a tiny probability of a test failing for an invalid reason

13:21

<jonatack> Does everything think the proposed test coverage is enough?

13:22

<lightlike> could also fake the randomness by providing the seed to the RNG

13:22

<jnewbery> michaelfolkson: yeah, false positives are really bad because they reduce people's trust in the testing (and so they stop caring if tests fail)

13:23

<sosthene> I don't quite get this thing about having 2 vectors of inputs and outputs

13:23

<jnewbery> lightlike: I don't think we have the ability to do that in bitcoind. And to do that I think we'd need to do a unit test instead of a functional test

13:23

<jonatack> If this was spaceflight or aeronautical piloting software, for example, or security-critical I'd want to see more coverage behind changes, to be sure they remain valid over time.

13:24

<jnewbery> sosthene: achow is talking about the internal structure of the PSBT

13:24

<jnewbery> basically saying that we can't reach in and just change the ordering of the inputs or outputs because it'd leave the PSBT in an inconsistent state

13:25

<sosthene> jnewbery: yes, I think I got it now

13:25

<sosthene> so it means you have to make the same shuffle operation on both vectors and make sure that the state remains consistent?

13:26

<lightlike> jnewbery: i have encountered this in other places, I think we can do it for some RNG functions

13:26

<jnewbery> My concern about the code here is that it's manually copying the PSBT. If the PSBT object is updated in future to contain new fields, then this code would need to be updated too

13:26

<michaelfolkson> So you'd write a test for this?

13:26

<jonatack> jnewbery: bingo

13:26

<lightlike> there is a flag g_mock_deterministic_tests=true in the functional framework, which applies to GetRand() for example

13:27

<jnewbery> I can easily imagine PSBT being updated and someone forgetting to update the code here, so joinpsbt accidentally drops some of the PSBT details

13:28

<sosthene> how would you test for this?

13:28

<michaelfolkson> Just test that they are equivalent?

13:28

<jonatack> I would want to extract the copy and shuffle functions out and unit test them.

13:28

<jnewbery> jonatack: I agree

13:28

<jonatack> As standalone fns.

13:29

<jnewbery> Make shuffle a member function of PartiallySignedTransaction

13:29

<jonatack> Yes.

13:30

<jnewbery> Let's open up to more general questions about the review process (but feel free to continue discussing 16512)

13:30

<michaelfolkson> Interesting

13:31

<michaelfolkson> I have some basic ones that I'm happy to end with if someone else wants to go first

13:32

<jnewbery> just ask now

13:32

<jonatack> I find it a challenge to decide how much time to spend on a review.

13:32

<jonatack> Even trivial ones can lead down a deep rabbit hole (for me)

13:33

<jonatack> of reading BIPs, history, gathering context, writing tests, etc. It's never enough.

13:33

<michaelfolkson> What can one do to help the author of a PR? Is one able to rebase, address nits etc for the author?

13:33

<jnewbery> jonatack: I agree, it's easy for that to happen! That's the best way to learn about the project I think - when you're motivated to understand something because you need it

13:34

<jonatack> So how do you decide which ones to PR and how deep to go?

13:34

<jonatack> Also: Hugh-priority vs easy ones

13:34

<jonatack> Time allocation is a challenge!

13:34

<jnewbery> jonatack: I have a horrible guilt-driven-devlopment model, where I spend my time looking that the PR I feel most guilty about neglecting for too long

13:35

<jnewbery> Time allocation _is_ a challenge!

13:35

<jonatack> Could spend literally days on some of them.

13:35

<michaelfolkson> (Cont) Or is the help limited to review, testing, providing feedback? (by no means trivial)

13:35

<jonatack> Ha! I feel guilty about dozens of PRs... I empathize.

13:37

<lightlike> So far I just choose the ones that seem really interesting to me.

13:37

<jnewbery> I suspect there are diminishing returns on how much benefit you can provide in your review. Like your 4th hour of reviewing a PR is likely to lead to less interesting observations than your 3rd hour of reviewing it

13:37

<jnewbery> (perhaps)

13:38

<jnewbery> michaelfolkson: rebasing/addressing nits/adding tests can be very useful. I recommend you check with the author before you spend a bunch of time on it though

13:38

<jnewbery> either leave a comment on the PR or message them and say "I want to help by doing this. Would that be helpful for you?"

13:39

<jonatack> TBH there a few PRs I spend more than an hour on, before forcing myself to move on. But for the deeper ones that's not enough.

13:39

<michaelfolkson> Also signing commits. Is it important that a regular contributor signs the commits or is it fine for a new contributor to sign the commit? I don't know whether it is checking the contributor is who they say they are if it is checking someone experienced has looked at it

13:39

<jnewbery> The only commits that need to be signed are the merge commits when a maintainer merges a PR

13:40

<michaelfolkson> Ok so it isn't standard practice for a contributor to sign their commits?

13:40

<jnewbery> feel free to sign commits on your own PRs, but it's not necessary

13:40

<michaelfolkson> Got it

13:40

<jonatack> lightlike: how long do you typically spend reviewing them?

13:42

<jkczyz> Hello, joining late. If you could change something fundamental about Bitcoin Core to improve the PR review process, what would it be? :)

13:42

<michaelfolkson> Another question. I opened a PR to do something that you John had already opened a PR for. Bar going through all the PRs how would I have known that? Eventually DrahtBot told me. Is it fine to use DrahtBot for this purpose? I'm assuming you'd prefer someone checked all the open PRs first...

13:43

<lightlike> jonatack: varies a lot, but usually more than an hour. For example I used a long train ride to understand amiti's WIP PR.

13:43

<michaelfolkson> I'm monitoring a lot of GitHub activity via email notifications but it is a lot. It is swamping my inbox

13:44

<jnewbery> michaelfolkson: I think just search for function names and any key words in the open PRs first. I don't think there's any way to run drahtbot offline (but I do think that would be useful)

13:44

<jonatack> lightlike: yes. for the more involved PRs I would need more as well. I would like to dedicate more time to the deeper and high priority ones, but it takes waaay more time.

13:44

<jnewbery> jkczyz: I'd add more great reviewers :)

13:45

<jnewbery> michaelfolkson: I gave up using github email notifications long ago

13:45

<jkczyz> jnewbery: haha, Fair enough. But anything about the code itself?

13:46

<michaelfolkson> Carl said on an interview he used IFTTT for specific GitHub alerts. Might be worth playing around with

13:46

<jnewbery> jkczyz: better modularization so people aren't stepping on each other's toes as much. This has already improved a lot in the last few months/years

13:46

<jkczyz> michaelfolkson: Filter your mail by author@noreply.github.com may help a bit

13:47

<jnewbery> I think splitting wallet from node and having a well-determined interface makes it way easier to review and think about wallet changes

13:47

<michaelfolkson> <jkczyz>: That is filtering by the author of the PR/issue/comment?

13:48

<michaelfolkson> <jkczyz>: So get notifications on the activity of specific people?

13:49

<jonatack> michaelfolkson: for reviewing PRs at least collisions aren't a worry :)

13:50

<jkczyz> michaelfolkson: Actually, nevermind. My filter is actually anything to me with list:bitcoin.bitcoin.github.com

13:51

<michaelfolkson> <jonatack>: Haha yeah that's right. I'm finding doing some very basic PRs is useful too though. Can't really review well if you haven't done a couple smoothly yourself

13:51

<michaelfolkson> But yeah definitely agree once one is comfortable the 10:1 ratio or whatever it was needs to be upheld

13:52

<jnewbery> I still think that reviewing is the most effective way you can contribute as a new contributor (and also will teach you much more about the code than opening PRs)

13:53

<michaelfolkson> Agreed

13:53

<jonatack> Yes -- much more value-add

13:54

<jnewbery> last 5 minutes. Any final questions?

13:54

<jonatack> and reviewing is also a great way to see things that can be done if you are looking for that

13:54

<jnewbery> jonatack: definitely

13:54

<michaelfolkson> If no one has any other questions shall I ask if you need any help with this club <jnewbery>?

13:54

<jonatack> e.g. in rpc_psbt.py... there is no logging, nada

13:54

<michaelfolkson> How should the PRs be chosen each week? Continue with John choosing them?

13:55

<jnewbery> Thanks Michael. I'm always looking for people to suggest PRs or host meetings

13:55

<jnewbery> You can suggest PRs here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or by messaging me

13:57

<jnewbery> Best PRs are small(ish) and don't require too much contextual knowledge

13:57

<jnewbery> (so it doesn't take you hours and hours to review)

13:58

<michaelfolkson> The larger complicated ones are great for learning but for me at least won't get me in a position to give adequate review

13:58

<jnewbery> ok, if there are no more questions, let's wrap it up there.

13:58

<jnewbery> I'll post next week's PR later today

13:58

<jnewbery> Thanks everyone!

13:59

<jonatack> Thanks, John and all!

13:59

<michaelfolkson> Cool, thanks

13:59

<lightlike> thanks!

14:00

<jonatack> nehan: p.s. if you are still here, great job with the a16z presentation https://www.youtube.com/watch?v=5oQ0djZYXNI

14:03

<achow101> re bip69: according to p2sh.info, somewhere between 40% and 60% of txs use bip69. also core doesn't use bip69 at all

14:08

<jonatack> achow101: thanks

14:09

<milochen0418> thanks