Shuffle inputs and outputs after joining psbts (
rpc/rest/zmq) Sep 4, 2019
Today’s PR is very small and should be a quick review. We’ll spend half the
time talking about the PR and half covering general questions about the Bitcoin
Core review process.
Please come prepared with your own questions about review in Bitcoin Core!
Partially Signed Bitcoin Transactions (PSBTs) are a way for Bitcoin wallets
and applications to pass unsigned or partially-signed transactions around.
This is useful needed when a transaction requires multiple signatures (eg for
a multisig output or an off-chain contract).
The PSBT format is defined in
174. The BIP
also defines several responsibilities in the PSBT workflow: creator, updater,
signer, combiner, input finalizer, and transaction extractor. The
RPC utility method takes multiple PSBTs and joins them into a single PSBT.
Currently, the inputs and outputs are joined in order. This PR changes the
method to shuffle both inputs and outputs.
What is the justification for this PR? Why would users want inputs/outputs to
3 13:00 <sosthene> hello there!
7 13:01 <michaelfolkson> Hey
9 13:01 <jnewbery> first of all, sorry about the late notice for this week's PR, and for the date typo on the website.
10 13:01 <michaelfolkson> I'll ask you afterwards how I/we can help?
11 13:01 <jonatack> sosthene: i don't remember, but the discussions have been excellent
12 13:02 <jnewbery> It was a very busy week at Chaincode last week, and unfortunately PR review club suffered
13 13:02 <jnewbery> should be back to normal service next week!
15 13:02 <jnewbery> It was quite a small, straightforward PR this week. It shouldn't take too long to discuss
16 13:03 <jnewbery> we can use any extra time to just talk about the review process and ask/answer more general questions
17 13:03 <jnewbery> I only had one question: What is the justification for this PR? Why would users want inputs/outputs to be shuffled?
18 13:04 <dergigi> My guess would be privacy
19 13:04 <fjahr> As achow mentions in the message: inputs and outputs could be correlated
20 13:04 <sosthene> If I got it right in the PR, it is because not shuffling the order of the inputs and outputs in a psbt transaction makes it trivial to map input and ouputs together
21 13:04 <sosthene> yeah privacy I mean :)
22 13:04 <jnewbery> yeah, that's it. This is fixing a privacy hole
24 13:05 <jnewbery> thanks jonatack. I was just looking for that link
25 13:06 <jnewbery> There's some discussion around whether BIP 69 (Lexicographical Indexing of Transaction Inputs and Outputs) is good for privacy. Any thoughts about that?
26 13:07 <sosthene> maybe it's not very efficient against chain analysis?
27 13:07 <jonatack> IIRC the deterministic aspect it proposes is debatable or undesirable for privacy
28 13:08 <jnewbery> jonatack: why?
29 13:08 <ariard> e.g in multi-party protocols, if a transaction is leaked on chain, some party outputs may not be distinguished until redeem as they are P2WSH following bip69
30 13:09 <jonatack> jnewbery: a deterministic algo would enable
31 13:09 <jonatack> distinguishing joinpsbts
32 13:09 <jnewbery> ariard: I'm not sure I understand
33 13:09 <nehan> achow101's comment: "Not at all. That gives a different privacy leak as it will indicate that joinpsbts has been used on that transaction."
34 13:10 <jonatack> thus a different privacy leak iirc
35 13:10 <nehan> I don't quite understand
36 13:10 <jnewbery> jonatack: how? How would they look different from other transactions?
37 13:10 <nehan> why not enable it by default. then is it still a privacy leak?
38 13:10 <ariard> jnewbery: in Lightning, commitment transaction outputs are ordered following BIP69
39 13:11 <ariard> so you shouldn't be able to distinguish between incoming and outgoing HTLC output
40 13:11 <ariard> and know party total balance until redeem
41 13:11 <jnewbery> Why can BIP 69 be a privacy leak?
42 13:12 <ariard> well in this case, it's good for privacy leak
43 13:12 <jnewbery> The reason is that if _every_ wallet uses BIP 69, then it's good for privacy, but if only a subset of wallets use BIP 69, then they're identifiable
44 13:13 <jonatack> because very few txns follow bip69 IIUC
45 13:13 <jnewbery> jonatack: correct
46 13:13 <nehan> jnewbery: but the trend is (hopefully) more and more wallets adopting it, increasing the anonymity set for good behavior and decreasing it for bad (not adopting it)
47 13:13 <jnewbery> so if this PR used bip 69 instead of random shuffle, these transactions would be identifiable by chain analysis
49 13:13 <ariard> at least in Lightning it's mandatory bip69, and funding_output is already a fingreprint
50 13:14 <jnewbery> nehan: perhaps, if wallets do adopt it
51 13:14 <lightlike> but are there any major advantages for lexicographic ordering as opposed to random?
52 13:15 <nehan> lightlike: determinism, not having to shuffle a bunch
54 13:16 <jnewbery> lightlike: another reason might be that two parties in a multi-party contract can agree what the order should be if it's deterministic (eg the Lightning case that ariard mentions)
55 13:17 <jnewbery> apart from the privacy aspect, did anyone have other questions or observations about the PR?
56 13:17 <michaelfolkson> Why are so many (up to 10 attempts) required to get a shuffle join? Just being extra safe?
57 13:18 <jonatack> michaelfolkson: shuffling doesn't ensure they aren't equal every time
58 13:18 <michaelfolkson> Not guaranteed, but highly likely to be?
59 13:18 <jnewbery> michaelfolkson: to minimize false positives I suppose
61 13:19 <jonatack> the test cannot be flakey, thus instagibbs proposing 128 times
62 13:19 <michaelfolkson> Ok
63 13:20 <michaelfolkson> Have to do the math to work out probability of staying same after 10 shuffles
64 13:20 <jnewbery> if it's 2 inputs it's 2^-9
65 13:21 <jnewbery> any other observations on the PR?
67 13:21 <michaelfolkson> But makes sense you don't even want a tiny probability of a test failing for an invalid reason
68 13:21 <jonatack> Does everything think the proposed test coverage is enough?
69 13:22 <lightlike> could also fake the randomness by providing the seed to the RNG
70 13:22 <jnewbery> michaelfolkson: yeah, false positives are really bad because they reduce people's trust in the testing (and so they stop caring if tests fail)
71 13:23 <sosthene> I don't quite get this thing about having 2 vectors of inputs and outputs
72 13:23 <jnewbery> lightlike: I don't think we have the ability to do that in bitcoind. And to do that I think we'd need to do a unit test instead of a functional test
73 13:23 <jonatack> If this was spaceflight or aeronautical piloting software, for example, or security-critical I'd want to see more coverage behind changes, to be sure they remain valid over time.
74 13:24 <jnewbery> sosthene: achow is talking about the internal structure of the PSBT
75 13:24 <jnewbery> basically saying that we can't reach in and just change the ordering of the inputs or outputs because it'd leave the PSBT in an inconsistent state
76 13:25 <sosthene> jnewbery: yes, I think I got it now
77 13:25 <sosthene> so it means you have to make the same shuffle operation on both vectors and make sure that the state remains consistent?
78 13:26 <lightlike> jnewbery: i have encountered this in other places, I think we can do it for some RNG functions
79 13:26 <jnewbery> My concern about the code here is that it's manually copying the PSBT. If the PSBT object is updated in future to contain new fields, then this code would need to be updated too
80 13:26 <michaelfolkson> So you'd write a test for this?
81 13:26 <jonatack> jnewbery: bingo
82 13:26 <lightlike> there is a flag g_mock_deterministic_tests=true in the functional framework, which applies to GetRand() for example
83 13:27 <jnewbery> I can easily imagine PSBT being updated and someone forgetting to update the code here, so joinpsbt accidentally drops some of the PSBT details
84 13:28 <sosthene> how would you test for this?
85 13:28 <michaelfolkson> Just test that they are equivalent?
86 13:28 <jonatack> I would want to extract the copy and shuffle functions out and unit test them.
87 13:28 <jnewbery> jonatack: I agree
88 13:28 <jonatack> As standalone fns.
89 13:29 <jnewbery> Make shuffle a member function of PartiallySignedTransaction
91 13:30 <jnewbery> Let's open up to more general questions about the review process (but feel free to continue discussing 16512)
92 13:30 <michaelfolkson> Interesting
93 13:31 <michaelfolkson> I have some basic ones that I'm happy to end with if someone else wants to go first
94 13:32 <jnewbery> just ask now
95 13:32 <jonatack> I find it a challenge to decide how much time to spend on a review.
96 13:32 <jonatack> Even trivial ones can lead down a deep rabbit hole (for me)
97 13:33 <jonatack> of reading BIPs, history, gathering context, writing tests, etc. It's never enough.
98 13:33 <michaelfolkson> What can one do to help the author of a PR? Is one able to rebase, address nits etc for the author?
99 13:33 <jnewbery> jonatack: I agree, it's easy for that to happen! That's the best way to learn about the project I think - when you're motivated to understand something because you need it
100 13:34 <jonatack> So how do you decide which ones to PR and how deep to go?
101 13:34 <jonatack> Also: Hugh-priority vs easy ones
102 13:34 <jonatack> Time allocation is a challenge!
103 13:34 <jnewbery> jonatack: I have a horrible guilt-driven-devlopment model, where I spend my time looking that the PR I feel most guilty about neglecting for too long
104 13:35 <jnewbery> Time allocation _is_ a challenge!
105 13:35 <jonatack> Could spend literally days on some of them.
106 13:35 <michaelfolkson> (Cont) Or is the help limited to review, testing, providing feedback? (by no means trivial)
107 13:35 <jonatack> Ha! I feel guilty about dozens of PRs... I empathize.
108 13:37 <lightlike> So far I just choose the ones that seem really interesting to me.
109 13:37 <jnewbery> I suspect there are diminishing returns on how much benefit you can provide in your review. Like your 4th hour of reviewing a PR is likely to lead to less interesting observations than your 3rd hour of reviewing it
110 13:37 <jnewbery> (perhaps)
111 13:38 <jnewbery> michaelfolkson: rebasing/addressing nits/adding tests can be very useful. I recommend you check with the author before you spend a bunch of time on it though
112 13:38 <jnewbery> either leave a comment on the PR or message them and say "I want to help by doing this. Would that be helpful for you?"
113 13:39 <jonatack> TBH there a few PRs I spend more than an hour on, before forcing myself to move on. But for the deeper ones that's not enough.
114 13:39 <michaelfolkson> Also signing commits. Is it important that a regular contributor signs the commits or is it fine for a new contributor to sign the commit? I don't know whether it is checking the contributor is who they say they are if it is checking someone experienced has looked at it
115 13:39 <jnewbery> The only commits that need to be signed are the merge commits when a maintainer merges a PR
116 13:40 <michaelfolkson> Ok so it isn't standard practice for a contributor to sign their commits?
117 13:40 <jnewbery> feel free to sign commits on your own PRs, but it's not necessary
118 13:40 <michaelfolkson> Got it
119 13:40 <jonatack> lightlike: how long do you typically spend reviewing them?
120 13:42 <jkczyz> Hello, joining late. If you could change something fundamental about Bitcoin Core to improve the PR review process, what would it be? :)
121 13:42 <michaelfolkson> Another question. I opened a PR to do something that you John had already opened a PR for. Bar going through all the PRs how would I have known that? Eventually DrahtBot told me. Is it fine to use DrahtBot for this purpose? I'm assuming you'd prefer someone checked all the open PRs first...
122 13:43 <lightlike> jonatack: varies a lot, but usually more than an hour. For example I used a long train ride to understand amiti's WIP PR.
123 13:43 <michaelfolkson> I'm monitoring a lot of GitHub activity via email notifications but it is a lot. It is swamping my inbox
124 13:44 <jnewbery> michaelfolkson: I think just search for function names and any key words in the open PRs first. I don't think there's any way to run drahtbot offline (but I do think that would be useful)
125 13:44 <jonatack> lightlike: yes. for the more involved PRs I would need more as well. I would like to dedicate more time to the deeper and high priority ones, but it takes waaay more time.
126 13:44 <jnewbery> jkczyz: I'd add more great reviewers :)
127 13:45 <jnewbery> michaelfolkson: I gave up using github email notifications long ago
128 13:45 <jkczyz> jnewbery: haha, Fair enough. But anything about the code itself?
129 13:46 <michaelfolkson> Carl said on an interview he used IFTTT for specific GitHub alerts. Might be worth playing around with
130 13:46 <jnewbery> jkczyz: better modularization so people aren't stepping on each other's toes as much. This has already improved a lot in the last few months/years
131 13:46 <jkczyz> michaelfolkson: Filter your mail by email@example.com may help a bit
132 13:47 <jnewbery> I think splitting wallet from node and having a well-determined interface makes it way easier to review and think about wallet changes
133 13:47 <michaelfolkson> <jkczyz>: That is filtering by the author of the PR/issue/comment?
134 13:48 <michaelfolkson> <jkczyz>: So get notifications on the activity of specific people?
135 13:49 <jonatack> michaelfolkson: for reviewing PRs at least collisions aren't a worry :)
136 13:50 <jkczyz> michaelfolkson: Actually, nevermind. My filter is actually anything to me with list:bitcoin.bitcoin.github.com
137 13:51 <michaelfolkson> <jonatack>: Haha yeah that's right. I'm finding doing some very basic PRs is useful too though. Can't really review well if you haven't done a couple smoothly yourself
138 13:51 <michaelfolkson> But yeah definitely agree once one is comfortable the 10:1 ratio or whatever it was needs to be upheld
139 13:52 <jnewbery> I still think that reviewing is the most effective way you can contribute as a new contributor (and also will teach you much more about the code than opening PRs)
140 13:53 <michaelfolkson> Agreed
141 13:53 <jonatack> Yes -- much more value-add
142 13:54 <jnewbery> last 5 minutes. Any final questions?
143 13:54 <jonatack> and reviewing is also a great way to see things that can be done if you are looking for that
144 13:54 <jnewbery> jonatack: definitely
145 13:54 <michaelfolkson> If no one has any other questions shall I ask if you need any help with this club <jnewbery>?
146 13:54 <jonatack> e.g. in rpc_psbt.py... there is no logging, nada
147 13:54 <michaelfolkson> How should the PRs be chosen each week? Continue with John choosing them?
148 13:55 <jnewbery> Thanks Michael. I'm always looking for people to suggest PRs or host meetings
150 13:57 <jnewbery> Best PRs are small(ish) and don't require too much contextual knowledge
151 13:57 <jnewbery> (so it doesn't take you hours and hours to review)
152 13:58 <michaelfolkson> The larger complicated ones are great for learning but for me at least won't get me in a position to give adequate review
153 13:58 <jnewbery> ok, if there are no more questions, let's wrap it up there.
154 13:58 <jnewbery> I'll post next week's PR later today
155 13:58 <jnewbery> Thanks everyone!
156 13:59 <jonatack> Thanks, John and all!
157 13:59 <michaelfolkson> Cool, thanks
158 13:59 <lightlike> thanks!
160 14:03 <achow101> re bip69: according to p2sh.info, somewhere between 40% and 60% of txs use bip69. also core doesn't use bip69 at all
161 14:08 <jonatack> achow101: thanks
162 14:09 <milochen0418> thanks