Only allow getdata of recently announced invs (p2p)

https://github.com/bitcoin/bitcoin/pull/19109

Host: amitiuttarwar  -  PR author: sipa

The PR branch HEAD was 2da7ee37b at the time of this review club meeting.

Notes

  • If a spy is able to identify which node initially broadcast a transaction, there’s a high probability that that node is the source wallet for the transaction.

  • To avoid that privacy leak, we try to be intentional about how we relay and request transactions. We don’t want to reveal the exact contents of our mempool or the precise timing when we received a transaction.

  • PR 18861 improved transaction-origin privacy. The idea is that if we haven’t yet announced a transaction to a peer, we shouldn’t fulfill any GETDATA requests for that transaction from that peer. The implementation for that PR checks the list of transactions we are about to announce to the peer (setInventoryTxToSend), and if it finds the transaction that the peer has requested, then responds with a NOTFOUND instead of with the transaction. While this helps in many cases, it is an imperfect heuristic. Think about why that is (see question 3 below).

  • This week’s PR further reduces the possible attack surface. It introduces a per-peer rolling bloom filter (m_recently_announced_invs) to track which transactions were recently announced to the peer. When the peer requests a transaction, we check the filter before fulfilling the request and relaying the transaction.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? You’re always encouraged to put your PR review on GitHub, even after it has been merged.

  2. What are possible techniques for an adversary to probe a node for a particular mempool transaction? What changes with this PR?

  3. What were drawbacks of using setInventoryTxToSend as a heuristic?

  4. What is the UNCONDITIONAL_RELAY_DELAY? How is this delay constant being used when deciding whether or not to fulfill a request?

  5. What are some relevant considerations when reviewing/implementing a bloom filter? What choices does this patch make?

  6. After these changes, can you think of other ways a spy could probe for a mempool transaction?

Meeting Log

  117:00 <amiti> #startmeeting
  217:00 <amiti> hi
  317:00 <nehan> hi
  417:00 <benthecarman> hi
  517:00 <troygiorshev> hi
  617:00 <willcl_ark> hi
  717:00 <andrewtoth> hi
  817:00 <michaelfolkson> hi
  917:00 <jnewbery> hi
 1017:00 <amiti> hi everyone! welcome to this weeks review club. today, we'll chat about transaction privacy on the network
 1117:01 <overall> hi
 1217:01 <amiti> notes and questions in the normal place: https://bitcoincore.reviews/19109.html
 1317:01 <amiti> who has gotten a chance to look at the proposed changes?
 1417:01 <amiti> (y/n)
 1517:01 <jnewbery> y
 1617:02 <michaelfolkson> y
 1717:02 <troygiorshev> y
 1817:02 <nehan> y
 1917:02 <willcl_ark> n (sorry!)
 2017:02 <benthecarman> y
 2117:03 <amiti> cool! let's get started with the questions
 2217:03 <amiti> What are possible techniques for an adversary to probe a node for a particular mempool transaction? What changes with this PR?
 2317:04 <andrewtoth> is sending a GETDATA the only real way?
 2417:04 <michaelfolkson> So this is Gleb's comment. Attacker connects to us right after we put the transaction in MapRelay
 2517:05 <amiti> andrewtoth: the only real way to solicit a TX response?
 2617:05 <andrewtoth> to try and probe a node for a particular mempool transaction?
 2717:05 <felixweis_k> you could give it a transaction that spends a certain input
 2817:06 <andrewtoth> ahh felixweis_k interesting
 2917:06 <felixweis_k> if it requests the input tx, it didn't have it
 3017:06 <amiti> michaelfolkson: yes, could you break that down further? we could also go into this more in the next question
 3117:07 <amiti> felixweis_k: yeah! using children to observe request patterns can help a spy deduce tx presence
 3217:07 <michaelfolkson> So my (limited) understanding is that the previous PR fixed a lot of the premature announcement logic. And so now we are working out what still needs to be fixed
 3317:08 <michaelfolkson> #18861 (previous PR, now merged)
 3417:08 <amiti> alright, lets jump into the next question and discuss the previous PR (18861) and how this PR (19109) addresses a leak
 3517:08 <amiti> from the notes: PR 18861 improved transaction-origin privacy. The idea is that if we haven’t yet announced a transaction to a peer, we shouldn’t fulfill any GETDATA requests for that transaction from that peer. The implementation for that PR checks the list of transactions we are about to announce to the peer (setInventoryTxToSend), and if it
 3617:08 <amiti> finds the transaction that the peer has requested, then responds with a NOTFOUND instead of with the transaction. While this helps in many cases, it is an imperfect heuristic.
 3717:08 <amiti> What were drawbacks of using setInventoryTxToSend as a heuristic?
 3817:08 <michaelfolkson> Sorry I think I misunderstood the first question :)
 3917:09 <overall> michaelfolkson premature or non-privacy preserving announcement logic?
 4017:09 <amiti> michaelfolkson: nah, the two questions overlap.
 4117:10 <jnewbery> sipa's original PR description answers this question...
 4217:11 <michaelfolkson> overall: Premature is non-privacy preserving. I don't understand distinction...
 4317:12 <jnewbery> so setInventoryTxToSend is a set of transactions that we haven't yet announced to that peer
 4417:12 <troygiorshev> The issue in sipa's original PR description is a "false negative" sort of problem, right? i.e. we'll tell a node that we don't have a transaction even when they know that we actually do?
 4517:12 <amiti> troygiorshev: that's one of the issues, pointed out in the convo with luke-jr & sipa
 4617:12 <amiti> here: https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419695831
 4717:12 <overall> michaelfolkson just checking if there was another metric, besides privacy, that wasn't needed to be managed
 4817:13 <amiti> but from my understanding, it seems pretty unlikely
 4917:13 <jnewbery> we'll only add txids to a peer's setInventoryTxToSend if we're connected to that peer when we relay the transaction
 5017:13 <sipa> overall: well, memory usage - we could also just keep a list of all things we've ever announced to a peer, and only allow them to query exactly that... but that would easily run into memory problems
 5117:14 <jnewbery> so using the old heuristic, if a node connects to us and immediately asks for a transaction, it won't be in the setInventoryTxToSend, and we'll happily send the transaction
 5217:14 <michaelfolkson> overall: Yeah I think it is just privacy. Apart from privacy, announcing prematurely doesn't matter. Unless you literally don't want to be announcing it at all (not a concern)
 5317:15 <amiti> jnewbery: exactly. does this make sense to people?
 5417:15 <andrewtoth> jnewbery: makes sense thanks!
 5517:15 <sipa> michaelfolkson: it's not announcing; it is allowing requests
 5617:15 <troygiorshev> jnewbery amiti: it does now :)
 5717:15 <sipa> michaelfolkson: this is about not permitting requests of things that weren't announced
 5817:16 <sipa> announcement itself has a bunch of privacy protections already, since a few years ago, and they're pretty good
 5917:16 <sipa> this is making sure that it can't be bypassed by tx requests
 6017:17 <andrewtoth> still a little unclear on the second point though. how does using setInventoryTxToSend prevent some locally resubmitted invs from being requested?
 6117:19 <amiti> andrewtoth: the idea is if we are using setInventoryTxToSend to prevent fulfilling a txn request, if we put a txn on there for a _second_ time, then it would be valid for the peer to request (since we prev announced), but we would deny them the txn
 6217:19 <michaelfolkson> sipa: Not permitting the providing of what they request right? They are free to request whatever they want. Just don't want to give it to them
 6317:19 <andrewtoth> amiti: thanks!
 6417:19 <sipa> michaelfolkson: exactly
 6517:20 <sipa> andrewtoth: it's a pretty contrived scenario, see https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419707909
 6617:20 <felixweis_k> whats a mental model to aproximate what consitutes an acceptable amount of per peer memory?
 6717:21 <luke-jr> amiti: it seems like it could result in efforts to relay a tx, making it harder to relay
 6817:21 <amiti> sipa: my understanding of your comment there was that you were identifying the precise steps / timing that would have to happen to not serve the txn, is that correct?
 6917:21 <amiti> (aka not as simple as resubmit via sendrawtransaction)
 7017:22 <sipa> amiti: yeah, because it also requires the old submission to have been expired from filterAlreadyKnown
 7117:22 <amiti> luke-jr: I don't follow
 7217:22 <andrewtoth> sipa: thanks. both these cases seem pretty unlikely, right? but probably better to not have any edge cases
 7317:22 <sipa> andrewtoth: probability is the wrong metric when thinking about attacks
 7417:23 <sipa> andrewtoth: the question is how costly it is for an attacker to make it happen - they won't let it to chance
 7517:23 <jnewbery> felixweiss_k: that's a really good question! I don't know if there are any hard numbers
 7617:23 <luke-jr> amiti: unless I'm confusing my own comment (it's been a while), it seems sendrawtransaction (ie, the user explicitly wants to relay it) could end with it not being relayed even if it would have otherwise been?
 7717:23 <andrewtoth> interesting
 7817:23 <amiti> sipa: 👍🏽, and you were calculating the most feasible / quick way to expire (using mempool requests)
 7917:24 <amiti> luke-jr: so, you're talking about your comment here- https://github.com/bitcoin/bitcoin/pull/18861#discussion_r419695831?
 8017:24 <sipa> andrewtoth: an analogy i like: a civilian engineer designs building resistant to natural causes (earthquakes, ...), and needs probability. a military engineer designs structures that are designed to protect against attacks; they don't go compute the probability that a missile will randomly hit it; they care about how hard it is for an attack that make that happen
 8117:25 <amiti> ok, so these are some of the identified drawbacks of using `setInventoryTxToSend` as a heuristic. what does PR 19109 implement as an alternative solution?
 8217:25 <andrewtoth> sipa: :)
 8317:25 <michaelfolkson> But probability still informs what should be prioritized to fix!
 8417:25 <sipa> michaelfolkson: how so?
 8517:26 <sipa> we care about probabilities of things _randomly_ going wrong absent attack scenarios of course
 8617:26 <michaelfolkson> Well if there are two possible attacks and one attack is more likely than the other, fix that one first
 8717:26 <sipa> michaelfolkson: no, you fix the cheapest to perform first
 8817:26 <sipa> attacks are not random
 8917:27 <michaelfolkson> I guess I am seeing the cost of the attack as informing how likely that attack is to happen
 9017:27 <amiti> felixweis_k: good question about acceptable amount of memory / peer, something I also wonder
 9117:27 <luke-jr> andrewtoth: if an attack is possible, it increases the possibility
 9217:28 <sipa> at a very level view, you could see it that way
 9317:28 <sipa> but it's misleading imo
 9417:29 <andrewtoth> makes sense
 9517:29 <troygiorshev> sipa: we would want to prioritize on the "product" of the "ease" of the attack and the "damage" of the attack, no?
 9617:29 <jnewbery> felixweiss_k: without having specific numbers, I think a good rule is that keeping sets of items per-peer isn't scalable, but keeping a probabilistic filter is
 9717:30 <sipa> i think we'd like to have some sort of upper bound of memory usage per peer
 9817:30 <jnewbery> and in many cases, a probabilistic filter is good enough for the specific function
 9917:30 <sipa> or, alternatively, actual acconting of memory, and a way to deprioritize/disconnect peer if we're using to much memory on behalf of a peer, and it gets tight
10017:31 <jnewbery> sipa: I think deprioritizing based on memory usage would be difficult. How do you shed the memory? Deprioriritizing based on CPU/bandwidth usage seems much easier
10117:32 <sipa> jnewbery: agree
10217:32 <amiti> lets jump into this question: What are some relevant considerations when reviewing/implementing a bloom filter? What choices does this patch make?
10317:32 <amiti> since we are talking about one of the considerations- memory usage
10417:32 <jnewbery> (not saying they're exclusive - just that the latter is much more straightforward)
10517:34 <a__jonas> amiti: Generally things you'd want to consider are the are number of items to keep track of and the acceptable false-positive rate
10617:34 <amiti> a__jonas: yes!
10717:35 <amiti> did you catch what values were chosen in this PR ?
10817:35 <jnewbery> how are those parameters chosen?
10917:35 <a__jonas> (3500 and 0.000001)
11017:35 <troygiorshev> amiti: False positive rate is 0.000001
11117:36 <amiti> a__jonas, troygiorshev: yup!
11217:36 <andrewtoth> so number of items increases memory, false positive rate increases CPU time?
11317:36 <amiti> let's dig into jnewbery's question- how were these numbers chosen? so as reviewers, we can evaluate that it makes sense
11417:37 <sipa> andrewtoth: better false positive rate also increases memory usage
11517:37 <sipa> a rolling bloom filter has around -log2(fprate)*3/log(2) bits usage per tracked item
11617:39 <jnewbery> 'rolling bloom filter' is perhaps a bit of a misleading name because you might expect that you can remove items
11717:39 <amiti> if anyone is interested, I found this site useful to wrap my head around how bloom filters work: https://llimllib.github.io/bloomfilter-tutorial/
11817:39 <jnewbery> in fact, it's implemented as 3 separate bloom filters, each with capacity 1/2 of the desired capacity
11917:39 <andrewtoth> amiti: cool site!
12017:40 <amiti> (props to aj for sharing with me)
12117:40 <felixweis_k> thanks, amiti!
12217:40 <sipa> jnewbery: in general you can't delete things from bloom filters
12317:40 <sipa> there are generalizations that can, but they need more memory
12417:40 <jnewbery> and we roll through the different filters (ie put items into the first filter until it's at capacity, then the second, then the third, then delete the first filter and start filling it again)
12517:40 <felixweis_k> yeah rolling bloom filters are cool i've been thinking recently about something like that. didn't know we had them in core
12617:40 <sipa> jnewbery: exactly
12717:41 <jnewbery> so that's where the *3 in sipa's memory usage comes from
12817:41 <jnewbery> because we actually have 3 bloom filters
12917:41 <jnewbery> (I think)
13017:41 <sipa> correct
13117:41 <sipa> though it's not the entire picture
13217:42 <amiti> jnewbery: oh cool, I was wondering how it worked but didn't get the chance to dig in yet. thanks!
13317:42 <sipa> we also get a factor 2 gain from having 2 active generations (each of the 3 generations is only half the window size)
13417:42 <felixweis_k> when you add, you add to the latest 2, when you check you check for OR in any of the 3? then a counter as to when advance the "ring buffer"?
13517:42 <sipa> but also a factor 2 again because we need 2 bits per entry (which can store a number from 0 to 3... 0 for unused; 1-3 for each of the 3 generations)
13617:42 <felixweis_k> sorry im just going to read the code
13717:43 <jnewbery> felixweiss_k: you only add to the latest generation I believe. And yes, there's a counter to determine which generation we should be adding to
13817:43 <michaelfolkson> What is AJ doing re running the patch to find premature requests?
13917:44 <michaelfolkson> This is to explore when premature requests still happen when someone is running Core and is not malicious?
14017:44 <jnewbery> felixweis_k: oops, I've been inserting and extra s in your name!
14117:45 <sipa> michaelfolkson: indeed
14217:45 <sipa> to see if our logic isn't too strict in what it permits
14317:46 <amiti> 15 minutes left
14417:46 <michaelfolkson> You want to allow a certain number of premature requests?! Don't get it
14517:46 <sipa> michaelfolkson: no?
14617:46 <michaelfolkson> Surely better if there are zero premature requests unless doing testing
14717:46 <sipa> michaelfolkson: yes
14817:47 <sipa> i'm confused by your conclusion :)
14917:47 <felixweis_k> no worries, i've written new berry probably too in the past ;-)
15017:47 <michaelfolkson> The "logic isn't too strict" comment. Surely we want extreme strictness in this case (unless doing testing)
15117:47 <sipa> michaelfolkson: but it's an approximation
15217:48 <troygiorshev> michaelfolkson: we're seeing false positives, right?
15317:48 <sipa> we don't know exactly what all legitimate (and safe) behaviors are
15417:48 <sipa> so we want to know if this policy we have implemented isn't going to accidentally affect legitimate use
15517:49 <sipa> because we might have forgotten some protocol interaction where someone ends up legitimately requesting something that the filter approach disallows
15617:49 <sipa> aj's testing is to figure this by running it in the wild, and seeing if there are legitimately-looking requests that get blocked by it
15717:49 <jnewbery> I have a question on my new favorite subject in Bitcoin Core (locking), inspired by neha's question in the previous PR: https://github.com/bitcoin/bitcoin/pull/18861#discussion_r451666992
15817:50 <michaelfolkson> Ok makes sense, thanks. So some premature requests are legitimate. e.g. the example AJ found.
15917:50 <sipa> michaelfolkson: well, they're not premature - we're just accidentally marking them as such
16017:50 <jnewbery> is this new data structure threadsafe? If so, what synchronization method is used to guard it? Do you think it's in the right place?
16117:51 <a__jonas> to answer your question amiti, I think these numbers could be adjusted (as AJ maths and Sipa's response suggests)
16217:52 <overall> jnewbery nice question :)
16317:52 <a__jonas> It looks like caching was a consideration (as noted in the comment on the assert on line 134)
16417:52 <andrewtoth> jnewbery: looks like it's protected by cs_main
16517:52 <jnewbery> overall: credit goes to neha for asking in the previous PR :)
16617:53 <jnewbery> andrewtoth: yes, can you explain why/
16717:53 <overall> nehan +1
16817:53 <sipa> a__jonas: i don't think i've commented at all on those numbers or how they could be changed
16917:54 <andrewtoth> i need to look closer at the code here :/
17017:55 <amiti> a__jonas: are you talking about the convo early on in the PR?
17117:55 <troygiorshev> sipa: a__jonas: this? https://github.com/bitcoin/bitcoin/pull/19109#issuecomment-636712875
17217:55 <nehan> jnewbery: +1. Also, I don't see how cs_vRecv is taken out properly in the previous PR, but maybe that could be answered more quickly here.
17317:55 <amiti> ah, I think that convo lead to introducing `UNCONDITIONAL_RELAY_DELAY`
17417:55 <sipa> or maybe i'm just wrong! and clang didn't detect it
17517:56 <amiti> lets chat about it in the last few minutes here... How is this delay constant being used when deciding whether or not to fulfill a request?
17617:56 <jnewbery> andrewtoth: the answer is that it m_recently_announced_invs is added to CNode, which is guarded by cs_main.
17717:56 <sipa> nehan: i believe i'm wrong!
17817:56 <nehan> sipa: where does net lock cs_vRecv?
17917:56 <nehan> sipa: yeah ok good i'm not going crazy :)
18017:56 <a__jonas> troygiorshev, yes - that and AJ's previous comment
18117:56 <jnewbery> sipa: wrong about what?
18217:56 <sipa> jnewbery: about claiming that net grabs cs_vRecv
18317:57 <nehan> jnewbery: vRecvGetData isn't protected by cs_vRecv. But as you both pointed out it's only accessed in one thread. That seems worth documenting to me though.
18417:57 <overall> 3 minutes remaining
18517:58 <sipa> same with orphan_work_set
18617:58 <jnewbery> nehan: definitely document it at the very least
18717:58 <sipa> agree
18817:58 <nehan> ok
18917:58 <sipa> or fix it
19017:59 <jnewbery> vRecvGetData and orphan_work_set are similar - they're both work queues for an individual peer. As such they're only pushed to/popped from in ProcessMessages
19117:59 <nehan> sipa: ok!
19217:59 <amiti> a__jonas, troygiorshev: that conversation calculated the number of transactions in 15 minutes, because after 15 minutes is the time set by mapRelay. after that review, sipa introduced `UNCONDITIONAL_RELAY_DELAY` as 2 mins. so now, if a peer requests a txn thats >2 minutes old, we will serve
19317:59 <jnewbery> but agree it should be fixed!
19417:59 <sipa> net_processing.cpp:1653:43: warning: reading variable 'vRecvGetData' requires holding mutex 'pfrom.cs_vRecv' [-Wthread-safety-analysis] std::deque<CInv>::iterator it = pfrom.vRecvGetData.begin();
19518:00 <sipa> i recompiled and just missed the warning, silly me
19618:00 <amiti> but if the txn is < 2 minutes old, it must be in the `m_recently_announced_invs` filter
19718:00 <amiti> okay! it looks like thats time
19818:00 <sipa> and the filter is sized such that it can store as many transactions as we can possibly announce in 2 minutes
19918:01 <amiti> thanks for coming all :)
20018:01 <troygiorshev> thanks amiti!
20118:01 <felixweis_k> thanks all, and amiti for hosting! very informative as always.
20218:01 <sipa> thanks!
20318:01 <andrewtoth> thanks amiti! and everyone else for the discussion
20418:01 <overall> thanks amiti!
20518:01 <nehan> thanks amiti!
20618:01 <jnewbery> thanks amiti. Great discussion!
20718:02 <a__jonas> Thanks amiti
20818:02 <amiti> #endmeeting
20918:02 <michaelfolkson> Thanks amiti! New IRC nick?
21018:03 <sipa> jnewbery: so, you're right - the memory usage of the rolling bloom filter (which i'll completely unambiguously abbreviate to RBF) is indeed exactly equal to having 3 completely separate bloom filters
21118:03 <sipa> and even operationally it's very similar to that
21218:03 <amiti> hah, my normal irc client is down (I use irccloud) , hopefully it will be back soon
21318:03 <felixweis_k> using irccloud too usually. lol
21418:04 <troygiorshev> sipa: unambiguously :D
21518:04 * sipa suggests ssh + vps + screen + irssi :)
21618:04 <felixweis_k> maybe someone should create a decentralized IRC client
21718:04 <overall> lol
21818:04 <felixweis_k> (im kidding)
21918:04 <sipa> jnewbery: the difference is that you'd need to access 3x as many memory locations
22018:04 <overall> on a time chain
22118:05 <jnewbery> sipa: so a 3x constant factor in checking for membership
22218:05 <overall> sipa and connect irssi to freenet over tor? the sasl registration over clearnet bugs me :(
22318:06 <sipa> overall: indeed
22418:06 <a__jonas> thanks sipa
22518:07 <overall> I understand that abuse / rate limiting demands that require sasl but still wish there was another way to handle it (surety bond)
22618:07 <sipa> jnewbery: with a rolling cuckoo filter you only need 28ish bits per elememt (at 20 bits fprate), instead of 86ish
22718:08 <jnewbery> if anyone wants to dig deeper on probabilistic filters, sipa left this rather tantalizing comment about _cuckoo_ filters on a different PR: https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652685715
22818:08 <jnewbery> paper here: https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf
22918:09 <troygiorshev> jnewbery: thx
23018:09 <sipa> happy to do a review session about it, if/when it's done :)
23118:09 <jnewbery> sipa: that'd be great!
23218:09 <amiti> sipa: +1
23318:15 <jnewbery> sipa: I haven't looked at the implementation recently, but do the 3 bloom filters use the same hash functions? If so, my comment ("3x constant factor in checking for membership") isn't correct because you'd only do the hashing once
23418:15 <sipa> jnewbery: they dk
23518:15 <sipa> they do
23618:16 <jnewbery> ok, so checking 3x memory locations but that doesn't equate to 3x lookup times
23718:16 <sipa> right; number of hashes would be identical across them
23818:17 <sipa> in practice i expect the filter data to not be hot in the cache, so memory accesses dominate
23918:17 <sipa> (which is annoyingly hard to measure in microbenchmarks)