Prevent block index fingerprinting by sending additional getheaders messages (
p2p) Apr 6, 2022
The PR branch HEAD was 4e415067 at the time of this review club meeting.
Attackers may use
to recognize the same node across different connections. This makes it
possible to test if two addresses belong to the same node, which we generally
try to avoid especially for addresses belonging to privacy-centric networks
such as Tor. Some fingerprinting attacks work across restarts of the victim’s
node, making it possible to detect if a node changes addresses.
A variety of fingerprinting techniques have been patched or mitigated.
For example, attackers could have used requests for old non-main-chain
depth to fingerprint nodes.
The fingerprintable behaviour that
#24571 addresses occurs when
a node receives headers from a peer (handled in
If the received headers don’t connect to any header in the node’s block
index, then it will request additional headers via a
getheaders message in
an attempt to connect the chain.
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
What is the block index and what is it used for? (Hint: look at the usage of
Why and how can the block index be used for fingerprinting? (Hint: it has to
do with stale blocks/headers)
Why do we keep stale blocks in the block index?
In your own words, how does the fingerprinting technique outlined in the PR
Does the fingerprinting technique outlined in the PR work across restarts of
the target node?
introduces a new parameter to
PeerManagerImpl::BlockRequestAllowed. Why is
1 17:00 <dergoegge> #startmeeting
2 17:00 <dergoegge> Hi everyone, welcome to this week's PR review club!
7 17:00 <dergoegge> Feel free to say hi to let people know you are here
8 17:00 <dergoegge> Anyone here for the first time?
10 17:02 <dergoegge> This week we are looking at #24571 “Prevent block index fingerprinting by sending additional getheaders messages”
12 17:02 <dergoegge> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
13 17:03 <lightlike> yes, concept ACK
14 17:03 <dunxen> Yes, light review. Approach ACK
16 17:03 <b10c> I started reviewing. Concept and Approach ACK
17 17:04 <dergoegge> Ok cool! first question: What is the block index and what is it used for?
18 17:04 <ccdle12> approach ACK
19 17:04 <b10c> Haven't tested anything yet, but have a few ideas to test against my (still private) signet with reorgs. Have a lot of stale blocks there
20 17:04 <dergoegge> b10c: cool idea!
21 17:06 <lightlike> An in-memory index of the blockchain which contains the headers plus info where to find the rest of the block data on disk.
22 17:07 <b10c> and it's kinda a blocktree
23 17:07 <b10c> multiple branches (chains)
24 17:07 <b10c> allows us to determine the longest branch/chain
25 17:07 <b10c> and switching between branches if one becomes _longer_ (more work)
26 17:08 <dergoegge> lightlike b10c: correct!
27 17:09 <dergoegge> the fact that it is a tree is very important this PR
28 17:09 <dergoegge> which brings us to the next question: Why and how can the block index be used for fingerprinting?
29 17:10 <lightlike> when do we accept multiple branches normally? only when we are witnessing a reorg as it happens? or also after the fact?
30 17:11 <dergoegge> lightlike: afaict we also accept any headers into the index that come after the last checkpoint and have enough work
31 17:12 <b10c> I might have a branch in that tree that not many others have. If you can find out that I have this branch, this leaks information. Not sure if I can pinpoint you with that information, but can get harmful when combined with more information
32 17:13 <b10c> dergoegge: that's my understanding too
33 17:13 <b10c> we don't download all blocks though
34 17:13 <lightlike> dergoegge: I mean, if I do an IBD now, will I accept any historical non-best-chain headers in my blockchain index? Or would I only get those if I am online when there are conflicting blocks for my tip, and don't know which will stay in the main chain?
35 17:13 <dergoegge> b10c: exactly you might have seen a header/block that for example after a reorg is no longer part of the main chain and has thus become stale
36 17:14 <b10c> lightlike: see the RPC docs for getchaintips too
37 17:14 <dergoegge> the exact number of stale blocks any specific node has seen will be unique to that node depending on where in the network the node sits
38 17:15 <b10c> dergoegge: right, from my experience older nodes have a lot of entries when calling the getchaintips RPC
39 17:15 <dergoegge> lightlike: during IBD you only request and download the blocks of the headers you got during initial header sync
40 17:16 <b10c> so you could probably find out how long that node has been running for (with ~months of accuracy)
41 17:16 <larryruane> If my node knows about a block that it thinks is stale, doesn't it forward to all its peers? If so, don't all knows end up knowing about all the same stale blocks?
42 17:18 <b10c> it relays headers IIRC, but you don't request stale blocks
43 17:18 <sipa> only blocks we believe are part of the best chain are relayed
44 17:19 <lightlike> so in order to accept a stale block header, we must have believed it was in our best chain at the time of acceptance (and then changed our opinion/reorged)?
45 17:20 <b10c> specify "accept". do you mean in order to add it to our block index?
47 17:21 <dergoegge> lightlike: if someone send you a valid header with enough work on it then you will store it in the index
48 17:21 <dergoegge> it does not need to extend the tip
49 17:21 <b10c> lightlike: then no, we accept stale headers too
50 17:21 <dergoegge> this would happen during a large reorg for example
51 17:22 <lightlike> dergoegge: "enough work" = "more work than our current tip"?
52 17:22 <dergoegge> no: "enough work" = "more work than the block it is extending"
53 17:23 <lightlike> ok, thanks
54 17:24 <dergoegge> so we have established that a node's block index is unique based on the fact that it can contain stale blocks that other nodes do not have
55 17:25 <dergoegge> if a peer can probe for stale blocks in the node's index then it can use that information to fingerprint the node
56 17:25 <larryruane> just to be sure, a node never drops blocks no matter how stale it thinks it is?
57 17:26 <b10c> larryruane: I don't think it does
58 17:26 <dergoegge> larryruane: do you mean after it has already accepted it into the index?
59 17:26 <dergoegge> or when receiving a new header?
60 17:31 <larryruane> i meant after being accepted into the index .. thanks
61 17:31 <dergoegge> afaict we dont prune stale headers/blocks from the index.
62 17:31 <dergoegge> which is also what the next question is about
63 17:31 <dergoegge> Why do we keep stale blocks in the block index?
64 17:32 <larryruane> (i think this is why once you have a valid `pindex` variable (getting that requires `cs_main`), you can use it without any lock
65 17:33 <dergoegge> i am actually not sure why we keep old stale headers/block around in the index
67 17:34 <lightlike> at least from the saved block data, not from the index though
68 17:35 <dergoegge> sipa: do you know why we keep old stale blocks/headers in the index?
69 17:35 <dergoegge> lightlike: yea i think we delete them from the disk but not from the index
70 17:35 <b10c> we keep (recent) stale blocks to be able to reorg to that chain if it becomes _longer_
71 17:36 <dergoegge> b10c: that makes sense, but do we need say a year old stale header?
72 17:37 <dergoegge> maybe pruning the block index from old stale headers could prevent this class of fingerprinting bug entirely
73 17:38 <b10c> yeah I'm not sure either, that's why I added the (recent) :)
74 17:39 <dergoegge> ok well we will leave this as an open question and move on...
75 17:39 <dergoegge> In your own words, how does the fingerprinting technique outlined in the PR work?
76 17:41 <b10c> we extend a stale branch with header H+1, send H+1 to a node and see if it requests header H. If it does, it doesn't know about the stale branch. If it doesn't, it knows about the stale branch
77 17:41 <lightlike> Have a list of existing stale blocks that our peer might have or not, create bogus headers building on them, and send them to a peer to check and record for which of the headers we get a GETHEADERS in return.
78 17:41 <b10c> The PR says H+1 doesn't need to have a valid PoW, so this is very cheap for us to do
79 17:42 <dergoegge> b10c lightlike: exactly right, i think if you use multiple headers like lightlike suggested then the accuracy of the attack increases
80 17:43 <larryruane> but is the whole idea of this attack to do this procedure to two different peer network addresses, to try to link them to the same machine?
81 17:43 <lightlike> i think the peer wouldn't request header H, but just send a locator with their current tip in response if it didn't know H
82 17:43 <dergoegge> b10c: using invalid PoW headers actually makes things easier for the attacker since the node will disconnect if it knows the stale block
83 17:44 <b10c> dergoegge: oh didn't know!
84 17:44 <dergoegge> larryruane: right, the attacker would know of two addresses say one IPv4 and one Tor and could then, using this technique, figure out if the addresses belong to the same node
85 17:45 <larryruane> thanks.. would it be much work to just check the PoW on the header to see if it's sufficient? (to make it harder on the attacker)
86 17:45 <dergoegge> lightlike: yea that sounds right
87 17:46 <dergoegge> larryruane: yes that would probably also work, but as you said would only make it harder not impossible
88 17:46 <lightlike> nodes would probably need to have a decent number of stale blocks in their index to make it possible to have a unique fingerprint.
89 17:46 <b10c> is this something you've actually tried and written code for, dergoegge?
90 17:46 <larryruane> theStack asks, is it common the run two different network connections from a single node?
91 17:48 <dergoegge> larryruane: i don't know. dont't have any statistics on that
92 17:48 <larryruane> lightlike: currently my node knows of 5 stale blocks (using `getchaintips`)
93 17:48 <larryruane> but I'm not gonna tell you which ones :)
94 17:48 <b10c> larrayruane: I think it is. e.g. IPv4 and IPv6. Obviously some are Tor only
96 17:49 <dergoegge> and used a different technique to check if all those tor addresses belong to the same node (which appears to not be the case)
97 17:50 <dergoegge> yea you can not deanonymize a Tor only node with this
98 17:50 <dergoegge> Maybe you can if they switch back to IPv4 but thats a stretch
99 17:52 <lightlike> if we accept any stale headers that extend the work of their predecessor to our index (as was discussed before) couldn't we just send our victim one of these headers, and then probe again, making the fingerprinting possible even if our victim doesn't have any stale headers at the beginning?
100 17:53 <dergoegge> yea if you have a good collection of past stale headers or are able to mine new ones then you might be able to mark nodes with specific headers you sent to them
101 17:53 <b10c> makes the attack a lot more expensive though
102 17:54 <dergoegge> yes if you have to mine new ones
103 17:54 <ccdle12> the pow would eventually have to be below the stale relay age limit
104 17:54 <lightlike> yes, but it must be easy to get a list of historical ones?
105 17:54 <dergoegge> i wonder if anyone has a collection of *all* blocks that were ever created stale or not
106 17:56 <b10c> dergoegge: I'd guess many of the new tor nodes are RPi's with RaspiBlitz or similar. Many of the home nodes are Tor-only
107 17:56 <dergoegge> ccdle12: afaik the relay age limit prevents a peer from downloading a block that is older than the limit
108 17:57 <lightlike> Since it's almost time: I'd be interested in the answer to the last question, why the "allow_potentially_invalid_headers" parameter is necessary.
109 17:57 <dergoegge> ccdle12: which was also a fingerprint bug at some point, i think i linked that PR in the notes
110 17:57 <dergoegge> b10c: can be most of them have NODE_BLOOM set
111 17:58 <b10c> lightlike: +1, wasn't clear to me during my initial review round too
112 17:58 <dergoegge> lightlike: lets get to that then
113 17:59 <dergoegge> i introduced that because one of the p2p test was failing, let me grab a link real quick
115 18:01 <dergoegge> crap i cant find the line
116 18:01 <dergoegge> #endmeeting
117 18:01 <dergoegge> sorry i answer this afterwards
118 18:01 <dergoegge> thanks everyone for coming!
119 18:02 <lightlike> Thanks dergoegge!
120 18:02 <dergoegge> i should have prepared an answer for the last one :D
121 18:02 <ccdle12> thanks dergoegge!
122 18:02 <b10c> maybe add a bit more details to the commit introducing it too, I was looking there and didn't find it
123 18:02 <dergoegge> b10c: +1
124 18:02 <b10c> thanks dergoegge! this was super interesting
125 18:03 <larryruane> thanks this was great!!
126 18:04 <lightlike> dergoegge: maybe also add the reason for this to the PR description or code, wherever it fits better (not that the test failed, but the root cause why it's necessary). It wasn't clear to me when reviewing.
127 18:05 <lightlike> oh, b10c said the same :)
129 18:10 <dergoegge> This extends a recent non main chain branch with a header and expects the node to send a getdata for the block
130 18:10 <dergoegge> but with the new logic the node would ignore the header because it extended a stale branch
131 18:10 <dergoegge> specifically "pindex->IsValid(BLOCK_VALID_SCRIPTS)" in PeerManagerImpl::BlockRequestAllowed always returns false for headers, so we need an exception for that if we are deciding if we should leak the info about a header
132 18:10 <dergoegge> but i will add this as a comment