Prevent block index fingerprinting by sending additional getheaders messages (p2p)

Apr 6, 2022

https://github.com/bitcoin/bitcoin/pull/24571

Host: dergoegge - PR author: dergoegge

The PR branch HEAD was 4e415067 at the time of this review club meeting.

Notes

Attackers may use fingerprinting techniques to recognize the same node across different connections. This makes it possible to test if two addresses belong to the same node, which we generally try to avoid especially for addresses belonging to privacy-centric networks such as Tor. Some fingerprinting attacks work across restarts of the victim’s node, making it possible to detect if a node changes addresses.
A variety of fingerprinting techniques have been patched or mitigated. For example, attackers could have used requests for old non-main-chain headers/blocks (#5820, #8408, #11113), addr message timestamps and prune depth to fingerprint nodes.
The fingerprintable behaviour that PR #24571 addresses occurs when a node receives headers from a peer (handled in PeerManagerImpl::ProcessHeadersMessage). If the received headers don’t connect to any header in the node’s block index, then it will request additional headers via a getheaders message in an attempt to connect the chain.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
What is the block index and what is it used for? (Hint: look at the usage of m_block_index)
Why and how can the block index be used for fingerprinting? (Hint: it has to do with stale blocks/headers)
Why do we keep stale blocks in the block index?
In your own words, how does the fingerprinting technique outlined in the PR work?
Does the fingerprinting technique outlined in the PR work across restarts of the target node?
This commit introduces a new parameter to PeerManagerImpl::BlockRequestAllowed. Why is that necessary?

Meeting Log

17:00

<dergoegge> #startmeeting

17:00

<dergoegge> Hi everyone, welcome to this week's PR review club!

17:00

<dunxen> hi!

17:00

<b10c> hi

17:00

<ccdle12> hi

17:00

<justin> hey

17:00

<dergoegge> Feel free to say hi to let people know you are here

17:00

<dergoegge> Anyone here for the first time?

17:01

<lightlike> hi

17:02

<dergoegge> This week we are looking at #24571 “Prevent block index fingerprinting by sending additional getheaders messages”

17:02

<dergoegge> Notes are in the usual place: https://bitcoincore.reviews/24571

17:02

<dergoegge> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

17:03

<lightlike> yes, concept ACK

17:03

<dunxen> Yes, light review. Approach ACK

17:03

<larryruane> hi

17:03

<b10c> I started reviewing. Concept and Approach ACK

17:04

<dergoegge> Ok cool! first question: What is the block index and what is it used for?

17:04

<ccdle12> approach ACK

17:04

<b10c> Haven't tested anything yet, but have a few ideas to test against my (still private) signet with reorgs. Have a lot of stale blocks there

17:04

<dergoegge> b10c: cool idea!

17:06

<lightlike> An in-memory index of the blockchain which contains the headers plus info where to find the rest of the block data on disk.

17:07

<b10c> and it's kinda a blocktree

17:07

<b10c> multiple branches (chains)

17:07

<b10c> allows us to determine the longest branch/chain

17:07

<b10c> and switching between branches if one becomes _longer_ (more work)

17:08

<dergoegge> lightlike b10c: correct!

17:09

<dergoegge> the fact that it is a tree is very important this PR

17:09

<dergoegge> which brings us to the next question: Why and how can the block index be used for fingerprinting?

17:10

<lightlike> when do we accept multiple branches normally? only when we are witnessing a reorg as it happens? or also after the fact?

17:11

<dergoegge> lightlike: afaict we also accept any headers into the index that come after the last checkpoint and have enough work

17:12

<b10c> I might have a branch in that tree that not many others have. If you can find out that I have this branch, this leaks information. Not sure if I can pinpoint you with that information, but can get harmful when combined with more information

17:13

<b10c> dergoegge: that's my understanding too

17:13

<b10c> we don't download all blocks though

17:13

<lightlike> dergoegge: I mean, if I do an IBD now, will I accept any historical non-best-chain headers in my blockchain index? Or would I only get those if I am online when there are conflicting blocks for my tip, and don't know which will stay in the main chain?

17:13

<dergoegge> b10c: exactly you might have seen a header/block that for example after a reorg is no longer part of the main chain and has thus become stale

17:14

<b10c> lightlike: see the RPC docs for getchaintips too

17:14

<dergoegge> the exact number of stale blocks any specific node has seen will be unique to that node depending on where in the network the node sits

17:15

<b10c> dergoegge: right, from my experience older nodes have a lot of entries when calling the getchaintips RPC

17:15

<dergoegge> lightlike: during IBD you only request and download the blocks of the headers you got during initial header sync

17:16

<b10c> so you could probably find out how long that node has been running for (with ~months of accuracy)

17:16

<larryruane> If my node knows about a block that it thinks is stale, doesn't it forward to all its peers? If so, don't all knows end up knowing about all the same stale blocks?

17:18

<b10c> it relays headers IIRC, but you don't request stale blocks

17:18

<sipa> only blocks we believe are part of the best chain are relayed

17:19

<lightlike> so in order to accept a stale block header, we must have believed it was in our best chain at the time of acceptance (and then changed our opinion/reorged)?

17:20

<b10c> specify "accept". do you mean in order to add it to our block index?

17:20

<lightlike> yes

17:21

<dergoegge> lightlike: if someone send you a valid header with enough work on it then you will store it in the index

17:21

<dergoegge> it does not need to extend the tip

17:21

<b10c> lightlike: then no, we accept stale headers too

17:21

<dergoegge> this would happen during a large reorg for example

17:22

<lightlike> dergoegge: "enough work" = "more work than our current tip"?

17:22

<dergoegge> no: "enough work" = "more work than the block it is extending"

17:23

<lightlike> ok, thanks

17:24

<dergoegge> so we have established that a node's block index is unique based on the fact that it can contain stale blocks that other nodes do not have

17:25

<dergoegge> if a peer can probe for stale blocks in the node's index then it can use that information to fingerprint the node

17:25

<larryruane> just to be sure, a node never drops blocks no matter how stale it thinks it is?

17:26

<b10c> larryruane: I don't think it does

17:26

<dergoegge> larryruane: do you mean after it has already accepted it into the index?

17:26

<dergoegge> or when receiving a new header?

17:31

<larryruane> i meant after being accepted into the index .. thanks

17:31

<dergoegge> afaict we dont prune stale headers/blocks from the index.

17:31

<dergoegge> which is also what the next question is about

17:31

<dergoegge> Why do we keep stale blocks in the block index?

17:32

<larryruane> (i think this is why once you have a valid `pindex` variable (getting that requires `cs_main`), you can use it without any lock

17:33

<dergoegge> i am actually not sure why we keep old stale headers/block around in the index

17:33

<lightlike> i think we might remove stale blocks if we use -prune mode: see https://bitcoin.stackexchange.com/questions/112205/removing-stale-blocks-using-prune-1tb

17:34

<lightlike> at least from the saved block data, not from the index though

17:35

<dergoegge> sipa: do you know why we keep old stale blocks/headers in the index?

17:35

<dergoegge> lightlike: yea i think we delete them from the disk but not from the index

17:35

<b10c> we keep (recent) stale blocks to be able to reorg to that chain if it becomes _longer_

17:36

<dergoegge> b10c: that makes sense, but do we need say a year old stale header?

17:37

<dergoegge> maybe pruning the block index from old stale headers could prevent this class of fingerprinting bug entirely

17:38

<b10c> yeah I'm not sure either, that's why I added the (recent) :)

17:39

<dergoegge> ok well we will leave this as an open question and move on...

17:39

<dergoegge> In your own words, how does the fingerprinting technique outlined in the PR work?

17:41

<b10c> we extend a stale branch with header H+1, send H+1 to a node and see if it requests header H. If it does, it doesn't know about the stale branch. If it doesn't, it knows about the stale branch

17:41

<lightlike> Have a list of existing stale blocks that our peer might have or not, create bogus headers building on them, and send them to a peer to check and record for which of the headers we get a GETHEADERS in return.

17:41

<b10c> The PR says H+1 doesn't need to have a valid PoW, so this is very cheap for us to do

17:42

<dergoegge> b10c lightlike: exactly right, i think if you use multiple headers like lightlike suggested then the accuracy of the attack increases

17:43

<larryruane> but is the whole idea of this attack to do this procedure to two different peer network addresses, to try to link them to the same machine?

17:43

<lightlike> i think the peer wouldn't request header H, but just send a locator with their current tip in response if it didn't know H

17:43

<dergoegge> b10c: using invalid PoW headers actually makes things easier for the attacker since the node will disconnect if it knows the stale block

17:44

<b10c> dergoegge: oh didn't know!

17:44

<dergoegge> larryruane: right, the attacker would know of two addresses say one IPv4 and one Tor and could then, using this technique, figure out if the addresses belong to the same node

17:45

<larryruane> thanks.. would it be much work to just check the PoW on the header to see if it's sufficient? (to make it harder on the attacker)

17:45

<dergoegge> lightlike: yea that sounds right

17:46

<dergoegge> larryruane: yes that would probably also work, but as you said would only make it harder not impossible

17:46

<lightlike> nodes would probably need to have a decent number of stale blocks in their index to make it possible to have a unique fingerprint.

17:46

<b10c> is this something you've actually tried and written code for, dergoegge?

17:46

<larryruane> theStack asks, is it common the run two different network connections from a single node?

17:48

<dergoegge> larryruane: i don't know. dont't have any statistics on that

17:48

<larryruane> lightlike: currently my node knows of 5 stale blocks (using `getchaintips`)

17:48

<larryruane> but I'm not gonna tell you which ones :)

17:48

<b10c> larrayruane: I think it is. e.g. IPv4 and IPv6. Obviously some are Tor only

17:49

<dergoegge> b10c: i was actually investigating the recent increase in Tor nodes see: https://bitnodes.io/dashboard/?days=1825

17:49

<dergoegge> and used a different technique to check if all those tor addresses belong to the same node (which appears to not be the case)

17:50

<dergoegge> yea you can not deanonymize a Tor only node with this

17:50

<dergoegge> Maybe you can if they switch back to IPv4 but thats a stretch

17:52

<lightlike> if we accept any stale headers that extend the work of their predecessor to our index (as was discussed before) couldn't we just send our victim one of these headers, and then probe again, making the fingerprinting possible even if our victim doesn't have any stale headers at the beginning?

100

17:53

<dergoegge> yea if you have a good collection of past stale headers or are able to mine new ones then you might be able to mark nodes with specific headers you sent to them

101

17:53

<b10c> makes the attack a lot more expensive though

102

17:54

<dergoegge> yes if you have to mine new ones

103

17:54

<ccdle12> the pow would eventually have to be below the stale relay age limit

104

17:54

<lightlike> yes, but it must be easy to get a list of historical ones?

105

17:54

<dergoegge> i wonder if anyone has a collection of *all* blocks that were ever created stale or not

106

17:56

<b10c> dergoegge: I'd guess many of the new tor nodes are RPi's with RaspiBlitz or similar. Many of the home nodes are Tor-only

107

17:56

<dergoegge> ccdle12: afaik the relay age limit prevents a peer from downloading a block that is older than the limit

108

17:57

<lightlike> Since it's almost time: I'd be interested in the answer to the last question, why the "allow_potentially_invalid_headers" parameter is necessary.

109

17:57

<dergoegge> ccdle12: which was also a fingerprint bug at some point, i think i linked that PR in the notes

110

17:57

<dergoegge> b10c: can be most of them have NODE_BLOOM set

111

17:58

<b10c> lightlike: +1, wasn't clear to me during my initial review round too

112

17:58

<dergoegge> lightlike: lets get to that then

113

17:59

<dergoegge> i introduced that because one of the p2p test was failing, let me grab a link real quick

114

17:59

<dergoegge> https://github.com/bitcoin/bitcoin/blob/master/test/functional/p2p_sendheaders.py

115

18:01

<dergoegge> crap i cant find the line

116

18:01

<dergoegge> #endmeeting

117

18:01

<dergoegge> sorry i answer this afterwards

118

18:01

<dergoegge> thanks everyone for coming!

119

18:02

<lightlike> Thanks dergoegge!

120

18:02

<dergoegge> i should have prepared an answer for the last one :D

121

18:02

<ccdle12> thanks dergoegge!

122

18:02

<b10c> maybe add a bit more details to the commit introducing it too, I was looking there and didn't find it

123

18:02

<dergoegge> b10c: +1

124

18:02

<b10c> thanks dergoegge! this was super interesting

125

18:03

<larryruane> thanks this was great!!

126

18:04

<lightlike> dergoegge: maybe also add the reason for this to the PR description or code, wherever it fits better (not that the test failed, but the root cause why it's necessary). It wasn't clear to me when reviewing.

127

18:05

<lightlike> oh, b10c said the same :)

128

18:10

<dergoegge> lightlike: https://github.com/bitcoin/bitcoin/blob/41720a1f540ef3c16a283a6cce6f0a63552a4937/test/functional/p2p_sendheaders.py#L497-L501

129

18:10

<dergoegge> This extends a recent non main chain branch with a header and expects the node to send a getdata for the block

130

18:10

<dergoegge> but with the new logic the node would ignore the header because it extended a stale branch

131

18:10

<dergoegge> specifically "pindex->IsValid(BLOCK_VALID_SCRIPTS)" in PeerManagerImpl::BlockRequestAllowed always returns false for headers, so we need an exception for that if we are deciding if we should leak the info about a header

132

18:10

<dergoegge> but i will add this as a comment