Prevent block index fingerprinting by sending additional getheaders messages (p2p)

https://github.com/bitcoin/bitcoin/pull/24571

Host: dergoegge  -  PR author: dergoegge

The PR branch HEAD was 4e415067 at the time of this review club meeting.

Notes

  • Attackers may use fingerprinting techniques to recognize the same node across different connections. This makes it possible to test if two addresses belong to the same node, which we generally try to avoid especially for addresses belonging to privacy-centric networks such as Tor. Some fingerprinting attacks work across restarts of the victim’s node, making it possible to detect if a node changes addresses.

  • A variety of fingerprinting techniques have been patched or mitigated. For example, attackers could have used requests for old non-main-chain headers/blocks (#5820, #8408, #11113), addr message timestamps and prune depth to fingerprint nodes.

  • The fingerprintable behaviour that PR #24571 addresses occurs when a node receives headers from a peer (handled in PeerManagerImpl::ProcessHeadersMessage). If the received headers don’t connect to any header in the node’s block index, then it will request additional headers via a getheaders message in an attempt to connect the chain.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What is the block index and what is it used for? (Hint: look at the usage of m_block_index)

  3. Why and how can the block index be used for fingerprinting? (Hint: it has to do with stale blocks/headers)

  4. Why do we keep stale blocks in the block index?

  5. In your own words, how does the fingerprinting technique outlined in the PR work?

  6. Does the fingerprinting technique outlined in the PR work across restarts of the target node?

  7. This commit introduces a new parameter to PeerManagerImpl::BlockRequestAllowed. Why is that necessary?

Meeting Log

  117:00 <dergoegge> #startmeeting
  217:00 <dergoegge> Hi everyone, welcome to this week's PR review club!
  317:00 <dunxen> hi!
  417:00 <b10c> hi
  517:00 <ccdle12> hi
  617:00 <justin> hey
  717:00 <dergoegge> Feel free to say hi to let people know you are here
  817:00 <dergoegge> Anyone here for the first time?
  917:01 <lightlike> hi
 1017:02 <dergoegge> This week we are looking at #24571 “Prevent block index fingerprinting by sending additional getheaders messages”
 1117:02 <dergoegge> Notes are in the usual place: https://bitcoincore.reviews/24571
 1217:02 <dergoegge> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
 1317:03 <lightlike> yes, concept ACK
 1417:03 <dunxen> Yes, light review. Approach ACK
 1517:03 <larryruane> hi
 1617:03 <b10c> I started reviewing. Concept and Approach ACK
 1717:04 <dergoegge> Ok cool! first question: What is the block index and what is it used for?
 1817:04 <ccdle12> approach ACK
 1917:04 <b10c> Haven't tested anything yet, but have a few ideas to test against my (still private) signet with reorgs. Have a lot of stale blocks there
 2017:04 <dergoegge> b10c: cool idea!
 2117:06 <lightlike> An in-memory index of the blockchain which contains the headers plus info where to find the rest of the block data on disk.
 2217:07 <b10c> and it's kinda a blocktree
 2317:07 <b10c> multiple branches (chains)
 2417:07 <b10c> allows us to determine the longest branch/chain
 2517:07 <b10c> and switching between branches if one becomes _longer_ (more work)
 2617:08 <dergoegge> lightlike b10c: correct!
 2717:09 <dergoegge> the fact that it is a tree is very important this PR
 2817:09 <dergoegge> which brings us to the next question: Why and how can the block index be used for fingerprinting?
 2917:10 <lightlike> when do we accept multiple branches normally? only when we are witnessing a reorg as it happens? or also after the fact?
 3017:11 <dergoegge> lightlike: afaict we also accept any headers into the index that come after the last checkpoint and have enough work
 3117:12 <b10c> I might have a branch in that tree that not many others have. If you can find out that I have this branch, this leaks information. Not sure if I can pinpoint you with that information, but can get harmful when combined with more information
 3217:13 <b10c> dergoegge: that's my understanding too
 3317:13 <b10c> we don't download all blocks though
 3417:13 <lightlike> dergoegge: I mean, if I do an IBD now, will I accept any historical non-best-chain headers in my blockchain index? Or would I only get those if I am online when there are conflicting blocks for my tip, and don't know which will stay in the main chain?
 3517:13 <dergoegge> b10c: exactly you might have seen a header/block that for example after a reorg is no longer part of the main chain and has thus become stale
 3617:14 <b10c> lightlike: see the RPC docs for getchaintips too
 3717:14 <dergoegge> the exact number of stale blocks any specific node has seen will be unique to that node depending on where in the network the node sits
 3817:15 <b10c> dergoegge: right, from my experience older nodes have a lot of entries when calling the getchaintips RPC
 3917:15 <dergoegge> lightlike: during IBD you only request and download the blocks of the headers you got during initial header sync
 4017:16 <b10c> so you could probably find out how long that node has been running for (with ~months of accuracy)
 4117:16 <larryruane> If my node knows about a block that it thinks is stale, doesn't it forward to all its peers? If so, don't all knows end up knowing about all the same stale blocks?
 4217:18 <b10c> it relays headers IIRC, but you don't request stale blocks
 4317:18 <sipa> only blocks we believe are part of the best chain are relayed
 4417:19 <lightlike> so in order to accept a stale block header, we must have believed it was in our best chain at the time of acceptance (and then changed our opinion/reorged)?
 4517:20 <b10c> specify "accept". do you mean in order to add it to our block index?
 4617:20 <lightlike> yes
 4717:21 <dergoegge> lightlike: if someone send you a valid header with enough work on it then you will store it in the index
 4817:21 <dergoegge> it does not need to extend the tip
 4917:21 <b10c> lightlike: then no, we accept stale headers too
 5017:21 <dergoegge> this would happen during a large reorg for example
 5117:22 <lightlike> dergoegge: "enough work" = "more work than our current tip"?
 5217:22 <dergoegge> no: "enough work" = "more work than the block it is extending"
 5317:23 <lightlike> ok, thanks
 5417:24 <dergoegge> so we have established that a node's block index is unique based on the fact that it can contain stale blocks that other nodes do not have
 5517:25 <dergoegge> if a peer can probe for stale blocks in the node's index then it can use that information to fingerprint the node
 5617:25 <larryruane> just to be sure, a node never drops blocks no matter how stale it thinks it is?
 5717:26 <b10c> larryruane: I don't think it does
 5817:26 <dergoegge> larryruane: do you mean after it has already accepted it into the index?
 5917:26 <dergoegge> or when receiving a new header?
 6017:31 <larryruane> i meant after being accepted into the index .. thanks
 6117:31 <dergoegge> afaict we dont prune stale headers/blocks from the index.
 6217:31 <dergoegge> which is also what the next question is about
 6317:31 <dergoegge> Why do we keep stale blocks in the block index?
 6417:32 <larryruane> (i think this is why once you have a valid `pindex` variable (getting that requires `cs_main`), you can use it without any lock
 6517:33 <dergoegge> i am actually not sure why we keep old stale headers/block around in the index
 6617:33 <lightlike> i think we might remove stale blocks if we use -prune mode: see https://bitcoin.stackexchange.com/questions/112205/removing-stale-blocks-using-prune-1tb
 6717:34 <lightlike> at least from the saved block data, not from the index though
 6817:35 <dergoegge> sipa: do you know why we keep old stale blocks/headers in the index?
 6917:35 <dergoegge> lightlike: yea i think we delete them from the disk but not from the index
 7017:35 <b10c> we keep (recent) stale blocks to be able to reorg to that chain if it becomes _longer_
 7117:36 <dergoegge> b10c: that makes sense, but do we need say a year old stale header?
 7217:37 <dergoegge> maybe pruning the block index from old stale headers could prevent this class of fingerprinting bug entirely
 7317:38 <b10c> yeah I'm not sure either, that's why I added the (recent) :)
 7417:39 <dergoegge> ok well we will leave this as an open question and move on...
 7517:39 <dergoegge> In your own words, how does the fingerprinting technique outlined in the PR work?
 7617:41 <b10c> we extend a stale branch with header H+1, send H+1 to a node and see if it requests header H. If it does, it doesn't know about the stale branch. If it doesn't, it knows about the stale branch
 7717:41 <lightlike> Have a list of existing stale blocks that our peer might have or not, create bogus headers building on them, and send them to a peer to check and record for which of the headers we get a GETHEADERS in return.
 7817:41 <b10c> The PR says H+1 doesn't need to have a valid PoW, so this is very cheap for us to do
 7917:42 <dergoegge> b10c lightlike: exactly right, i think if you use multiple headers like lightlike suggested then the accuracy of the attack increases
 8017:43 <larryruane> but is the whole idea of this attack to do this procedure to two different peer network addresses, to try to link them to the same machine?
 8117:43 <lightlike> i think the peer wouldn't request header H, but just send a locator with their current tip in response if it didn't know H
 8217:43 <dergoegge> b10c: using invalid PoW headers actually makes things easier for the attacker since the node will disconnect if it knows the stale block
 8317:44 <b10c> dergoegge: oh didn't know!
 8417:44 <dergoegge> larryruane: right, the attacker would know of two addresses say one IPv4 and one Tor and could then, using this technique, figure out if the addresses belong to the same node
 8517:45 <larryruane> thanks.. would it be much work to just check the PoW on the header to see if it's sufficient? (to make it harder on the attacker)
 8617:45 <dergoegge> lightlike: yea that sounds right
 8717:46 <dergoegge> larryruane: yes that would probably also work, but as you said would only make it harder not impossible
 8817:46 <lightlike> nodes would probably need to have a decent number of stale blocks in their index to make it possible to have a unique fingerprint.
 8917:46 <b10c> is this something you've actually tried and written code for, dergoegge?
 9017:46 <larryruane> theStack asks, is it common the run two different network connections from a single node?
 9117:48 <dergoegge> larryruane: i don't know. dont't have any statistics on that
 9217:48 <larryruane> lightlike: currently my node knows of 5 stale blocks (using `getchaintips`)
 9317:48 <larryruane> but I'm not gonna tell you which ones :)
 9417:48 <b10c> larrayruane: I think it is. e.g. IPv4 and IPv6. Obviously some are Tor only
 9517:49 <dergoegge> b10c: i was actually investigating the recent increase in Tor nodes see: https://bitnodes.io/dashboard/?days=1825
 9617:49 <dergoegge> and used a different technique to check if all those tor addresses belong to the same node (which appears to not be the case)
 9717:50 <dergoegge> yea you can not deanonymize a Tor only node with this
 9817:50 <dergoegge> Maybe you can if they switch back to IPv4 but thats a stretch
 9917:52 <lightlike> if we accept any stale headers that extend the work of their predecessor to our index (as was discussed before) couldn't we just send our victim one of these headers, and then probe again, making the fingerprinting possible even if our victim doesn't have any stale headers at the beginning?
10017:53 <dergoegge> yea if you have a good collection of past stale headers or are able to mine new ones then you might be able to mark nodes with specific headers you sent to them
10117:53 <b10c> makes the attack a lot more expensive though
10217:54 <dergoegge> yes if you have to mine new ones
10317:54 <ccdle12> the pow would eventually have to be below the stale relay age limit
10417:54 <lightlike> yes, but it must be easy to get a list of historical ones?
10517:54 <dergoegge> i wonder if anyone has a collection of *all* blocks that were ever created stale or not
10617:56 <b10c> dergoegge: I'd guess many of the new tor nodes are RPi's with RaspiBlitz or similar. Many of the home nodes are Tor-only
10717:56 <dergoegge> ccdle12: afaik the relay age limit prevents a peer from downloading a block that is older than the limit
10817:57 <lightlike> Since it's almost time: I'd be interested in the answer to the last question, why the "allow_potentially_invalid_headers" parameter is necessary.
10917:57 <dergoegge> ccdle12: which was also a fingerprint bug at some point, i think i linked that PR in the notes
11017:57 <dergoegge> b10c: can be most of them have NODE_BLOOM set
11117:58 <b10c> lightlike: +1, wasn't clear to me during my initial review round too
11217:58 <dergoegge> lightlike: lets get to that then
11317:59 <dergoegge> i introduced that because one of the p2p test was failing, let me grab a link real quick
11417:59 <dergoegge> https://github.com/bitcoin/bitcoin/blob/master/test/functional/p2p_sendheaders.py
11518:01 <dergoegge> crap i cant find the line
11618:01 <dergoegge> #endmeeting
11718:01 <dergoegge> sorry i answer this afterwards
11818:01 <dergoegge> thanks everyone for coming!
11918:02 <lightlike> Thanks dergoegge!
12018:02 <dergoegge> i should have prepared an answer for the last one :D
12118:02 <ccdle12> thanks dergoegge!
12218:02 <b10c> maybe add a bit more details to the commit introducing it too, I was looking there and didn't find it
12318:02 <dergoegge> b10c: +1
12418:02 <b10c> thanks dergoegge! this was super interesting
12518:03 <larryruane> thanks this was great!!
12618:04 <lightlike> dergoegge: maybe also add the reason for this to the PR description or code, wherever it fits better (not that the test failed, but the root cause why it's necessary). It wasn't clear to me when reviewing.
12718:05 <lightlike> oh, b10c said the same :)
12818:10 <dergoegge> lightlike: https://github.com/bitcoin/bitcoin/blob/41720a1f540ef3c16a283a6cce6f0a63552a4937/test/functional/p2p_sendheaders.py#L497-L501
12918:10 <dergoegge> This extends a recent non main chain branch with a header and expects the node to send a getdata for the block
13018:10 <dergoegge> but with the new logic the node would ignore the header because it extended a stale branch
13118:10 <dergoegge> specifically "pindex->IsValid(BLOCK_VALID_SCRIPTS)" in PeerManagerImpl::BlockRequestAllowed always returns false for headers, so we need an exception for that if we are deciding if we should leak the info about a header
13218:10 <dergoegge> but i will add this as a comment