The PR branch HEAD was 281fd1a at the time of this review club meeting.
Notes
This refactor PR moves existing code into new src/util/hasher.{cpp|h} files.
Along the way, it touches some interesting code in:
src/coins.{cpp|h}
src/index/blockfilter.h
src/script/sigcache.h
src/txmempool.{cpp|h}
src/validation.h
Hashers are generally used to construct hash maps, which is a data structure
that has fast item lookup, addition and removal. They allow lookup of elements
in 0(1) time in the typical/best-case scenario although implementation can
affect this.
BlockHasher is used by src/validation.h::BlockMap to construct a hash map
of blocks and their sequence ID (block height).
BlockManager uses the BlockMap when adding a new block to the index with
AddBlockToIndex(), first checking whether it’s a duplicate
(src/validation.cpp#L3106) and then adding the new block and sequence ID to
the index.
SignatureCacheHasher
SignatureCacheHasher is used to avoid performing costly verification of ECDSA
signatures twice; once when accepted into the mempool and again when accepted
into a block.
The cache type currently used for CSignatureCache is “cuckoo cache”,
implemented in PR 8895.
FilterHeaderHasher
FilterHeaderHasher is used by
src/index/blockfilterindex.h::BlockFilterIndex to construct an index of
(BIP157)
filter headers for each filter type enabled.
However, BlockFilterIndex only consults the m_headers_cache lookup if the
block index is at a “checkpoint” height – every 1000 blocks per BIP157.
SaltedOutpointHasher
The Coin object requires CCoinsMap to use SaltedOutpointHasher in its map
of CCoinsCacheEntry(s).
CCoinsMap is used by CCoinsViewCache with most of its methods. As discussed
in the review club meeting for PR 18113 each
CCoinsCacheEntry contains both the Coin itself in addition to the flags
associated with that coin.
<willcl_ark> A quick reminder of some conventions, you don't have to ask to ask a question (e.g. "I have a question about x but I don't know if it's on-topic?") just go ahead and ask.
<joelklabo> I have a quick C++ question, is putting SaltedOutpointHasher() in the public interface of itself how you create a singleton? A little confused by the syntax
<joelklabo> Ah, ok but I was thinking the salt would need to be consistent for the lifetime of the process, so things don't move around, I'll look that up. Thanks
<murtyjones> Using SaltedTxidHasher would require storing all of the outputs in a list in the value, vs. using the outpoint to point to one specific UTXO.
<willcl_ark> murtyjones: dhruvm right. If we used SaltedTxidHasher then all the UTXOs from the same transaction would hash to the same result which would be, sub-optimal.
<murtyjones> A salt is random data included in the input to a hash function. My guess is that it's used with those structures to prevent malicious construction of colliding values?
<dhruvm> The salt makes the output of the hash function unpredictable to parties that know the input but not the salt. The hash function remains deterministic so long as the same salt is used. The salt is updated every time the containers are initialized. As an example, SaltedTxidHasher is used in a multi_index for the mempool, which seems to be a non-persistent data structure.
<michaelfolkson> Right... so salts are generally used for things like storing passwords. If you didn't salt them you would be able to see everyone who uses "Password" as their password
<willcl_ark> correct! std::unordered_set puts objects into buckets based on their hash value. An adversary who can create hash collisions at will can force a large number of items into the same bucket meaning that lookup time goes from 0(1) to 0(n).
<dhruvm> Maybe because those hashers are used in persistent data structures? BlockHasher is used in BlockMap/m_block_index. (Un)LoadBlockIndex persist it to disk.
<michaelfolkson> OK... so I get the benefits of salting - use random number in your hash. Like a nonce in dig sigs I guess... I get the benefits of siphash, need secret key to generate.
<dhruvm> A multi_index maintains multiple indices on the same data set. index_transaction_set is a multi_index on the mempool. SaltedTxidHasher is used to maintain the mempool sorted by txid for fast lookups by txid.
<jnewbery> if someone wanted to create a collision in our BlockMap, they'd need to be mining valid blocks which _also_ collide in the BlockHasher function
<willcl_ark> dhruvm: right, this would allow a single hasher to be used with any of `uint256`, `CPubKey`, `CScript`, `CKeyId`, and generics like `std::vector<unsigned char>`
<willcl_ark> It's not easy to use the `SaltedSipHasher` with `Span` in places where the keyed hash is saved to disk, e.g. https://github.com/bitcoin/bitcoin/blob/3f205808/src/addrman.cpp#L14. If the hash function is changed, then the object would no longer hash to the same digest.
<michaelfolkson> I know you jnewbery wanted to only allow parts of C++17. Was it consensus that all C++17 should be allowed in Core v22? I know people argued for this
<jnewbery> There's a short survey at https://forms.gle/ZLZcxVRoAozVcAyE6 about review club this year. It'd be very helpful if you could fill it out and let me know how we could make review club even better next year.
<wumpus> felixweis | When leaving c++11 behind, why not go all the way to 20 ? <- if only :) every bump of the c++ version is a struggle, which depends on compiler support in even the most slow linux distros