Introduce internal kernel library (build system)

https://github.com/bitcoin/bitcoin/pull/28690

Host: stickies-v  -  PR author: TheCharlatan

The PR branch HEAD was 5a6a0b1693cf739a5e6cc1161160a1d621d215f9 at the time of this review club meeting.

2023-11-29 edit: the questions and references have been updated slightly to reflect the latest force-push to 5a6a0b1693cf739a5e6cc1161160a1d621d215f9

Notes

  • The libbitcoinkernel project is an effort to decouple Bitcoin Core’s consensus engine from other non-consensus modules in the codebase. We have previously covered libbitcoinkernel-related PRs #27711, #25527, #24410 and #20158. However, they are not essential to understanding this PR as they focused more on internal code reorganization as opposed to this PR’s focus on the build system.

  • Suggested reading: Static and dynamic libraries (up until the “Installing and using libraries” section which is not relevant).

  • Most of the libraries (required reading) that are built are internal and static, and can’t be used outside of Bitcoin Core. Examples of internal libraries are libbitcoin_cli, libbitcoin_common and libbitcoin_node. There are only two external (dynamic, or shared) libraries (required reading): libbitcoinconsensus and libbitcoinkernel, even though the latter is not currently documented as such since the API is not ready.

  • This PR introduces a new internal libbitcoin_kernel static library, in addition to the existing external libbitoincoinkernel dynamic library.

  • Internal libraries are solely used to modularize the build system, which helps with build performance as well as maintainability. In the future, the Bitcoin Core build system may start to use the external libbitcoinkernel library, but that would probably require more kernel work (such as having a more complete and stable API) to be completed first.

  • An explanation of some abbreviations used:

    • a vs la(as in libbitcoin_kernel_a_SOURCES and libbitcoinkernel_la_SOURCES):
      • a stands for archive (i.e. a static library)
      • la stands for Libtool Archive

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Did you read the Static and dynamic libraries, libraries.md and shared-libraries.md documentation?

  3. Are the Bitcoin Core internal libraries all statically or dynamically built, or a mix of both? Why?

  4. Why do we need to build external libraries? What’s the purpose of having both an internal libbitcoin_kernel and an external libbitoinkernel? Why do we build external libraries in the first place?

  5. Which of the libraries {libbitcoin_cli, libbitcoin_consensus, libbitcoinconsensus, libbitcoin_kernel, libbitcoinqt} are internal?

  6. How are we using kernel functionality in Bitcoin Core before this PR, if there is no internal library?

  7. Why does the libbitcoinkernel_la_SOURCES source file list specifically include $(libbitcoin_util_a_SOURCES) and $(libbitcoin_consensus_a_SOURCES) but libbitcoin_kernel_a_SOURCES doesn’t seem to?

  8. Commit 41a80de mentions: “Moving util/message.cpp to the common library also breaks an undocumented dependency of the util library on the consensus library’s CPubKey::RecoverCompact symbol.”. What was this dependency?

  9. Commit 5a6a0b1 de-duplicates the source file list. Are there any changes to the source file list used for libbitcoinkernel?

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <maxedw> hi
  317:00 <kashifs> hi
  417:00 <pablomartin> hello
  517:00 <marcofleon> Hi
  617:00 <abubakarsadiq_> hello
  717:00 <TheCharlatan> hi
  817:00 <stickies-v> welcome everyone! Today we're looking at #28690, authored by TheCharlatan. The notes and questions are available on https://bitcoincore.reviews/28690
  917:01 <hebasto> hi
 1017:01 <monlovesmango> hey
 1117:01 <stickies-v> note: we'll be focusing on the previous PR HEAD (https://github.com/bitcoin-core-review-club/bitcoin/commit/5a235048500a38fae691396cb59f6697032b4deb), which is _not_ the current HEAD anymore (but we don't want to overhaul the notes and questions last minute)
 1217:02 <stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi!
 1317:03 <stickies-v> > who got the chance to review the PR or read the notes? (y/n)
 1417:03 <hebasto> y
 1517:03 <pablomartin> y
 1617:03 <monlovesmango> y
 1717:03 <marcofleon> y
 1817:03 <kashifs> y
 1917:03 <abubakarsadiq_> light review Concept ACK
 2017:03 <TheCharlatan> y :P
 2117:03 <hebasto> :)
 2217:03 <stickies-v> ooh look at that prep! would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK? what was your review approach?
 2317:04 <abubakarsadiq_> the notes are gem, thanks for the resource @stickies-v
 2417:04 <maxedw> y, Concept ACK
 2517:04 <hebasto> Approach ACK -- backported each commit to cmake-staging branch and analized undefined symbols in internal libraries
 2617:05 <stickies-v> glad they were helpful abubakarsadiq_
 2717:05 <pablomartin> Concept ACK but after reading notes and all conversations in it, most prob Approach ACK
 2817:05 <stickies-v> hebasto: i saw your comment on the undefined symbols analysis, great stuff! would you mind sharing how you did that?
 2917:05 <marcofleon> Ran some tests but would need to do a more thorough walkthrough of commits, concept ACK
 3017:06 <pablomartin> hebasto, stickies-v: yes pls
 3117:07 <hebasto> stickies-v: in CMake it is easy to build any (even internal) library as shared one. the `--no-undefined` option will force a linker to report undefined symbols
 3217:08 <stickies-v> ah that's very cool, another cmake benefit (for anyone interested, see https://github.com/bitcoin/bitcoin/issues/28607)
 3317:08 <maxedw> Is there a time that undefined symbols is acceptable?
 3417:08 <stickies-v> i'm assuming everyone's completed (or already knows about) the required reading from Q2, so we can just skip that and head straight into Q3
 3517:08 <stickies-v> quick reminder that review club is async - so don't hold back on continuing the conversations/questions about previous questions when we've moved on already!
 3617:08 <hebasto> yes, for internal static libs
 3717:09 <maxedw> thanks hebasto
 3817:09 <stickies-v> hebasto: but it's not really "acceptable" for internal static libs either, right? like, the current state is not perfect, but ideally we'd eliminate all such instances?
 3917:10 <maxedw> not ideal but could work if you provide the symbol in the rest of your code
 4017:10 <maxedw> perhaps is a smell?
 4117:10 <fanquake> To be clear, --no-undefined is not related to CMake, it's a linker option that can be used regardless of build system
 4217:10 <fanquake> If you want to use it with master, just put --no-undefined into your LDFLAGS etc (we already do that in some cases)
 4317:10 <maxedw> thanks for the clarification fanquake
 4417:11 <hebasto> ^ CMake-related thing is easy building SHARED instead of STATIC
 4517:11 <stickies-v> i'm very happy we've got all the experts in the room
 4617:11 <stickies-v> so, first question (keeping the numbering from the notes)
 4717:11 <stickies-v> 3. Are the Bitcoin Core internal libraries all statically or dynamically built, or a mix of both? Why?
 4817:12 <hebasto> well, configuring with such global LDFLAGS might not work
 4917:12 <abubakarsadiq_> From the list of the libraries in https://github.com/bitcoin/bitcoin/blob/master/doc/design/libraries.md seems like all internal libraries have the file extension .a which means they are all static libs. But I see libsecp256k1 and libbitcoin_crypto_base are dynamic I dont think they are internal libs.
 5017:12 <maxedw> I believe they are all static
 5117:13 <pablomartin> thought it was a mix...
 5217:13 <LarryRuane> yes I think static
 5317:13 <TheCharlatan> libbitcoin_crypto_base is in fact static, check crypto_libbitcoin_crypto_base_la_LDFLAGS
 5417:14 <abubakarsadiq_> oh thanks TheCharlatan will check
 5517:14 <maxedw> unrelated question but am I right in thinking that libsecp256k1 is copied into the bitcoin source code. It's not a git submodule.
 5617:15 <hebasto> maxedw: it is a subtree
 5717:15 <abubakarsadiq_> I see
 5817:15 <maxedw> ah gotcha hebasto
 5917:16 <stickies-v> so yes as most have pointed out already, all the internal libs are static! anyone got an answer for the "Why" part?
 6017:16 <monlovesmango> @pablomartin I thought so too bc of the bullet point in the notes, but I think that is about bitcoin core, not bitcoin core internal
 6117:17 <monlovesmango> for stability/security? dunno
 6217:17 <LarryRuane> stickies-v: is it more secure? the user can't be using a rogue library that an attacker installed on the system? (not sure at all)
 6317:17 <pablomartin> i see, thanks
 6417:18 <abubakarsadiq_> security? can you expand
 6517:18 <pablomartin> yeah they are all embedded into the binary
 6617:18 <stickies-v> LarryRuane: shared library/DLL hijacking is indeed something I've come across a few times in my research but I don't know how feasible it is in practice nowadays, hopefully someone can expand on that
 6717:19 <hebasto> internal static libraries, being simple archives of object files, are used to make build system efficient, modularized, maintanble
 6817:19 <maxedw> feels neater rather than needing to ship a bunch of libraries with the binary. I'm not sure about security as if someone is able to swap out one of the libs then why couldn't they swap out the main binary?
 6917:19 <LarryRuane> "... why couldn't they swap out ..." -- yes, very good point
 7017:20 <hebasto> the same internal static library might be reused to build several executables
 7117:20 <stickies-v> hebasto: wouldn't shared libraries would be equally modularized and maintainable?
 7217:22 <hebasto> then bitcoind must be shipped with the bunch of dependency shared libs
 7317:22 <sipa> shared libraries IMHO only make sense when they can commit to having a stable interface
 7417:22 <sipa> and i don't think that is something we want at all
 7517:24 <stickies-v> i think perhaps one way of summarizing is: static libraries is better (i.e. more efficient and maintainable build system) than not having libraries, and shared libraries don't offer any additional benefit that we need here (because we don't want anyone outside of core to use our internal libraries)
 7617:25 <stickies-v> 4. What's the purpose of having both an internal `libbitcoin_kernel` and an external `libbitoinkernel`?
 7717:25 <monlovesmango> interesting, hadn't thought about discouraging use outside of bitcoin core
 7817:25 <sipa> i feel this is a bit of a false dichotomy too... you can still having static internal libraries as intermediate build products/organization, and then compile them into a shared library
 7917:26 <sipa> shared libraries in that regard in contrast to having executables
 8017:26 <hebasto> ^ exactly
 8117:27 <hebasto> and internal static lib is a a contrast to just a bunch of object files
 8217:27 <sipa> right
 8317:28 <maxedw> Is the reason that it's not much effort to have both so why not
 8417:28 <maxedw> static still preferable for bitcoind and shared for others to use
 8517:28 <hebasto> to have a shared library, the well defined stable interface must go first
 8617:29 <hebasto> it makes sense for external libs only
 8717:29 <abubakarsadiq_> Does this mean their is no functionality distinction between `libbitcoin_kernel` and `libbitoinkernel`?
 8817:30 <hebasto> no
 8917:30 <hebasto> one is a build unit
 9017:30 <hebasto> another has (will have) a stable versioned API
 9117:30 <stickies-v> hebasto: that's not true at the moment, though?
 9217:31 <fanquake> abubakarsadiq: correct. the code in either is essentially the same
 9317:31 <monlovesmango> it says that libbitcoinkernel API is not ready. if we are making internal libbitcoin_kernel now will that have to keep up with changes from external library?
 9417:31 <hebasto> * my "no" means "no distinction" -- sorry\
 9517:32 <stickies-v> oh, i see, thanks!
 9617:32 <stickies-v> monlovesmango: both libraries are built from the same source files, so they are 100% in sync
 9717:32 <monlovesmango> gotcha thank you!
 9817:32 <hebasto> the only difference is `-DBUILD_BITCOIN_INTERNAL` flag
 9917:33 <hebasto> so, not 100% the same
10017:34 <TheCharlatan> (and kernel/bitcoinkernel.cpp for providing the translation function pointer)
10117:34 <hebasto> yeap
10217:35 <stickies-v> I think at some point in the future, we _could_ have a separate libbitcoinkernel project that produces a shared library, and then i think we wouldn't have use for an internal `libbitcoin_kernel` static library in core anymore? correct?
10317:35 <stickies-v> although that may complicate things if/when we want to make core-specific changes to the kernel
10417:37 <monlovesmango> stickies-v: can you explain why? seems like libbitcoinkernel already was a shared library but we are still making an internal one to replace it. why would that change if it became a separate project?
10517:37 <hebasto> we ship shared libconsensus and libkernel compiled with `-DBUILD_BITCOIN_INTERNAL`
10617:38 <stickies-v> monlovesmango: that's basically what one of the next questions is about, so let me just launch that question already:
10717:38 <hebasto> for bitcoind we need code compiled without `-DBUILD_BITCOIN_INTERNAL`
10817:38 <stickies-v> 6. How are we using kernel functionality in Bitcoin Core before this PR, if there is no internal library?
10917:40 <hebasto> we use an alternative -- a old good list of object files compiled from source
11017:42 <stickies-v> so to answer monlovesmango: as said before: using static libs is better than not using libs (which is the current state), the shared libbitcoinkernel isn't really used anywhere, we just build it
11117:42 <stickies-v> so we're not replacing anything, we're just improving the build system
11217:43 <monlovesmango> that makes sense
11317:44 <monlovesmango> why in the notes does it mention libbitcoinkernel already existing if this pr is creating it? or am I misunderstanding?
11417:45 <stickies-v> monlovesmango: libbitcoinkernel is the (existing) shared library. libbitcoin_kernel is the new static one
11517:45 <hebasto> ^ and internal one
11617:46 <stickies-v> 7. Why does the `libbitcoinkernel_la_SOURCES` source file list specifically include `$(libbitcoin_util_a_SOURCES)` and `$(libbitcoin_consensus_a_SOURCES)` but `libbitcoin_kernel_a_SOURCES` doesn't seem to?
11717:48 <monlovesmango> so hebasto said that current state we use a list of object files compiled from source, were these maintained to be in line with libbitcoinkernel? or libbitcoinkernel was built from these?
11817:50 <monlovesmango> is it bc it is archive and sources have already been compiled into library?
11917:52 <hebasto> we want to compile `libbitcoinkernel` with `-DBUILD_BITCOIN_INTERNAL`, that's why we have to compile `libbitcoin_consensus_a_SOURCES` one more time, and not reuse already compiled internal lib, which was compiled without that flag
12017:55 <monloves_> hello?
12117:55 <stickies-v> monlovesmango: see https://github.com/bitcoin-core-review-club/bitcoin/commit/983d0978a973a12c3128b2c8e13b73ed08155e67
12217:56 <TheCharlatan> Besides, next to having to define `-DBUILD_BITCOIN_INTERNAL` during compilation, it is not possible to for our internal libraries and end artifacts (like bitcoind) to eventually link against the external libbitcoin_kernel. Not only does the external libbitcoin_kernel have more symbols (like the translation function pointer), but it is also not possible to statically link a dynamic library
12317:56 <stickies-v> we add `$(LIBBITCOIN_KERNEL)` to `bitcoin_bin_ldadd`
12417:56 <TheCharlatan> (which the external libbitcoin_kernel is).
12517:56 <stickies-v> so we link directly to the (new) libbitcoin_kernel
12617:57 <stickies-v> TheCharlatan: thanks a lot for the extra info!
12717:57 <TheCharlatan> damn, this naming is has, should be external libbitcoinkernel :P
12817:57 <monloves_> thank you will read that over!
12917:58 <stickies-v> we're almost at time, so just gonna lqunch the last question:
13017:58 <stickies-v> 9. Commit 5a6a0b1 de-duplicates the source file list. Are there any changes to the source file list used for `libbitcoinkernel`?
13117:58 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/commit/5a6a0b1693cf739a5e6cc1161160a1d621d215f9)
13217:59 <abubakarsadiq_> I dont think there are any
13318:03 <stickies-v> abubakarsadiq_: what about libbitcoin_util_a_SOURCES including $(BITCOIN_CORE_H)?
13418:04 <stickies-v> we're a bit over time already so i'm going to end the meeting here, but we can continue the discussion for a bit longer on this last question for those who want to!
13518:04 <stickies-v> #endmeeting
13618:05 <maxedw> thanks stickies-v! I'll hang round for a bit
13718:06 <monloves_> same
13818:06 <pablomartin> sure
13918:06 <abubakarsadiq_> stickies-v Ahh missed that, thought there is no change in functionality means source list will stay the same
14018:06 <abubakarsadiq_> Thanks for hosting
14118:08 <pablomartin> stickies-v: sorry, did we miss the #8?
14218:09 <stickies-v> pablomartin: yeah we skipped that for running out of time, i'll have to run myself but feel free to discuss here though and hopefully people will chime in
14318:09 <TheCharlatan> After some change requests from reviewers, I decided to leave some of the small utilities in the util library, that previously were not used or part of the kernel, but which don't import further functionality, like `spanparsing`, `bytevectorhash`, and `readwritefile`. So the file list and thus the content of the kernel library does change a bit. They might be moved at later point though if
14418:09 <TheCharlatan> they prove in the way of things. See https://github.com/bitcoin/bitcoin/pull/28690#discussion_r1408086308 and https://github.com/bitcoin/bitcoin/pull/28690#discussion_r1416392404.
14518:10 <pablomartin> ok, nw, cheers! many thanks for preparing this meeting
14618:10 <monloves_> so i'm a bit slow, what were the changes to the source file list? how would you determine that?
14718:10 <pakaro> hi, very late, sorry
14818:10 <monloves_> do you manually have to check each previous dependency?
14918:11 <monloves_> yeah thanks for hosting stickies-v and TheCharlatan!
15018:11 <abubakarsadiq_> for 8 `CPubKey::RecoverCompact` defined in pubkey.h is part of the consensus, previously util library has message.cpp which uses RecoverCompact in https://github.com/bitcoin/bitcoin/blob/c46cc8d3c1a6250d56b68abb184db344565eff34/src/util/message.cpp#L46
15118:11 <abubakarsadiq_> This means util lib depends on consensus_lib
15218:11 <abubakarsadiq_> But in this PR message.cpp is moved to common, common already depends on consensus.
15318:13 <pablomartin> thanks abubakarsadiq!
15418:18 <pablomartin> TheCharlatan: I see... you mean on your latest push: from 5a23504 to 2086d1d