Improve sed robustness by not using sed (build system)

https://github.com/bitcoin/bitcoin/pulls/19761

Host: fanquake  -  PR author: fanquake

The PR branch HEAD was 3de365e at the time of this review club meeting.

Notes

  • Bitcoin Core requires several dependencies. These include large dependencies such as boost, libevent and qt, as well as several smaller dependencies.

  • These dependencies can be built locally as part of a depends build.

  • The versions of each dependency library is pinned by a specific hash (e.g. here for bdb).

  • Sometimes small changes need to be made to the library before building. This has previously been done using sed.

  • sed (stream editor) is a lightweight tool that is often used for text substitutions.

  • This PR changes some of those sed commands for patch files, which more explicitly document the exact substitution.

Questions

  1. Why might we want to replace the usage of sed in our dependency builder with patches?

  2. What makes patches any better than sed? Are there any potential downsides?

  3. After this PR, there are 9 usages of sed left. Why can’t we replace all of them?

  4. How would you test this PR?

  5. How could we verify that the patches are being applied correctly?

Meeting Log

  119:00 <@jnewbery> #startmeeting
  219:00 <amiti> hi
  319:00 <emzy> hi
  419:00 <@jnewbery> hi all. Welcome to PR Review Club. Feel free to say hi to let everyone know you're here
  519:00 <jonatack> ohai
  619:00 <punctured> hi
  719:00 <@jnewbery> Anyone here for the first time?
  819:01 <Paul_R> hi
  919:01 <michaelfolkson> hi
 1019:01 <Paul_R> 2nd time
 1119:01 <@jnewbery> welcome back Paul_R. Glad we didn't scare you off :)
 1219:01 <nehan> hi
 1319:01 <Paul_R> I was here on september 2, for bip 340...haha no i'm glad to be back :)
 1419:01 <@jnewbery> This week we're going to look at part of the repo that I know very little about - the build system!
 1519:02 <@jnewbery> (Sorry, I didn't prepare an ASCII art BUILD SYSTEM)
 1619:02 <@jnewbery> I'm really excited to get stuck in and learn more about our depends build, and fanquake has very kindly stayed up late in his timezone to teach us all. Thanks fanquake
 1719:02 <@jnewbery> I'll pass over to you now
 1819:02 <punctured> meaning a ascii art diagram of the build system?
 1919:03 <jonatack> [[[((((*=== build system ===*))))]]]
 2019:03 <punctured> of a build system for ascii art *scratches head*
 2119:03 <punctured> or a*
 2219:03 <michaelfolkson> Inside joke
 2319:03 <fanquake> jnewbery: cool
 2419:03 <punctured> got it
 2519:03 <michaelfolkson> Past PR review club session on Signet punctured
 2619:03 <@jnewbery> punctured: pinheadmz reasied the bar considerably with some art a couple of weeks ago
 2719:03 <fanquake> Prepare for me to drop is back down to whereever it was
 2819:04 <fanquake> The PR we're going to be looking at is #19761.
 2919:04 <jonatack> punctured: (see https://bitcoincore.reviews/18267)
 3019:04 <fanquake> https://github.com/bitcoin/bitcoin/pull/19761
 3119:04 <fanquake> I put these changes together about a month ago to hack some sed usage out of a our build system, and try and better document what the commands had been doing.
 3219:05 <fanquake> Has anyone had a look over/review the PR?
 3319:05 <michaelfolkson> y
 3419:05 <emzy> y
 3519:05 <@jnewbery> y
 3619:05 <jonatack> y, a little
 3719:05 <Paul_R> y, a little. i became aware of the sed tool today.
 3819:06 <matthewleon> y, a little
 3919:06 <nehan> y, some
 4019:06 <amiti> y, have taken a look, but its all very new to me !
 4119:06 <fanquake> Ok. Let just jump into some Qs.
 4219:06 <pinheadmz> hi! sorry late, just lurking today
 4319:06 <fanquake> Why might we want to replace the usage of sed in our dependency builder with patches?
 4419:06 <pinheadmz> but got that ping :-) nice job jonatack
 4519:06 <Paul_R> to fail loudly? very small rocks?
 4619:06 <Paul_R> (monty python)
 4719:07 <crma> Improve documentation
 4819:07 <michaelfolkson> So Carl said sed calls fail silently
 4919:07 <emzy> To get errors if a patch is not working. With sed there is no error.
 5019:07 <fanquake> failing loudly is one reason, yes. sed has a tendency to fail silently, which isn't great. Where as patches that fail will kill that build.
 5119:07 <jonatack> maintainability, robustness, not fail silently, per PR description
 5219:08 <amiti> sed seems to have a lot of operating-system-specific behaviors and a tendency to fail silently, where as patch seems to be more explicit about expectations and errors
 5319:08 <punctured> sed's just a functional mapping which doesn't seem like a robust way to handle the desired functionality
 5419:08 <michaelfolkson> Generally sed leads to "enigmatic behavior"
 5519:08 <fanquake> Improving documentation is another. It's nice to be able to bundle documentation / info with the changes themselves, rather than comments in the depends makefiles.
 5619:09 <michaelfolkson> It was initially investigated how to use exit status with sed but presumably patches is a better approach
 5719:09 <fanquake> A lot of our remaining sed commands are still undocumented, and I'm hoping to improve that in #19867.
 5819:09 <michaelfolkson> https://askubuntu.com/questions/1036912/how-do-i-get-the-exit-status-when-using-the-sed-command/1036918#1036918
 5919:09 <fanquake> and as amiti mentioned. sed can behave different per-platform, which is annoying / can be confusing.
 6019:09 <fanquake> i.e: https://unix.stackexchange.com/questions/401905/bsd-sed-vs-gnu-sed-and-i
 6119:10 <amiti> can confirm. very confusing πŸ˜›
 6219:10 <michaelfolkson> https://github.com/bitcoin/bitcoin/issues/19815
 6319:10 <emzy> Additionaly there are different sed versions that behalf slightly different.
 6419:10 <fanquake> You're likely to run into this behavior on macOS, which uses BSD sed. I'd advise installing and using GNU sed.
 6519:10 <fanquake> Cool. So Q 2
 6619:11 <fanquake> What makes patches any better than sed? Are there any potential downsides?
 6719:11 <fanquake> There's a bit of cross-over in the answers here but 🀷
 6819:11 <michaelfolkson> I'm not sure if patches solve *everything*
 6919:11 <crma> You need to know what the output, must be
 7019:12 <michaelfolkson> But on some things they do. More context, rollback etc
 7119:12 <Paul_R> fanquake is changing the default on macos GNU sed a complicated process?
 7219:12 <nehan> patches require reviewing more of a raw change
 7319:12 <emzy> I think sed may work even if you chenge the upsteam version as the patch needs to be changed.
 7419:12 <@jnewbery> Something that might seem like a downside, but could also be considered a benefit is that the patch expects an exact input file
 7519:12 <@jnewbery> so if you update the version of the library, the patch will start failing, whereas the sed command might keep working
 7619:12 <fanquake> crma: correct, and we'll expand on that in the next Q.
 7719:13 <matthewleon> that sounds like more of a benefit
 7819:13 <matthewleon> keep "working" silently can have serious unintended consequences
 7919:13 <@jnewbery> matthweleon: +1
 8019:13 <fanquake> nehan: yes. Patches can be a bit less convenient to generate than a simple sed command.
 8119:13 <Paul_R> it does feel like more of a benefit. explicitness is good.
 8219:13 <jonatack> in some cases, looking at PR 19761, patch seem more explicit but also more verbose
 8319:14 <crma> Haven't tested, but guess sed is faster..
 8419:14 <michaelfolkson> What did sed with a sed exit status solve? Just silently failing presumably
 8519:14 <fanquake> jonatack: It certainly is more verbose, however for our build system, I see that as a positive. I'm happy for us to be as explicit as possible in most cases.
 8619:14 <jonatack> the documentation looks like a real win
 8719:14 <jonatack> fanquake: agreed!
 8819:15 <fanquake> crma: that may be true, however as a proportion of total build time, any difference between patch and sed should be insignificant.
 8919:15 <fanquake> i.e compared to extracting sources and actually compiling.
 9019:15 <fanquake> Lets move on
 9119:15 <fanquake> After this PR, there are 9 usages of sed left. Why can’t we replace all of them?
 9219:16 <fanquake> This was also mentioned earlier.
 9319:16 <michaelfolkson> Using build time variables
 9419:17 <jonatack> OS compat?
 9519:17 ⚑ jonatack stabs in the dark
 9619:17 <Paul_R> haha
 9719:17 <emzy> you could use sed on the patch file und than use patch. Just joking.
 9819:17 <fanquake> Correct. Some of the sed commands are substituting values that we can't know ahead of time.
 9919:18 <fanquake> So we obviously can't generate a patch to replace them.
10019:18 <nehan> how does one calculate the number of sed usages left? a simple grep gives me way more than 9
10119:18 <fanquake> This values include things that might be set in depends, or come from the environment.
10219:18 <michaelfolkson> Impossible > non-trivial :) At least with patches
10319:18 <fanquake> i.e the compiler being used, or some C or CXXFLAGS. some examples:
10419:18 <fanquake> https://github.com/bitcoin/bitcoin/blob/8235dca6210dab8e9657c0b592ab928554155082/depends/packages/miniupnpc.mk#L17
10519:19 <fanquake> https://github.com/bitcoin/bitcoin/blob/7f609f68d835bece8b01da1b09b127c67769ae7d/depends/packages/qt.mk#L220
10619:19 <punctured> Paul_R add gnu sed on mac with `brew install gnu-sed`
10719:19 <fanquake> In these cases we're looking at any values like $()
10819:19 <matthewleon> fanquake: is there any possibility of generating the patch files from a template using environment variable substitution?
10919:19 <Paul_R> punctured thx
11019:19 <punctured> Paul_R then it'll be available as gsed
11119:20 <fanquake> nehan: grep sed depends/packages should do it
11219:20 <matthewleon> fanquake: for example, using GNU envsubst
11319:20 <fanquake> I see 7 in qt, 1 in miniupnpc and 1 in libxcb
11419:21 <fanquake> So you are correct that there are more than 9 left. I missed libxcb before.
11519:21 <jonatack> git grep sed ./depends/packages
11619:21 <nehan> fanquake: i see 8 in qt :)
11719:21 <@jnewbery> I see 12 total, including 8 in qt
11819:21 <nehan> but i was looking at sed usage in everything. why is it important to get rid of sed in depends but not everywhere?
11919:22 <fanquake> jnewbery: are you counting the 2 in boost that aren't actually seds?
12019:22 <@jnewbery> yes :(
12119:22 <jonatack> i c 10 lines, first saw 12 as well
12219:22 <fanquake> matthewleon: possibly, however I haven't looked at that tool before. So probably can't give you a good answer right now.
12319:22 <punctured> nehan proper question +1
12419:22 <matthewleon> that might be worth looking at in a successor PR
12519:23 <punctured> nehan I was wondering the same
12619:23 <fanquake> nehan: I'd probably advocate for removing sed from most places, where possible (and we are suffering from all the issues mentioned), however I've started in the build system as that's what I know best.
12719:24 <fanquake> There's also the potential for failing sed commands to cause determinism or security issues, which I'm very keen to avoid.
12819:24 <fanquake> *in our dependencies.
12919:25 <fanquake> The PTHREAD sed in cctools is a good example of a subtle determinism issue.
13019:25 <jonatack> there seems to be a fair number in /depends outside of /depends/packages
13119:26 <@jnewbery> I had a question about native_cctools which might be a bit tangential at this point, but I'll ask anyway. Is that just used for building on macos?
13219:26 <@jnewbery> or building for a macos target?
13319:26 <fanquake> jonatack: can you link to some? afiak we don't use sed elsewhere in the dependency builder.
13419:27 <jonatack> where are the cctools seds?
13519:27 <@jnewbery> there was only one, which was removed by this PR
13619:27 <fanquake> jnewbery: it's only used on linux to build for macos
13719:27 <fanquake> https://github.com/bitcoin/bitcoin/pull/19761/commits/cc107a3af17d821f66de9357efe73214a628803b
13819:28 <fanquake> Should we move on to Q4, or do we want to discuss this further?
13919:28 <jonatack> fanquake: in depends/config.guess
14019:28 <jonatack> idk if it's relevant
14119:29 <jonatack> (or just some cruft i have)
14219:29 <fanquake> jonatack: right, config.guess/sub aren't files we maintain, and they are only used to determine system information.
14319:29 <jonatack> πŸ‘Œ
14419:29 <@jnewbery> fanquake: keep on moving. People can always ask questions about previous questions
14519:30 <fanquake> Q4. How would you test this PR?
14619:31 <emzy> Maybe make a diff of the sed/patched files? Not sure.
14719:31 <emzy> Compare the by sed changed files with the ones changed by patch.
14819:32 <nehan> i'd love to learn more about this. i got worried when i saw jnewbery manually un-tar'ing things
14919:32 <jonatack> This is an obstacle for me to reviewing these kinds of PRs... I don't know how to test.
15019:32 <@jnewbery> I went about it in a rather convoluted way...
15119:32 ⚑ fanquake needs to type faster
15219:32 <@jnewbery> the Makefile in /depends does a lot of things: download, compare hashes, extract, build, ...
15319:33 <@jnewbery> I wanted to extract and stop before it did any of the preprocessing and building (because that changes the source files, and then they get deleted afterwards)
15419:33 <fanquake> So there are multiple things you could test / check here. emzy has outlined one, which is testing that when you run the sed comamnds over the source, you end up with the same patches I've got in the PR.
15519:33 <@jnewbery> I hacked the funcs.mk file to remove all steps after extract, but fanquake pointed out that I could just extract manually
15619:34 <fanquake> One way to do that (and it's the same way I generated them), is to just extra the source for a package, init a new git repo, commit the files, and then run the sed(s). Then check the diff.
15719:34 <@jnewbery> then I went into work/build/x86_64-pc-linux-gnu , turned it into a git repo, made the sed changes and ran `git diff`
15819:34 <jonatack> jnewbery: (re cctools, thanks!)
15919:35 <@jnewbery> fanquake: snap!
16019:35 <punctured> ginx
16119:35 <fanquake> However, you also want to check that the patches are being applied correctly before building, and the end result is what we want .
16219:35 <fanquake> So now we can actually do something practical.
16319:36 <fanquake> Has everyone here actually build depends before?
16419:36 <fanquake> If not it doesn't matter.
16519:36 <Paul_R> i haven't
16619:36 <@jnewbery> Before I did that, I just ran the sed commands and did a diff of the .old file and the new file, but that didn't work in some cases (eg here: https://github.com/bitcoin/bitcoin/pull/19761/files#diff-12c0a1e69780189a7700996b933eeb26L17-L18) Can you see why?
16719:37 <emzy> First time I build the depends. Except maybe from the deterministic builds I do.
16819:37 <fanquake> emzy: yep if you've gitian built before. building depends is part of that
16919:37 <emzy> figured that.
17019:38 <fanquake> One thing I forgot to mention re "testing" a PR like this, is sanity checking the documentation.
17119:38 <fanquake> Especially given that one of the supposed benefits is better docs, I need to know that any docs I'm adding are actually useful / make sense / link to the right places etc.
17219:39 β„Ή belcher_ is now known as belcher
17319:39 <fanquake> Q5 How could we verify that the patches are being applied correctly?
17419:40 <fanquake> If you've got the repo cloned, and want to run some commands, now would be the time to navigate to bitcoin/depends/
17519:40 <emzy> There will be a error if the patch can't be applied.
17619:41 <fanquake> You might also want to kick off a build like 'make NO_WALLET=1 NO_QT=1 NO_ZMQ=1 -j6'
17719:41 <@jnewbery> fanquake: i think it'd be really helpful if there were make targets (or a flag) that didn't clean up the work directory after it was done
17819:41 <@jnewbery> that'd make it very easy to verify these things
17919:42 <fanquake> jnewbery: so that you could inspect the patched source?
18019:42 <@jnewbery> fanquake: exactly
18119:42 <@jnewbery> and the pre-patched source
18219:42 <fanquake> You can run make "lib_name"_configured and the build will stop after the configure step
18319:42 <fanquake> which would include the patching
18419:43 <fanquake> *preprocessed
18519:43 <michaelfolkson> What do you mean by applied correctly? Applied at all? Equivalent to sed command? Applied consistently across different environments?
18619:43 <fanquake> I'll write that out again.
18719:43 <fanquake> You can run something like 'make zeromq_preprocessed' and that will only build up to the end of the preprocessing stage, which should include all patching.
18819:44 <fanquake> At that point you could inspect the patched source.
18919:44 <jonatack> One nuance for people: you need to run invocations like "make NO_WALLET=1 NO_QT=1 NO_ZMQ=1 -j6" from inside /depends and not root
19019:44 <@jnewbery> fanquake: cool! thanks
19119:44 <fanquake> i.e https://gist.github.com/fanquake/c6e0c4129a1741702cb91a1e0b39b7f2
19219:45 <Paul_R> curious what the -j6 flag does?
19319:45 <fanquake> jonatack: correct. If you're running from src/ you can pass -C depends OTHER=1 ARGS=1 ETC=1
19419:46 <sipa> Paul_R: run up to 6 compilation steps in parallel
19519:46 <jonatack> Paul_R: if you have multiple CPU cores, which is the usual case nowadays, you can tell make to use all of them and reduce compile time by passing -j <number of cores>
19619:46 <fanquake> Paul_R: it's the number of make jobs to run in parallel.
19719:46 <@jnewbery> toto answer my questions earlier about why comparing the original file and the .old file there didn't work: the second sed command overwrites the .old file from the first sed command, so the .old file isn't actually the file from before sed was run.
19819:46 <Paul_R> a wild sipa appears
19919:47 <@jnewbery> fanquake: I have some more general questions about the depends builds and this PR
20019:47 <jonatack> on linux you could do something like make -j "$(($(nproc)+1))", on mac (untested): make -j "$(($(sysctl -n hw.physicalcpu)+1))"
20119:47 <fanquake> jnewbery: sure. I did want to get to one example of inspecting changes in a lib we built, but fire away.
20219:48 <@jnewbery> in the fontconfig change, the sed command was previously being run in the build step, but the patch is being run in the preprocess step: https://github.com/bitcoin/bitcoin/pull/19761/commits/865cb23a485d88be603c1d6bf8c32ef7a5edeaa2
20319:48 <@jnewbery> The gperf_header_regen.patch didn't work for me because the makefile didn't exist. That might have just been a quirk of the way I extracted the source files though
20419:49 <@jnewbery> is it definitely ok to run a patch on the Makefile in the preprocess step?
20519:49 <fanquake> That's correct. I think there's a comment on the PR in regards to that. Basically it makes more sense to keep patching in preprocessing, so we moved it there.
20619:50 <fanquake> We actually also modified the patch to edit Makefile.in, rather than Makefile
20719:50 <fanquake> https://github.com/bitcoin/bitcoin/pull/19761#discussion_r475511755
20819:50 <@jnewbery> Ah!
20919:50 <@jnewbery> that makes so much more sense
21019:51 <@jnewbery> because the Makefile doesn't exist in the preprocess step. Thanks!
21119:51 <Paul_R> i'd like to learn more about the pre-process & build stages. are there any links that could introduce me to what happens in pre-process vs build steps and why?
21219:51 <nehan> Paul_R: +1
21319:52 <nehan> actually, could you just quickly overview the stages?
21419:52 <@jnewbery> Paul_R: as it happens, https://github.com/bitcoin/bitcoin/blob/8235dca6210dab8e9657c0b592ab928554155082/depends/packages.md#build-commands
21519:52 <fanquake> Thanks, that'll probably save me some furious typing.
21619:53 <nehan> jnewbery: thanks!
21719:53 <@jnewbery> fetch, extract, preprocess, configure, build, stage. Most should be self-evident
21819:53 <Paul_R> thx
21919:53 <fanquake> btw. Did anyone manage to complete a depends build?
22019:53 <@jnewbery> I guess preprocess and stage are the only two that might not be immediately obvious?
22119:54 <Paul_R> fetch - get the zip
22219:54 <@jnewbery> preprocess is exactly what we've talked about today. Making small tweaks to the source files before building
22319:54 <Paul_R> woops sent prematurely
22419:54 <michaelfolkson> Not Core specific but may be of interest too Paul_R https://codingnest.com/the-little-things-speeding-up-c-compilation/
22519:54 <Paul_R> thanks michaelfolkson
22619:55 <fanquake> jnewbery: correct
22719:55 <Paul_R> jnewbery these tweaks are mostly for versionning? i'm still confused why the values are unknown at runtime, which is why we can't use patch on everything
22819:55 <nehan> fanquake: i thought i did but it went relatively quickly so now i'm not sure. i followed the instructions of making in depends and then configuring with the prefix
22919:57 <jonatack> fanquake: is https://github.com/fanquake/core-review/blob/master/diffoscope.md what you use for comparing/diffing builds?
23019:57 <@jnewbery> nehan: how relatively is quickly? I think the thing that takes the most time in a depends build is qt
23119:57 <fanquake> nehan: cool. We are running our of time, but want to try running 'strings depends/x86_64-pc-linux-gnu/lib/libminiupnpc.a | grep User-Agent'
23219:58 <jonatack> User-Agent: x86_64-pc-linux-gnu, UPnP/1.1, MiniUPnPc/2.0.20180203
23319:58 <nehan> User-Agent: x86_64-pc-linux-gnu, UPnP/1.1, MiniUPnPc/2.0.20180203
23419:58 <jonatack> πŸ†
23519:59 <fanquake> So the reason I bring that up is it's an example of where we are substituting in build time values using sed
23619:59 <fanquake> https://github.com/bitcoin/bitcoin/blob/8235dca6210dab8e9657c0b592ab928554155082/depends/packages/miniupnpc.mk#L17
23719:59 <fanquake> You can see that the miniupnpc version (2.0.20180203) and the host (x86_64-pc-linux-gnu) end up embeded in the user agent string in the miniupnpc lib
23820:00 <fanquake> I was going to suggest editing miniupnpc.mk to remove line 17, and then rebuilding depends, and checking that same user agent string after rebuilding.
23920:00 <fanquake> However I think we are out of time.
24020:00 <@jnewbery> sounds like a good assignment :)
24120:01 <fanquake> Apologies that we didn't get to much interactive stuff like thatt.
24220:01 <fanquake> jnewbery: how do we wrap up?
24320:01 <@jnewbery> #endmeeting
24420:01 <jonatack> thanks fanquake!
24520:01 <@jnewbery> ^ like dat
24620:02 <nehan> User-Agent: Ubuntu/18.04, UPnP/1.1, MiniUPnPc/2.0
24720:02 <emzy> Tnx fanquake for hosting and staying up all night. :)
24820:02 <fanquake> Cool. Thanks everyone.
24920:02 <nehan> thanks!
25020:02 <@jnewbery> thanks fanquake. That was really interesting. It was fun to dig into something new
25120:02 <Paul_R> thx fanquake
25220:02 <crma> thank you for your time and your insights
25320:02 <fanquake> nehan: great.
25420:03 <@jnewbery> No need to apologize. People can make the changes you suggested and try rebuilding. I assume they can reach you here (probably tomorrow at this point)?
25520:03 <fanquake> So now you've ended up with the values that miniupnpc tries to put into the useragent: https://github.com/miniupnp/miniupnp/blob/master/miniupnpc/miniupnpcstrings.h.in
25620:04 <@jnewbery> oh, one last question. Do you have suggestions for other PRs to look at now that we're all experts in the depends builds?
25720:04 <fanquake> jnewbery: yea I'm around if people want to ping me.
25820:04 <jonatack> User-Agent: Debian/testing, UPnP/1.1, MiniUPnPc/2.0
25920:04 <fanquake> Sure. I'm looking for some review on: https://github.com/bitcoin/bitcoin/pull/19522
26020:05 <fanquake> https://github.com/bitcoin/bitcoin/pull/19867 is also related to the changes we discussed today, and adds some more documentation.
26120:07 <jonatack> πŸ‘
26220:17 <punctured> thanks fanquake