Improve sed robustness by not using sed (build system )
Sep 23, 2020
https://github.com/bitcoin/bitcoin/pull/19761
Host:
fanquake
-
PR author: fanquake
The PR branch HEAD was 3de365e at the time of this review club meeting.
Notes
Bitcoin Core requires several
dependencies .
These include large dependencies such as boost ,
libevent and qt , as well as
several smaller dependencies.
These dependencies can be built locally as part of a depends
build .
The versions of each dependency library is pinned by a specific hash
(e.g.
here
for bdb).
Sometimes small changes need to be made to the library before building. This
has previously been done using sed .
sed
(s tream ed itor) is a lightweight tool that is often used for
text substitutions.
This PR changes some of those sed
commands for
patch files, which more
explicitly document the exact substitution.
Questions
Why might we want to replace the usage of sed
in our dependency builder
with patches?
What makes patches any better than sed? Are there any potential downsides?
After this PR, there are 9 usages of sed left. Why canβt we replace all of
them?
How would you test this PR?
How could we verify that the patches are being applied correctly?
Meeting Log
1 19:00 <@jnewbery> #startmeeting
4 19:00 <@jnewbery> hi all. Welcome to PR Review Club. Feel free to say hi to let everyone know you're here
7 19:00 <@jnewbery> Anyone here for the first time?
9 19:01 <michaelfolkson> hi
10 19:01 <Paul_R> 2nd time
11 19:01 <@jnewbery> welcome back Paul_R. Glad we didn't scare you off :)
13 19:01 <Paul_R> I was here on september 2, for bip 340...haha no i'm glad to be back :)
14 19:01 <@jnewbery> This week we're going to look at part of the repo that I know very little about - the build system!
15 19:02 <@jnewbery> (Sorry, I didn't prepare an ASCII art BUILD SYSTEM)
16 19:02 <@jnewbery> I'm really excited to get stuck in and learn more about our depends build, and fanquake has very kindly stayed up late in his timezone to teach us all. Thanks fanquake
17 19:02 <@jnewbery> I'll pass over to you now
18 19:02 <punctured> meaning a ascii art diagram of the build system?
19 19:03 <jonatack> [[[((((*=== build system ===*))))]]]
20 19:03 <punctured> of a build system for ascii art *scratches head*
21 19:03 <punctured> or a*
22 19:03 <michaelfolkson> Inside joke
23 19:03 <fanquake> jnewbery: cool
24 19:03 <punctured> got it
25 19:03 <michaelfolkson> Past PR review club session on Signet punctured
26 19:03 <@jnewbery> punctured: pinheadmz reasied the bar considerably with some art a couple of weeks ago
27 19:03 <fanquake> Prepare for me to drop is back down to whereever it was
28 19:04 <fanquake> The PR we're going to be looking at is #19761.
31 19:04 <fanquake> I put these changes together about a month ago to hack some sed usage out of a our build system, and try and better document what the commands had been doing.
32 19:05 <fanquake> Has anyone had a look over/review the PR?
33 19:05 <michaelfolkson> y
36 19:05 <jonatack> y, a little
37 19:05 <Paul_R> y, a little. i became aware of the sed tool today.
38 19:06 <matthewleon> y, a little
40 19:06 <amiti> y, have taken a look, but its all very new to me !
41 19:06 <fanquake> Ok. Let just jump into some Qs.
42 19:06 <pinheadmz> hi! sorry late, just lurking today
43 19:06 <fanquake> Why might we want to replace the usage of sed in our dependency builder with patches?
44 19:06 <pinheadmz> but got that ping :-) nice job jonatack
45 19:06 <Paul_R> to fail loudly? very small rocks?
46 19:06 <Paul_R> (monty python)
47 19:07 <crma> Improve documentation
48 19:07 <michaelfolkson> So Carl said sed calls fail silently
49 19:07 <emzy> To get errors if a patch is not working. With sed there is no error.
50 19:07 <fanquake> failing loudly is one reason, yes. sed has a tendency to fail silently, which isn't great. Where as patches that fail will kill that build.
51 19:07 <jonatack> maintainability, robustness, not fail silently, per PR description
52 19:08 <amiti> sed seems to have a lot of operating-system-specific behaviors and a tendency to fail silently, where as patch seems to be more explicit about expectations and errors
53 19:08 <punctured> sed's just a functional mapping which doesn't seem like a robust way to handle the desired functionality
54 19:08 <michaelfolkson> Generally sed leads to "enigmatic behavior"
55 19:08 <fanquake> Improving documentation is another. It's nice to be able to bundle documentation / info with the changes themselves, rather than comments in the depends makefiles.
56 19:09 <michaelfolkson> It was initially investigated how to use exit status with sed but presumably patches is a better approach
57 19:09 <fanquake> A lot of our remaining sed commands are still undocumented, and I'm hoping to improve that in #19867.
59 19:09 <fanquake> and as amiti mentioned. sed can behave different per-platform, which is annoying / can be confusing.
61 19:10 <amiti> can confirm. very confusing π
63 19:10 <emzy> Additionaly there are different sed versions that behalf slightly different.
64 19:10 <fanquake> You're likely to run into this behavior on macOS, which uses BSD sed. I'd advise installing and using GNU sed.
65 19:10 <fanquake> Cool. So Q 2
66 19:11 <fanquake> What makes patches any better than sed? Are there any potential downsides?
67 19:11 <fanquake> There's a bit of cross-over in the answers here but π€·
68 19:11 <michaelfolkson> I'm not sure if patches solve *everything*
69 19:11 <crma> You need to know what the output, must be
70 19:12 <michaelfolkson> But on some things they do. More context, rollback etc
71 19:12 <Paul_R> fanquake is changing the default on macos GNU sed a complicated process?
72 19:12 <nehan> patches require reviewing more of a raw change
73 19:12 <emzy> I think sed may work even if you chenge the upsteam version as the patch needs to be changed.
74 19:12 <@jnewbery> Something that might seem like a downside, but could also be considered a benefit is that the patch expects an exact input file
75 19:12 <@jnewbery> so if you update the version of the library, the patch will start failing, whereas the sed command might keep working
76 19:12 <fanquake> crma: correct, and we'll expand on that in the next Q.
77 19:13 <matthewleon> that sounds like more of a benefit
78 19:13 <matthewleon> keep "working" silently can have serious unintended consequences
79 19:13 <@jnewbery> matthweleon: +1
80 19:13 <fanquake> nehan: yes. Patches can be a bit less convenient to generate than a simple sed command.
81 19:13 <Paul_R> it does feel like more of a benefit. explicitness is good.
82 19:13 <jonatack> in some cases, looking at PR 19761, patch seem more explicit but also more verbose
83 19:14 <crma> Haven't tested, but guess sed is faster..
84 19:14 <michaelfolkson> What did sed with a sed exit status solve? Just silently failing presumably
85 19:14 <fanquake> jonatack: It certainly is more verbose, however for our build system, I see that as a positive. I'm happy for us to be as explicit as possible in most cases.
86 19:14 <jonatack> the documentation looks like a real win
87 19:14 <jonatack> fanquake: agreed!
88 19:15 <fanquake> crma: that may be true, however as a proportion of total build time, any difference between patch and sed should be insignificant.
89 19:15 <fanquake> i.e compared to extracting sources and actually compiling.
90 19:15 <fanquake> Lets move on
91 19:15 <fanquake> After this PR, there are 9 usages of sed left. Why canβt we replace all of them?
92 19:16 <fanquake> This was also mentioned earlier.
93 19:16 <michaelfolkson> Using build time variables
94 19:17 <jonatack> OS compat?
95 19:17 β‘ jonatack stabs in the dark
97 19:17 <emzy> you could use sed on the patch file und than use patch. Just joking.
98 19:17 <fanquake> Correct. Some of the sed commands are substituting values that we can't know ahead of time.
99 19:18 <fanquake> So we obviously can't generate a patch to replace them.
100 19:18 <nehan> how does one calculate the number of sed usages left? a simple grep gives me way more than 9
101 19:18 <fanquake> This values include things that might be set in depends, or come from the environment.
102 19:18 <michaelfolkson> Impossible > non-trivial :) At least with patches
103 19:18 <fanquake> i.e the compiler being used, or some C or CXXFLAGS. some examples:
106 19:19 <punctured> Paul_R add gnu sed on mac with `brew install gnu-sed`
107 19:19 <fanquake> In these cases we're looking at any values like $()
108 19:19 <matthewleon> fanquake: is there any possibility of generating the patch files from a template using environment variable substitution?
109 19:19 <Paul_R> punctured thx
110 19:19 <punctured> Paul_R then it'll be available as gsed
111 19:20 <fanquake> nehan: grep sed depends/packages should do it
112 19:20 <matthewleon> fanquake: for example, using GNU envsubst
113 19:20 <fanquake> I see 7 in qt, 1 in miniupnpc and 1 in libxcb
114 19:21 <fanquake> So you are correct that there are more than 9 left. I missed libxcb before.
115 19:21 <jonatack> git grep sed ./depends/packages
116 19:21 <nehan> fanquake: i see 8 in qt :)
117 19:21 <@jnewbery> I see 12 total, including 8 in qt
118 19:21 <nehan> but i was looking at sed usage in everything. why is it important to get rid of sed in depends but not everywhere?
119 19:22 <fanquake> jnewbery: are you counting the 2 in boost that aren't actually seds?
120 19:22 <@jnewbery> yes :(
121 19:22 <jonatack> i c 10 lines, first saw 12 as well
122 19:22 <fanquake> matthewleon: possibly, however I haven't looked at that tool before. So probably can't give you a good answer right now.
123 19:22 <punctured> nehan proper question +1
124 19:22 <matthewleon> that might be worth looking at in a successor PR
125 19:23 <punctured> nehan I was wondering the same
126 19:23 <fanquake> nehan: I'd probably advocate for removing sed from most places, where possible (and we are suffering from all the issues mentioned), however I've started in the build system as that's what I know best.
127 19:24 <fanquake> There's also the potential for failing sed commands to cause determinism or security issues, which I'm very keen to avoid.
128 19:24 <fanquake> *in our dependencies.
129 19:25 <fanquake> The PTHREAD sed in cctools is a good example of a subtle determinism issue.
130 19:25 <jonatack> there seems to be a fair number in /depends outside of /depends/packages
131 19:26 <@jnewbery> I had a question about native_cctools which might be a bit tangential at this point, but I'll ask anyway. Is that just used for building on macos?
132 19:26 <@jnewbery> or building for a macos target?
133 19:26 <fanquake> jonatack: can you link to some? afiak we don't use sed elsewhere in the dependency builder.
134 19:27 <jonatack> where are the cctools seds?
135 19:27 <@jnewbery> there was only one, which was removed by this PR
136 19:27 <fanquake> jnewbery: it's only used on linux to build for macos
138 19:28 <fanquake> Should we move on to Q4, or do we want to discuss this further?
139 19:28 <jonatack> fanquake: in depends/config.guess
140 19:28 <jonatack> idk if it's relevant
141 19:29 <jonatack> (or just some cruft i have)
142 19:29 <fanquake> jonatack: right, config.guess/sub aren't files we maintain, and they are only used to determine system information.
144 19:29 <@jnewbery> fanquake: keep on moving. People can always ask questions about previous questions
145 19:30 <fanquake> Q4. How would you test this PR?
146 19:31 <emzy> Maybe make a diff of the sed/patched files? Not sure.
147 19:31 <emzy> Compare the by sed changed files with the ones changed by patch.
148 19:32 <nehan> i'd love to learn more about this. i got worried when i saw jnewbery manually un-tar'ing things
149 19:32 <jonatack> This is an obstacle for me to reviewing these kinds of PRs... I don't know how to test.
150 19:32 <@jnewbery> I went about it in a rather convoluted way...
151 19:32 β‘ fanquake needs to type faster
152 19:32 <@jnewbery> the Makefile in /depends does a lot of things: download, compare hashes, extract, build, ...
153 19:33 <@jnewbery> I wanted to extract and stop before it did any of the preprocessing and building (because that changes the source files, and then they get deleted afterwards)
154 19:33 <fanquake> So there are multiple things you could test / check here. emzy has outlined one, which is testing that when you run the sed comamnds over the source, you end up with the same patches I've got in the PR.
155 19:33 <@jnewbery> I hacked the funcs.mk file to remove all steps after extract, but fanquake pointed out that I could just extract manually
156 19:34 <fanquake> One way to do that (and it's the same way I generated them), is to just extra the source for a package, init a new git repo, commit the files, and then run the sed(s). Then check the diff.
157 19:34 <@jnewbery> then I went into work/build/x86_64-pc-linux-gnu , turned it into a git repo, made the sed changes and ran `git diff`
158 19:34 <jonatack> jnewbery: (re cctools, thanks!)
159 19:35 <@jnewbery> fanquake: snap!
160 19:35 <punctured> ginx
161 19:35 <fanquake> However, you also want to check that the patches are being applied correctly before building, and the end result is what we want .
162 19:35 <fanquake> So now we can actually do something practical.
163 19:36 <fanquake> Has everyone here actually build depends before?
164 19:36 <fanquake> If not it doesn't matter.
165 19:36 <Paul_R> i haven't
167 19:37 <emzy> First time I build the depends. Except maybe from the deterministic builds I do.
168 19:37 <fanquake> emzy: yep if you've gitian built before. building depends is part of that
169 19:37 <emzy> figured that.
170 19:38 <fanquake> One thing I forgot to mention re "testing" a PR like this, is sanity checking the documentation.
171 19:38 <fanquake> Especially given that one of the supposed benefits is better docs, I need to know that any docs I'm adding are actually useful / make sense / link to the right places etc.
172 19:39 βΉ belcher_ is now known as belcher
173 19:39 <fanquake> Q5 How could we verify that the patches are being applied correctly?
174 19:40 <fanquake> If you've got the repo cloned, and want to run some commands, now would be the time to navigate to bitcoin/depends/
175 19:40 <emzy> There will be a error if the patch can't be applied.
176 19:41 <fanquake> You might also want to kick off a build like 'make NO_WALLET=1 NO_QT=1 NO_ZMQ=1 -j6'
177 19:41 <@jnewbery> fanquake: i think it'd be really helpful if there were make targets (or a flag) that didn't clean up the work directory after it was done
178 19:41 <@jnewbery> that'd make it very easy to verify these things
179 19:42 <fanquake> jnewbery: so that you could inspect the patched source?
180 19:42 <@jnewbery> fanquake: exactly
181 19:42 <@jnewbery> and the pre-patched source
182 19:42 <fanquake> You can run make "lib_name"_configured and the build will stop after the configure step
183 19:42 <fanquake> which would include the patching
184 19:43 <fanquake> *preprocessed
185 19:43 <michaelfolkson> What do you mean by applied correctly? Applied at all? Equivalent to sed command? Applied consistently across different environments?
186 19:43 <fanquake> I'll write that out again.
187 19:43 <fanquake> You can run something like 'make zeromq_preprocessed' and that will only build up to the end of the preprocessing stage, which should include all patching.
188 19:44 <fanquake> At that point you could inspect the patched source.
189 19:44 <jonatack> One nuance for people: you need to run invocations like "make NO_WALLET=1 NO_QT=1 NO_ZMQ=1 -j6" from inside /depends and not root
190 19:44 <@jnewbery> fanquake: cool! thanks
192 19:45 <Paul_R> curious what the -j6 flag does?
193 19:45 <fanquake> jonatack: correct. If you're running from src/ you can pass -C depends OTHER=1 ARGS=1 ETC=1
194 19:46 <sipa> Paul_R: run up to 6 compilation steps in parallel
195 19:46 <jonatack> Paul_R: if you have multiple CPU cores, which is the usual case nowadays, you can tell make to use all of them and reduce compile time by passing -j <number of cores>
196 19:46 <fanquake> Paul_R: it's the number of make jobs to run in parallel.
197 19:46 <@jnewbery> toto answer my questions earlier about why comparing the original file and the .old file there didn't work: the second sed command overwrites the .old file from the first sed command, so the .old file isn't actually the file from before sed was run.
198 19:46 <Paul_R> a wild sipa appears
199 19:47 <@jnewbery> fanquake: I have some more general questions about the depends builds and this PR
200 19:47 <jonatack> on linux you could do something like make -j "$(($(nproc)+1))", on mac (untested): make -j "$(($(sysctl -n hw.physicalcpu)+1))"
201 19:47 <fanquake> jnewbery: sure. I did want to get to one example of inspecting changes in a lib we built, but fire away.
203 19:48 <@jnewbery> The gperf_header_regen.patch didn't work for me because the makefile didn't exist. That might have just been a quirk of the way I extracted the source files though
204 19:49 <@jnewbery> is it definitely ok to run a patch on the Makefile in the preprocess step?
205 19:49 <fanquake> That's correct. I think there's a comment on the PR in regards to that. Basically it makes more sense to keep patching in preprocessing, so we moved it there.
206 19:50 <fanquake> We actually also modified the patch to edit Makefile.in, rather than Makefile
209 19:50 <@jnewbery> that makes so much more sense
210 19:51 <@jnewbery> because the Makefile doesn't exist in the preprocess step. Thanks!
211 19:51 <Paul_R> i'd like to learn more about the pre-process & build stages. are there any links that could introduce me to what happens in pre-process vs build steps and why?
212 19:51 <nehan> Paul_R: +1
213 19:52 <nehan> actually, could you just quickly overview the stages?
215 19:52 <fanquake> Thanks, that'll probably save me some furious typing.
216 19:53 <nehan> jnewbery: thanks!
217 19:53 <@jnewbery> fetch, extract, preprocess, configure, build, stage. Most should be self-evident
219 19:53 <fanquake> btw. Did anyone manage to complete a depends build?
220 19:53 <@jnewbery> I guess preprocess and stage are the only two that might not be immediately obvious?
221 19:54 <Paul_R> fetch - get the zip
222 19:54 <@jnewbery> preprocess is exactly what we've talked about today. Making small tweaks to the source files before building
223 19:54 <Paul_R> woops sent prematurely
225 19:54 <Paul_R> thanks michaelfolkson
226 19:55 <fanquake> jnewbery: correct
227 19:55 <Paul_R> jnewbery these tweaks are mostly for versionning? i'm still confused why the values are unknown at runtime, which is why we can't use patch on everything
228 19:55 <nehan> fanquake: i thought i did but it went relatively quickly so now i'm not sure. i followed the instructions of making in depends and then configuring with the prefix
230 19:57 <@jnewbery> nehan: how relatively is quickly? I think the thing that takes the most time in a depends build is qt
231 19:57 <fanquake> nehan: cool. We are running our of time, but want to try running 'strings depends/x86_64-pc-linux-gnu/lib/libminiupnpc.a | grep User-Agent'
232 19:58 <jonatack> User-Agent: x86_64-pc-linux-gnu, UPnP/1.1, MiniUPnPc/2.0.20180203
233 19:58 <nehan> User-Agent: x86_64-pc-linux-gnu, UPnP/1.1, MiniUPnPc/2.0.20180203
235 19:59 <fanquake> So the reason I bring that up is it's an example of where we are substituting in build time values using sed
237 19:59 <fanquake> You can see that the miniupnpc version (2.0.20180203) and the host (x86_64-pc-linux-gnu) end up embeded in the user agent string in the miniupnpc lib
238 20:00 <fanquake> I was going to suggest editing miniupnpc.mk to remove line 17, and then rebuilding depends, and checking that same user agent string after rebuilding.
239 20:00 <fanquake> However I think we are out of time.
240 20:00 <@jnewbery> sounds like a good assignment :)
241 20:01 <fanquake> Apologies that we didn't get to much interactive stuff like thatt.
242 20:01 <fanquake> jnewbery: how do we wrap up?
243 20:01 <@jnewbery> #endmeeting
244 20:01 <jonatack> thanks fanquake!
245 20:01 <@jnewbery> ^ like dat
246 20:02 <nehan> User-Agent: Ubuntu/18.04, UPnP/1.1, MiniUPnPc/2.0
247 20:02 <emzy> Tnx fanquake for hosting and staying up all night. :)
248 20:02 <fanquake> Cool. Thanks everyone.
250 20:02 <@jnewbery> thanks fanquake. That was really interesting. It was fun to dig into something new
251 20:02 <Paul_R> thx fanquake
252 20:02 <crma> thank you for your time and your insights
253 20:02 <fanquake> nehan: great.
254 20:03 <@jnewbery> No need to apologize. People can make the changes you suggested and try rebuilding. I assume they can reach you here (probably tomorrow at this point)?
256 20:04 <@jnewbery> oh, one last question. Do you have suggestions for other PRs to look at now that we're all experts in the depends builds?
257 20:04 <fanquake> jnewbery: yea I'm around if people want to ping me.
258 20:04 <jonatack> User-Agent: Debian/testing, UPnP/1.1, MiniUPnPc/2.0
262 20:17 <punctured> thanks fanquake