Today’s PR is the recently merged #18468: Span Improvements.
It brings our Span type closer to the functionality of C++20’s proposed std::span and
then demonstrates some of its uses by simplifying code that uses it in various places.
A Span can be thought of as a pair composed of a pointer to a data type together with a length,
identifying a range of contiguous elements in memory. In many ways it acts like a vector,
but it doesn’t own any data — which means no copying of the actual data is involved. It is
an extremely lightweight object, but it does come with a cost: higher-level code is responsible
for guaranteeing that the pointed-to data is still available.
std::span is a new data type that will likely be part of the upcoming C++20 standard
(see the cppreference.com page on std::span).
While Bitcoin Core won’t switch to that standard any time soon (we’re currently on C++11 and will
be transitioning to C++17 over the next 2 releases), it is a remarkably useful and simple
abstraction, which is why we have our own backported version that is compatible with C++11.
Span isn’t quite as powerful as std::span, but today’s PR brings it a lot closer.
The gist of the changes is introducing implicit construction of Span objects from range-like
objects (arrays,
std::array,
std::vector,
prevector,
std::string, etc.) and
automatic conversion between Span of compatible member types. Implicit construction and
conversion is a powerful but dangerous C++ feature that should be used cautiously. Most of
the complexity in the code changes is in making sure these operations cannot be used in
dangerous or unexpected ways.
Several other PRs have been merged and proposed that make use of Span. Looking over those
may give an intuition for why this data type is so useful:
Do you think Span is a useful abstraction? Can you think of more places where it could
be used to simplify existing code?
When reviewing the PR, did you compare with the proposed
std::span interface? What differences
did you notice?
What condition is imposed on converting Span<T1> into a Span<T2>? Why is it useful to
permit such conversion, and what are the risks in doing so unconditionally?
Why is MakeSpan useful? Can’t it be replaced with just invoking the Span::Span
constructor?
What are some other examples of features from future C++ versions that have been backported
as utility features in the Bitcoin Core codebase?
<jnewbery> Today we're looking at PR 18468 (Span improvements), although I expect the discussion might touch on other aspects of Spans that aren't specifically in that PR.
<jnewbery> sipa has very kindly offered to host today. He also contributed all of our Span implementation in Bitcoin Core, so he should be able to answer at least some of our questions :)
<sipa> so the first real question: Do you think Span is a useful abstraction? Can you think of more places where it could be used to simplify existing code?
<michaelfolkson> Can we quickly first just confirm the motivation for Span? We are concerned with buffer overflows right? And including a size with a pointer ensures these don't occur?
<sipa> michaelfolkson: i wouldn't say that preventing buffer overdlows is the primary motivation... indirectly i expect that more readable code for working with ranges of objects reduces the chance of that, but i'd say readability is the goal
<michaelfolkson> sipa: Thanks. So is this use case of making the script interpreter independent from CScript possible without Span? Span just makes it easier due to more readable code?
<sipa> the script interpreter for witness scripts, it has to deal with a range of elements that are the actual stack, and then a final one that is the script
<willcl_ark> It certainly seems handy for the experienced programmer, but also seems to introduce some "hidden" pitfalls (which are described in #19367) that you might be more inclined to consider with a pointer/length pair.
<fjahr> sipa: I was just thinking of at which points we might want to handle data efficiently and I thought it might be connect with data that gets saved to disk sometimes. But yes, it does not directly make the case to use a span.
<jnewbery> Often, we'll have data stored in continguous range-like contains (arrays, vactors, ...). There are functions that we want to call with that data, but those functions don't want to care about the specifics of the container, so currently they take a pointer and a size, which the caller needs to fish out of the container.
<jnewbery> With a Span, and implicit conversion from those containers to Span, the function can just take a Span, the caller can pass any of those range-like containers, and the conversion will take care of it for you.
<sipa> and an anti-pattern that sometimes emerges is "oh, let's make the function only accept a vector, and if someone has it in some other form, they can construct a vector with a copy of the data" - which is obviously wasteful
<sipa> with a span you'd write the function once, and it'll work when called with a vector, or an array, or an std::array, or a prevector, or a uint256, or a CPubKey (the latter two also being types that act as ranges bytes!)
<sipa> which may be worth it, but in many cases it's also not: if literally all your function needs is a range, there is no reason to instantiate it for vector and prevector and array and ...
<sipa> so perhaps it's reasonable to state it as: a more lightweight alternative to making functions templated over various input types, in case all they need is a range of elements... while simultaneously also supporting passing down sub-ranges
<sipa> What condition is imposed on converting Span<T1> into a Span<T2>? Why is it useful to permit such conversion, and what are the risks in doing so unconditionally?
<sipa> if it's possible to treat an array of T as an array of C (e.g. if you have an array of ints, you can treat it as an array of const ints), then converting a Span of C to a Span of T is also possible
<jnewbery> The commit log confused me: "This prevents constructing a Span<A> given two pointers into an array of B (where B is a subclass of A), at least without explicit cast to pointers to A." I think it is possible to construct Span<A> (as long as those pointers to arrays are implicitly convertible)
<fjahr> sorry, was the answer to the risks that the elements could have a different size or was there something else? i think i misunderstood the question that there was something else