Cpp Notes

taking_a_byte_out_of_cpp

Taking a Byte Out of C++ - Avoiding Punning by Starting Lifetimes - Robert Leahy

Understanding C++ type system

struct foo {
  std::uint32_t a;
  std::uint32_t b;
};

static_assert(sizeof(foo) == sizeof(std::uint64_t));

std::uint64_t& i, const foo& f) noexcept {
  if (f.a == 2) {
    i = 4;
  };
  if (f.a == 2) {
    return f.a;
  };
  return f.b;
};


int main() {
  foo f{2, 3};
  printf("%d\n", bar((std::uint64_t&)f, f));
  // shows 0!? why f.b got cleared to 0?
  // it's actually UB...
}


// NOTE: No Type Punning Rule

// An object within its lifetime may only be accessed in certain ways

// - Through a reference to its type (addition of cv qualification allowed)
// - Through a reference to its signed or unsigned equivalent
// - Through a reference to char, unsigned char, or std::byte

// Any other access modality is undefined behavior.
  • Code Structure and Functionality:

    • Defines a struct foo with two std::uint32_t members (a and b).
    • Asserts that the size of foo is equal to the size of std::uint64_t.
    • Implements a function bar that modifies and returns values based on the condition of foo's a member.
    • A main function demonstrates an unexpected behavior when foo is treated as both its type and a std::uint64_t.
  • Memory's Role in Programming:

    • Stresses the importance of memory in programming and the need for developers to have a solid mental model of it.
    • Discusses the creation of simple rules to manage cognitive overhead, though these rules can break under complex scenarios.
  • Introduction to Higher-Level Constructs in C++:

    • Uses the foo struct as an example to illustrate the abstraction from raw memory to higher-level constructs in C++, simplifying memory management and representation.
    • Examines how these abstractions allow developers to bundle values and work with them without dealing with low-level details.
  • Exploration of Type System and Memory Model:

    • Proposes types as lenses through which memory is viewed and manipulated in C++, highlighting the disconnect that can occur between the programmer's intentions and the actual memory representation.
    • Presents a practical scenario where foo and a std::uint64_t might refer to the same memory, leading to unexpected results due to type aliasing.
  • Compiler Optimizations and Type Aliasing:

    • Explains how compilers optimize based on the assumption that different types do not alias the same memory location, which can lead to optimizations that affect program correctness.
    • Clarifies the rules around type aliasing, stating that accessing an object through a reference of an unrelated type is undefined behavior, which compilers exploit to optimize code.
  • Practical Implications and Undefined Behavior:

    • Demonstrates through an example how the compiler's assumptions regarding type aliasing can lead to unexpected behavior, such as modifying one value through a reference potentially affecting another seemingly unrelated value.
    • Concludes that the program's behavior can become unpredictable when violating the C++ standard's rules on type aliasing.

Proof of the compiler rule

//// The updated C++ code example demonstrates an important concept in C++
/// related to type aliasing and how the compiler handles optimizations based on
/// the type system.

// change to "related" type uint32_t
std::uint32_t& i, const foo& f) noexcept {
  // ... same as above
}

int main() {
  foo f{2, 3};
  printf("%d\n",
         bar((std::3 now!!!
}
  • Modification to Use Related Type:

    • The function bar is changed to accept a reference to std::uint64_t.
    • This modification aligns the type of the reference i with the type of foo's member a, making them related types according to the C++ standard.
  • Behavioral Change in Program Output:

    • With the modification, the main function now prints 3 instead of the previously unexpected 0.
    • This change in output indicates that the compiler treats the access to i and f.a differently due to them being related types.
  • Compiler's Handling of Related Types:

    • When the types are related, the compiler cannot assume that the reference to i and the foo structure f do not alias the same memory.
    • As a result, the compiler must emit the move instruction to reload f.a after modifying i, acknowledging the possibility that i and f.a may refer to the same memory location.
  • Implications of Using Related Types:

    • By making i and f.a related types, the program conforms to the C++ standard's rules on type aliasing, avoiding undefined behavior.
    • This demonstrates how closely the C++ type system and memory model are intertwined, and how understanding these relationships is crucial for writing correct and predictable C++ programs.
  • Reflection on Memory Model and Type System:

    • The example underscores the complexity of C++'s memory model and type system, challenging the simplistic view of memory as just bytes.
    • It highlights the significance of type aliasing rules and compiler optimizations in influencing program behavior.

C++ Has an Object Model

  • Bytes supply storage for objects
  • Objects have lifetimes
  • Duration of storage is not necessarily the same as object lifetime
  • Accessing object outside lifetime is undefined behavior

C++ Object Model and Undefined Behavior

  • Storage vs. Object Lifetime:
    • Storage provides the space for objects, but does not automatically start an object's lifetime.
    • Object lifetimes are crucial; accessing an object outside its lifetime results in undefined behavior.
    • The distinction is essential for understanding how C++ manages memory and objects.

Example 1: Allocation of Integers

const auto ptr = (int*)std::malloc(sizeof(int) * 4);
if (!ptr) {
  throw std::bad_alloc();

  for (int i = 0; i < 4; ++i) {
    ptr[i] = i;
  }
}
  • Code Description:

    • Allocates storage for four integers using malloc but does not initiate the lifetime of any integers.
    • Attempts to assign values to this uninitialized storage, leading to undefined behavior.
  • Implications:

    • While the code may compile and run, it technically operates on objects outside their lifetimes, exploiting a common misunderstanding of storage versus object initiation.

Example 2: Allocation of Strings

const auto ptr = (std::malloc(sizeof(std::string) * 4);
if (!ptr) {
  throw std::bad_alloc();

  for (int i = 0; i < 4; ++i) {
    ptr[i] = std::to_string(i);
  }
}
  • Code Adaptation:

    • Similar structure to the first example but allocates storage for std::string objects.
    • Attempts to assign string values to the allocated storage, which is more evidently problematic due to std::string's complex invariants and constructor requirements.
  • Key Differences:

    • Strings have constructors that establish invariants, unlike trivial types like integers.
    • Overlaying std::string objects onto raw, uninitialized storage without properly constructing them violates these invariants, leading to more obvious undefined behavior.

C++ Types May Have Invariants

  • One of the core value propositions of C++
    • Invariants are established by constructors
    • Invariants are maintained by members
  • Some types don't have such strict requirements
    • Contain basic values
    • Don't maintain complicated (or any) invariants
  • Such types are trivial types

Core Value Propositions of C++

  • Type Invariants:

    • C++ enforces type invariants through constructors, destructors, and member functions, establishing and maintaining the semantic meaning of types beyond mere bytes.
    • Constructors bring objects into existence with established invariants, while destructors clean up resources in a structured way.
  • Trivial vs. Non-Trivial Types:

    • Trivial types, like integers, do not have complex invariants or construction/destruction requirements.
    • Non-trivial types, like std::string, maintain invariants that guide their usage, making it unsafe to treat them as mere byte storage without proper initialization and destruction.

Implicit-Lifetime Types (C++20)

  • Certain types are "implicit-lifetime"

    • Aggregate types
    • At least one trivial constructor and trivial destructor
  • Certain operations implicitly create objects of implicit-lifetime type

    • std::malloc et al.
    • std::memcpy and ::memmove
    • Starting lifetime of array of char, unsigned char, or std::byte
    • operator new and operator new[]
  • See P0593

Implicit-Lifetime Types in C++20

  • Definition: Implicit-lifetime types include aggregate types and those with at least one trivial constructor and trivial destructor. This makes their management more flexible in terms of object lifetime.

  • Operations That Implicitly Create Objects:

    • Operations like std::malloc, std::memcpy, ::memmove, and array initializations of char, unsigned char, or std::byte can implicitly start the lifetimes of objects of implicit-lifetime types.
    • This also extends to operator new and operator new[].
  • Practical Implications:

    • Allows for a more pragmatic approach to memory management for certain types, reducing the burden of explicitly managing every aspect of an object's lifetime.
    • However, it doesn't negate the importance of understanding and reasoning about object lifetimes, especially for non-trivial types.

Guiding Principles and Limitations

  • The Lifetimes of Trivial Types: Despite the introduction of implicit-lifetime types, trivial types still have lifetimes that need to be reasoned about. The flexibility offered by C++20 does not eliminate the need for careful consideration of object lifetimes.

  • The Role of Operations in Starting Lifetimes:

    • Certain "blessed" operations can implicitly initiate the lifetimes of implicit-lifetime types, offering a path to avoid undefined behavior in some scenarios.
    • This includes commonly used functions like malloc and memcpy, which play a crucial role in memory management.

Understanding the Impact

  • Superposition of Objects: The idea that performing certain operations can start the lifetime of a "superposition of objects" introduces a quantum-like concept where the exact type whose lifetime is started is not predetermined but is defined by subsequent operations in the program.

  • Wave Function Collapse: The analogy of wave function collapse is used to describe how the set of possible types narrows down based on program operations, leading to well-defined behavior if a self-consistent set of types is achieved.

Example of applying the feature

--- tmp@1440 ---