taking_a_byte_out_of_cpp

Taking a Byte Out of C++ - Avoiding Punning by Starting Lifetimes - Robert Leahy

Understanding C++ type system

struct foo {
  std::uint32_t a;
  std::uint32_t b;
};

static_assert(sizeof(foo) == sizeof(std::uint64_t));

std::uint64_t& i, const foo& f) noexcept {
  if (f.a == 2) {
    i = 4;
  };
  if (f.a == 2) {
    return f.a;
  };
  return f.b;
};


int main() {
  foo f{2, 3};
  printf("%d\n", bar((std::uint64_t&)f, f));
  // shows 0!? why f.b got cleared to 0?
  // it's actually UB...
}


// NOTE: No Type Punning Rule

// An object within its lifetime may only be accessed in certain ways

// - Through a reference to its type (addition of cv qualification allowed)
// - Through a reference to its signed or unsigned equivalent
// - Through a reference to char, unsigned char, or std::byte

// Any other access modality is undefined behavior.

Code Structure and Functionality:
- Defines a struct foo with two std::uint32_t members (a and b).
- Asserts that the size of foo is equal to the size of std::uint64_t.
- Implements a function bar that modifies and returns values based on the condition of foo's a member.
- A main function demonstrates an unexpected behavior when foo is treated as both its type and a std::uint64_t.
Memory's Role in Programming:
- Stresses the importance of memory in programming and the need for developers to have a solid mental model of it.
- Discusses the creation of simple rules to manage cognitive overhead, though these rules can break under complex scenarios.
Introduction to Higher-Level Constructs in C++:
- Uses the foo struct as an example to illustrate the abstraction from raw memory to higher-level constructs in C++, simplifying memory management and representation.
- Examines how these abstractions allow developers to bundle values and work with them without dealing with low-level details.
Exploration of Type System and Memory Model:
- Proposes types as lenses through which memory is viewed and manipulated in C++, highlighting the disconnect that can occur between the programmer's intentions and the actual memory representation.
- Presents a practical scenario where foo and a std::uint64_t might refer to the same memory, leading to unexpected results due to type aliasing.
Compiler Optimizations and Type Aliasing:
- Explains how compilers optimize based on the assumption that different types do not alias the same memory location, which can lead to optimizations that affect program correctness.
- Clarifies the rules around type aliasing, stating that accessing an object through a reference of an unrelated type is undefined behavior, which compilers exploit to optimize code.
Practical Implications and Undefined Behavior:
- Demonstrates through an example how the compiler's assumptions regarding type aliasing can lead to unexpected behavior, such as modifying one value through a reference potentially affecting another seemingly unrelated value.
- Concludes that the program's behavior can become unpredictable when violating the C++ standard's rules on type aliasing.

Proof of the compiler rule

//// The updated C++ code example demonstrates an important concept in C++
/// related to type aliasing and how the compiler handles optimizations based on
/// the type system.

// change to "related" type uint32_t
std::uint32_t& i, const foo& f) noexcept {
  // ... same as above
}

int main() {
  foo f{2, 3};
  printf("%d\n",
         bar((std::3 now!!!
}

Modification to Use Related Type:
- The function bar is changed to accept a reference to std::uint64_t.
- This modification aligns the type of the reference i with the type of foo's member a, making them related types according to the C++ standard.
Behavioral Change in Program Output:
- With the modification, the main function now prints 3 instead of the previously unexpected 0.
- This change in output indicates that the compiler treats the access to i and f.a differently due to them being related types.
Compiler's Handling of Related Types:
- When the types are related, the compiler cannot assume that the reference to i and the foo structure f do not alias the same memory.
- As a result, the compiler must emit the move instruction to reload f.a after modifying i, acknowledging the possibility that i and f.a may refer to the same memory location.
Implications of Using Related Types:
- By making i and f.a related types, the program conforms to the C++ standard's rules on type aliasing, avoiding undefined behavior.
- This demonstrates how closely the C++ type system and memory model are intertwined, and how understanding these relationships is crucial for writing correct and predictable C++ programs.
Reflection on Memory Model and Type System:
- The example underscores the complexity of C++'s memory model and type system, challenging the simplistic view of memory as just bytes.
- It highlights the significance of type aliasing rules and compiler optimizations in influencing program behavior.

C++ Has an Object Model

Bytes supply storage for objects
Objects have lifetimes
Duration of storage is not necessarily the same as object lifetime
Accessing object outside lifetime is undefined behavior

C++ Object Model and Undefined Behavior

Storage vs. Object Lifetime:
- Storage provides the space for objects, but does not automatically start an object's lifetime.
- Object lifetimes are crucial; accessing an object outside its lifetime results in undefined behavior.
- The distinction is essential for understanding how C++ manages memory and objects.

Example 1: Allocation of Integers

const auto ptr = (int*)std::malloc(sizeof(int) * 4);
if (!ptr) {
  throw std::bad_alloc();

  for (int i = 0; i < 4; ++i) {
    ptr[i] = i;
  }
}

Code Description:
- Allocates storage for four integers using malloc but does not initiate the lifetime of any integers.
- Attempts to assign values to this uninitialized storage, leading to undefined behavior.
Implications:
- While the code may compile and run, it technically operates on objects outside their lifetimes, exploiting a common misunderstanding of storage versus object initiation.

Example 2: Allocation of Strings

const auto ptr = (std::malloc(sizeof(std::string) * 4);
if (!ptr) {
  throw std::bad_alloc();

  for (int i = 0; i < 4; ++i) {
    ptr[i] = std::to_string(i);
  }
}

Code Adaptation:
- Similar structure to the first example but allocates storage for std::string objects.
- Attempts to assign string values to the allocated storage, which is more evidently problematic due to std::string's complex invariants and constructor requirements.
Key Differences:
- Strings have constructors that establish invariants, unlike trivial types like integers.
- Overlaying std::string objects onto raw, uninitialized storage without properly constructing them violates these invariants, leading to more obvious undefined behavior.

C++ Types May Have Invariants

One of the core value propositions of C++
- Invariants are established by constructors
- Invariants are maintained by members
Some types don't have such strict requirements
- Contain basic values
- Don't maintain complicated (or any) invariants
Such types are trivial types

Core Value Propositions of C++

Type Invariants:
- C++ enforces type invariants through constructors, destructors, and member functions, establishing and maintaining the semantic meaning of types beyond mere bytes.
- Constructors bring objects into existence with established invariants, while destructors clean up resources in a structured way.
Trivial vs. Non-Trivial Types:
- Trivial types, like integers, do not have complex invariants or construction/destruction requirements.
- Non-trivial types, like std::string, maintain invariants that guide their usage, making it unsafe to treat them as mere byte storage without proper initialization and destruction.

Implicit-Lifetime Types (C++20)

Certain types are "implicit-lifetime"
- Aggregate types
- At least one trivial constructor and trivial destructor
Certain operations implicitly create objects of implicit-lifetime type
- std::malloc et al.
- std::memcpy and ::memmove
- Starting lifetime of array of char, unsigned char, or std::byte
- operator new and operator new[]
See P0593

Implicit-Lifetime Types in C++20

Definition: Implicit-lifetime types include aggregate types and those with at least one trivial constructor and trivial destructor. This makes their management more flexible in terms of object lifetime.
Operations That Implicitly Create Objects:
- Operations like std::malloc, std::memcpy, ::memmove, and array initializations of char, unsigned char, or std::byte can implicitly start the lifetimes of objects of implicit-lifetime types.
- This also extends to operator new and operator new[].
Practical Implications:
- Allows for a more pragmatic approach to memory management for certain types, reducing the burden of explicitly managing every aspect of an object's lifetime.
- However, it doesn't negate the importance of understanding and reasoning about object lifetimes, especially for non-trivial types.

Guiding Principles and Limitations

The Lifetimes of Trivial Types: Despite the introduction of implicit-lifetime types, trivial types still have lifetimes that need to be reasoned about. The flexibility offered by C++20 does not eliminate the need for careful consideration of object lifetimes.
The Role of Operations in Starting Lifetimes:
- Certain "blessed" operations can implicitly initiate the lifetimes of implicit-lifetime types, offering a path to avoid undefined behavior in some scenarios.
- This includes commonly used functions like malloc and memcpy, which play a crucial role in memory management.

Understanding the Impact

Superposition of Objects: The idea that performing certain operations can start the lifetime of a "superposition of objects" introduces a quantum-like concept where the exact type whose lifetime is started is not predetermined but is defined by subsequent operations in the program.
Wave Function Collapse: The analogy of wave function collapse is used to describe how the set of possible types narrows down based on program operations, leading to well-defined behavior if a self-consistent set of types is achieved.

Example of applying the feature

--- tmp@1440 ---