Pointers: Thinking in Memory

The Typeless Pointer

A void* is a pointer that has forgotten what it points at. This sounds useless until you realise it's how malloc returns memory, how qsort sorts anything, how callbacks carry user data, and how libraries hide their internals. This post is about when to reach for void*, how to do it without hurting yourself, and why the very thing that makes it powerful also makes it dangerous.

What a void* actually is

Every pointer we've met so far has had a type attached. int* points at an int, char* points at a char, struct Node* points at a Node. The type does two jobs: it tells the compiler how big the pointee is (for pointer arithmetic and dereferencing), and it tells the programmer what kind of thing lives at that address.

A void* is a pointer that has neither of those. It's an address with no type information attached. The compiler knows something lives at that location, but nothing about what. You can't dereference it. You can't do pointer arithmetic on it. You can only pass it around, store it, or cast it back to a real pointer type when you want to actually use it.

int x = 42;
void* p = &x;     // fine: any pointer converts to void* implicitly

*p;                // ERROR: can't dereference void*, don't know the size
p++;               // ERROR: can't do arithmetic on void*, don't know the stride

int* q = (int*)p; // cast back to int*, now we can use it
printf("%d\n", *q); // 42

If you're coming from higher-level languages, void* is similar in spirit to "any pointer" types: Object in Java, interface{} (or any) in Go, Any in Swift. The key difference is that those languages remember the original type at runtime, so they can check casts for you. C doesn't. A void* is just an address; nothing anywhere tracks what type was originally stored there. If you cast it to the wrong type, you get silence from the compiler and garbage (or worse) at runtime.

A void* is an address without a type. That's its superpower (it fits any pointer) and its curse (the compiler can't help you if you get the type wrong).

Casting rules

Two rules, and they are asymmetric. In C, any pointer converts to void* implicitly; no cast needed. Going the other way, from void* back to a typed pointer, also works implicitly in C but requires an explicit cast in C++:

// C
int x = 42;
void* p = &x;       // implicit, both C and C++
int* q = p;            // implicit in C, error in C++

// C++
int* q = (int*)p;     // explicit cast required
int* q2 = static_cast<int*>(p);   // C++'s preferred spelling

The C++ rule is stricter on purpose. C++'s designers felt that casting from void* back to a typed pointer is the exact step where things can go wrong, and wanted to make it visible at the call site. You'll see this difference echo through a lot of the C-to-C++ porting experience.

Watch a typed pointer lose its type and (maybe) recover it.

Original pointer

→
assign to void*

After recovery

The bytes in memory never change. What changes is how you interpret them. In the first three cases, the recovery is correct and you read back what you wrote. In the last case, the recovery type disagrees with the original type, and what you get is nonsense. Nothing in the language stops this; the discipline has to come from you.

Where void* actually shows up

If you only ever wrote application code, you'd almost never need to declare a void* yourself. But you'd use functions that return or take one every day, because void* is how C's standard library implements things that the type system can't express.

1. `malloc` and friends

The most familiar example:

void* malloc(size_t size);
void free(void* ptr);

Think about what malloc is being asked to do. The caller says "give me 40 bytes." malloc doesn't know if those 40 bytes are going to hold ten ints, or five doubles, or one struct Book. It shouldn't have to know. It just returns an address, and the caller interprets the bytes however they want. A typed return (say, int*) would force a cast at every other use site; a void* return is honest about the fact that the allocator doesn't care.

int* numbers = malloc(10 * sizeof(int));     // C: implicit cast
double* values = malloc(5 * sizeof(double));  // same function, different interpretation

struct Book* b = malloc(sizeof(struct Book));      // same function, struct interpretation

free(numbers);   // free takes void*, so any pointer type goes in
free(values);
free(b);

This is void* at its most helpful. The function is genuinely type-agnostic, and the type erasure is the whole point. (In C++ you'd write new int[10] instead, which is typed end-to-end; we'll get to why in Part 10.)

2. `qsort` and the generic algorithm trick

The standard library's qsort is a more interesting case:

void qsort(void* base, size_t nmemb, size_t size,
           int (*compar)(const void*, const void*));

The signature is a short story. base is a void*: the start of the array to sort, any element type. nmemb is the number of elements, and size is the byte-width of each element. Together, base, nmemb, and size are the type-erased stand-in for "an array of N somethings." And compar is a function pointer that takes two const void*s and returns a comparison result. The caller supplies that function because only the caller knows what type lives at those pointers.

Here's what it looks like in practice:

int cmp_int(const void* a, const void* b) {
    const int* ia = a;        // recover types at the edge
    const int* ib = b;
    return (*ia > *ib) - (*ia < *ib);
}

int arr[] = {5, 2, 8, 1, 9};
qsort(arr, 5, sizeof(int), cmp_int);

Read this as a contract between two parties. qsort says: "I know how to rearrange blobs of bytes in an array; I don't know what those bytes mean." The caller says: "Here's the array, here's the size of each blob, and here's how to compare two of them." They meet in the middle via void*, and neither needs to know the other's full type.

Compare this with a language that has generics (C++ templates, Rust generics, Java generics). In those languages, sort would take a comparator as a type-safe function and the compiler would check types at compile time. In C there's no such machinery, so void* plus an explicit size is what we get. The trade is type safety for code reuse: one qsort works for every element type, but there's nothing stopping you from passing an int array with a cmp_double comparator and getting nonsense back.

The void* plus size_t plus callback pattern is C's template mechanism. It's how one function can operate on any element type, at the cost of losing compile-time type checking.

Pick an element type. Watch qsort sort it through void*.

The same qsort call drives four different element types. What changes between runs is the comparator and the element size; the algorithm's loop over void* chunks is identical. This is what the void* plus size_t trick buys: one sort routine, every element type.

Callbacks with user context

The third place void* shows up is in callback APIs, and it's the use case that catches people by surprise the most. Here's the shape:

// API:
void for_each_line(const char* path,
                   void (*cb)(const char* line, void* user),
                   void* user);

The function reads a file line by line and calls cb on each line. The callback takes the line and a void* called user. That second parameter is the clever bit. Without it, callbacks can only talk to global state, which is terrible for testability, thread safety, and re-entrancy. With it, the caller can pass their own data through to the callback, and the API itself doesn't need to know what that data is.

In use:

struct Stats { int lines; int bytes; };

void count_line(const char* line, void* user) {
    struct Stats* s = user;    // recover the type
    s->lines++;
    s->bytes += strlen(line);
}

struct Stats stats = {0, 0};
for_each_line("input.txt", count_line, &stats);
printf("%d lines, %d bytes\n", stats.lines, stats.bytes);

The API doesn't know Stats exists. The callback does. The void* is the pipe that lets caller-owned context flow through an API that's designed to be generic. This pattern shows up everywhere there are callbacks in C: signal handlers, event loops, thread entry points, library APIs.

Compare with a library that doesn't provide a user pointer:

// Older API without user pointer:
void for_each_line(const char* path,
                   void (*cb)(const char* line));

// The only way to accumulate state is a global:
static struct Stats g_stats;       // terrible for threading, re-entrancy

The void* user pointer is what upgraded this pattern from unusable to idiomatic. It's why pthread_create, bsearch, most event-loop libraries, and modern signal handling APIs all have one. The rule of thumb: if you're designing a callback API in C, always include a void* context parameter, even if your first use case doesn't need it. You will want it later, and it's very hard to add without breaking every caller.

A callback without a user-data pointer forces everyone who uses it into global state. A void* context is the small-piece that makes callbacks actually composable.

Opaque pointers: hiding implementation

The last big use of void* (or something closely related) is opaque pointers. This is a design pattern where a library exposes a pointer type but hides what it actually points at. Callers can pass the pointer around and hand it back to library functions, but they cannot peek inside.

Here's the classic setup, used in countless C libraries (SQLite's sqlite3, FILE from the standard library, most of POSIX's handle types):

// database.h (public)
typedef struct Database Database;    // forward declaration, no definition

Database* db_open(const char* path);
int       db_query(Database* db, const char* sql);
void      db_close(Database* db);

// database.c (private)
struct Database {
    int   fd;
    char  buf[4096];
    struct QueryCache* cache;
    // ... all the internal machinery
};

Database* db_open(const char* path) {
    Database* db = malloc(sizeof(Database));
    // ... initialise
    return db;
}

Notice what the public header shows: a name (Database) but no definition. Callers can hold a Database*, pass it around, print its address for debugging. But they can't access db->fd, because the compiler, reading the header, doesn't know that field exists. The full definition lives in the .c file, where only the library implementation can see it.

This is called an opaque pointer (or "incomplete type" in standard-ese). Strictly speaking it's not a void*, because the pointer has a specific type name. But the mechanism is the same in spirit: the library hands out a typed handle whose internals are hidden, and clients can't reach inside. It's how C libraries enforce encapsulation without classes.

You get three large benefits from this pattern:

1. ABI stability. The library can change the struct's layout, add fields, reorder them, change sizes, without breaking any compiled client code. The clients only know the pointer type, not the struct size.

2. Access control. Callers physically cannot reach in and mess with the internals. They must go through the API.

3. Dependency isolation. The public header doesn't need to #include all the internal types. A library whose public header pulls in 30 transitive includes is a compile-time nightmare; opaque pointers trim that dependency graph aggressively.

Some libraries go one step further and expose the handle as a void*:

// Even more opaque:
typedef void* DBHandle;

DBHandle db_open(const char* path);

This is a weaker pattern. You've thrown away the name, so now a client could accidentally pass a DBHandle to a function expecting some other void*-typed handle, and the compiler would let them. The typed-but-opaque version (Database*) is strictly better: you get all the encapsulation benefits, plus the compiler can still catch type confusion at call sites.

What each side of the library sees.

What the library sees private .c

The three approaches trade off along two axes: how much the caller knows about the type name, and how much the caller knows about the layout. The middle approach keeps the type name useful (catches bugs) while keeping the layout private (enables evolution). That's why it's the standard recipe.

The hazards

Everything void* gives you, it gives by removing type checking. That means every class of bug that the type system would have caught for you becomes a runtime bug. Four hazards worth learning to watch for:

1. Wrong-type recovery

double x = 3.14;
void* p = &x;
int* q = p;             // compiler doesn't complain in C
printf("%d\n", *q);    // reads the first 4 bytes of a double as int: nonsense

No compiler warning. No runtime check. Just garbage. This is also a strict-aliasing violation (from Part 8), so it's UB on top of being wrong. The typed-pointer-to-typed-pointer version would have failed at compile time. The void* version waits until production.

2. Lifetime bugs through callbacks

void schedule_later() {
    struct Stats s = {0, 0};
    event_loop_register(count_line, &s);
    // returns here, s goes out of scope
}
// Later, the event loop calls count_line with a pointer to a dead stack slot. UB.

The void* hid the fact that &s was a stack pointer with a lifetime shorter than the callback's registration. Because the callback signature is generic (void*), nothing in the type system pushed back on this mistake. Callback-with-user-pointer APIs have to document lifetime contracts carefully, because the compiler can't help.

3. Pointer arithmetic on void*

void* p = malloc(100);
p += 10;       // ERROR in standard C (no size to advance by)
                // In GCC/Clang, this is allowed as an extension and treats void as 1 byte.
                // Portable code should cast to char* first.

Pointer arithmetic needs to know the stride, and void has none. Standard C forbids it; GCC and Clang allow it as an extension and pretend void is one byte. Don't rely on this; always cast to char* (or uint8_t*) when you want byte-level arithmetic.

4. void* doesn't work with `typeof` or `sizeof` the way you'd hope

void* p = malloc(100);
sizeof(*p);           // ERROR: can't take sizeof(void)
sizeof(p);            // fine: sizeof a pointer (usually 8 on 64-bit)

This seems obvious but trips people up in generic code. Anything you want to know about the pointee's size has to be passed in separately, or derived from context. It's why qsort takes a size parameter.

Every feature void* lacks (dereference, arithmetic, sizeof) corresponds to a piece of knowledge the type system doesn't have. You have to supply it some other way.

Patterns for using void* well

Four rules that keep the chaos contained:

Recover the type immediately, at the edge. When a callback receives a void*, the first line of code should be a typed-pointer cast. Don't pass the void* deeper into the function; convert it once, then work in the typed world.

// Good:
void callback(void* user) {
    struct Context* ctx = user;   // immediate recovery
    do_work(ctx);                      // typed from here on
}

Pair void* with a discriminator if you need real type-safety. If a generic container needs to remember what's in it, store a type tag alongside the pointer:

enum Kind { KIND_INT, KIND_DOUBLE, KIND_STRING };

struct Any {
    enum Kind kind;
    void*     data;
};

void print_any(struct Any a) {
    switch (a.kind) {
        case KIND_INT:    printf("%d\n", *(int*)a.data); break;
        case KIND_DOUBLE: printf("%f\n", *(double*)a.data); break;
        case KIND_STRING: printf("%s\n", (char*)a.data); break;
    }
}

The enum tag restores enough information for the code to check at runtime what it's looking at. This is called a tagged union or discriminated union. It's not as good as a real sum type (Rust's enum, ML's type), but it's what you have in C.

Document the lifetime contract of every void*. Who owns the pointee? Who frees it? How long does it live? When a library accepts a void* user_data, the documentation must answer all three questions. "Caller retains ownership; pointer must remain valid until unregister() is called" is the kind of sentence that saves weekends.

Prefer typed-but-opaque over void* for library handles. As we saw, struct Database* (opaque typedef) is strictly better than void* for handle-style APIs. The compiler still gets to catch "wrong handle passed to wrong function."

A brief C++ note

C++ has void* too, with the same mechanics, but the idioms are very different. Where a C programmer reaches for void*, a C++ programmer usually reaches for one of:

Templates. For the qsort case, C++ uses std::sort, which is templated on the iterator type. The compiler generates a specialised sort for each element type and does full type-checking along the way. No void*, no size parameter, no comparator-returns-void*.
std::function and lambda captures. For the callback case, C++ callbacks typically take a std::function<void(Args)>, and user context gets bundled into the lambda's capture. No user pointer, no manual type-recovery.
std::any and std::variant (C++17). For the "any type" case, these are type-safe stand-ins for the tagged union pattern. std::variant is the C++ version of a sum type, and it's checked at compile time.

You still see void* in C++ when calling into C APIs (OS calls, legacy libraries, FFI boundaries), and sometimes in highly tuned code that wants type erasure without template bloat. But the default posture in modern C++ is "use templates or std::variant instead." If you find yourself writing raw void* in new C++ code, it's worth asking whether one of these alternatives would fit.

Opaque pointers, interestingly, are still useful in C++. The PIMPL idiom ("pointer to implementation") is exactly the opaque-pointer pattern we described, adapted to classes. A public class holds a std::unique_ptr<Impl>, where Impl is forward-declared in the header and defined in the .cpp. Same ABI-stability and dependency-isolation benefits as in C, dressed up in C++ clothes.

Summing up

A void* is an address without a type. You can't dereference it, can't do arithmetic on it, can't take sizeof the pointee. Its usefulness comes from being the universal pointer, implicitly convertible from (and, in C, to) any other pointer type.

Four use cases earn their keep: malloc/free (the allocator genuinely doesn't know the type), qsort-style generic algorithms (one function, every element type, via void* plus a size), callback APIs with user context (type-agnostic APIs that still let callers bring their own data), and opaque pointers (library handles whose internals are hidden from callers). The last is usually better expressed as a typed incomplete type (struct Database*) than as a raw void*, because it keeps the compiler's type-checking helpful while still hiding layout.

The hazards are all about type-checking being turned off: wrong-type recovery gives garbage, lifetime bugs hide behind generic signatures, pointer arithmetic is either illegal or non-portable, and size information has to be carried out-of-band. The disciplines that keep these under control are: recover types at the edge, pair void* with discriminators when necessary, document lifetimes explicitly, and prefer opaque typedefs over raw void* for handles.

C++ has void* but mostly doesn't use it for new code; templates, std::function, and std::variant do the same jobs with full type-checking. Opaque pointers, rebranded as PIMPL, remain idiomatic in C++ too.

What's next

Part 10 moves us into C++ and the idea that ownership can be encoded in the type system. So far every pointer we've seen has been a raw address, with ownership invisible to the compiler. Smart pointers turn "who owns this, and when does it die?" from a comment into a type. It's the tool that takes the edge off most of the memory-management pain of this whole series.

You've now seen every variety of raw pointer C has to offer. Next: pointers that know who owns them, and why that changes everything.

Test yourself

Seven questions on void*, its canonical uses, its pitfalls, and the patterns that keep it manageable. Five correct means you're ready for Part 10.

Which of these statements about void* are true?

True: A, C, E. False: B, D. A is true. In C, any object-pointer type converts to void* without a cast. (In C++ this is also true going to void*.)
B is false. Dereferencing a void* is a compile error, not a runtime byte-read. There's no type to dereference to.
C is true. Standard C has no stride for void, so pointer arithmetic is not defined. GCC and Clang allow it as a non-portable extension that treats void as 1 byte.
D is false. C++ requires an explicit cast (static_cast<T*> or C-style) to convert from void* back to a typed pointer. This is a deliberate difference from C.
E is true. void is an incomplete type with no size, so sizeof(void) is not allowed. This is why generic routines like qsort take a separate size parameter.

You're calling qsort on an array of doubles. Write a correct comparator function.

// Starting point, fill in the body:
int cmp_double(const void* a, const void* b) {
    // TODO
}

Recover types at the edge, then compare.

int cmp_double(const void* a, const void* b) {
    const double* da = a;
    const double* db = b;
    if (*da < *db) return -1;
    if (*da > *db) return 1;
    return 0;
}

Why not return *da - *db;? Because that returns a double, which gets truncated when cast to int. Small differences round to zero, giving wrong "equal" verdicts. The integer-subtraction trick (common for ints) doesn't work for doubles at all.

Why not return (*da > *db) - (*da < *db);? For ints that's a neat trick, and it works for doubles too in the normal case. But with NaNs, both comparisons return 0, so NaN "equals" everything, which can cause the sort to loop or produce garbage. The explicit if version above degrades more gracefully. For production code, decide up front how you want to handle NaN (filter them out, treat as always-greatest, etc.) and document it.

A colleague writes this event-loop registration:

void schedule() {
    int counter = 0;
    event_loop_on_tick(increment_callback, &counter);
}

void increment_callback(void* user) {
    int* c = user;
    (*c)++;
}

What's wrong, and what does "fix it" actually look like?

The lifetime of counter ends at the } of schedule(), but the callback keeps pointing at it. &counter is a pointer into schedule's stack frame. When schedule returns, that stack memory gets reused. Later, when the event loop calls increment_callback, user points at whatever garbage occupies that slot now. Classic use-after-free, hidden by the void*.

The fix depends on who should own the counter:

// Option 1: heap-allocate it, transfer ownership to the event loop
int* counter = malloc(sizeof(int));
*counter = 0;
event_loop_on_tick(increment_callback, counter);
// callback or the event loop is responsible for free()

// Option 2: use a static or global
static int counter = 0;
event_loop_on_tick(increment_callback, &counter);
// counter lives forever, but now it's shared state

// Option 3: make schedule() not return until the callback is done

This is the class of bug that C's type system cannot prevent. The void* callback signature is a handshake that says "I'll take anything," and "anything" includes pointers that are about to die. Good callback-API documentation includes a prominent note about the lifetime of user_data; good callers match the lifetime to the registration.

Which approach to library handles is best, and why?

// (a) Transparent:
struct Database { int fd; char buf[4096]; /* ... */ };
struct Database* db_open(const char* path);

// (b) Opaque typedef:
typedef struct Database Database;    // declaration only, no fields
Database* db_open(const char* path);

// (c) void* handle:
typedef void* DBHandle;
DBHandle db_open(const char* path);

B. Opaque typedef is the sweet spot. (a) is worst. Exposing the layout couples every client to the struct. Adding a field breaks ABI; reordering fields breaks it. Clients can (and will) reach in and touch internals, bypassing your API. And the header has to pull in every type the struct uses, bloating compile times.
(b) is best. Callers have a distinct type name, so the compiler catches "passed a FILE* where a Database* was expected." But because the struct has no visible fields, callers can't do db->fd, can't take sizeof(Database), and the library is free to change internals at will. Perfect encapsulation.
(c) is worst-of-both. You lose the distinct type name (now any void* from anywhere can be passed as a DBHandle) without gaining anything in return. The layout is still hidden, but layout was already hidden in (b). Avoid this unless you're genuinely writing a type-erased API (e.g., for FFI or multi-language compatibility).

You're designing a logging library. The log_set_sink function accepts a function to call for every log event. Design the signature so that users can attach their own state without needing a global. Explain your design choices.

// Starting point:
void log_set_sink(void (*fn)(const char* msg));

Add a void* context pointer, passed through to the callback.

void log_set_sink(void (*fn)(const char* msg, void* ctx),
                  void* ctx);

Why: the original signature forces every user of the library to write their sink against global state, because the callback has no way to reach back to the user's own data. Adding a void* context parameter (and threading it through to the callback) means each user can bundle whatever state they need (a file handle, a counter, a buffer, a lock) into a pointer and have it handed back on every call.

Two things the documentation must also cover:

Lifetime: "The pointer ctx must remain valid until log_set_sink is called again with a different value, or until the program exits." Without this, users will pass stack pointers that die.
Thread safety: "The sink function may be called concurrently from any thread; ctx access inside the sink must be synchronised if the library is used from multiple threads."

These are the things the type system can't express but the API has to enforce socially. The void* is the mechanism; the docs are the seat belt.

Given the following code, predict what happens and explain why.

double d = 1.5;
void* p = &d;
int* q = p;          // cast void* to int*
printf("%d\n", *q); // what gets printed?

C. The cast compiles, the read is UB, the output is whatever. A is wrong. There's no conversion happening here. *q reads 4 bytes from &d and interprets them as an int's bit pattern. That bit pattern is part of the IEEE-754 encoding of 1.5, not an integer representation of it. You'd print some large or negative-looking value, depending on endianness.
B is wrong. In C the cast is implicit and the compiler won't complain. In C++ an explicit cast is needed but still compiles. No compile error in either language.
C is correct. Reading a double through an int* violates the strict aliasing rule (Part 8). Under UB, the compiler may reorder, cache, or optimise in ways that make the result unpredictable. Even if by coincidence the read returned the "right" bits, the behaviour is undefined and the compiler is not required to preserve it at higher optimisation levels.
D is unlikely. A double is 8-byte aligned on most platforms, which is also valid alignment for int. An alignment crash would be a more exotic concern than the aliasing UB.

This is a void*-enabled version of exactly the strict-aliasing violation from Part 8. The void* is the accessory; strict aliasing is the rule being broken.

You want to write a generic map function that applies a user-supplied transformation to every element of an array. Any element type. Write the function signature, then write the function body. Explain your parameter choices.

Same recipe as qsort: void* plus element size plus callback.

void map(void* base,
         size_t nmemb,
         size_t size,
         void (*transform)(void* elem, void* user),
         void* user) {
    char* p = base;                // char* so we can do byte arithmetic
    for (size_t i = 0; i < nmemb; i++) {
        transform(p + i * size, user);
    }
}

Parameter-by-parameter:
base is void* because the array element type isn't known to map. nmemb is the number of elements. size is the byte-width of one element; needed because void* arithmetic isn't standard, and it's how we step through the array. transform is the callback that actually knows the element type: it takes void* (recovered inside) and a user context. user is the user context, carried through like in the callback pattern.

Note the char* cast inside. Pointer arithmetic on void* isn't portable, so we cast to char* (which is always 1-byte stride) and multiply by size manually. This is the idiomatic way to step through a type-erased array.

Sample call:

void double_it(void* elem, void* user) {
    (void)user;                 // unused here
    int* x = elem;
    *x *= 2;
}

int arr[] = {1, 2, 3, 4, 5};
map(arr, 5, sizeof(int), double_it, NULL);

How did you do?

5 or more correct, you're ready for Part 10. Less than that, the common trouble spots are Q4 (the opaque typedef sweet spot) and Q7 (stepping through a type-erased array). Re-read "Opaque pointers" and "Patterns for using void* well", then try again.

The Typeless Pointer

What a void* actually is

Casting rules

Where void* actually shows up

1. malloc and friends

2. qsort and the generic algorithm trick

Callbacks with user context

Opaque pointers: hiding implementation

The hazards

1. Wrong-type recovery

2. Lifetime bugs through callbacks

3. Pointer arithmetic on void*

4. void* doesn't work with typeof or sizeof the way you'd hope

Patterns for using void* well

A brief C++ note

Summing up

What's next

Test yourself

Comments

1. `malloc` and friends

2. `qsort` and the generic algorithm trick

4. void* doesn't work with `typeof` or `sizeof` the way you'd hope