Programming in C: Enumerations, Bit-Fields, and Unions

This article is part of a series – How to Program Anything: C Programming

Preface

So far we’ve covered basic data types, aggregate data types such as structures, arrays, pointers, and such.  There are three, what I call miscellaneous, data types to deal with in C and these are Enumeration, Bit-Fields, and Unions.  These latter of these three are somewhat esoteric and aren’t necessarily use a lot except when dealing with hardware, or implementing compilers and such.  Enumerations are basically a way to specify a data type that maps given keys to integer numbers or constants.  A bit-field allows you to store several different identifiers as a map onto the bits of one or more bytes, but because of their nature can be C compiler specific.  Unions are types that allow you to access the same space in memory under the auspices of different data types, so for example you might access the same address in memory as either a two-element char array, or one short int.

Enumerations

Enumerations are fairly simple concepts.  An enumeration is a sequence of integer constants where each integer is assigned a name.  That is, I might use the name “one” for 1, “deux” for 2, “thrice” for 3, etc.  They are are not just confined to a data type, meaning that a variable can be of the type of the enumeration in order to take advantage of assigning to itself the names of the enumeration, but you can also use the names defined in the enumeration themselves to represent their corresponding integer values.  Enumerations are defined very much like structures, but with their own idiosyncrasies.  A code snippet may help to illustrate:

In this code example, I’ve defined an enumeration of all the months of the year.  Each month is an identifier that stands for an integer value.  In fact, each name stands for an integer value 1 greater than the last.  In this example january = 0, februray = 1, march = 3, etc.  In the code snippet I’ve shown how you can make a variable confined to the values found in the enumeration using enum as a data type, but you can also use the enumeration names “outside” of their data type scope.  In the last statement in this example “prints” the value of may to the standard output (the screen).  We haven’t covered this type of functionality yet, but you can see I am able to use may outside of myMonth.

In this example, the names of the months are assigned to corresponding integers counting from the start of 0.  It is possible to specify the value(s) of an enumerator by assigning the name to that value inside the enum declaration.  All the identifiers that come after that name add one to the given value.  An example of this follows:

In this enum, first would equal 0 and second would be 1.  However, when we get to third we set it to equal the value of 300. This alters the definition of fourth and fifth, who now respectively equal 301 and 302.  Note that you can only assign integers to enum values.

From the above code snippet you can see the generic format of an enum declaration goes as follows:

Enumerations are handy when you want to a list of identifiers that don’t interfere with each other value wise.  Say you want to refer to a large swath of names that could be listed elsewhere not as their values, but by their identifiers.  Enums would serve well for this purpose.  An example might be all the operation codes of a compiler.  While writing the compiler you could refer to each op-code using a human-readable name, but to the computer it’d just be a value out of many.  We could do the same thing with global variables and constants, but enums are much more elegant solution to this as they can be packaged up into a data type.

Enumerations Aren’t Strings

There is an important caveat when it comes to enumerators.  The programmer must remember that an enumeration is simply an identifier for a number.  In the context of the running or compiled program, the identifier has no significance.  This means that you cannot expect to retrieve the identifiers themselves in program operations.  For example, following our months enumerator, the following wouldn’t work:

In the code myMonth is assigned the integer code that april stands for in the enumeration.  myMonth does not actually equal the identifier april.  Assigning myMonth as if it were a string is a type mismatch.  All enumerators are integers.  To actually map an enumeration value to a string counterpart, you’d have to either create a switch statement (we haven’t covered switch statements yet), which would test the enumeration to see if it was a particular value and then output a string, or you could create a look up table.

A look up table would consiste of a series of strings, that being an array of strings, that would mimic the identifiers in the enumeration.  A look up table for our months enum might look like the following:

This works because no identifier in the enumerator was initialized to anything special, such as 100.  If there were special initialization, this type of table lookup would have a hard time working and special accommodations would have to be made.  If we wanted to get the string version of an enumerator value in this scenario then, following our earlier months enumarator,  we’d write:

Unions

Sometimes you want to refer to the same piece of memory containing data in two different ways.  In some instances, you wish to work with it as if it was a float, but in other instances you wish to use it as if it were an array of chars, accessing the same exact data in memory byte by byte.  Perhaps you would do this to alter and manipulate the double’s precision formatting, or round it.  Or perhaps you want a way to quickly break down a float into bytes you can then write to a file quickly.  Sometimes compilers use unions as catch-all values, allowing one union to act as many different types of data structures.  These example are where you would use a union.

A union is precisely what we just described.  Let’s set up our float that’s interpretable as as four chars below:

As you can see its construction is very much like the struct.  In this example, the float f and the char ch[4] would take up the same space.  Like a structure if we wish to access the elements o f a union we would use the dot operator ( . ) or the arrow operator ( -> ) depending on if it was a pointer or now.  This will become clearer in code here, let’s set up a union variable, and then access the various bytes of data using the dot (.) operator:

When a union variable is declared the compiler makes space in memory equal to the largest member of the given union, in this case the float.  If we had a member of a union that was 24 bytes large, say a character array, then anything smaller than that size would map onto THAT memory allocation.  That being said then, the memory footprint of any given union variable is equal to the largest data declaration in the union.  Unions are useful when you want to use the same memory space as say an array of short ints, but then need to to do more careful work byte by byte, so you would access it as a series of chars instead.

Bit-Fields

Because C is a middle-level language some of its features are geared towards exact bit setting and bit retrieval.  We’ll study more in depth bitwise operators in our C expressions articles, but for now we will focus on a built in data type/structure that enables programmers to access specific bits in data.  This is called a bit-field, and its support is somewhat different depending on your computing environment and compiler.  However, that being said bit-fields can be handy for when you want to reference a particular bit(s) of a piece of data by name.  Many times when working with hardware it might return to us compacted pieces of information, where different pieces are encoded into one or several bits of one byte.  Or perhaps you want a handy way to access each bit of a given byte because you’ve set up what are known as flags, on or off values, that you’ve compacted into one byte rather than kept a variable for each state.

Bit-fields must be declared inside structures (see our previous article) or unions, they cannot exist on their own.  Because of this we’ll cover the basic generic form of a bit-field, er, field as below:

Bit-fields can only have three basic types (or four if you’re using C99).  The allowed types are int, signed, and unsigned (and _Bool if using C99).  To make sense of how bit-fields might be utilized imagine if we had a piece of hardware that returned the following byte of information, each bit meaning something.  To borrow from Wikipedia suppose we had a status register from a 6502 processor (8 bit).  Every time you get the state of this status register you’d get the following information from each bit:

  • Bit 7. Negative flag
  • Bit 6. Overflow flag
  • Bit 5. Unused
  • Bit 4. Break flag
  • Bit 3. Decimal flag
  • Bit 2. Interrupt-disable flag
  • Bit 1. Carry flag
  • Bit 0. Zero flag

Thus, I could create a structure using bit-fields to correspond to this status register.  This would allow me to specify a particular bit in the register by a human-readable name:

You’ll notice one of the peculiar definitions of a bit-field below the overflowFlag field.  This is because that bit is unused, and we can instruct the compiler to ‘skip’ that number of bits.  Why give it an identifier name if we are never going to use it, so without an identifier name we just move over it.

To access the value of a bit-field we can access it like any other structure field, through the use of the dot ( . ) and arror ( -> ) operator, if it’s a pointer respectively.  We can actualy not only retrieve bit-fields but we can assign to them as well.  They accept or understand integer literals up to the amount of bits that make them up.  For example, a 3 bit bit-field can compute up to 7, while a 4 bit bit-field can compute up to 15.

It is possible to mix bit-fields with normal structure elements.  From our structure article we’ll extend our example.  In that article I cobbled up an emlpoyee structure that held various information about employees.  Suppose I wanted to track to see if they were currently working, or suspended, and furthermore if they were hourly employees or salary.  I could take up two whole new bytes with that information, assigning each to a char.  However, with bit-fields I can store those two bit “flags” into one byte:

NOTE: Bitfields have restrictions!  They cannot be arrayed, and you cannot get the address of a bit-field.  This is because they are smaller than the smallest addressable unit: a byte.  Plus, dependong in the machine the program is running on the fields may run from right-to-left or from left-to-right.  So if you wrote the information to a file, you cannot be sure if you read it on a different machine if the data will be duplicated genuinely.  Thus, when you use bit-fields you may be introducing machine specific dependencies on the execution of your program.

Conclusion

Structures, as studied in the previous article, are very handy but they aren’t the only advanced data type in town.  Enumerations allow us to specify identifiers for integer values without having to specify a long list of global variables or constants.  For example, in this article we created an enumeration that numbered all the months of the year.  We also covered unions, which are a bit more esoteric, but useul anyway.  Sometimes we need to access an array of long ints as individual char bytes, with a union you can do that!  Lastly, we rounded-up with bit-fields: a C programming built-in that allows us to identify and work with specific bits by name.  In all of these data types, including structures, the key here is that we’re setting up ways to identify very specific pieces of memory by name.  If we couldn’t shorten some of these arrangements to singular identifier names, our code would be repeating itself over and over as we kept addressing the same pieces in long-hand again and again.

With basic and advanced data types under our belt, we can now start to look at what to do with all this data.  The first step on that road is a sojourn through expressions and the operators that make them up.  We will cover arithmetic operators, bitwise operators, relational and logic operators, and program operators (as I refer to them).  I hope I was able to illuminate something of advanced data types in the C language.  Thanks for reading!

This article is part of a series – How to Program Anything: C Programming

If you appreciate this article you might consider supporting my Patreon.

But if a monthly commitment is a bit much, I get it, you might consider buying me a coffee.

photo credit: Kyle McDonald small-enumeration via photopin (license)

kadar

I'm just a wunk, trying to enjoy life. I am a cofounder of http//originalpursuitssoc.com/ and I like computers, code, creativity, and friends.

You may also like...

1 Response

  1. December 14, 2017

    […] two operators are from the previous article on structures (and unions).  In essence they do the same thing, just in different circumstances.  A structure, otherwise […]

Leave a Reply

%d bloggers like this: