Programming In C: Basic Data Types

This article is part of a series – How to Program Anything: C Programming

Preface

There are times when we want to store a piece of information, be it a number, or a name, or a memory address somewhere within the computer where we can go back later and retrieve it.  This is usually accomplished by a variable as discussed in my Programming Language Crash Course.  A variable serves as sort of a computer’s memory when it is defined and used in a piece of programming.  For example, I can define a variables called ashersAge  and assign it the number 34.  Later on in the programming when I reference ashersAge  I get the value I assigned it, which is 34.

This is all well and good in some more dynamic programming languages, like Python, or PHP, where a variable is able to be assigned any type of data.  However, in C we are working closer to the processor, as a sort of step up from straight assembly, which constrains us to have to be more specific and definite with what our variables can hold.  In C for instance, every type of variable is essentially a number, even strings are just series of numbers that are stopped by a zero value.  Constraining our variables in this way is known as data typing.

Data typing, something I also covered in the crash course, is where you declare what kind of data, or how much data, a variable can hold.  Typing a variable specifies that variables limits and constraints in terms of how it can be used in the program, such as what can be assigned to it, and what values it can hold.

The Basic Data Types

There are multiple versions of C, each building on the last, but the basic C89 (at it’s called) standard defines five basic data types indicators to a variable definition.  C99, a later standard of C, actually adds three more: _Bool, _Complex, _Imaginary, and we’ll cover those at the end.

The five basic data types that can be indicated in a variable declaration are: char, int, float, double, and void.  Void is not generally used with variable declaration outside of defining pointers, something this post series has yet to get to, or indicating that a function doesn’t return a value, we’ll for the most part skip it for now.

char is the most basic and usually the most “stable” type declaration.  What I mean by that is that a char represents 8 bits of memory, or what is commonly called a byte.  This is true no matter what platform you are coding for, be it 16-, 32-, or 64-bit.  If you are unfamiliar with what I mean by bits you might consider looking at my article series “Wonderful World of Binary“.  It helps explain what I mean by bits, in that, one bit is one placevalue in a binary number.  The char data type is actually short for “character” and in this sense it lives up to its name.  Before Unicode came along, most text on western computers was stored in ASCII format, where one byte equaled one letter in our alphabet or so.  Thus, one western character was one byte.  In fact, an array of char’s is usually what’s constructed these days when we want to store a string of characters, otherwise known as just a string.

The reason I said that the char data types was the most stable is because when it comes to the other data types, you aren’t guaranteed the exact number of bits it is going to take up on a specific architecture.  When I say architecture here I am referring to the combination of processor, device, and operating system.  Some processors are 64-bit based, and thus handle different sizes of data differently.

This is why when we get to the int data type (short for integer) the size may vary.  An int is typically the same as the “word length” of the particular platform being programmed for.  Don’t worry, “word length” is a more technical bit measurement term found in assembly programming.  What this boils down to is that an int may be 16 bits, but it may also be 32 bits depending on the architecture you’re programming for.  The C89 standard specifies the minimal range of each data type, which we’ll get to, not how many bytes that range has to take up.  This caveat about the int data type is important when you are writing “portable” code that is meant to run on more than one type of machine.

Likewise, what are known as floating-point numbers or the float type for short depend on how they are implemented in the given architecture you’re programming for.  This data type is generally used to store numbers with decimal components (hence the “floating point”).  Due to how they are organized they can hold very “large” values in their respective space, to varying degrees of accuracy.  A double is basically twice the size of a normal float and is generally twice as specific in its values.  If you need to store a really small or really large value, or a value with a decimal component to varying degrees of accuracy you would use a float or double data type.  I have outline a table here that deals with these specific five data types showing how many bits they typically take up, and their minimal values according to the C89 standard:

Type Size in Bits Minimal Range
char 8 bits -127 to 127
int 16 or 32 bits -32,767 to 32767
float 32 bits 1E-37 to 1E+37*
double 64 bits 1E-37 to 1E+37**
* with six digits precision
** with ten digits precision

Data Type Size Modifiers

Now, we complicate things.  Of course we do!  All of the built-in data types given here except void can be modified with additional keywords to mean something more specific.  These additional keywords are signedunsignedlong, and short.

Not all modifier keywords can be applied to all data types.  Let’s take char for a spin first.  Char can be modified by signed and unsigned.  You can specify how the compiler handles a char or int when it comes to its sign, being positive or negative.  For example, maybe we want a variable that is a char to go up to 255, rather than revert to a negative value when it maxes out its positive value.  We can accomplish this is through the use of the unsigned keyword.

Depending on how a char, basically a byte, is interpreted, it can be either a positive or negative number.  This is accomplished usually by a twos-complement implementation where the high-order bit indicates the sign of the number.  If you are unfamiliar with twos-complement you might look at my binary articles (the list posted above) specifically the Binary Negative Space article.

By prepending unsigned to the char declaration we are stating that we want the computer to ignore the high order twos-complement bit and simply continue counting up in a positive direction.  This allows a char for instance to store a value up to 255, rather than just 127.

Long and short can be applied to the int data type as well as unsigned and signed.  Short in essence specifies the smaller end of a given data type size, while long indicates the larger end of a given data type size. For example, if you want to use an int data type, but it equals 32 bits on your architecture, a short int will usually enforce it to only be 16 bits long instead of 32.  Likewise a long modifier would make your long int equal to 32 bits instead.  In C99 the long modifier can be applied to the long modifier itself, which usually doubles the number of bits so for instance a long long int may be as big as 64 bits on a given architecture.  Long and short cannot be applied to a char data type.

It is possible to use these modifiers on their own, so that you only type signed instead of signed int.  The compiler will fill in the int part for you.  However, for code readability and future portability it’s best to include the int with the data type declaration.

I have put together another chart here for ease of understanding.  Remember that this chart indicates the usual pattern of things, as they may be different on your target architecture.  It’s always best to make sure lest you get half way through programming your masterpiece and realize your data sizes are all wrong.  And of course, this table specifies the minimum range of value as specified by the C89 standard.  Your milage may vary:

Type Size in Bits Minimal Range
signed char 8 bits -127 to 127
unsigned char 8 bits 0 to 255
signed int 16 or 32 bits -32,767 to 32,767
unsigned int 16 or 32 bits 0 to 65,535
signed short int 16 bits -32,767 to 32,767
unsigned short int 16 bits 0 to 65,535
signed long int 32 bits -2,147,483,647 to 2,147,483,647
unsigned long int 32 bits 0 to 4,294,967,295
signed long long int* 64 bits -(2^63 – 1) to 2^63 – 1
unsigned long long int* 64 bits 0 to 2^64 – 1
long double 80 bits 1E-37 to 1E+37**
* specified in C99
** with ten digits precision

C99 Data Types

The C99 standard adds the _Bool data type.  It is capable of exactly what it sounds like, storing the values of 1 and 0.  Presumably 1 is true, and 0 is false.  _Bool is actually an integer type.  To define the constants true and false C99 actually utilizes a special header <stdbool.h> to define booltrue, and false.  Anyone familiar with C++, which is not covered in this article series you can see the incompatibilities this might have with C++.

C99 also adds supports for complex arithmetic, being operations done over complex and imaginary numbers.  This requires additional headers and new library functions.  This added the _Complex and _Imaginary keywords to the languages.  They are utiliazed much like so:

The header <complex.h> defines macros and constants to call these complex and imaginary for code where those kinds of macros wouldn’t break the existing code.

Utilizing The Data Types

In C, as mentioned, you declare variables on which your code or instructions are going to operate.  A variable is made up of an “identifier”, or a string of characters that indicate to you its name.

NOTE: Identifiers have special rules to their construction in C, like many programming languages.  This will be covered more in depth later when discussing variables, but for now you should be aware that the first character must be a letter or underscore, and then the following characters must be either letters, digits, or underscores.

So when I declare the existence of a variable, such as say ashersAge from above, I would define it in my code as say something like this:

“ashersAge” would be the identifier, and of course unsigned and int are the data types we just discussed.  As you can see the pattern for a variable declartion is:

This declaration becomes a bit important soon when we talk about pointers and references.

Conclusion

As you can see, with the modifiers attached to the basic type declarations we can have even more control over how many bits a given variable will be.  This is very important in C programming because we need to know the size of the data we’re handling, or pointing at, and what not, lest we accidentally overwrite some piece of memory because we think our storage area is bigger than it really is.

You’ll notice that all these data types specify a number, and that there isn’t any kind of data type for strings or characters where you can form strings of words “like this.”  This is because C is still a low-level enough language that it only deals in numbers for everything.  When we want to specify a string we really end up building an array (an advanced type of data type) of char‘s with each individual char holding a numerical value that we interpret as an individual letter.  Remember, C is middle-level, being more abstract than the processor itself (assembly), but not being so abstract as to hide from us the details of handling things like strings as numerical arrays.

Do not worry if the number of data types seems small, you can do just about anything you can imagine by implementing these data types in many different ways.  After all, programming is just a matter of pushing bits around in such a way that it makes sense to us.  I hope I was able to illuminate and describe the basic C data types for you and how they might be used.  Thanks for reading!

This article is part of a series – How to Program Anything: C Programming

If you appreciate this article you might consider supporting my Patreon.

But if a monthly commitment is a bit much, I get it, you might consider buying me a coffee.

photo credit: Stadtneurotiker Zeitlos via photopin (license)

kadar

I'm just a wunk, trying to enjoy life. I am a cofounder of http//originalpursuitssoc.com/ and I like computers, code, creativity, and friends.

You may also like...

Leave a Reply

%d bloggers like this: