Programming in C: Arrays and Strings

This article is part of a series – How to Program Anything: C Programming

Preface

In my previous article I discussed basic data types that can make up a given variable declaration.  This article takes on variable declaration from another angle.  Some variables can hold more than one value.  These values are specified in a sequence from beginning to end.  You’ve run into some of these when we’ve discussed “strings,” which are really a sequence of numerical values indicating letters.  Variables that can hold a sequence of values are commonly called arrays, and yes, C supports them.

An array is a sequence of individual variables that can be accessed under one identifier using an index.  That is, say, I have an array of three chars  char myArray[3] = {6, 7, 8};   I can access the first char by typing  myArray[0] , and the second char as well by typing  myArray[1] , and so on.  As you can see myArray is the identifier, and the 0 or 6 is in the index.  In this case the first expression would result in the value 1, and the second expression would rseult in the value 7.  Array indexing starts at zero  in C instead of 1.

Arrays can even be made up of arrays, and in this sense they are known as multidimensional arrays.  Each additional “layer” of arrays is called a dimension.  So for example we could have a two dimensional array like so:

You would index into such an array by specifying the position of the first array, that is the array of arrays, and then the index of the second internal array.  For example to access the number 3 above you would write [2][2], or to access 6 above you would write [1][2].  For the beginning of the array, the 7, you would write [0][0].

Each value in an array is a separate variable from all the other values.  I can assign a new value to a specific location in an array, such as my2DArray[0][2] = 12;  and the first array would then be {7, 8, 12} .

As you can see an array is simply a sequence of values gathered under one variable name.  In C the most common array is known as the string.  We’ve discussed strings in previous articles.  This is where they are fully explained.  In C a string is simply a series of numerical values, stored usually as chars, that indicate a character or letter in an encoding, usually ASCII.  They are null-terminated, meaning that the string is assumed to continue on until it hits the null value (0).

NOTE: It is important to remember that C does not bounds-check arrays.  What that means is that the C compiler does not usually check to see if you are referencing array elements that don’t exist.  It is up to you to make sure you only reference array elements that you have defined, lest you write over some portion of memory accidentally.

Single Dimensional Arrays

In C, when you define an array you are really indicating that you’d like to have a sequence of values stored in memory one right after the other.  This is known as being stored contiguously in memory, with the lowest position having the lower memory address, and the last position having the highest memory address.  Each element in the array in memory is “next” to each other.  Because this is a requirement of C, it is necessary in general (except in some cases with C99) to determine or declare how “big” the array is going to be when you define it for the first time.  This way the C compiler can determine how much space it needs to clear and take up in memory to store your values.

The usual way of doing this is straight away when you declare the array.  Since an array is simply a normal variable extended into a sequence, declaring it is not too different than declaring a normal variable.  The pattern for a basic single-dimension array would look like this (examples follow):

In C89 the size of an array must be declared explicitly with a literal (meaning not a variable in this case), setting its size in stone for the compiler.  In C99 it’s possible to use a variable as a size indicator, causing the array to be allocated and determined post-compilation, meaning as the program runs (run-time).

To access an element in an array we place a numerical value between two brackets starting at the number 0.  So to access the first element of myName above I would write myName[0] , and the second element I would write myName[1] , and so on.  Be careful here though!  When you declare the array you must specify the number of elements as counting from 1.  What this equates to is that myArray[5] starts at 0 and ends at 4, myName[6] starts at 0 and ends at 5.

Since arrays are contiguous in memory, it’s sometimes necessary to calculate how much memory they take up.  The first thing you have to determine is how large is the data type making up the array.  If it is a char for instance, then each element takes up one byte.  If it’s a short int though, each element would take up 2 bytes, or 16 bits.  Once you have the space taken up by one element down, you simply multiply that amount by the number of elements in the array to arrive at the memory footprint of the array.  So myName would be 6 bytes, someNumbers would be (23 x 4 bytes) equaling 92 bytes.  This method can be broken down into the following mathematical template:

Multi-dimensional Arrays

As written above, it is possible for an array to have multiple “dimensions”.  I think this somewhat causes confusion because it’s not a particular special quality the array has that makes it “multi-dimensional”.  All this really boils down to is that an array can contain further arrays.  That is, each element in one array can contain another array of a fixed size, and each element in THAT array could contain another array of a fixed size etc.  The “dimension” is really just a count of how many arrays deep does the number of embedded arrays go.

A two dimensional array for instance is simply an array of arrays.  A three dimensional array again follows that it is simply an array of arrays of arrays.  In C, since arrays generally must be initialized to a fixed size, you specify how big these arrays are at the declaration.  Each additional array requires it’s own “index bracket” to be tacked on to the array definition.  As you saw above in the preface, you would specify a two dimensional array, say an array of 5 arrays of 2, like so:

We can boil that declaration down into this template:

In this template, however large N gets in sizeN is how many “dimensions” the array has.  It can become important to determine the size of a given multi-dimensional array.  It is simply an augmentation to the original equation given above.  You multiply the given size of the extra dimension to the end of the equation:

One way you can further grasp multi-dimensional arrays is to examine a two-dimensional array.  Imagine as if you were laying out each element of the “first” array out in a stack.  This would start to create a square, where the row numbers of the stack would be the first element, and the second index would act as the column of that particular row, like the graphic below:

Array Initialization

A nice thing about C is that it allows the contents of arrays to be specified at the same time they are declared.  Just as I can declare a variable to have a specific value at the time I define it like so:

I can also declare arrays to have specific values when they are defined (you saw examples of this earlier in this article).  To do so I must provide a sequence of values.  This sequence can be a string, which we’ll look at, but outside of strings is instead a series of values separated by commas, and enclosed in curly braces.  These values must be literals, meaning actually typed values and not variables themselves (see note below).  The values in the sequence MUST be compatible with the data type given by the array.  The array is then constructed by inserting the first value in the sequence into the first element of the array, the second value in the sequence goes into the second element of the array and so on.  It can be reduced to this template and example:

In this instance myNumbers[0] will equal 1 and myNumbers[4] will equal 5.

Note: For advanced users please note C99 allows non-literals to be used in the value sequence for local arrays.  C89 requires that all initializer values be literals

Multi-dimensional arrays are initialized much the same way, through a list of values.  For example, this array holds a number in the 0 index of a set of arrays, and double that number in the 1 index of a set of arrays:

However, with multi-dimensional arrays there’s a cool thing you can do in its initialization and that is known as subaggregate grouping.  All that means is that you supply curly braces around the definition of each smaller array in the bigger array.  It would look something like this:

The only advantage of subaggregate grouping (besides readability) is that if you don’t specify enough values in the sub-list to fill the array, all the rest of the array values are set to zero automatically.

In C a string, being "something like this" ,  is a one-dimensional array of chars terminated by a null value (null here being zero).  This is the only “official” form of a string in the C language, and all the standard library functions that expect strings expect them to be of this fashion.  Some libraries implement their own string formats, but these formats are not supported by the standard library.

There is a short-hand for strings in array initialization beyond simply the value list.  It is possible to assign an array of chars straight to a literal string representation (that is a string you type out yourself in between quotation marks).  So you might do something like this:

This is equivalent to:

As you can see that’s a lot handier, but we haven’t got to the most handy of initialization techniques yet.  That beautiful thing is called unsized array initialization.  This is very useful when you’re specifying a lot of strings by hand in your code.  As you could see above, we had to manually set the size of the array to store our error message.  But what happens if we change our error message?  Let’s say we add a word, or remove the period, we’d have to count out how many characters were in the string again and specify them exactly in our array declaration.  But, what if we didn’t?  That’s right, we don’t.  We could actually write the following:

And the C compiler would automatically determine how many elements the array needed to be to accommodate the given string literal.  This doesn’t just work on strings, but on any array initialization.  It also works on multi-dimensional arrays however, when working with an array that has more than one dimension you DO have to specify the remaining dimensions’ sizes.  It might look something like this:

Designated Initializers

In C99 you have a few more options when determining the contents of an array.  Sometimes you only want to set certain indices of an array when you first define it.  You can accomplish this by using a designated initializer.  This takes the form of:

where index is the index you want to initialize and the value is the value you wish it to have.  For example, to only initialize elements 5 and 7 of a 10 element array you would write something like the following:

This can come in handy when dealing with sparse arrays.  Keep in mind though that this is a a C99 only feature, so check your version of C with your compiler.

C99’s Variable Length Arrays

In C89 array dimensions must be declared using number literals, such as 5 or 7.  In C99 it is possible to use a variable or other identifier to determine the size of an array at declaration.  This is an advanced topic, which is why I’m covering it last.  An array declared whose dimensions are any valid expression, particularly if these are only known at run-time, is called a variable-length array.  DON’T confuse this with a dynamic array whose size can change.  There are limitations to such declarations however, they can only be done on arrays in block or prototype scope (an advanced topic).

Conclusion

Single value variables are great, and can manage to solve a whole host of problems, but sometimes we need to know a bunch of values in a sequence.  This is particularly useful for strings, which C sees as simply a series of one byte numerical values that to us represent letters.  Imagine storing each letter in a string in a separate variable, that would be atrocious and confusing.

But sequences of values can go beyond that, for example multi-dimensional arrays.  Maybe we’re setting up an error system and we need to store a bunch of error messages in a sequence.  This is a sequence, or array, of arrays, or strings.  Or maybe we’re looking at a two-dimensional plot of heights on a map and need to get those into the computer in a way the programmer can easily process and understand.

Sequences of values play even more important roles when we start getting into memory management and start allocating buffers.  Without buffers we wouldn’t be able to read in files from differnt input streams, or output our information to other devices or programs.

Arrays give us a wealth of opportunities to creatively solve our problems, and I hope I was able to elaborate on more of their functioning in this article.  They can be confusing at first, particularly multi-dimensional arrays, but they are simply sequences (of seqences of sequences…).  Thanks for reading!

This article is part of a series – How to Program Anything: C Programming

If you appreciate this article you might consider supporting my Patreon.

But if a monthly commitment is a bit much, I get it, you might consider buying me a coffee.

photo credit: NBStwo Beale Street 2008 via photopin (license)

kadar

I'm just a wunk, trying to enjoy life. I am a cofounder of http//originalpursuitssoc.com/ and I like computers, code, creativity, and friends.

You may also like...

Leave a Reply

%d bloggers like this: