Collections in .NET - Part 1

This is part 1 of a series of blog posts about collections in .NET, to start we will look at collection characteristics. We then turn to the basic collection types including Arrays and a couple of basic generic collection types.

Collections are one of the fundamental building blocks of the .NET framework. They allow us to group objects together and do useful things with them. They hold data in memory that a running application can read or alter. You are unlikely to encounter many non-trivial programs that do not utilise collections of one sort or another in some way.

Arrays

Arrays are a fundamental structure in .NET and underpin many of the internal structures and processes in the Framework itself (many of the other collection types actually utilise Arrays). They are one of the simplest ways in which you can group data items together.

.NET arrays are similar to arrays in lower level languages such and C or C++ in that they reference a particular range of contiguous memory. .NET arrays however, are actually objects and provide some additional properties and methods that make dealing with data easier and more intuitive.

Arrays enable you to store objects in a simple, fixed length collection that can be accessed by position. They’re a good choice when you don’t need any special inbuilt functionality to do things like sorting, resizing or searching.

One point to note is that values stored in arrays must be of the same type. To create a new array of string we would use:

string[] animals = newstring[] { "pig","cat","dog","cow","pig" };

Let’s work with this array. We can determine it’s length, sort it, copy it to a new array, compare it to another array, find where a particular element exists (indexof), reverse it or sort it. Here’s a few examples:

using System;

namespaceCollectionWorkspace
{
    publicclassArrayDemo
    {
        public ArrayDemo()
        {
            string[] animals = newstring[] { "pig","cat","dog","cow","pig","sheep" };

            Console.WriteLine($"Array length {animals.Length}, position of cat {Array.IndexOf(animals, "cat")}\n");

            Console.WriteLine("Array values:");
            DisplayAnimals(animals);

            Array.Sort(animals);
            Console.WriteLine("Sorted array values:");
            DisplayAnimals(animals);

            Console.WriteLine("Copy a subset of array:");
            string[] someAnimals = newstring[3];
            Array.Copy(animals, someAnimals, 3);
            DisplayAnimals(someAnimals);

            Console.WriteLine("Compare two arrays:");
            Console.WriteLine($"animals == someAnimals: {animals.Equals(someAnimals)}\n");

            Console.WriteLine("Cleared first 4 items:");
            Array.Clear(animals, 0, 3);
            DisplayAnimals(animals);
        }

        private void DisplayAnimals(string[] animals)
        {
            foreach (string animal in animals) Console.Write($"'{animal}' ");
            Console.WriteLine("\n");
        }
    }
}

Another interesting point is that all arrays are reference types, that is they are objects and regardless of whether they contain value types or not. They are themselves reference types.

So to summarise, here’s a few key characteristics of arrays to bear in mind:

Arrays are useful for dealing with a fixed number or strongly typed objects
Array dimensions can be single, multidimensional or jagged
Arrays cannot be resized
They are zero indexed (i.e. items are references starting at 0, not 1)
All array elements must be of the same type
Arrays may contain other arrays

Choosing the Right Collections in .NET

So when should you use an array over something like a List or Dictionary or any other types of collections in .NET?

The answer to that lies in how .NET stores and retrieves data from an array. Because arrays have a fixed size, their data can be stored in a contiguous range of memory and potentially offer faster retrieval. For example, if we have a 8 element array of 32 bit ints, we can safely say that:

element 1 will begin at memory location 0
element 2 will begin at memory location 32
element 3 will begin at memory location 64
element 4 will begin at memory location 96
element 5 will begin at memory location 128
element 6 will begin at memory location 160

..or that element 4 will begin at memory location 4 * 32 for example.

Data stored contiguously like this is generally less expensive to write or retrieve. In reality however, the data stored by a List of T will be stored in memory contiguously. The advantages of arrays in terms of performance in certain situations are negated by the flexibility and ease of use of Lists or other collection types.

Introducing Generic Collections in .NET

Let’s look at a category of collections in .NET called Generic Collections. We know what a collection is broadly speaking; it’s an object that contains a group of other objects and can be worked upon or processed in some way. Let’s try and define what we mean by generic.

How They Work

Generics were introduced into the .NET framework in version 2.0 and offer a way to provide a ‘type parameter’ when instantiating an object.

What this means is that we can design a class that will be given parameters, the types of these parameters are defined when an generic class object is instantiated.

So in the case of a generic collection, we can instantiate a list that only contains strings. No list items that are not strings may be added to this. We could then define another List that may only accept ints, and so on. In this way we can design classes that can be re-used in many different contexts, and increase type safety as well.

It’s easy to see why collection types are ideal candidates for generic behaviour. A collection is a container, like a bag for example. We probably wouldn’t want to go to the store and buy several bags, just because we might put different things in them at different times. We’d probably just buy one bag and accept that we can use it for carrying different types of objects as and when required.

So for instance, we could create a generic collection class that allows us to hold strings in one point in our code, and to hold ints at another point. This is prefereable to creating two classes, one to hold only strings, and one to hold ints.

Types of Generic Collections

Let’s take a quick look at the various types of generic collections in .NET. Note that all these types fall within the System.Collections.Generic namespace and implement the ICollection interface.

The pertanant point here is that by implementing ICollection, some standard methods and properties such as Add, Contains and Remove that make dealing with collections very intuative are provided.

List

Let’s jump straight in with an example of one of the simplest generic collections in .NET: a List of strings. The generic List collection type is very similar to the non generic ArrayList type. It may contain a list of items including duplicates:

var list = new List<int>();
list.Add(10);
list.Add("Hello"); // this is not possible

One difference between the non generic ArrayList and a generic list here, is to do with the fact that ArrayList stores objects, and when a value type is added to an ArrayList, a boxing operation occurs. When you want to retrieve the value, an unboxing operation occurs.

Boxing and unboxing

Let’s define what we mean by boxing and unboxing here. There are two kinds of variables in .NET, value types and reference types. Most numeric types are value types, so ints, doubles and decimals are all in this category, as are the bool, enum and char types.

When you assign a value to one of these variable, memory is allocated on the stack and the value is stored placed there directly. With reference type variables, such as those containing objects, the object is stored on the heap and the variable contains a reference to it.

Note the way in which data is stored, value types are stored in the stack and reference types in the managed heap. These two areas of memory are managed differently and have their own performance characteristics. Reference types stored on the heap are managed and subject to being cleaned up by the garbage collector, whereas values on the stack are not.

This is a simplistic description and there are additional complexities to memory allocation, but all we need to consider at this point is that boxing and unboxing operations can be costly.

So, to illustrate this, here’s an example of the non generic ArrayList:

var list = new ArrayList();
int n = 10;
list.Add(n);          /* boxing   */
int i = (int)list[0]; /* unboxing */

These boxing and unboxing operation are not great for performance and of course, there is no guarantee of the types of objects stored in the ArrayList, so exceptions could occur when items are unboxed.

You may not think this is a big problem as you know the type of what you put in the ArrayList and have cast it appropriately when you have retrieved the value – which is fine.

Problems may occur however if this ArrayList is passed elsewhere in your code, and it becomes less obvious what types it contains potentially leading to exceptions.

HashSet

A HashSet contains a set of unordered, unique values and can be accessed and used in a very similar way to Lists, this is due to the fact it implements the ICollection interface and all of those really useful methods are available. You can create a HashSet from an array:

string[] animals = newstring[] { "pig","cat","dog","cow","pig" };
var hashSet = new HashSet<string>(animals);
foreach (var animal in hashSet) Console.WriteLine(animal);

/* outputs:
pig
cat
dog
cow */

As you can see here, a set must contain unique values, the second ‘pig’ is not added to the set.

To understand how and when to use a HashSet, we need to dive into the way in which they work internally.

A key takeaway is to consider that because HashSets do not worry about the order of items and access items using a hash, they can identify whether an item already exists in the collection very very quickly indeed.

The downside to this is that you are not able to access items by their index.

Performance

Using ‘big O’ notation, HashSets are referred to as O(1) which means that the time it takes to retrieve an item is the same regardless of the number of items in the collection, whereas Lists for instance are referred to as O(n) meaning the amount of time it takes to perform calls to Contains or Remove is dependant on the value of n, or the number of items in the collection.

Adding items to a HashSet:

Console.WriteLine(hashSet.Add("llama")); // returns True
Console.WriteLine(hashSet.Add("cow")); // returns False

As you can see, trying to add an item that already exists returns False, whereas is the new item doesn’t exist and gets inserted into the set, True is returned.

Again, HashSets may be the right choice for you if you know your values are unique and need to retrieve specific items very quickly.

Conclusion

I hope that this brief exploration of a couple of collections in .NET has been useful, we have only just scratched the surface and I hope to explore some further collection types and discuss their uses in p art two of this series.

CODINGTHOUGHTS

Collections in .NET – Part 1