CODINGTHOUGHTS

A blog about C#, Python, Azure and full stack development

Collections in .NET – Part 2

Introduction

This is part 2 of a series of blog posts about collections in .NET. In part one we looked at some of the simpler collection types and discussed common characteristics of collections in .NET.

Just to recap, Part 1 covered the following topics:

  • Arrays
  • Generic Collections
  • Lists
  • Boxing and unboxing
  • HashSets

In this post we will look at some further types of collections in .NET that you might find useful in general day to day circumstances.

First, we will continue looking at some generic collection types that were not discussed in Part 1: Hashtable, Dictionary and LinkedLists. Then we shall run through some common features found in most collection types: Iterators, Comparers and Sorting.

So without further ado, we will examine Hashtables and Dictionaries.

Hashtable and Dictionary Type Collections in .NET

Hashtables

Hashtable and Dictionary are broadly similar in their usage, but have a subtle difference. Let’s start by looking at how a Hashtable works.

Both Hashtables and Dictionaries are key/value stores that implement the IDictionary<TKey,TValue> interface. This means that each entry must have a key (to identify the entry) and a value (the entry’s payload).

Elements of a Hashtable are stored in ‘buckets’ that are defined by the key’s hash, the GetHashCode method determines its hash.

Hashtable hashtable = new Hashtable();

hashtable.Add("pig", "four");
hashtable.Add("chicken", "two");
hashtable.Add("dog", "four");

foreach (var item in hashtable) Console.WriteLine(item.GetHashCode());

This code will return the following:

2068647613
-482357880
-46073517

As you can see there are three unique hash codes returned based upon the name of the key. A custom hashcode provider can be passed into a Hashtable’s constructor using the EqualityComparer

Elements of a Hashtable are of type Object and are therefore subject to boxing and unboxing. This makes a Hashtable less performant than a Dictionary type as this uses value types.

Dictionaries

Dictionaries are very similar to Hashtables in that:

  • They are a collection key/value pairs
  • Both require immutable keys, that is, once an entry if added, you cannot alter the key

In fact, one could say that dictionaries are hashtables, conceptually at least. A dictionary is a generic collection type, Hashtable is not, and relies on boxing/unboxing. Effectively a Hashtable is similar to a Dictionary<object,object>.

Let’s look at an example of a Dictionary:

Dictionary<string, string> dictionary = new Dictionary<string, string>();

dictionary.Add("pig", "oink");
dictionary.Add("cow", "moo");
dictionary.Add("duck", "quack");

Console.WriteLine(dictionary["duck"]);

foreach (var animal in dictionary) Console.WriteLine(animal);

This code will display:

[pig, oink]
[cow, moo]
[duck, quack]

When we iterate through a Dictionary, the elements are returned as type KeyValuePair<string,string>. This of course is obscured by the use of var in the foreach statement above, but we could just as well have used:

foreach (KeyValuePair<string, string> animal in dictionary) Console.WriteLine(animal);

In contrast, iterating through a Hashtable would return each item as a DictionaryEntry struct.

Dictionary Capacity

You may specify the initial capacity of a Dictionary by using the Dictionary<TKey,TValue>(Int32) constructor. This can be useful if you know how many elements are to be added. If you do not know this and a size is not specified, the new dictionary will be given a default capacity, this value will grow as more items are added and the capacity is exceeded.

The big downside to this is that expanding the capacity involves allocating memory which in turn can be a performance hit in terms of CPU usage.

LinkedLists

.NET’s LinkedList class provides a more specialised use case and implements a classic linked list structure. In certain scenarios this can be beneficial, for example, where there is a need to insert additional elements into the middle of an existing list, or where particular elements in the list are likely to be removed. When compared to a List, LinkedLists can facilitate these operations in a more performance manner.

As a general rule, you could say that a List is better suited if you are adding a new elements to the beginning or end of the collection, whereas a LinkedList would confer similar performance inserting into the middle of the collection.

On the downside, LinkedList does no perform well when random access is required as it has no indexer, whereas a List and other types of collections in .NET do. A LinkedList is a good choice if access is going to be sequential.

In summary, choose a LinkedList if insertions/deletions are required within the collection and access is going to be sequential

Let’s look at a couple of examples of LinkedLists in use:

Adding Elements

var animals = new LinkedList<string>();
animals.AddFirst("cow");
animals.AddLast("duck");
animals.AddLast("cat");

Create From Enumerable

var names = {"cow", "duck", "cat"};
var animals = new LinkedList<string>(names);

Inserting Elements

LinkedListNode<string> duck = animals.Find("duck");
animals.AddBefore(duck, "dog");
animals.AddAfter(duck, "pig");

LinkedList Performance

As you can see in the example above, to actually find the point at which to insert a new element requires some effort, if you already have a reference to the node to insert before or after, that’s great – this is where LinkedList can be the right choice, but if not, and the collection is large, the performance hit involved in finding the insertion point may wipe out any performance gain from using a LinkedList in the first place.

As with many decision around which types of collections in .NET to use, understanding the context and it’s subtleties is often key.

Iterators, Comparers and Sorting

These three concepts are key when dealing with collections in .NET. They allow us to process a collection’s contents in an appropriate and suitable manner and are a jumping off point for creating your own customised functionality for collections.

Iterators

What is the point of a collection if you cannot get at the elements contained within it? A great many times, we need to process each element in a collection, or a subset thereof.

Any class in .NET that is enumerable, i.e. implements the IEnumerable interface may be iterated through. For straightforward use cases, you may iterate though a collection with a simple foreach loop. This will work for any collection that implements the IEnumerable or IEnumerable of T class. In fact, this is almost certainly the method you will be used to using in the vast majority of instances. If however you need to iterate through a collection in a different way, then you it may be appropriate to create a custom iterator.

Custom iterators employ the yield keyword and, again, act upon any collection that implements the IEnumerable or IEnumerable of T interface.

Here is a quick example of a custom iterator in action:

static void Main()
{
    foreach (int value in MyList()) => Console.Write($"{value} ";
}

publicstatic System.Collections.IEnumerable PrimeNumbers()
{
    yieldreturn2;
    yieldreturn3;
    yieldreturn5;
    yieldreturn7;
    yieldreturn11;
}

So how is this working? How after the first value (2 in this case) has been returned, does the runtime process know where it left off and resume from that point? The simple answer is that the compiler creates a state machine, this is able to keep track of the current position for then length of the foreach loops lifetime. This elegant construct will iterate round each ‘return yield’ statement until it either runs out or reaches a ‘yield break’ statement.

So when should you consider using a custom iterator?

  • A great use case for custom iterators is if you have a huge amount of elements in a collection, and you do not want to return the entire list. Using the yield keyword gives you more control over how the values are returned. For example, a conventional foreach loop will load the entire collection on its first iteration. This may not be desirable behaviour and be detrimental to performance.
  • Related to the point above, if you need to modify the contents of the collection you are iterating round after the first foreach iteration.

Comparing and sorting objects

IComparable of T

Before we get onto the topic of sorting collections, it may be a good idea to investigate how elements are compared against one another. This is straightforward if you are sorting a collection of ints or strings, but if the elements’ types are more complex, there may be some ambiguity about how to order them. For example, if we have a collection of books, we may want to sort them by title, price or number of pages, e.g.

publicclassBook
{
    publicstring Title { get; set; }
    publicint NumberOfPages { get; set; }
    publicdecimal Price { get; set; }
}

To control how books should be compared against each other, and by extension, how a list of books should be sorted, we may implement the IComparable of T interface. In doing so we implement the CompareTo method. This method returns a positive or negative value depending on whether the elements are greater or less than a second book given as a parameter. For example:

public class Book : IComparable<Book>
{
    publicstring Title { get; set; }
    publicint NumberOfPages { get; set; }
    publicdecimal Price { get; set; }

    public int CompareTo(Book otherBook)
    {
        if (this.NumberOfPages == otherBook.NumberOfPages)
            return0;
        elseif (this.NumberOfPages < otherBook.NumberOfPages)
            return-1;

        return1;
    }
}

In this example books are compared based upon number of pages. One could just as easily specify price or title on which to compare by. Indeed, more complex comparison and sorting could be achieved using this method, for example price per page:

public int CompareTo(Book otherBook)
{
    var thisPricePerPage = this.Price / this.NumberOfPages;
    var otherPricePerPage = otherBook.Price / otherBook.NumberOfPages;
    if (thisPricePerPage == otherPricePerPage)
        return0;
    elseif (thisPricePerPage < otherPricePerPage)
        return-1;

    return1;
}

The output of which would be:

Catch-22, 519 pages, £7.99
Hard Times, 430 pages, £6.99
Dubliners, 288 pages, £9.99

Sorting

Now we have established how any two items may be weighed up against each other, sorting a collection of these objects is trivial:

var books = new List<Book>();
books.Add(new Book { NumberOfPages = 519, Title = "Catch-22", Price = 7.99M });
books.Add(new Book { NumberOfPages = 430, Title = "Hard Times", Price = 6.99M });
books.Add(new Book { NumberOfPages = 288, Title = "Dubliners", Price = 9.99M });
books.Sort();

foreach (var book in books) { Console.WriteLine($"{book.Title}, {book.NumberOfPages} pages, £{book.Price}"); }

This code will output the following:

Dubliners, 288 pages, £9.99
Hard Times, 430 pages, £6.99
Catch-22, 519 pages, £7.99

The sort method will have used the IComparable interface previously defined against each element to produce the sorted list.

Conclusion

Parts 1 and 2 of this series should have given you a grounding in some of the main uses and mechanisms around collections in .NET, this is a key topic that unlocks many further framework areas. I hope these posts have been helpful, regardless of your level of experience and knowledge. I always think it is a valuable exercise to return to the fundamental building blocks of a technology periodically as these are so often used by many other higher level constructs.


Posted

in

by

Tags:

Comments

Leave a Reply