Working with Collections — Part 1

The goal of this series of articles is to help readers build a mental model of how to work with collections of data, with an eye toward making sense of the myriad of Ruby methods in the Enumerable module. Really, it should be helpful for understanding the basics of these methods in almost any language since we will not be going into technical detail or looking at much code. Rather, we will build our mental model of operations that we can perform on collections by focusing on the following:

  1. Understanding what the data looks like going in
  2. Getting a clear idea of what the data should look like coming out
  3. Understanding what needs to happen to each element to get from Step 1 to Step 2

For the sake of simplicity I will just refer to ‘lists’ or ‘collections’ here which you can think of this as simply a list with a specific order (though, not necessarily sorted). You can think of it as a list on a piece of paper if you want. If it’s helpful to think of a data structure you’re used to (such as an array) go for it, just don’t get too hung up on language-level details. I will refer to ‘elements’ or ‘items’ interchangeably to refer to each distinct part of the list.

Let’s get a real life example to work with — a shopping list. Remember, you can simply think of this as a list on a sheet of paper for now.

Here is what our data will look like going in:

  • Apples
  • Bananas
  • Cream of Wheat
  • Eggplant
  • Fish sticks

Congratulations, we’ve just completed Step 1 from above! While it’s important to have a clear idea of your initial data, I think Step 2 is where many people get tripped up. There are many, many ways we could transform this list and that’s why Ruby’s Enumerable methods are so overwhelming for beginners. It would be helpful if we could categorize the possible outputs. Let’s give it a try.

Given a simple list on a piece of paper we could imagine the following possibilities for what the data will look like coming out:

  1. The same list
  2. A list, of the same length, with each item changed in some way
  3. A (possibly) smaller list, with certain items removed
  4. A single item from the list
  5. An answer to a question about the list
  6. A list that is reordered, combined with another list, or grouped in some way
Photo by Glenn Carstens-Peters on Unsplash

Anything we can imagine doing to this list should be able to fit in one of the above categories. Let’s give it a try:

1. The same list:

Maybe your partner asks you to read off the list and you do. You’ve now iterated through the entire collection and done something with each element, and at the end you have the same collection left. Maybe you copy down every element from the whiteboard on your fridge onto a pad of paper to take to the grocery store. There are lots of things you can do with this list that involve going through each element, but leaving the original alone.

2. A transformed list, of the same length, with each item changed in some way

Let’s say you are reading through the list and your partner asks you to write something down along with each element:

  • Apples (Granny Smith)
  • Bananas (green)
  • Cream of Wheat (12 oz box)
  • Eggplant (2 large)
  • Fish sticks (gluten free)

Here we have the same collection, but with each element transformed. You might say that we’ve passed each element of our list to our partner and used their comments to transform our list.

Or what if we wanted to copy down the list from our fridge onto a scrap of paper to take to the store? The paper is pretty small so we need to abbreviate:

  • Apl
  • Bna
  • CoW
  • Egp
  • FS

In both of these examples we can see the direct connection from each element of our first list to each element of the second list. There is a one-to-one correspondence from an item in our original list to an item in our transformed list.

3. A (possibly) smaller list, with certain items removed

Let’s say your partner says,

“We’re going to the farmer’s market. Let’s buy our produce there and get the rest at the store later.”

So, now how will you need to transform your list? You won’t be changing any elements on the list, but you will be filtering items based on whether or not they are produce:

  • Apples
  • Bananas
  • Eggplant

(We'll ignore the fact that people in the northern latitudes, like me, can't get locally grown bananas at the farmer's market!)

In this case we have decided to select the produce items on the list. But it would be the same to say we are crossing off or rejecting anything that is not produce.

4. A single item from the list

Maybe your partner asks:

"What's the first item on the list that starts with the letter 'a'" → "apples"

Or how about,

"What's the first vegetable on the list?" → "Eggplant"

In both of these cases, there may be more than one answer that qualifies, but we only care about the first answer that satisfies our question. If our list also had 'Aquafina' and 'Green Beans', the answers would be the same, because this type of answer is only concerned with finding the first answer.

Can you think of other questions that would produce an answer that's just a single element from the list?

5. An answer to a question about the list

This is very similar to the previous category. The way we traverse the list will be similar, but here we are not looking for an element in the list, we are trying to answer a question about the list or elements within it. So, questions like this:

Is there anything on the list that begins with the letter 'a'? → yes

Is there a vegetable on the list? → yes

Notice our search was exactly the same as the examples directly above. If you were going through the list by hand, you should have stopped on the same elements each time. But our question was different so we got a different answer. Instead of retrieving an element from the list, we find an element that answers our question.

Let's try one more:

Is there any candy on the list? →no

In this case, we go through the same process of searching the list, but we don't find anything that qualifies for our question. Only once we reach the end of the list can we answer 'no'. Compare that with a question about all the elements in a list. Something like this:

Are all the items on this list produce? → no

Here we would be able to stop as soon as we get to 'Cream of Wheat' since it is not produce. It seems like we are asking a question about all the elements, but sometimes we can quit as soon as we get to an element that answers our question.

We could reformulate the question like this:

Are any of the items on this list not produce? → yes

This might sound a little awkward, but do you see how it is the same question as above? It will give the same answer, but with this formulation it is much more clear that we might not have to traverse the whole list.

For certain questions you must traverse the whole list if it is unordered. This is often true about counting questions:

  • How many items are on the list? → 5
  • How many times does the letter 'p' appear in our list? → 3
  • How many items begin with the letter 'g'? → 0

It is also true of questions that search for a maximum or minimum value in a list:

  • What item on the list has the most letters? → 'Cream of Wheat'
  • Which item in the list has the most occurrences of the letter 'n'? → 'Banana'

Each of these answers requires us to search the whole list (although if we know the list is ordered we might be able to stop early, but that is beyond the scope of this article). It's always possible that the last item in the list will have one more of the thing we are looking for, so we have to keep going.

This category covered a lot of ground, but the common thread is that we are trying to answer a single question about some or all of the items in our list.

Photo by Markus Winkler on Unsplash

6. A list that is reordered, combined with another list, or grouped

This is the most complex and diverse group and we are not going to cover it in depth here. It's probably wise to master the previous categories before delving into this one too deeply. However, it is important to recognize when your desired output falls into this category.

All of the previous categories had a few things in common:

  1. We iterated through the list once
  2. We were only dealing with items from one list at a time
  3. We only had to consider one item from that list at a time

The examples below will each break one or more of these rules. Sometimes we are dealing with multiple lists, other times answering the question will require us to make multiple passes through the list, and other times we need to consider multiple elements of the list at the same time.

Let's imagine some scenarios here:

You decide to reorder your list based in reverse alphabetical order:

  • Fish sticks
  • Eggplant
  • Cream of Wheat
  • Bananas
  • Apples

Although this is the same list, with the same elements, do you see what has fundamentally changed? In some data structures, the order of the elements doesn't matter, but in others it does. If this is stored in a basic array or list, this is a different list than what went in.

How about another example: Your partner has another list and they would like you to get these items when you go to the store:

  • Dog food
  • Granola

If you're anything like me, you don't want two pieces of paper to lose, so you'll want to combine your partner's list with yours. You might just append their list onto yours:

  • Apples
  • Bananas
  • Cream of Wheat
  • Eggplant
  • Fish sticks
  • Dog food
  • Granola

Or, you might want to keep the alphabetical order of your list:

  • Apples
  • Bananas
  • Cream of Wheat
  • Dog food
  • Eggplant
  • Fish sticks
  • Granola

(Note that this probably means you will need to write out a new list.)

For small lists, alphabetical order or even random order is probably fine. It won't take you too long to scan your list in each aisle of the store to see if you need something. But let's say we wanted to be a little more efficient and group the items based on where we will find them. We end up with a new nested list like this:

  • Produce:
    — Apples
    — Banana
    — Eggplant
  • Cereal:
    — Cream of Wheat
    — Granola
  • Pet food:
    — Dog food

There are many possibilities here, but the basic idea is that these collections have a different order, a different structure, and/or new elements. Can you think of other ways you could transform your list in this way?

Practice

Before moving on, let's get a little bit of practice. Use the list below and answer the series of questions that follow. After answering mine, come up with as many questions as you can that fit in that same category. The goal is to get a feel for what kinds of questions fit into each category. This practice will help you immensely as you learn to get a feel for which Ruby method to reach for when you are solving a particular problem.

Continents of the world:

  • North America
  • South America
  • Asia
  • Europe
  • Africa
  • Oceania/Australia
  • Antarctica

(I recognize there are some discrepancies in what is considered a continent. Feel free to substitute the list you know.)

Try to come up with some questions that will give you a result in each of our categories:

  1. The same list
  2. A list, of the same length, with each item changed in some way
  3. A (possibly) smaller list, with certain items removed
  4. A single item from the list
  5. An answer to a question about the list
  6. A list that is reordered, combined with another list, or grouped in some way

If you would like even more practice, here Wikipedia's List of lists of lists. Find one that looks interesting and go through the categories again.

Father of two. Musician. Music therapist. Teacher. Aspiring software engineer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store