Collections of Data

Chapter 6 Collections of Data

Section 6.1 Data Structures

So far, we’ve only dealt with single values at a time. All our functions take in single value arguments and return single values. Sometimes, we need to be able to work with more than just single values. In theses times, we need a Data Structure.

A data structure is a combination of multiple values, the way those values are arranged, and a set of rules on how we can interact with those values. There are many different kinds of data structures used in computer science, each with their own costs and benefits. Understanding how they work allows us to pick the best one for a particular task.

Before we get into our first programming data structure, let’s look at a non-cs related example. Consider how two different people store their clean clothes. We’ll call these people Lisa and Bart.

Lisa stores all her clothes in her dresser, and she has different drawers for socks, underwear, shirts and pants.
Bart stores all his clothes in a pile on the floor.

Both Lisa and Bart store their clothes in a single place, but the ways the clothes are organized and how one interacts with those clothes are quite different. Lisa’s clothes are well organized, and when she wants to get a clean shirt, she can go directly to the shirt drawer and find one. But this organization comes at a cost. Whenever Lisa gets new clean clothes, she has to take the time to fold them and put them in the right place.

When Bart wants a clean shirt, he has to pull clothes off the pile until he finds one. This is certainly less effective that Lisa’s drawer method. But, when Bart gets new clean clothes, he can just empty the laundry basket on the floor and move on with his day.

Section 6.2 Linked Lists

Racket is a descendant of the Lisp programming language. Lisp is an acronym for LIST Processing. So it is fair to say that Lists are fairly important to the development of Racket. The lists we will work with are specifically called linked lists. A linked list collects values in nodes, where each node has two parts:

A particular value.
A link to the next node.

Section 6.3 Linked Lists in Racket

In Racket, there are many functions that work with lists. Let’s start with some basic list manipulation functions:

list: Takes 0 or more arguments and returns a list containing those arguments as elements.
- (list) returns the empty list: '()
- (list 1 2 3) returns the list: '(1 2 3)
- (list 1 #t "happy" -9.4) returns the list: '(1 #t "happy" -9.4)
first: Takes a list as an argument and returns the value stored in the first node of that list. first will result in an error if you try to use it on the empty list.
- (first (list 1 2 3)) returns: 1
- (first (list "hello" 2)) returns: "hello"
- (first (list (list 1 2) 3)) returns: '(1 2)
rest: Takes a list as an argument and returns node that the first element of the list is linked to. This will be the entire list after the first element. rest will result in an error if you try to use it on the empty list.
- (rest (list 1 2 3)) returns: '(2 3)
- (rest (list 2)) returns: '()
- (rest (list (list 1 2) 3)) returns: '(3)

Subsection 6.3.1 `build-list`

build-list is a function that, as you can probably guess, builds a list. But how it builds a list is where it gets interesting.

$ (build-list 5 sqrt)
> '(0 1 1.4142135623730951 1.7320508075688772 2)

(build-list n func) takes a positive integer, n and a function func, and creates a list of n elements generated by performing func on each integer starting a t 0 and going. to n-1.

(build-list 5 sqrt)

Is equivalent to:

(list (sqrt 0) (sqrt 1) (sqrt 2) (sqrt 4) (sqrt 5))

The second argument of build-list must be a function. Sometimes (often) you’ll need to provide your own function instead of relying on one of Racket’s built in ones. Let’s say you wanted to write a list that contains the powers of 2. There’s no powersOfTwo function in Racket, so we’d have to build our own.

(define powersOfTwo
  (lambda (n)
    (expt 2 n)))
(build-list 5 powersOfTwo)

This code will generate the list (1 2 4 8 16), but it’s worth asking Did I need to write and define powersOfTwo? The only reason we needed that function was because Racket didn’t have the function we needed. So do we have to write helper functions every time we need a custom build-list?

Yes,

but...

We don’t need to define a new function every time.

You may vaguely recall that we we introduced writing functions, lambda was described as something that creates and returns an anonymous function. Because we can pass functions as values, we can then pass the return value of lambda, which is a function, into define which will assign a name to the anonymous function just created. At the time, we didn’t have a good use for anonymous functions, so we kept using define every time we made a function with lambda.

But now we do have a good use for anonymous functions! Why spend time defining a function to use in build-list if that’s the only time we’re going to need it? If build-list takes a function as a parameter, then we can use lambda to make that function. Since we don’t need to reuse it, it can be anonymous. Turns out, anonymous functions are incredibly useful for list processing!

(build-list 5 (lambda (n) (expt 2 n)))
(build-list 5 (lambda (n) (* 3 n)))
(build-list 5 (lambda (n) n))

These are all valid build-list function calls that use lambda in place of a named function. The three lines of code above would produce:

'(1 2 4 8 16)
'(0 3 6 9 12)
'(0 1 2 3 4)

It is worth noting that even though func must be a function with one parameter, you don’t have to use that parameter. The function call below will make a list of 5 random integers between 0 and 10. In this case, the argument 5 is there to provide the size of the list, but the values 0 - 4 are not used.

(build-list 5 (lambda (x) (random 10)))

Section 6.4 Manipulating Lists

Now we can make lists using list or build-list, but how do we work with them after we’ve made them? There are two major ways to approach list manipulation in Racket. One was involved using built in functions that work with an entire list at once, the other is to work with each list node individually. We’ll explore both options, but let’s start with the build in functions.

Subsection 6.4.1 `filter`

filter is a function that we use to filter out (or more accurately, in) certain values from a list.

(define g (list 58 36 73 46 21 42 28 46 34 64))

Let’s say we wanted to get only the odd values in g as a list. To do this, we’d use:

(filter odd? g)

Similarly to build-list one of the arguments to filter is a function. In the example above, we’re using Racket’s odd? function. Specifically, the function argument to filter must be a function that takes one argument and returns a boolean value. filter will call the provided function on every element in the provided list. Any value that results in #true will be put into the list returned by filter.

Also like build-list, anonymous functions are quite useful here. Instead of getting the odd numbers, let’s get all the values in g greater than 50:

(filter (lambda (n)
          (> n 50))
        g)

It’s important to pay attention to the syntax here. (lambda (n) (> n 50)) is the first argument to filter, and g is the second. It’s a common error to accidentally add an extra ) to the end of the lambda function, which would actually close off filter.

Subsection 6.4.2 `foldl`

So far, our list functions have had reasonably good names list first rest build-list filter all do a great job of describing what they do. Then we have foldl. To be fair, the name does make sense, but in a more technical way than is useful to start. In other languages, the same operation is named reduce, which might be a little more helpful. The purpose is to take a list, perform a function on the elements in the list and get a value back. Often, the returned value is a single value (as opposed to a list) so we say that the list is reduced to a single value.

So how does this thing work? Here are the parameters for foldl:

(foldl func startValue g)

func is a function that must take exactly two arguments.
startValue is a value used the first time func is called.
g is the list to work on.

Each time func is called, the first argument will be the next value from the list, and the second argument will be the value returned from the previous time func was called. startValue takes the place of the previous return value at the beginning, since there will not have been a previous value. So foldl starts by doing:

(func (first g) startValue)

Let’s look at a concrete example:

(define g (list 58 36 73))
(foldl + 0 g)

To start, we’ll have:

(+ 58 0) ;(first g) is 58 0 is startValue

That will return 58 (0 being the ideal startValue for this case). The next operation will be:

(+ 36 58) ;(first(rest g)) is 36, 58 was the previous return value

That will return 94, and next:

(+ 73 94)

Which gives us the answer, 167.

Like filter and build-list we can also use lambda to help with foldl. Here is another more complex example:

(define g (list 58 73 36 46))
(foldl  (lambda (value previous)
          (and previous
               (> value 50)))
        #t
        g)

Table 6.1.

`value`	`func`	result
`58`	`(and #t (> 58 50))`	`#t`
`73`	`(and #t (> 73 50))`	`#t`
`36`	`(and #t (> 36 50))`	`#f`
`46`	`(and #f (> 46 50))`	`#f`

Subsubsection 6.4.2.1 What’s in a Name?

So what’s up with the name foldl? As it turns out, there are actually two parts to the name, fold and l, which in this instance stands for left. Another way of thinking about the function is that if folds the list down, element by element, from left to right. In the examples above, the direction of the folding doesn’t matter much, because order doesn’t matter + and and. When order is important, there is another option, foldr which works exactly like foldl, except it works right to left:

(foldl - 0 (list 58 73 36 46))
; this would be run as:
(46 (- 36 (- 73 (- 58 0))))
; and return:
25

(foldr - 0 (list 58 73 36 46))
; this would be run as:
(- 58 (- 73 (- 36 (- 46 0))))
; and return:
-25

Subsection 6.4.3 `map`

The last list processing function we’ll look at is map. Like filter and foldl, map goes through each element of a list, performing a provided function. Each individual return value is placed into a new list that is returned:

(define g (list 58 36 73 46 21 42 28 46 34 64))
(map  (lambda (value)
        (if (> value 50)
            value
            50))
        g)
;this would return:
'(58 50 73 50 50 50 50 50 50 64)

If you give map a function that has one parameter and a single list, it will call that function on each element of the provided list. But if you give map a function with two parameters and two lists, it will call the function using one element from each list at a time:

(define g (list 1 2 3))
(define h (list 5 7 11))
(map  (lambda (value0 value1)
        (* value0 value1))
        g
        h)
;this would return:
'(5 14 33)

In fact you can give as many lists as you want, as long as the following rules are followed:

The number of list arguments to map must equal the number of arguments to the provided function.
Each list must be the same length.

Here is another example worked out further:

(define g (list 58 73 36 46))
(define h (list 32 97 42 14))
(map (lambda (value0 value1)
      (if (> value0 value1)
          value0
          value1))
      g
      h)

Table 6.2.

`value0`	`value1`	`func`	result
`58`	`32`	`(if (> 58 32) 58 32)`	`'(58)`
`73`	`97`	`(if (> 73 92) 73 97)`	`'(58 97)`
`36`	`42`	`(if (> 36 42) 36 42)`	`'(58 97 42)`
`46`	`14`	`(if (> 46 14) 46 14)`	`'(58 97 42 46)`

Prev Top Next