Introduction to University Mathematics (2025-26): 3 Functions

3.0 Definitions and examples

Definition 3.1 (Function)

Let \(X\) and \(Y\) be sets. A function \(f\) from \(X\) to \(Y\) is a subset of \(X\times Y\) with the property that for each \(x\in X\), there is exactly one value of \(y \in Y\) such that \((x,y)\) is in the subset.

Of course, this is probably not how you think of functions in practice (and indeed, some people would say that what I've defined above is in fact the graph of a function). The property in the definition means that we can think of the function as a map or mapping from \(X\) to \(Y\) such that for each \(x\in X\) there is an assignment of a particular element \(f(x)\in Y\).

With this way of thinking, a function takes an input element from the set \(X\), and maps it to an output in the set \(Y\). The input is often referred to as the argument of the function (the word `argument' has many different meanings!).

Notation.

We write \(f:X\rightarrow Y\) to mean that \(f\) is a function from \(X\) to \(Y\).

The arrow here highlights the idea that the function takes values from \(X\) and to \(Y\).

A set X on the left, and a set Y on the right, with A green arrow labelled f from an element x of X to an element f(x) of Y.

Definition 3.2 (Domain, codomain)

For a function \(f:X\rightarrow Y\), the set \(X\) is called the domain of \(f\), and the set \(Y\) is called the codomain of \(f\).

Definition 3.3 (Equal functions)

Two functions \(f\colon X\rightarrow Y\) and \(g\colon X\rightarrow Y\) are equal (in which case we can write \(f = g\)), if and only if \(f(x) = g(x)\) for every \(x \in X\).

Note that we require functions to have the same domain as each other, and also the same codomain as each other, in order to be equal.

Example 3.4

We can define a function \(f\colon \mathbb{R} \rightarrow \mathbb{Z}\) that takes any real number \(x\) and returns the least integer that is greater than or equal to \(x\). (This is called the ceiling function, and is denoted by \(\lceil x \rceil\). You may be familiar with it as "rounding up"; there is a related floor function, \(\lfloor x \rfloor\) that `rounds down').

Then \(f(3.5)=4\) and \(f(-4)=-4\) but, for example, \(f(1.01+2.5\mathrm{i})\) is not defined.

Notation.

We might define a particular function by explaining what each element \(x\) maps to, using the notation \(x\mapsto f(x)\)

Example 3.5

The function that adds one to a natural number can be defined as \(f:\mathbb{N}\rightarrow \mathbb{N}\) with \(n\mapsto n+1\).

Beware; the function \(f\) is not the same thing as the value \(f(x)\). This matters when we start to talk about the properties of the function \(f\) itself, which might be very different from properties of values of the function \(f(x)\).

The definition of a function requires that a unique element of the codomain is assigned for every element of the domain, so when we define a function we need to take account of this. For example if we want to define a function \(f\colon\mathbb{R} \rightarrow \mathbb{R}\), the assignment \(x\mapsto 1/x\) is not sufficient, since it fails at \(x = 0\). Similarly, the recipe "\(f(x) = y\) where \(y^2 = x\)" fails for two reasons: one is that \(f(x)\) is undefined for \(x <0\), and the other is that for \(x>0\) it does not return a unique value - does \(f(4)\) equal \(2\) or \(-2\)? In such cases, we say the "function" is ill-defined (it doesn't satisfy the definition of a function). We are interested in the opposite; functions that are well-defined. When we start to define more complicated functions, it may not be so immediately obvious whether this is the case, so some effort is often required to demonstrate that a given recipe produces a well-defined function.

Another example of what might seem to be a function, but which is not well-defined (and therefore not actually a function), is if we try to let \(f\colon \mathbb{Q} \rightarrow \mathbb{Z}\) be given by \(f(\frac{m}{n}) = n\). The problem here is that there is not a unique way of expressing an element of \(\mathbb{Q}\) as \(\frac{m}{n}\), so this assignment gives multiple different values for the same argument (for example \(f\left(\frac{2}{3}\right) = 3\) and \(f\left(\frac{4}{6}\right) = 6\), but for the function to be well-defined we need these to be the same).

As the remarks above highlight, the definition of a function needs to make clear the domain and codomain, not just the "formula". The following are all different functions.

Example 3.6

\( \begin{align*} f_{1}\colon \mathbb{R}\rightarrow\mathbb{R}\qquad & \text{given by}\qquad f_{1}(x)=x^{2}.\\ f_{2}\colon\mathbb{R}\rightarrow\lbrack0,\infty)\qquad & \text{given by}\qquad f_{2}(x)=x^{2}.\\ f_{3}\colon\lbrack0,\infty)\rightarrow\mathbb{R}\qquad & \text{given by}\qquad f_{3}(x)=x^{2}.\\ f_{4}\colon\lbrack0,\infty)\rightarrow\lbrack0,\infty)\qquad & \text{given by }\qquad f_{4}(x)=x^{2}. \end{align*} \)

Definition 3.7 (Image, range, pre-image)

Given a function \(f\colon X\rightarrow Y,\) the image or range of \(f\) is

\(\displaystyle f(X)=\{f(x) : x\in X\} \subseteq Y.\)

More generally, given \(A \subseteq X\), the image of \(A\) (under \(f\)) is

\(\displaystyle f(A)=\{f(x) : x\in A\} \subseteq Y.\)

Given \(B \subseteq Y\), the pre-image of \(B\) (under \(f\)) is

\(\displaystyle f^{-1}(B)=\{ x : f(x) \in B\} \subseteq X.\)

Beware that some authors say "range" for what I would call the codomain of a function.

Example 3.8

For the function \(f_1\) defined above in Example 3.6, the image is \([0,\infty)\), and \(f_1([0,1]) = [0, 1]\), \(f_1^{-1}([0,1]) = [-1,1]\), and \(f_1^{-1}((-\infty,0]) = \left\{ 0 \right\} \).

For \(f_4\), the image is \([0, \infty)\), and \(f_4([0,1]) = [0,1]\), \(f_4^{-1}([0,1]) = [0,1]\), and \(f_4^{-1}((-\infty,0])\) is not defined, since \((-\infty,0]\) is not a subset of the codomain in this case.

Beware the confusing notation: for \(x \in X\), \(f(x)\) is a single element of \(Y\), but for \(A \subseteq X\), \(f(A)\) is a set (a subset of \(Y\)). Formally, the function \(f\) induces a map of power sets \(f_\ast:\mathcal{P}(X)\rightarrow\mathcal{P}(Y)\) defined by \(f_\ast(A)=\{f(x):x\in A\}\) for all \(A\subseteq X\). But outside of Category Theory, this distinction is not usually made, and we'll write \(f\) for both functions.

Beware also; the notation \(f^{-1}(B)\) should be read as "the pre-image of \(B\)" and not as "\(f\)-inverse of \(B\)". The pre-image is defined even if no inverse function exists (in which case \(f^{-1}\) on its own has no meaning; we discuss invertibility of a function below). Also, we'll see that even when the inverse function exists, it's a function on \(Y\), whereas this notation describes a function on the power set of \(Y\). The notation \(f^{-1}(B)\) for the pre-image of a set is widely used like this, I'm afraid, so we'll have to work with it.

Proposition 3.9

Let \(f\colon X\rightarrow Y\) be a function.

(a) For any \(A\subseteq X\) we have \(A\subseteq f^{-1}(f(A))\), but do not have equality in general.

(b) For any \(C\subseteq Y\) we have \(f(f^{-1}(C))\subseteq C\), but do not have equality in general.

Proof. (a) By definition \(A\subseteq f^{-1}(f(A))\) as the elements of \(A\) map into \(f(A)\). However other elements may also map into \(f(A)\). If we consider the map \(f(x)=x^{2}\) from \(\mathbb{R}\) to \(\mathbb{R}\) then we see for \(A=\{1\}\) that

\(\displaystyle f^{-1}(f(A))=f^{-1}(\{1\})=\{-1,1\}\neq\{1\}=A.\)

(b) We immediately have \(f(f^{-1}(C))\subseteq C\) as \(f\) maps the elements of \(f^{-1}(C)\) into \(C\) by definition. However \(f(f^{-1}(C))\) need not equal \(C\). For example with the same \(f\) as in (a) and \(C=\{-1\}\) we see that \(f^{-1}(C) = \varnothing\), so

\(\displaystyle f(f^{-1}(C))=f(\varnothing)=\varnothing\neq C.\)

\(\square\)

If a function is defined on some larger domain than we care about, it may be helpful to restrict the domain:

Definition 3.10 (Restriction of a function)

Given a function \(f\colon X \rightarrow Y\) and a subset \(A \subseteq X\), the restriction of \(f\) to \(A\) is the map \(f|_{A}\colon A \rightarrow Y\) defined by \(f|_{A}(x) = f(x)\) for all \(x \in A\).

The restriction is almost the same function as the original \(f\) - just the domain has changed.

Another rather trivial but nevertheless important function is the identity map:

Definition 3.11 (Identity function)

Given a set \(X\), the identity \(\text{id}_{X}\colon X\rightarrow X\) is defined by \(\text{id}_{X}(x) = x\) for all \(x\in X\). Other notation is sometimes used, such as \(1_{X}\), and if the domain is unambiguous, we might just write \(\text{id}\).

3.1 Injectivity and surjectivity

Definition 3.12 (Injective, surjective, bijective, bijection)

Let \(f\colon X\rightarrow Y\) be a function.

(i) We say that \(f\) is injective, or one-to-one, if whenever \(f(x_{1})=f(x_{2})\) then \(x_{1}=x_{2}\). In words, each element of the image has a unique pre-image.

(ii) We say that \(f\) is surjective, or onto, if for every \(y \in Y \) there exists \(x\in X\) such that \(f(x)=y\). In words, the image of the domain is the codomain.

(iii) We say that \(f\) is bijective if it is both injective and surjective. In words, each element of the codomain has a unique pre-image.

A bijective function is called a bijection (and similarly, we say injection and surjection for injective and surjective functions respectively).

The figure below shows schematic representations of functions \(f\colon X \rightarrow Y\) that are (a) both injective and surjective, (b) injective but not surjective, (c) surjective but not injective, (d) neither injective nor surjective. Note that \(X\) and \(Y\) are not the same between pictures.

Functions. (a) 3 points map to 3, 1:1. (b) 2 points to 3, missing 1. (c) 3 points to 2, with clash. (d) 3 to just 2 of 3.

Example 3.13

The ceiling function from \(\mathbb{R}\) to \(\mathbb{Z}\) defined in Example 3.4 is surjective, because for any integer \(n \in \mathbb{Z}\), we have \(\lceil n \rceil = n\). But it is not injective, because there are other elements in \(\mathbb{R}\) (infinitely many of them in fact) that will return the same output; for example \(\lceil n-\frac{1}{2} \rceil = n\).

Example 3.14

Returning to the functions defined in Example 3.6, we see that the function \(f_1\) is not injective (because, for example, \(f(-1) = f(1) = 1\)), and it is not surjective (because there are no \(x \in \mathbb{R}\) that give \(x^2 = y\) for \(y<0\)).

The function \(f_2\) is also not injective but is surjective (because \(y<0\) is not in the codomain this time, and for every \(y \in [0,\infty)\) there is some \(x \in \mathbb{R}\) with \(x^2 = y\)).

The function \(f_3\) is again not surjective, but this time it is injective (because negative values are now excluded from the domain).

The function \(f_4\) is both injective and surjective (and is therefore a bijection).

Having introduced injective and surjective functions, we can give an alternative and more intuitive definition of the cardinality of finite sets:

Definition 3.15 (Finite set, cardinality of a set (with bijections))

The empty set \(\varnothing\) is finite and has cardinality \(|\varnothing| = 0\). A non-empty set \(S\) is said to finite and have cardinality \(|S| = n \in \mathbb{N}\) if and only if there exists a bijection from \(S\) to the set \(\{ 1,2,\ldots n \}\).

The bijection provides a way of counting the elements of \(S\). You might like to convince yourself that this definition is equivalent to the inductive definition given earlier.

Note that for finite sets \(X\) and \(Y\), a function \(f\colon X \to Y\) can only be injective if \(|Y| \ge |X|\), since for any injective function the number of elements in the image \(f(X)\), is equal to the number of elements in the domain, and \(f(X) \subseteq Y\). In other words, the codomain of an injective function cannot be smaller than the domain. This is sometimes referred to as the pigeonhole principle (so called from the observation that if \(n\) letters are placed in \(m\) pigeonholes and \(n>m\), then at least one hole must contain more than one letter; the non-injective function in that case is the assignment of pigeonholes to letters).

Similarly, a function between finite sets \(f\colon X \to Y\) can only be surjective if \(|Y|\le |X|\). Hence if \(f\) is bijective, then \(|X| = |Y|\); that is, the domain and codomain of a bijection have equal cardinality.

For infinite sets, it is a version of this new `bijective' definition that leads to the conclusion that \(\mathbb{N}\) and \(\mathbb{Z}\) and \(\mathbb{Q}\) have the same cardinality (there are bijections between these sets), but \(\mathbb{R}\) does not have the same cardinality (there is no bijection between \(\mathbb{N}\) and \(\mathbb{R}\)).

3.2 Composition of functions and invertibility

Definition 3.16 (Function composition)

Given two functions \(f\colon X\rightarrow Y\) and \(g\colon Y\rightarrow Z\), the composition \(g\circ f\colon X\rightarrow Z\) is defined by

\(\displaystyle (g\circ f)(x)=g(f(x))\qquad\text{for all }x\in X.\)

If \(Z = X\) then we can similarly define \(f \circ g \colon Y \rightarrow Y\), but in general \(f\circ g \neq g \circ f\) (indeed, if \(X \neq Y\), these functions have different domain and codomain, but even in the case \(X = Y\) the functions will generally not be the same). For example, if \(f(x) = x^2\) and \(g(x) = e^{x}\) are both maps from \(\mathbb{R}\) to \(\mathbb{R}\), then

\(\displaystyle (f\circ g)(x)=e^{2x}\neq e^{x^2} =(g\circ f)(x).\)

This shows that composition of functions is not commutative. However, composition is associative, as the following result shows:

Proposition 3.17 (Function composition is associative)

Let \(f\colon X\rightarrow Y,\,g\colon Y\rightarrow Z,\,h\colon Z\rightarrow W\) be three functions. Then

\(\displaystyle f\circ(g\circ h)=(f\circ g)\circ h.\)

Proof. Let \(x\in X\). Then, by the definition of composition, we have

\(\displaystyle (f\circ(g\circ h))(x)=f((g\circ h)(x))=f(g(h(x)))=(f\circ g)(h(x))=((f\circ g)\circ h)(x).\)

\(\square\)

The following proposition addresses the extent to which composition of functions preserves injectivity and surjectivity:

Proposition 3.18

Let \(f\colon X\rightarrow Y\) and \(g\colon Y\rightarrow Z\) be functions.

(i) If \(f\) and \(g\) are injective then so is \(g\circ f\). Conversely, if \(g\circ f\) is injective, then \(f\) is injective, but \(g\) need not be.

(ii) If \(f\) and \(g\) are surjective then so is \(g\circ f\). Conversely, if \(g\circ f\) is surjective, then \(g\) is surjective, but \(f\) need not be.

Proof and commentary. We prove (i), and leave the proof of (ii) as an exercise.

It is helpful to clarify for each part of the proposition what exactly we are told (the hypotheses), and what exactly we need to show. For the first part of (i), we can take it that \(f\) and \(g\) are injective, and need to show that \(g \circ f\) is injective. From the definition of injectivity, that means we need to show that for any \(x_1,x_2\in X\), if \((g\circ f)(x_1) = (g\circ f)(x_2)\) then \(x_1 = x_2\). So let's suppose \(x_1,x_2 \in X\) and \((g\circ f)(x_1) = (g\circ f)(x_2)\), and aim to show \(x_1 = x_2\), making use of what we know. From the injectivity of \(g\) we know that if \(g(f(x_1)) = g(f(x_2))\) then \(f(x_1) = f(x_2)\), so this must be the case here. Then from the injectivity of \(f\) we know that this means \(x_1 = x_2\). So we have indeed shown what is needed for \(g\circ f\) to be injective.

For the second part of (i), we are told that \(g \circ f\) is injective, and we need to show that \(f\) is injective; that is, we need to show that if \(f(x_1) = f(x_2)\) then \(x_1 = x_2\). So let's suppose that \(f(x_1) = f(x_2)\) and aim to show this, making use of what we know about \(g \circ f\) this time. Applying \(g\) to both sides gives \(g(f(x_1)) = g(f(x_2))\), and then we see that the injectivity of \(g \circ f\) immediately tells us that \(x_1 = x_2\). So we have shown that \(f\) is injective.

An alternative approach here could have been to use contradiction. If we start with the supposition that \(f\) is not injective, then it means there exist some \(x_1,x_2 \in X\) for which \(x_1 \neq x_2\) but \(f(x_1) = f(x_2)\). Then \(g(f(x_1)) = g(f(x_2))\), so in fact this means there exist some \(x_1,x_2 \in X\) for which \(x_1 \neq x_2\) but \((g\circ f)(x_1) = (g\circ f)(x_2)\). But that would contradict the definition of \(g\circ f\) being injective, so our supposition that \(f\) was not injective was incorrect, and we have therefore shown that \(f\) is injective.

To show that \(g\) need not be injective, we should give a counterexample. A bit of thought may lead to the observation that \(g\) could have a larger domain than the image of \(f\). An extreme example is to take \(X = Z = \{ 0\}\) and \(Y = \mathbb{R}\), and have \(f\) and \(g\) defined by \(f(0) = 0\) and \(g(y) = 0\) for all \(y \in \mathbb{R}\). Then \((g\circ f)\colon X \rightarrow Z\) is injective (it simply maps \(0\) to \(0\)). But clearly \(g\) is not injective.

\(\square\)

I have written much more than is needed in the proof above because I am spelling out the thought process as well as the logic. Having worked out what to do, we could streamline it:

Proof. For the first part of (i), suppose \((g\circ f)(x_1) = (g\circ f)(x_2)\) for some \(x_1,x_2 \in X\). From the injectivity of \(g\) we know that \(g(f(x_1)) = g(f(x_2))\) implies \(f(x_1) = f(x_2)\), and then from the injectivity of \(f\) we know that this implies \(x_1 = x_2\). So \(g\circ f\) is injective.

For the second part of (i), suppose \(f(x_1) = f(x_2)\) for some \(x_1,x_2 \in X\). Then applying \(g\) gives \(g(f(x_1)) = g(f(x_2))\), and by the injectivity of \(g\circ f\) this means \(x_1=x_2\). So \(f\) is injective.

To see that \(g\) need not be injective, a counterexample is \(X = Z = \{ 0\}\), \(Y = \mathbb{R}\), with \(f(0) = 0\) and \(g(y) = 0\) for all \(y \in \mathbb{R}\).

\(\square\)

Recalling that \(\text{id}_X\) is the identity map on a set \(X\), we are now in a position to define invertibility:

Definition 3.19 (Invertible function, inverse function)

A function \(f\colon X \rightarrow Y\) is invertible if there exists a function \(g\colon Y \rightarrow X\) such that \(g\circ f = \text{id}_X\) and \(f\circ g = \text{id}_Y\). The function \(g\) is the inverse of \(f\), and we write \(g = f^{-1}\).

Note that directly from the definition, if \(f\) is invertible then \(f^{-1}\) is also invertible, and \((f^{-1})^{-1} = f\).

An immediate concern we might have is whether there could be multiple such functions \(g\), in which case the inverse \(f^{-1}\) would not be well-defined. This is resolved by the following result:

Proposition 3.20

If \(f\colon X \rightarrow Y\) is invertible then its inverse is unique.

Proof. Let \(g_1\) and \(g_2\) be two functions for which \(g_i\circ f = \text{id}_X\) and \(f\circ g_i = \text{id}_Y\). Using the fact that composition is associative, and the definition of the identity maps, we can write

\(\displaystyle g_1 = g_1 \circ \text{id}_Y = g_1 \circ (f \circ g_2) = (g_1 \circ f) \circ g_2 = \text{id}_X \circ g_2 = g_2. \)

\(\square\)

As a result, we can describe a rather non-trivial example of a function.

Example 3.21

Let \(X\) and \(Y\) be sets, and let \(F\) be the set of all invertible functions with domain \(X\) and codomain \(Y\). Then there is a function \(T:F\rightarrow F\) such that \(T(f) = f^{-1}\), and this function \(T\) is its own inverse, with \(T\circ T = \text{id}_F\).

The following result shows how to invert the composition of invertible functions.

Proposition 3.22

Let \(f\colon X\rightarrow Y\) and \(g\colon Y \rightarrow Z\) be functions between sets \(X, Y, Z\). If \(f\) and \(g\) are invertible, then \(g\circ f\) is invertible, and \((g\circ f)^{-1} = f^{-1} \circ g^{-1}\).

Proof. Making repeated use of the fact that function composition is associative, and the definition of the inverses \(f^{-1}\) and \(g^{-1}\), we note that

\( \begin{align*} (f^{-1}\circ g^{-1}) \circ (g\circ f) & = \left( (f^{-1}\circ g^{-1}) \circ g \right) \circ f\\ & = \left( f^{-1}\circ (g^{-1} \circ g) \right) \circ f \\ & = \left( f^{-1}\circ \text{id}_Y \right) \circ f \\ & = f^{-1} \circ f \\ & = \text{id}_X, \end{align*} \)

and similarly,

\( \begin{align*} (g\circ f) \circ (f^{-1}\circ g^{-1}) & = g \circ \left( f\circ (f^{-1} \circ g^{-1}) \right) \\ & = g \circ \left( (f\circ f^{-1}) \circ g^{-1}\right) \\ & = g \circ \left( \text{id}_Y \circ g^{-1}\right) \\ & = g\circ g^{-1} \\ & = \text{id}_Z, \end{align*} \)

which shows that \(f^{-1} \circ g^{-1}\) satisfies the properties required to be the inverse of \(g\circ f\).

\(\square\)

The following result provides an important and useful criterion for invertibility:

Theorem 3.23

A function \(f\colon X \rightarrow Y\) is invertible if and only if it is bijective.

Proof. First suppose \(f\) is invertible, so that it has an inverse \(f^{-1}\colon Y \rightarrow X\). To show \(f\) is injective, suppose that for some \(x_1,x_2 \in X\) we have \(f(x_1) = f(x_2)\). Then applying \(f^{-1}\) to both sides and noting that by definition \(f^{-1} \circ f = \text{id}_X\), we see that \(x_1 = f^{-1}(f(x_1)) = f^{-1}(f(x_2)) = x_2\). So \(f\) is injective. To show that \(f\) is surjective, let \(y\in Y\), and note that \(f^{-1}(y) \in X\) has the property that \(f(f^{-1}(y)) = y\). So \(f\) is surjective. Therefore \(f\) is bijective.

Conversely, suppose that \(f\) is bijective. We aim to show that there is a well-defined \(g\colon Y \rightarrow X\) such that \(g\circ f = \text{id}_X\) and \(f\circ g = \text{id}_Y\). Since \(f\) is surjective, we know that for any \(y\in Y\), there is an \(x \in X\) such that \(f(x) = y\). Furthermore, since \(f\) is injective, we know that this \(x\) is unique. So for each \(y\in Y\) there is a unique \(x\in X\) such that \(f(x) = y\). This recipe provides a well-defined function \(g(y) = x\), for which we have \(g(f(x)) = x\) for any \(x\in X\) and \(f(g(y)) = y\) for any \(y \in Y\). So \(g\) satisfies the property required to be an inverse of \(f\) and therefore \(f\) is invertible.

\(\square\)

It is also possible to define left-inverse and right-inverse functions as functions that partially satisfy the definition of the inverse.

Definition 3.24 (Left-invertible, right-invertible)

A function \(f\colon X \rightarrow Y\) is left-invertible if there exists a function \(g\colon Y \rightarrow X\) such that \(g\circ f = \text{id}_X\), and is right-invertible if there exists a function \(h\colon Y \rightarrow X\) such that \(f\circ h = \text{id}_Y\).

As may be somewhat apparent from the previous proof, being left- and right- invertible is equivalent to being injective and surjective, respectively. We leave this as an exercise to show.

Proposition 3.25

(i) A function \(f\colon X \rightarrow Y\) is left-invertible if and only if it is injective.

(ii) A function \(f\colon X \rightarrow Y\) is right-invertible if and only if it is surjective.

The proof is left as an exercise.

Last modified: Wednesday, 1 October 2025, 1:37 PM