Dedekind at the bookshop

Recently, I studied again the so called Dedekind cuts. They are a refined idea to explain how to construct real numbers out of rational numbers.

The whole problem comes from the fact that the set of rational numbers \mathbb{Q} (made of the numbers that can be defined by a ratio of two integers, such as m/n) has holes. For example the number \sqrt{2} doesn’t fit in \mathbb{Q}.

If you don’t know why this is true, I think that you should have a look; it’s very interesting indeed.

But let’s go back to the subject of this post: Dedekind’s really smart idea.

He thought that the best way to define real number is by… not defining them! Let me explain.

For Dedekind, a cut is the division of the set of rational numbers into two disjoint, non empty and ordered sets. These two sets are just around the new number you want to define: in other words your number is the hole in between the two. Then, he defines the set of real numbers \mathbb{R} as the set of all cuts.

Isn’t that smart? Yes, of course.

Unfortunately, when I read about it for the first time I felt that this idea didn’t really help my intuition:

Definition. A cut in \mathbb{Q} is a pair of subsets A, B of \mathbb{Q} such that:

  • (a) A \cup B = \mathbb{Q}, A \neq \emptyset, B \neq \emptyset, A \cap B = \emptyset.
  • (b) If a \in A and b \in B then a < b.
  • (c) A contains no largest element.

Definition. A real number is a cut in \mathbb{Q}.

I closed the book and went for a coffee. After few hours thinking about it, something clicked. I had my EUREKA moment and a metaphor came to my mind.

I imagined Dedekind going to a bookshop in order to buy some books. With a detailed list of the books he wants to buy, he enters the bookshop and starts searching for them.

He heads to the right shelf and start browsing the books (that are alphabetically ordered). When he arrives at the position in the shelf where the book he is looking for should be, he finds a.. hole. Ouch, the book is not there.

Then he continues on his list and starts searching for the second book. And… this second book is also missing! Another empty slot on the shelf.

As you may know, Dedekind is a well motivated person and so he continues on his list for the whole afternoon. Imagine his disappointment when he discovers that NONE of the books he wanted to buy are available in the bookshop.

Dedekind is sad but after a few seconds (and not after a few hours like me thinking about his cuts..) he realises something: for him, it is perfectly fine to collect just the empty slots of the books he wanted to buy. After all, they are uniquely determined by the alphabetically ordered books in the shelf around each empty slot!

So he goes towards the exit and tells the cashier: “Sir, I’d like to buy all the books that are around the empty slots of the books I didn’t find. How much is it? Can you deliver them at my place by preserving the current ordering?”

The guy at the desk doesn’t really understand his questions and just answers: “Five bucks, please”.

For what it’s worth, I know now that the set of real numbers is like a large (infinite) collection of.. missing books (and only costs five bucks)!

Geometric signal theory

This post will provide a brief introduction to the geometrical representation
of signals. After some background material in linear algebra, the concepts of analysis and synthesis will be explained and the idea of convolution will be defined.

NOTE: a print-friendly (and more complete) version of this post can be found here.

Vector spaces

Discrete signals x of length n can be thought as a multidimensional vectors
in \mathbb{R}^n or \mathbb{C}^n. As such, they can be added with other vectors and multiplied by scalars as follows:

(1)   \begin{equation*} a_1v_1 + a_2v_2\ldots +a_nv_n \end{equation*}

where a_n are scalars and v_n are vectors.

Equation 1 is called a linear combination and a set of
vectors that can be combined in such way forms a vector space.

Several metrics can be computed on vector spaces:

  • mean: \mathcal{M} = \frac{1}{N}\sum_{n=0}^{N-1}{x_n};
  • energy: \mathcal{E} = \sum_{n=0}^{N-1}|x_n|^2;
  • power: \mathcal{P} = \frac{\mathcal{E}}{N} = \frac{1}{N}\sum_{n=0}^{N-1}|x_n|^2;
  • L_2-norm: \mathcal{N} = \sqrt{\mathcal{E}} = \sqrt{\sum_{n=0}^{N-1}|x_n|^2}.

The latter metric, the norm, is often indicated as \|x\|_2 and
represents the length of the vector in a space. When a vector space has a defined norm, it is called a Banach space. The L_2-norm is said
to be contractive, that is:

(2)   \begin{equation*}     |x + y |_2 \leq |x|_2 + |y|_2 \end{equation*}

and it can be generalized to any order:
L_p = |x|_p =\left({\sum_{n=0}^{N-1}|x_n|^p}\right)^\frac{1}{p}.

Inner product spaces

A very important operation, called inner product, can defined on a Banach space as follows:

(3)   \begin{equation*}      \langle x, y \rangle = \sum_{n=0}^{N-1}{x_n\overline{y_n}} \end{equation*}

where \overline{x} is the conjugate of x. A Banach space with a defined inner product is called Hilbert space. The specific form of inner product shown in equation 3 is induced from the norm by taking the inner product of a vector with itself:

(4)   \begin{equation*}     \langle x, x \rangle = \sum_{n=0}^{N-1}{x_n\overline{x_n}}=\sum_{n=0}^{N-1}|x_n|^2=\|x\|_2^2. \end{equation*}

The inner product has several important properties, among which:

Cauchy-Schwarz inequality
|\langle x, y \rangle| \leq |x| \cdot |y|.

Two vectors are said to be orthogonal (indicated as x \perp y) if their inner product is zero:

(5)   \begin{equation*}     x \perp y \equiv \langle x, y \rangle = 0. \end{equation*}

One of the most important applications of inner product is to project one vector over another. The projection of x on y is defined as:

(6)   \begin{equation*}     \mathfrak{P}_x(y) = \frac{\langle x, y \rangle}{|x|^2} \cdot x \end{equation*}

where the ratio between the inner product and the squared norm of x is called coefficient of projection.

Under specific conditions, a vector can be reconstructed from some projections by summing them.

It is important to remark that this reconstruction only works if the vectors on which the projections are done are pairwise orthogonal; in case the used vectors for projections are linearly independent but not orthogonal, they can be orthogonalized by a method called Gram-Schmidt orthogonalization.

The subspace covered by all linear combinations of a set of vectors {s_0, \ldots, s_n} is called span. If the set of vectors are linearly independent than the span is called basis of the vector space. It is easy to show that in a space in \mathbb{R}^d there are d vectors in the basis for that space. Clearly, a vector can be reconstructed with a linear combination from its projections on another set of vectors if and only if the set used is a basis.

Analysis and synthesis

Previous sections showed that a vector in vector space can be written as a linear combination of a basis for that space, by multiplying it by some constants and summing the products.

It is therefore possible to define an analysis as the estimation of the constants and a synthesis as the linear combination equation that recover the signal.

The analysis is the representation \phi_x of a signal given by the inner product of it by a basis in a vector space; it is therefore given by the projection:

(7)   \begin{equation*} \phi_x = \sum_t{x(t)*\overline{b_k}} = \langle x, b_k \rangle \end{equation*}

where b_k is a given basis and t is time.

The synthesis is the reconstruction of the original signal x by the summation of the products with the representation \phi_x created by the analysis:

(8)   \begin{equation*} x(t) = \sum_k{\phi_x b_k(t)} = \sum_k{\langle x, b_k \rangle b_k(t)}. \end{equation*}

The Fourier representation is interpretable in the following context as a specific case of analysis and synthesis, where the basis is given by a set of complex sinusouids: b_k = e^i2\pi k (where i is the imaginary unit).

The discrete Fourier analysis (DFT) will be therefore:

(9)   \begin{equation*}     \boxed{\hat{x}(k) = \sum_t x(t) e^\frac{-i2\pi k t}{T} } \end{equation*}

and, in the same way, the reconstruction (or inverse Fourier transform, IDFT) is given by:

(10)   \begin{equation*}     \boxed{{x}(t) = \frac{1}{T}\sum_k \hat{x}(k) e^\frac{i2\pi k t}{T} }.  \end{equation*}

The basis made of complex sinusoids is only a possible basis among infinite. This basis focus on representig correctly frequencies and is therefore well localized in frequency but is not localized in time. On the other hand, it is possible to create a basis made of Dirac’s pulses that will provide perfect localization in time but no localization in frequency.

A compromise between sinusoids and impulses is given by bases made of oscillating signals with a temporal limitation, such as wavelets. A wavelet is a bandpass filter centered on a specific frequency with a specific bandwidth that has therefore a localization both in time and frequency.
The Gabor wavelet represent the best compromise, in term of Heisenberg uncertainty principle, between time and frequency. More information on this vast subject can be found elsewhere online.


Convolution is a mathematical operation defined in vector spaces that has important applications in signals theory and it can be defined in term of inner product. In general it is possible to say that the inner product between two vectors is:

  • the projection of a vector onto the other as discussed in previous sections (or, in other
  • words, the product of magnitudes scaled by the angle between the vectors);
  • the sum of the elements of a vector, weighted by the elements of the other;
  • the calculation of the similarity (covariance) of the two vectors.

The convolution then, can be defined as an inner product that is repeated over time:

(11)   \begin{equation*}     (x * h)_t = \sum_m {x(t-m)*h(m)}  \end{equation*}

where h is called kernel and is of length m. The convolution is therefore:

  • the time series given by a signal weighted by another that slides along;
  • the cross-variance between two signals (similarity in time);
  • the time series given the mapping between to signals;
  • a filtering process.

There is an important relation between the DFT and convolution: the convolution in time. domain between to signals is equal to the product of the DFT of them. Formally:

(12)   \begin{equation*} x * h \equiv \hat{x} \cdot \hat{h} \end{equation*}

where \hat{x}, \hat{h} are DFTs of respective signals.

Multidimensional Haar transforms

This post summarizes how to compute the multidimensional Haar transform both in normal and separable way.

NOTE: A print-friendly (and more complete) version of this post can be found here.

Haar functions

The Haar functions are the simplest possible wavelets. The 1D mother wavelet function \psi(t) can be described as:

    \begin{equation*}     \psi_1(t) =      \begin{cases}         1, & \text{for } 0 \leq t \ < \frac{1}{2} \\         -1, & \text{for } \frac{1}{2} \leq t < 1 \\         0, & \text{otherwise.}     \end{cases} \end{equation*}

The corresponding scaling function \phi(t) is described as:

    \begin{equation*}     \phi_1(t) =     \begin{cases}         1, & \text{for } 0 \leq t < 1\\         0, & \text{otherwise}.     \end{cases} \end{equation*}

2D Haar functions can be constructed using tensor products as follows:

    \begin{equation*}     2D =      \begin{cases}         {\psi}_2^1 = \psi_1 \otimes \phi_1 \\         {\psi}_2^2 = \phi_1 \otimes \psi_1 \\         {\psi}_2^3 = \psi_1 \otimes \psi_1 \\         {\phi}_2 = \phi_1 \otimes \phi_1.\\      \end{cases} \end{equation*}

In the same way, 3D filters are:

    \begin{equation*}     3D =  \begin{cases}     {\psi}_3^1 = \psi_2^1 \otimes \psi_1 \\ {\psi}_3^2 = \phi_2^2 \otimes \psi_1 \\ {\psi}_3^3 = \psi_2^3 \otimes \psi_1 \\ {\psi}_3^4 = \psi_2^4 \otimes \psi_1 \\ {\psi}_3^5 = \psi_2^1 \otimes \phi_1 \\ {\psi}_3^6 = \phi_2^2 \otimes \phi_1 \\ {\psi}_3^7 = \psi_2^3 \otimes \phi_1 \\ {\phi}_3 = \phi_2 \otimes \phi_1. \end{cases} \end{equation*}

The number of functions needed depends of the number of dimensions as 2^d, where d are the dimensions. So there are two filters for 1D (2^1), four filters for 2D (2^2), eight filters for 3D (2^3) and so on.

For the 4D case the filters are:

    \begin{equation*} 4D = \begin{cases} {\psi}_4^1 = \psi_3^1 \otimes \psi_1 \\ \ldots \\ {\psi}_4^{15} = \psi_3^7 \otimes \phi_1 \\ {\phi}_4 = \phi_3 \otimes \phi_1. \end{cases} \end{equation*}

This can be generalised to any number of dimensions.

Haar transform

The Haar transform is the simplest of the wavelet transforms. This transform is computed by convolving the signal to be transformed with all necessary Haar functions and by subsampling accordingly afterwords.

The single-level 1D transform \hat{x} is described by the following two 1D-convolutions:

    \begin{equation*} \hat{x}(k_1) =  \begin{cases}     \displaystyle\sum_{u_1 = -\infty}^{\infty}{\psi_1(u_1)x(k_1 - u_1)}\\     \displaystyle\sum_{u_1 = -\infty}^{\infty}{\phi_1(u_1)x(k_1 - u_1)}\\ \end{cases} \end{equation*}

and by analogy the 2D traansform is given by the following four 2D-convolutions:

    \begin{equation*} \hat{x}(k_1, k_2) =  \begin{cases} \displaystyle\sum_{u_1 = -\infty}^{\infty}\sum_{u_2 = -\infty}^{\infty}{\psi_2^1(u_1, u_2)x(k_1 - u_1, k_2 - u_2)}\\ \displaystyle\sum_{u_1 = -\infty}^{\infty}\sum_{u_2 = -\infty}^{\infty}{\psi_2^2(u_1, u_2)x(k_1 - u_1, k_2 - u_2)}\\ \displaystyle\sum_{u_1 = -\infty}^{\infty}\sum_{u_2 = -\infty}^{\infty}{\psi_2^3(u_1, u_2)x(k_1 - u_1, k_2 - u_2)}\\ \displaystyle\sum_{u_1 = -\infty}^{\infty}\sum_{u_2 = -\infty}^{\infty}{\phi_2(u_1, u_2)x(k_1 - u_1, k_2 - u_2)}. \end{cases} \end{equation*}

In previous equations the interval of u variables is infinite; in practice however, convolutions are usually computed for finite intervals.
Generalizing to d dimensions, then we will have \hat{x} computed by 2^d d-dimensional convolutions:

    \begin{equation*} \hat{x}(k_1, \ldots, k_d) =  \begin{cases} \displaystyle\sum_{u_1 = -\infty}^{\infty}\ldots\sum_{u_d= -\infty}^{\infty}{\psi_d^1(u_1, \ldots, u_d)x(k_1 - u_1, \ldots, k_d - u_d)}\\ \ldots \\ \displaystyle\sum_{u_1 = -\infty}^{\infty}\ldots \sum_{u_d = -\infty}^{\infty}{\psi_d^{2^d -1}(u_1, \ldots, u_d)x(k_1 - u_1, \ldots, k_d - u_d)}\\ \displaystyle\sum_{u_1 = -\infty}^{\infty}\ldots \sum_{u_d = -\infty}^{\infty}{\phi_d(u_1,\ldots, u_d)x(k_1 - u_1,\ldots, k_d - u_d)}. \end{cases} \end{equation*}

Separable convolutions

Computing multidimensional convolutions can be complicated or too impractical sometimes. For this reason, in some specific cases, a multidimensional convolution can be computed as one-dimensional convolution along all dimensions. The necessary condition to have convolution computed this way is that at least one of the signals being convolved must be separable.
A signal is said to be {separable} if it can be written as product of one-dimensional signals:

    \begin{equation*}     x(u_1, \ldots, u_d) = f_1(u_1) f_2(u_2)\ldots f_d(u_d) = \prod_{d=1}^{D}f_d(u_d). \end{equation*}

In this case we can compute the so-called row-column convolution, described below.

Given a 2D-signals x and the 2D Haar functions as described above, than a separable convolution for the 2D Haar transform is described as:

    \begin{eqnarray*}     \hat{x}(k_1, k_2) =& \displaystyle\sum_{u_1 = -\infty}^{\infty}\displaystyle\sum_{u_2= -\infty}^{\infty}{\psi_2^p(u_1, u_2)x(k_1 - u_1, k_2 - u_2)}\\          &= \displaystyle\sum_{u_1 = -\infty}^{\infty}\displaystyle\sum_{u_2= -\infty}^{\infty}{\psi_2^{(p,1)}(u_1)\psi_2^{(p,2)}(u_2)x(k_1 - u_1, k_2 - u_2) }\\          &= \displaystyle\sum_{u_1 = -\infty}^{\infty}{\psi_2^{(p,1)}(u_1)  \left[\displaystyle\sum_{u_2 = -\infty}^{\infty}\psi_2^{(p,2)}(u_2)x(k_1 - u_1, k_2 - u_2) \right]} \end{eqnarray*}

for any filter p of the 2D Haar functions.

The muldidimensional Haar transform can be computed as sequences of 1D Haar transform along all dimensions.

A plan for the next months

I’ve made a list of books that I’ve decided to study/re-study in the next few months (not necessarily in this order):

  • Baby Rudin, 1-8 // Pugh, 1-5 (or more chapters if I want to go on multivariable analysis…)
  • Munrkes, Topology (selection)
  • Brezis, Functional analysis (selection)
  • Dammit & Foote or MacLane, Groups (again… will it never stop?)
  • Some introductory books on algebraic topology

I’m curious to see how my preferences will change over the next months….

Old(er) mathematicians

Mathematics is a discipline for young people.

Most of the greatest mathematicians in history started very early their work and, often, they had the best results when still very young.

I believe, anyway, that there is still some space for older mathematicians…

After some research, I ended up with this funny list:

  • Joan Birman got her PhD when she was 41, she did a great job in several areas among which braid theory;
  •  Eugène Ehrhart got his PhD when he was… 60;
  • Albrecht Fröhlich worked on Galois theory (there is even a prize named after him); he got his PhD when he was 36;
  • George Green was an autodidact that did undergrad studies when he was 40; apparently Albert Einstein commented that Green had been 20 years ahead of his time.

But the most striking case is, for me, Preda Mihăilescu: he got his PhD when he was 42 and just a few years later he gave the proof of the 158-year-old Catalan’s conjecture.

A more scientific study examined the correlation between age and scientific creativity but I believe that, after all, it is not only a matter of age.

Check this very touching blog post by Tanya Khovanova to understand what I mean…

Lebesgue integrals

Yesterday, I reviewed some principles behind the integration theory by Lebesgue.

In a few words: instead of slicing the domain of a function in small equal parts, Lebesgue slices the range of the function. This produces parts that are not of the same size (like the dx in Riemann) and makes possible to integrate functions that are not Riemann-integrable (for example for discontinuity).

Using this approach, the integral of a function should be the sum the elementary area contained in the thin horizontal strip between y = t and y = t − dt. This elementary area is just:

\mu (\{x | f(x) > t\})dt.

If we let f^*(t) = \mu (\{x | f(x) > t\}) then we can define the Lebesgue integral of a function f as:

\int f d\mu = \int_0^\infty f^* (t) dt, where the integral on the right is an ordinary improper Riemann integral.

In order to do this, Lebesgue needs a measure to compute how big is a set. This measure is the same as area in 2D and as volume in 3D but generalises in multiple dimensions.

In case a function can be integrated both with Riemann and Lebesgue integral, then the value of the two integrals is the same.

Charles Pugh, in his book Real mathematical analysis, tells a funny story about counting money on a table in the Riemann and in the Lebesgue way. The first would collect all pieces and would sum them up. The second, instead, would organise first the money in groups of similar value. It’s a useful metaphor.

I think that the following image from Wikipedia is indeed meaningful…

What I studied in my (math) life

Not all the mathematics that I studied until today made a sign in me. Many books and articles come and go and just a few things remain.

I’d like to list here some of the things that somehow had an impact in my personal and professional life:

  • some books about recreational mathematics that I read when I was at the elementary school (mostly from Martin Gardner): at the time I was following the suggestions by my uncle Marcello (that was obsessed by prime numbers);
  • some classes in high school with relative manuals and exercises: I was bad in analytic geometry but I think I understood well calculus (that we called analysis at the time);
  • logic at university [Negri, Elementi di logica (several chapters); Odifreddi, Classical recursion theory (few sections), Chung-Keisler, Model theory (few sections), …]: I applied a sort of pattern-recognition to formulas without really understanding (the problem was that we never asked to do exercises) but this material had a deep impact on me;
  • abstract algebra at university: I don’t remember which books I used and I complemented with some tutorials online on group theory; unfortunately I didn’t perform so well at the final exams but group theory is one of the best things I’ve never seen;
  • Sergei Treil, Linear Algebra done wrong: for the very first time I tried, with this book, to really understand proofs, without skimming through them; I think I retained several parts of the book and linear algebra is the branch of mathematics that I know better today;
  • MIT online courses on calculus and linear algebra: I understood pretty well while following them and I believe that every professor should watch Gilbert Strang to learn how to teach;
  • [1999-now] self study in signal processing [some tutorials online, parts of Lyon’s book on DSP, parts of Ripples in mathematics (wavelets), some chapters of Smith’s Mathematics of the DFT, sections of Mallat’s book, …]: I believe I understood some linear algebra, something about filters, something about Fourier, something about wavelets and convolution;
  • selected parts in some analysis textbooks.

There are probably several other books or articles that I read about mathematics (especially during my PhD in… mathematical logic or my post-doc at École normale supérieure in Paris) but probably they didn’t mean too much for me.

I believe, anyway, that the part that one needs more when doing mathematics is exactly doing it. Lack of practice is always an issue.

Hello world!

After many years of hesitation, I finally decided to make a coming out:

serious mathematics is very difficult!

While mathematics has been very important both in my personal and professional life, I feel that I didn’t understand it enough.

Nel mezzo del cammin di nostra vita, then, I decided to start again from scratch (or almost).

I’m an assistant professor at UCB now and the time to review fundamental topics in mathematics is not so much anymore, given my professional duties (research, teaching, service to the community).

Anyway, I’m not in a rush and I think that it makes sense for me to do it. This blog is the journal of this journey.

Berkeley, March 2019.