Cosine similarity

alex_ber
12 min readJun 22, 2024

--

Let start from Euclidean geometry first.

Note: if you need to refresh your memory read here about vectors in uclidean geometry.

cos α

Modern definitions express the cos α as infinite series. See here for more details. There is also unit circle definitions. But we will use the most simplest version right-angled triangle definitions.

For the angle α, the sine function gives the ratio of the length of the opposite side to the length of the hypotenuse.

To define the cosine of an acute angle α, start with a right triangle that contains an angle of measure α; in the accompanying figure, angle α in triangle ABC is the angle of interest. The three sides of the triangle are named as follows:

  • The opposite side is the side opposite to the angle of interest, in this case side a.
  • The hypotenuse is the side opposite the right angle, in this case side h. The hypotenuse is always the longest side of a right-angled triangle.
  • The adjacent side is the remaining side, in this case side b. It forms a side of (and is adjacent to) both the angle of interest (angle A) and the right angle

Once such a triangle is chosen, the cosine of the angle is equal to the length of the adjacent side, divided by the length of the hypotenuse:

Well-definition: As stated, the value and cos ⁡ α appear to depend on the choice of right triangle containing an angle of measure α. However, this is not the case: all such triangles are similar (by the congruent angles), and so the ratios are the same for each of them.

Theorem of cosine

For any triangle

The square of one side of a triangle is equal to the sum of the squares of the other two sides minus twice the product of these sides and the cosine of the angle between them.

Proof:

An acute triangle with perpendicular

Let’s draw altitude through vertex C — a segment perpendicular to side c.

The distance from the foot of the altitude to vertex A plus the distance from the foot of the altitude to vertex B is equal to the length of side c (see Fig). Each of these distances can be written as one of the other sides multiplied by the cosine of the adjacent angle:

(This is still true if α or β is obtuse, in which case the perpendicular falls outside the triangle).

Multiplying both sides by c yields:

The same steps work just as well when treating either of the other sides as the base of the triangle:

Taking the equation for c² and subtracting the equations for b² and a²:

Q.E.D.

Note:

* This proof is independent of the Pythagorean theorem, insofar as it is based only on the right-triangle definition of cosine and obtains squared side lengths algebraically. Other proofs typically invoke the Pythagorean theorem explicitly, and are more geometric, treating a cos γ as a label for the length of a certain line segment.
* This theorem is generalization of
Pythagorean theorem. You can obtain pythagorean theorem by choosing α=π/2. Than preconditions of the pythagorean theorem holds. Using theorem of cosine we will get

That is exactly what we should prove in pythagorean theorem.
Note: We’ve used the fact that cos π/2=0.

You can recall the following definitions from the high-school:

What is a Vector?

A vector is a quantity that has both magnitude (length) and direction. You can think of it as an arrow pointing from one point to another. For example, if you walk 5 meters north, your movement can be represented as a vector.

What is the Length of a Vector?

The length (or magnitude) of a vector is a measure of how long the vector is. If you have a vector v = (x, y, z) in 3D space, the length is calculated using the Pythagorean theorem: Length(v) = sqrt(x² + y² + z²)=∥v∥.

What is Scalar Product ?

The scalar product, also known as the dot product, is a way to multiply two vectors to get a scalar (a single number).

For vectors u = (u1, u2, u3) and v = (v1, v2, v3), the dot product is: u . v = u1 * v1 + u2 * v2 + u3 * v3.

Note: Generalization of scalar product (dot product) is inner product.

Properties of scalar product:

u⋅(v+w)=u⋅v+u⋅w

v⋅w=w⋅v

v⋅v=∥v∥²

(cv)⋅w=v⋅(cw)=c(v⋅w)

v⋅0=0

If v⋅v=0 then v=0

The proofs of these properties are mostly “computational”. You can find some of the proofs here.

Theorem

Let θ is the angle between a and b such that 0≤θ≤π as shown in the image below.

than

a⋅b=∥a∥ ∥b∥ cos θ

Note: The angle between two vectors is the measure of the smallest rotation from one vector to the other.

Proof:

Let’s give a modified version of the sketch above.

The three vectors above form the triangle AOB and note that the length of each side is nothing more than the length of the vector forming that side.

From the theorem of cosine:

(1) ∥a−b∥²=∥a||²+||b||²-2||a|| ||b|| cos θ holds

Now, using the properties of scalar product (see above):

∥a−b∥² = (a-b)*(a-b)=a*a-a*b-b*a+b*b=||a||²-2a*b+||b||²

Than

(2) ||a||²-2a*b+||b||²=∥a||²+||b||²-2||a|| ||b|| cos θ holds

Than:

(3) -2a*b = ²-2||a|| ||b|| cos θ holds

Than

(4) a*b = ||a|| ||b|| cos θ holds

Q.E.D.

Corollary:

The dot product gives us a very nice method for determining if two vectors are perpendicular and it will give another method for determining when two vectors are parallel.

Corollary :

  1. The angle between two vectors is the measure of the smallest rotation from one vector to the other
  2. If two vectors are orthogonal then we know that the angle between them is π/2. So,

a*b=||a|| ||b|| cos π/2 = ||a|| ||b|| * 0 = 0

3. If two vectors are parallel then the angle between them is either 0 (pointing in the same direction) or π degrees (pointing in the opposite direction).

a*b=||a|| ||b|| cos 0 = ||a|| ||b|| * 1 = ||a|| ||b||

a*b=||a|| ||b|| cos π = ||a|| ||b||* (-1)= — ||a|| ||b||

This can be summarize as following:

Note: The value of cosine θ ranges from -1 to 1.

* 1 indicates that the vectors are identical in direction.

* 0 indicates that the vectors are orthogonal (perpendicular).

* -1 indicates that the vectors are diametrically opposed.

This formula holds in more general cases.

The inner product is a generalization of the scalar product. In the context of Euclidean space (like 2D or 3D space), the inner product is the same as the scalar product. However, in more abstract vector spaces, the inner product can be defined differently.

Definition:

Let V be a vector space over field R. Let u,v,w be vector from V and α be a real number. An inner product is a function
<*, *>: VxV->R that satisfies:

  1. <u+v, w> = <u, w>+<v,w>
  2. <αv, w>=α<v, w>
  3. <v, w>=<w,v>
  4. <v,v≥0 and equal if and only if if v=0

Theorem: Scalar product (aka dot product) is inner product
Let V be a vector space over field R. Let u = (u1, u2, u3), v = (v1, v2, v3) be vector from V and α be a real number.

<*, *>: VxV->R defined as

<u,v> = u1 * v1 + u2 * v2 + u3 * v3.

satisfies axioms of inner product.

Proof:

Let u = (u1, u2, u3), v = (v1, v2, v3), w = (w1, w2, w3).
<u,v> = u1 * v1 + u2 * v2 + u3 * v3.
We need to prove that:

  1. <u+v, w> = <u, w>+<v,w>

<u+v, w>=<(u1+v1, u2+v2 , u3+v3), (w1, w2, w3)>=(u1+v1)w1+(u2+v2)w2+(u3+v3)w3=u1w1+v1w1+u2w2+v2w2+u3w3+v3w3=(u1w1+u2w2+u3w3)+(v1w1+v2w2+v3w3)=<u, w>+v, w>. Q.E.D.

2.<αv, w>=α<v, w>

<αv, w>=<α(v1, v2, v3), (w1, w2, w3)>=<(αv1, αv2, αv3), (w1, w2, w3)>=αv1w1+αv2w2+αv3w3=α(v1w1, v2w2, v3w3)=α<v, w>. Q.E.D.

3.<v, w>=<w,v>
<v, w>=< (v1, v2, v3), (w1, w2, w3)> = v1w1+v2w2+v3w3=w1v1+w2v2+w3v3 = <(w1, w2, w3), (v1, v2, v3)>=<w, v>.

4.<v,v≥0 and equal if and only if if v=0

(<=) if v=0, lets calculate <v, v>. We should proof that it is 0.

<v, v>=<(0, 0, 0), (0, 0, 0)>=0*0+0*0+0*0=0.

(=>) if <v,v>=0, we should proof that v=0

0=<v, v>=<(v1,v2,v3), v1,v2,v3)>=v1²+v2²+v3²

We should proof that v1=v2=v3=0.

Without loss of generality, let assume that v1 <> 0. Than v1² > 0 and than v1²+v2²+v3²>0. Contradiction. Q.E.D.

Definition. A vector space together with an inner product on it is called an inner product space.

Examples of inner product spaces include:

  1. The real number R where the inner product is given by:

<x,y>=x y

2. The Euclidean space R^n where the inner product is given by the scalar product (aka dot product):

For vectors u = (u1, u2, u3) and v = (v1, v2, v3), the inner product is:

<u, v> = u1 * v1 + u2 * v2 + u3 * v3.

Exercise: Check that examples 1 and 2 satisfies axioms of inner product.

Definition:

Given a vector space X over field R, a norm on X is a real-valued function p : X → R with the following properties, where |s| denotes the usual absolute value of a scalar s:

  • Positiveness: for every x in X , p(x)=0 if and only if x = 0.
  • Triangle inequality: p (x + y) ≤ p (x) + p (y)
  • Absolute homogeneity: p( s x) = |s| p (x)

Exercise: Check that absolute value on R is norm.

Definition:

The Euclidean length of a vector x = ( x1 , x2 , … , xn ) in the R^n is given by the Euclidean norm:

The Euclidean distance between two points x and y is the length ||x-y||_2 of the straight line between the two points. In many situations, the Euclidean distance is appropriate for capturing the actual distances in a given space. In contrast, consider taxi drivers in a grid street plan who should measure distance not in terms of the length of the straight line to their destination, but in terms of the rectilinear distance, which takes into account that streets are either orthogonal or parallel to each other. The class of Lp-norms generalizes these two examples and has an abundance of applications in many parts of mathematics, physics, and computer science.

Exercise: Check that Euclidean norm satiates all axiom of the norm.

Definition:

For a real number p ≥ 1 , Lp — norm or of x is defined by:

Note:

* The absolute value bars can be dropped when p is a rational number with an even numerator in its reduced form, and is drawn from the set of real numbers, or one of its subsets.

* The Euclidean norm from above falls into this class and is the L2 -norm, and the L1-norm is the norm that corresponds to the rectilinear distance (aka Manhattan distance).

The L∞-norm or maximum norm is the limit of the Lp-norms for p → ∞. It turns out that this limit is equivalent to the following definition:

Definition:

A metric space is a set X together with a function d:X×X→R called a metric or distance function. The metric d must satisfy the following properties for all x,y,z in X:

  • Non-negativity: d(x,y)≥0 and d(x,y)=0 if and only if x=y.
  • Symmetry: d(x,y)=d(y,x).
  • Triangle Inequality: d(x,z)≤d(x,y)+d(y,z).

Note:
1. Every normed vector space is also a metric space (where the metric is defined by d(x,y)=∥x−y∥)

2. Not every metric space is a normed vector space.
Metric spaces are more general and do not require the underlying set to have a vector space structure or a norm.

Exercises: Prove that d(x,y)=∥x−y∥ from note 1 satisfies all axioms metric space.

An inner product induces both a norm and a metric

Definition: Induced Norm by inner product

Given a vector space V with an inner product <*, *>, the norm || * || is defined by:

||x|| = sqrt(<x, x>)

Exercise:

  1. Prove that formula above is well defined. If we have vector space V over field R, sqrt(<x, x>) is in R (and not, say, in C)
  2. Prove that this formula satisfies all axioms of norm.

Definition: Induced metric by inner product

Using the norm induced by the inner product (above), a metric d can be defined on V by:

d(x, y) = ||x-y||=sqrt( <x-y , x-y> )

Theorem: Pythagorean theorem in inner-product space

Let u and v be any orthogonal vectors in an inner product space R with inner product <u, v>. Let ||u|| be a norm that induced by inner product, that is ||u||=sqrt( <u, u> ). Than

||u+v||²=||u||²+||v||²

Note: This is called Pythagorean theorem, because the vectors u and v are akin to the sides of a right triangle with hypotenuse given by the vector sum v + w.

Proof:

Vector u and v are orthogonal than by definition <u, v>=<v, u>=0.

||u+v||²= <u+v, u+v>=<u, u>+<u, v>+<v,u>+<v,v>=<u, u>+0+0<v,v>=||u||²+||v||².

Q.E.D

Theorem Cauchy–Bunyakovsky–Schwarz inequality:

Let u and v be arbitrary vectors in an inner product space R with inner product <u, v>. Let ||u|| be a norm that induced by inner product, that is ||u||=sqrt( <u, u> ). Than

| <u, v> | <= || u || || v ||

with equality holding in the Cauchy–Schwarz inequality if and only if u and v are linearly dependent.

Moreover, if | <u, v> | = || u || || v || and v!=0 than

Proof:

  1. Let’s assume that u and v are linearly dependent. We need to proof that

| <u, v> | = || u || || v ||

By definition u and v are linearly dependent means that exists scalar c such as u=cv or v=cu. Without lose of generality, let u=cv. Than

| <u, v> |= | <cv, v> |= |c|*|<v, v>|=|c|*||v|| ||v||=||cv|| ||v||= || u || || v ||.

2. Let’s assume that u or v is 0. Without loss of generality, u=0. Than u=cv where c=0. So, by definition u and v are linearly dependent. So, by 1)

| <u, v> | = || u || || v ||

3. Left to be proved:

(*) | <u, v> | <= || u || || v || where u != 0 and v != 0

(**) if | <u, v> | = || u || || v || than u and v are linearly dependent.

Case where v=0 was proven, so we can assume that v!=0. Than <v,v> !=0

Let

It follows from the linearity of the inner product in its first argument that:

Therefore, z is a vector orthogonal to the vector v (indeed, z was chosen to be projection of u onto the plane orthogonal to v). We can thus apply the Pythagorean theorem in inner-product space to:

which gives

The Cauchy–Schwarz inequality follows by multiplying by ||v||² and then taking the square root. Moreover, if the relation ≥ in the above expression is actually an equality, then ||z||²=0 and hence z=0; the definition of z establishes a relation of linear dependence between u and v.

Q.E.D.

Please, recall that we prove that in R^n with Euclidian norm following equation holds:

The Cauchy–Schwarz inequality allows one to extend the notion of “angle between two vectors” to any real inner-product space by defining:

The Cauchy–Schwarz inequality proves that this definition is sensible, by showing that the right-hand side lies in the interval [−1, 1].

Formulas for cos θ are widely used in machine learning for calculating cosine similarity.

--

--