Dot Product to detect patterns

Introduction

Recently, I asked myself about a fast and reliable way to detect patterns on a data set. I knew about linear regressions, which works great to detect linear patterns, but it was not the correct fit for the necessity of searching any given pattern (irregular or not) inside a given data set.

Dot Product

The Dot Product is really interesting when it comes to detect patterns. It is described on the Wikipedia as the operation that takes a two equal length sequence of numbers and returns a single value.

I much prefer to understand the Dot Product as the operation that will tell us how much equal one sequence of numbers is to another one, if both are normalized.

This is because if two sequences of numbers are equal (and normalized), the Dot Product will return a value of 1.0, if they are the opposite, it will return -1.0, and if they are perpendicular, it will return 0.0

Normalization

Every sequence of numbers can be normalized, this means that we can change the values of that sequence to make its length equal to 1. And the normalization works for any dimension. Let's see an example:

A = [1, 2, 3]

We have A, before we can normalize it, we need the current length of A

lenA = sqr( (1 * 1) + (2 * 2) + (3 * 3) )

In this case, lenA will be 3.741, and then we only need to divide all the numbers in the sequence by this length to get a normalized sequence:

A = [ 1/3.741, 2/3.741, 3/3.741] = [0.267..., 0.534..., 0.801...]

After performing those divisions, we have a normalized sequence of numbers, which is crucial when we are working with the Dot Product if we want results values between the range of [0.0 -- 1.0]

Step by Step

Let's imagine that we have two vectors, A and B, and these two vectors live in a 3D world.

A = [1, 0, 0]
B = [1, 1, 0]

First, we need both vectors normalized (length = 1.0) to end up with a good result.

[  1,   0, 0] = normalize(A)
[0.7, 0.7, 0] = normalize(B)

Now we can apply the Dot Product:

dot = A • B = (1 * 0.7) + (0 * 0.7) + (0 * 0.7) = 0.7

But, what does it mean the value of 0.7 ? Well, if you project B into A, the result is 0.7. I like to think about it of how alike is vector B versus A, which is 70%.

As we can see in the image above, the projection of the vector is 0.7, which means that the vector B is likely 70% equal to A.

More simple examples

Let's make some more simple tests, if we have these two vectors:

A = [1, 0, 0]
B = [1, 0, 0]

The Dot Product will return 1.0, in other words, B is going to be 100% equal to A.

Another example:

A = [1, 0, 0]
B = [0, 1, 0]

The Dot Product will return 0.0 because B is 0% equal to A, in fact, it is perpendicular to A, so these vectors have no direction in common.

If we have these two vectors:

A = [ 1, 0, 0]
B = [-1, 0, 0]

The Dot Product will return -1.0, this means that not only the vectors are not equal, but they are completely the opposite to each other.

Pattern recognition

This process is very powerful because if you have two equal-length sequence of numbers, you can normalize them and compute the Dot Product, which will tell you how much similar is one sequence to the other.

Let's imagine that we want to find patterns in the stock market, more precisely we want to find the similar patterns of Company A stock price and Company B stock price:

This is a trivial example, but if we do the math we will have:

A = [ 2, 1,  2]
B = [10, 5, 10]

A = [0.6666..., 0.3333..., 0.6666...] = normalize(A)
B = [0.6666..., 0.3333..., 0.6666...] = normalize(B)

dot = A • B = 1.0

Well, this example is trivial because both vectors after normalization are the same, but what if B is slightly different ?

A = [ 2, 1, 2]
B = [10, 5, 7]

A = [0.6666..., 0.3333..., 0.6666...] = normalize(A)
B = [0.758...., 0.379...., 0.530....] = normalize(B)

dot = A • B = ~0.984

The value ~0.984 is still very close to 1.0 which is good because it tells us that the similarity between A and B is about ~98%

Too similar values

What happens if we replace the last value of B of the previous example with a 0.0 ?

A = [0.6666..., 0.3333..., 0.6666...] = normalize(A)
B = [0.758...., 0.379...., 0.0......] = normalize(B)

dot = A • B = ~0.63

Well, the Dot Product is telling us that the vector B is ~63% similar to vector A, but if we look into the image below, it does not look like a good pattern recognition match.

In those cases, what we can do is scale the result and make a decision on which will be our rejection threshold. For example:

dot = A • B
dot = (dot - 0.6)
dot = clamp(dot, 0.0, 1.0)
dot = dot * (1.0 / 0.4)

With this code basically we are scaling and biasing the result, so the previous range of [0.6 -- 1.0] will become the range [0.0 -- 1.0]

Scaled patterns

What if we want to detect patterns that are a scaled version of the original values ?

Something like this:

In this case, something that we can do is to pre-scale and bias the values before making the Dot Product.

maxBeforeB = max(B) = 10
B = B - 3 // [7, 5, 7]
maxAfterB = max(B) = 7
B = B * (maxBeforeB / maxAfterB) // [10, 7.14, 10]

If we bias the vector by 3, and scale by the ratio between the previous maximum and the current maximum, we will end up doing the Dot Product between the vectors [2, 1, 2] and [10, 7.14, 10]. However, if instead of 3 we bias by 6, then we will end up doing the Dot Product with the vectors [2, 1, 2] and [6.66.., 3.33.., 6.66], which after normalization, the Dot Product will return a value of 1.

In this scenario, we can see that we will need to make iterations to find the bias value that will give us the perfect match (if there is one). I am not sure if this method is valuable, but certainly is going to tell us some information about the data we are comparing.

Patterns everywhere

So, after all we have discussed here, we should only need to transform data into vectors to start comparing them and finding patterns.

For example, let's imagine that we are a Social Media company, and we allow our customers to post pictures with tags.

And also, let's imagine that we track which tags are associated with the pictures our customers like.

After a while, we will have a "history" of visited tags with a normalized value per user associated with each of them. This normalized value will represent the ratio between tagged-liked-pictures shown to the user divided by tagged-pictures.

Let's imagine a profile like this:

name: Jane
tag: #cats        ratio: 1.0
tag: #dog         ratio: 0.7
tag: #flowers     ratio: 0.8
tag: #books       ratio: 0.7

name: Mike
tag: #cats        ratio: 0.95
tag: #dog         ratio: 0.1
tag: #flowers     ratio: 0.05
tag: #snake       ratio: 1.0

name: Susan
tag: #cats        ratio: 1.0
tag: #dog         ratio: 0.8
tag: #books       ratio: 0.75

Given the users above and knowing that we want to show Susan some new pictures she might like, which kind of pictures should we show to her ?

If we try to find patterns, most probably we will find that applying Dot Products with different sequence numbers lengths will return us a result near 100% between Jane and Susan with the sequence [#cat, #dog, #books]. So, we will have a better chance of success if we show to Susan a picture of #flowers rather than a picture of #snakes.

Of course this example is very simple, but I believe that Dot Products can be very powerful when it comes to find patterns, similarities, etc. In addition, the Dot Product operations are very fast, which makes things even better :)

Posted on
Written by Llorenç Marti
Tagged under math