# Find euclidean / cosine distance between a tensor and all tensors stored in a column of dataframe efficently

## Issue

I have a tensor ‘input_sentence_embed’ with shape torch.Size([1, 768])

There is a dataframe ‘matched_df’ which looks like

``````   INCIDENT_NUMBER           enc_rep
0  INC000030884498      [[tensor(-0.2556), tensor(0.0188), tensor(0.02...
1  INC000029956111      [[tensor(-0.3115), tensor(0.2535), tensor(0.20..
2  INC000029555353      [[tensor(-0.3082), tensor(0.2814), tensor(0.24...
3  INC000029555338      [[tensor(-0.2759), tensor(0.2604), tensor(0.21...
``````

Shape of each tensor element in dataframe looks like

`````` matched_df['enc_rep'].iloc[0].size()
torch.Size([1, 768])
``````

I want to find euclidean / cosine similarity between ‘input_sentence_embed’ and each row of ‘matched_df’ efficently.

If they were scalar values, I could have easily broadcasted ‘input_sentence_embed’ as a new column in ‘matched_df’ and then find cosine similarity between two columns.

I am struggling with two problems

1. How to broadcast ‘input_sentence_embed’ as a new column to the
‘matched_df’
2. How to find cosine similarity between tensors stored
in two column

May be someone can also suggest me other easier methods to achieve the end goal of finding similarity between a tensor value and all tensors stored in a column of dataframe efficently.

## Solution

Input data:

``````import pandas as pd
import numpy as np
from torch import tensor

match_df = pd.DataFrame({'INCIDENT_NUMBER': ['INC000030884498',
'INC000029956111',
'INC000029555353',
'INC000029555338'],
'enc_rep': [[[tensor(0.2971), tensor(0.4831), tensor(0.8239), tensor(0.2048)]],
[[tensor(0.3481), tensor(0.8104) , tensor(0.2879), tensor(0.9747)]],
[[tensor(0.2210), tensor(0.3478), tensor(0.2619), tensor(0.2429)]],
[[tensor(0.2951), tensor(0.6698), tensor(0.9654), tensor(0.5733)]]]})

input_sentence_embed = [[tensor(0.0590), tensor(0.3919), tensor(0.7821) , tensor(0.1967)]]
``````
1. How to broadcast ‘input_sentence_embed’ as a new column to the ‘matched_df’
``````match_df["input_sentence_embed"] = [input_sentence_embed] * len(match_df)
``````
1. How to find cosine similarity between tensors stored in two column
``````a = np.vstack(match_df["enc_rep"])
b = np.hstack(input_sentence_embed)
match_df["cosine_similarity"] = a.dot(b) / (np.linalg.norm(a) * np.linalg.norm(b))
``````

Output result:

``````   INCIDENT_NUMBER                                            enc_rep                               input_sentence_embed  cosine_similarity
0  INC000030884498  [[tensor(0.2971), tensor(0.4831), tensor(0.823...  [[tensor(0.0590), tensor(0.3919), tensor(0.782...           0.446067
1  INC000029956111  [[tensor(0.3481), tensor(0.8104), tensor(0.287...  [[tensor(0.0590), tensor(0.3919), tensor(0.782...           0.377775
2  INC000029555353  [[tensor(0.2210), tensor(0.3478), tensor(0.261...  [[tensor(0.0590), tensor(0.3919), tensor(0.782...           0.201116
3  INC000029555338  [[tensor(0.2951), tensor(0.6698), tensor(0.965...  [[tensor(0.0590), tensor(0.3919), tensor(0.782...           0.574257
``````