## Issue

I have a tensor ‘input_sentence_embed’ with shape torch.Size([1, 768])

There is a dataframe ‘matched_df’ which looks like

```
INCIDENT_NUMBER enc_rep
0 INC000030884498 [[tensor(-0.2556), tensor(0.0188), tensor(0.02...
1 INC000029956111 [[tensor(-0.3115), tensor(0.2535), tensor(0.20..
2 INC000029555353 [[tensor(-0.3082), tensor(0.2814), tensor(0.24...
3 INC000029555338 [[tensor(-0.2759), tensor(0.2604), tensor(0.21...
```

Shape of each tensor element in dataframe looks like

```
matched_df['enc_rep'].iloc[0].size()
torch.Size([1, 768])
```

I want to find euclidean / cosine similarity between ‘input_sentence_embed’ and each row of ‘matched_df’ efficently.

If they were scalar values, I could have easily broadcasted ‘input_sentence_embed’ as a new column in ‘matched_df’ and then find cosine similarity between two columns.

I am struggling with two problems

- How to broadcast ‘input_sentence_embed’ as a new column to the

‘matched_df’ - How to find cosine similarity between tensors stored

in two column

May be someone can also suggest me other easier methods to achieve the end goal of finding similarity between a tensor value and all tensors stored in a column of dataframe efficently.

## Solution

Input data:

```
import pandas as pd
import numpy as np
from torch import tensor
match_df = pd.DataFrame({'INCIDENT_NUMBER': ['INC000030884498',
'INC000029956111',
'INC000029555353',
'INC000029555338'],
'enc_rep': [[[tensor(0.2971), tensor(0.4831), tensor(0.8239), tensor(0.2048)]],
[[tensor(0.3481), tensor(0.8104) , tensor(0.2879), tensor(0.9747)]],
[[tensor(0.2210), tensor(0.3478), tensor(0.2619), tensor(0.2429)]],
[[tensor(0.2951), tensor(0.6698), tensor(0.9654), tensor(0.5733)]]]})
input_sentence_embed = [[tensor(0.0590), tensor(0.3919), tensor(0.7821) , tensor(0.1967)]]
```

*How to broadcast ‘input_sentence_embed’ as a new column to the ‘matched_df’*

```
match_df["input_sentence_embed"] = [input_sentence_embed] * len(match_df)
```

*How to find cosine similarity between tensors stored in two column*

```
a = np.vstack(match_df["enc_rep"])
b = np.hstack(input_sentence_embed)
match_df["cosine_similarity"] = a.dot(b) / (np.linalg.norm(a) * np.linalg.norm(b))
```

Output result:

```
INCIDENT_NUMBER enc_rep input_sentence_embed cosine_similarity
0 INC000030884498 [[tensor(0.2971), tensor(0.4831), tensor(0.823... [[tensor(0.0590), tensor(0.3919), tensor(0.782... 0.446067
1 INC000029956111 [[tensor(0.3481), tensor(0.8104), tensor(0.287... [[tensor(0.0590), tensor(0.3919), tensor(0.782... 0.377775
2 INC000029555353 [[tensor(0.2210), tensor(0.3478), tensor(0.261... [[tensor(0.0590), tensor(0.3919), tensor(0.782... 0.201116
3 INC000029555338 [[tensor(0.2951), tensor(0.6698), tensor(0.965... [[tensor(0.0590), tensor(0.3919), tensor(0.782... 0.574257
```

Answered By – Corralien

**This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **