## Issue

I have a Pandas dataframe `df`

with two columns, `A`

and `B`

such that all values in `B`

would be different for a value in `A`

, but two different values in `A`

can have duplicate values in `B`

.

```
A B
ABC1 XYZ XYZ 123
ABC2 XYZ XYZ 123
ABC1 XYZ XYZ 135
ABC3 XYZ PQR 123
ABC2 XYZ PQR 123
ABC3 XYZ XYZ 135
ABC4 XYZ XYZ 135
ABC2 XYZ PQR 987
ABC4 XYZ PQR 123
ABC5 PQR PQR 567
```

How can I create a heatmap from this dataframe that shows me the number of common `B`

values between any two `A`

values? I presume I’ll have to create a pivot that might look something like this –

```
ABC1 ABC2 ABC3 ABC4 ABC5
ABC1 2 1 ...
ABC2 1 3
ABC3 0 1 ...
ABC4 0 0 ... ...
ABC5 0 0
```

Any help on how this can be done efficiently?

## Solution

I think we can use `get_dummies`

and `np.tensordot`

for this.

First, let’s find out for each value of `A`

which `B`

are present

```
>>> common = pd.get_dummies(df['A']).groupby(df['B']).max()
>>> common
Bar Foo
B
2 1 0
3 0 1
4 0 1
5 1 1
6 1 0
9 1 0
```

Then to compute the intersections of each rows we need the dot product of every pair of rows, for that we can use:

```
>>> np.tensordot(common, common, (0, 0))
array([[4, 1],
[1, 3]], dtype=uint8)
>>> pd.DataFrame(np.tensordot(common, common, (0, 0)), columns=common.columns, index=common.columns)
Bar Foo
Bar 4 1
Foo 1 3
```

The diagonal shows the number of unique `B`

values for each `A`

column.

Answered By – Cimbali

**This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **