Using a Python pandas dataframe column as input to a loop through another column

Issue

I’ve got two dataframes. One looks like:

Year  Count
1      3
2      2
3      1
4      5
5      4

The other looks like

ID   Value
1     100
2      50
3       0
4      25
5      50

I’m looking to use the Count in first dataframe to loop through the second dataframe. I want to use the count value in each row to randomly select from the Value column in the second dataframe N times where N is the value in the Count column – and add these values up, giving a new column in the first dataframe thus:

Year  Count  RandSum
1      3      200
2      2       50
3      1      100
4      5      225
5      4      200

i.e. so the RandSum column added to the first dataframe is the sum of "Count" random selections from the Value column in the 2nd dataframe (i.e. in the first row, Count = 3, so drawing randomly from the Value column in the 2nd table drew 100, 50 and 50 = 200)

Any help appreciated for this relative python novice.

Solution

Here’s another idea using numpy.random.choice in a list comprehension:

import numpy as np

np.random.seed(0)

df1['RandSum'] = [np.random.choice(df2['Value'], n).sum() for n in df1['Count']]

[out]

   Year  Count  RandSum
0     1      3      175
1     2      2       50
2     3      1       50
3     4      5      275
4     5      4      200

Answered By – Chris Adams

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published