Calculate pandas groupby difference iteratively

Issue

Hi I created a column called "group" based on if the row before and after is fruit.

How could I create the new_group column? It’s based on 10-minute fruit gaps. The dataframe is sorted by person, time.

person   time_bought  product  group  new_group
abby     2:21         fruit     1       1
abby     2:24         other     
abby     2:25         fruit     2       1  (2.25 is within 10 minutes of 2.21 so part of same group)
abby     10:35        fruit     2       2  
abby     10:40        other
abby     10:42        fruit     3       2  (10.42 is within 10 minutes of 10.35)
abby     10:53        fruit     4       3  (10.53 is not within 10 minutes of 10.42)
abby     11:04        fruit     d
barry    12:00        fruit     1

I tried

m= df.groupby(["person", "group"]).time_bought.diff()
df["new_group"] = df.groupby(["person, "group"]).mask(m).ffill()

Solution

To generate your new group, you can use:

m1 = pd.to_datetime(df['time_bought']).groupby(df['person']).diff().gt('10min')
df['new_group'] = m1.cumsum().add(1)

output:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None          1
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None          2
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d          4
8  barry       12:00   fruit     1          4

Potential masking: mask non-fruit and last of group (the logic is unclear):

m2 = df['product'].ne('fruit')
m3 = df['person'].ne(df['person'].shift(-1))
df['new_group'] = m1.cumsum().add(1).mask(m2|m3).convert_dtypes()

output:

  person time_bought product group  new_group
0   abby        2:21   fruit     1          1
1   abby        2:24   other  None       <NA>
2   abby        2:25   fruit     2          1
3   abby       10:35   fruit     2          2
4   abby       10:40   other  None       <NA>
5   abby       10:42   fruit     3          2
6   abby       10:53   fruit     4          3
7   abby       11:04   fruit     d       <NA>
8  barry       12:00   fruit     1       <NA>

Answered By – mozway

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published