How to create an algorithm that helps me improve results and automate process?

Issue

i’ve already posted here my problem and since then i’ve been trying to find solution to help me optimise my results , in the previous post ,Yaloa understood what i wanted to do but sadly i always end up in a dead end
My previous Post

the fact is that I want to improve my results in order to visualize them
This is my dataframe:

ID           TimeandDate        Date       Time
10   2020-08-07 07:40:09  2022-08-07   07:40:09
10   2020-08-07 08:50:00  2022-08-07   08:50:00
10   2020-08-07 12:40:09  2022-08-07   12:40:09
10   2020-08-08 07:40:09  2022-08-08   07:40:09
10   2020-08-08 17:40:09  2022-08-08   17:40:09
12   2020-08-07 08:03:09  2022-08-07   08:03:09
12   2020-08-07 10:40:09  2022-08-07   10:40:09
12   2020-08-07 14:40:09  2022-08-07   14:40:09
12   2020-08-07 16:40:09  2022-08-07   16:40:09
13   2020-08-07 09:22:45  2022-08-07   09:22:45
13   2020-08-07 17:57:06  2022-08-07   17:57:06

first of all the data is collected from time clock , i want to create new dataframe with 2 new columns the first one is df["Check-in"] , as you can see my data doesnt have any indicator to show what time the id has checked in , so i will suppose that the first time for every id is a check-in , and the next row is a check-out and it will be inserted in df["Check-out"] , also if a check-in doesnt have a check-out time it has to be registred as the check-out for the previous check-out of the same day(sometimes id forgot to check-out) because it has to be the same number of rows for check-in and check-out cant have 2 check-ins and 3 check-outs

What i’ve tried ?
what i meant with i need better results is because what i’ve tried is not the best solution , i took the min as a check-in and the max is a check-out of time for every id without adding the two columns , and after that i started calculating the time difference , now imagine ID=13 has entered at 07:40:09 and the he check out at 08:40:09 , later that day he returns at 19:20:00 and leave in the next 10 minutes 19:30:00 if i do that fonction it will show that he worked for 12 hours while his real working time is 1 hour

Result Desired

ID         Date   Check-in    Check-out
10   2020-08-07   07:40:09     12:40:09
10   2020-08-08   07:40:09     17:40:09
12   2020-08-07   08:03:09     10:40:09
12   2020-08-07   14:40:09     16:40:09 
13   2020-08-07   09:22:45     17:57:06

Solution

It took me a while to correctly understand your question. One way to do this is to group your df by ID and Date and check the number of rows of the sub-df. Then delete the next-to-last row if you have an odd number of rows.
Finally create your checkin and checkout columns (ffill checkin and dropna to delete double entries):

df = df.drop('TimeandDate', axis=1)
df_output = pd.DataFrame()

for (id, date), subdf in df.groupby(['ID', 'Date']):
    subdf = subdf.reset_index(drop=True)

    # handling case where more checkins than checkouts
    nb_checks = len(subdf.index)
    if nb_checks % 2 and nb_checks > 1:
        subdf.iloc[-2, :] = subdf.iloc[-1, :]
        subdf = subdf.head(-1)
    
    subdf['Check-in'] = subdf.loc[::2, 'Time']
    subdf['Check-out'] = subdf.loc[1::2, 'Time']
    subdf['Check-in'].ffill(inplace=True)

    df_output = df_output.append(subdf.drop('Time', axis=1).dropna())

print(df_output.reset_index(drop=True))

Output:

   ID        Date  Check-in Check-out
0  10  2022-08-07  07:40:09  12:40:09
1  10  2022-08-08  07:40:09  17:40:09
2  12  2022-08-07  08:03:09  10:40:09
3  12  2022-08-07  14:40:09  16:40:09
4  13  2022-08-07  09:22:45  17:57:06

Edit/Note: if you have only one entry for a given ID/Date, it won’t appear in your final result. (You’d have to handle the case where nb_checks==1 separately)

Answered By – Tranbi

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published