Issue
i’ve already posted here my problem and since then i’ve been trying to find solution to help me optimise my results , in the previous post ,Yaloa understood what i wanted to do but sadly i always end up in a dead end
My previous Post
the fact is that I want to improve my results in order to visualize them
This is my dataframe:
ID TimeandDate Date Time
10 2020-08-07 07:40:09 2022-08-07 07:40:09
10 2020-08-07 08:50:00 2022-08-07 08:50:00
10 2020-08-07 12:40:09 2022-08-07 12:40:09
10 2020-08-08 07:40:09 2022-08-08 07:40:09
10 2020-08-08 17:40:09 2022-08-08 17:40:09
12 2020-08-07 08:03:09 2022-08-07 08:03:09
12 2020-08-07 10:40:09 2022-08-07 10:40:09
12 2020-08-07 14:40:09 2022-08-07 14:40:09
12 2020-08-07 16:40:09 2022-08-07 16:40:09
13 2020-08-07 09:22:45 2022-08-07 09:22:45
13 2020-08-07 17:57:06 2022-08-07 17:57:06
first of all the data is collected from time clock , i want to create new dataframe with 2 new columns the first one is df["Check-in"]
, as you can see my data doesnt have any indicator to show what time the id
has checked in , so i will suppose that the first time for every id
is a check-in
, and the next row is a check-out and it will be inserted in df["Check-out"]
, also if a check-in
doesnt have a check-out
time it has to be registred as the check-out
for the previous check-out
of the same day(sometimes id
forgot to check-out
) because it has to be the same number of rows for check-in
and check-out
cant have 2 check-ins
and 3 check-outs
What i’ve tried ?
what i meant with i need better results is because what i’ve tried is not the best solution , i took the min
as a check-in
and the max
is a check-out
of time
for every id
without adding the two columns , and after that i started calculating the time difference , now imagine ID=13
has entered at 07:40:09
and the he check out at 08:40:09
, later that day he returns at 19:20:00
and leave in the next 10 minutes 19:30:00
if i do that fonction it will show that he worked for 12 hours while his real working time is 1 hour
Result Desired
ID Date Check-in Check-out
10 2020-08-07 07:40:09 12:40:09
10 2020-08-08 07:40:09 17:40:09
12 2020-08-07 08:03:09 10:40:09
12 2020-08-07 14:40:09 16:40:09
13 2020-08-07 09:22:45 17:57:06
Solution
It took me a while to correctly understand your question. One way to do this is to group your df by ID
and Date
and check the number of rows of the sub-df. Then delete the next-to-last row if you have an odd number of rows.
Finally create your checkin and checkout columns (ffill checkin and dropna to delete double entries):
df = df.drop('TimeandDate', axis=1)
df_output = pd.DataFrame()
for (id, date), subdf in df.groupby(['ID', 'Date']):
subdf = subdf.reset_index(drop=True)
# handling case where more checkins than checkouts
nb_checks = len(subdf.index)
if nb_checks % 2 and nb_checks > 1:
subdf.iloc[-2, :] = subdf.iloc[-1, :]
subdf = subdf.head(-1)
subdf['Check-in'] = subdf.loc[::2, 'Time']
subdf['Check-out'] = subdf.loc[1::2, 'Time']
subdf['Check-in'].ffill(inplace=True)
df_output = df_output.append(subdf.drop('Time', axis=1).dropna())
print(df_output.reset_index(drop=True))
Output:
ID Date Check-in Check-out
0 10 2022-08-07 07:40:09 12:40:09
1 10 2022-08-08 07:40:09 17:40:09
2 12 2022-08-07 08:03:09 10:40:09
3 12 2022-08-07 14:40:09 16:40:09
4 13 2022-08-07 09:22:45 17:57:06
Edit/Note: if you have only one entry for a given ID/Date, it won’t appear in your final result. (You’d have to handle the case where nb_checks==1
separately)
Answered By – Tranbi
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0