pandas GroupBy plotting two lines for each group on one plot

Issue

I’ve been struggling to plot the results of the GroupBy on three columns.
I have the data on the different absences (AbsenceType) of employees (Employee) over 3 years (MonthYear). I would like to plot in one plot, how many absences of a particular type an employee had in each month-year. I have only two employees in the example, but there are more in the data as well as more month-year values.

Create data

data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}

columns = ['Employee','MonthYear','AbsenceType']

df = pd.DataFrame(data, columns=columns)

Then I map each of the codes of the AbsenceType into two categories: Sick or Injury.

df['SickOrInjury'] =df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})

What I want to achieve is the following groupby:

test = df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count()

But, when I try to plot it, it does not fully show what I want. So far I managed to get to the stage:

df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count().unstack('SickOrInjury', fill_value=0).plot()
plt.show()

test plot
However, employee’s ID are shown on the X axis and not in the legend.

What I want to have is something like this:
desired plot
I would like to have time on the X axis and the count for each absence type (sick or injury) on the Y axis. There should be two different types of lines (e.g. solid and dashed) for each absence type and different colors for each employee (e.g. black and red).

Solution

I think unstacking is the right approach to fill missing values but you should probably convert MonthYear to date and resample by month. You can then plot your dataframe using seaborn.lineplot:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}

columns = ['Employee','MonthYear','AbsenceType']

df = pd.DataFrame(data, columns=columns)

df['SickOrInjury'] = df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
df['MonthYear'] = pd.to_datetime(df['MonthYear'], format="%Y%m")
df = df.groupby(['MonthYear', 'Employee', 'SickOrInjury']).count()

# renaming the aggregated (and unique) column
df = df.rename(columns={'AbsenceType': 'EmpAbsCount'})

df = df.unstack(['Employee', 'SickOrInjury'], fill_value=0)
# resampling for monthly values:
df = df.resample('M').sum().stack(['Employee', 'SickOrInjury'])

sns.lineplot(x='MonthYear', y='EmpAbsCount', data=df, hue='Employee', style='SickOrInjury', markers=True, ci=None)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Output:

enter image description here

Answered By – Tranbi

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published