Issue
I’ve been struggling to plot the results of the GroupBy on three columns.
I have the data on the different absences (AbsenceType) of employees (Employee) over 3 years (MonthYear). I would like to plot in one plot, how many absences of a particular type an employee had in each month-year. I have only two employees in the example, but there are more in the data as well as more month-year values.
Create data
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
Then I map each of the codes of the AbsenceType into two categories: Sick or Injury.
df['SickOrInjury'] =df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
What I want to achieve is the following groupby:
test = df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count()
But, when I try to plot it, it does not fully show what I want. So far I managed to get to the stage:
df.groupby(['Employee', 'MonthYear', 'SickOrInjury'])['SickOrInjury'].count().unstack('SickOrInjury', fill_value=0).plot()
plt.show()
test plot
However, employee’s ID are shown on the X axis and not in the legend.
What I want to have is something like this:
desired plot
I would like to have time on the X axis and the count for each absence type (sick or injury) on the Y axis. There should be two different types of lines (e.g. solid and dashed) for each absence type and different colors for each employee (e.g. black and red).
Solution
I think unstacking is the right approach to fill missing values but you should probably convert MonthYear
to date and resample by month. You can then plot your dataframe using seaborn.lineplot
:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'Employee': ['ID1', 'ID1','ID1','ID1','ID1','ID1','ID1', 'ID1', 'ID1', 'ID2','ID2','ID2','ID2','ID2', 'ID2'],
'MonthYear': ['201708', '201601','201601','201708','201710','201801','201801', '201601', '201601', '201705', '201705', '201705', '201810', '201811', '201705'],
'AbsenceType': ['0210', '0210','0250','0215','0217','0260','0210', '0210', '0210', '0260', '0250', '0215', '0217', '0215', '0250']}
columns = ['Employee','MonthYear','AbsenceType']
df = pd.DataFrame(data, columns=columns)
df['SickOrInjury'] = df['AbsenceType'].replace({'0210':'Sick', '0215':'Sick', '0217':'Sick', '0250':'Injury', '0260':'Injury'})
df['MonthYear'] = pd.to_datetime(df['MonthYear'], format="%Y%m")
df = df.groupby(['MonthYear', 'Employee', 'SickOrInjury']).count()
# renaming the aggregated (and unique) column
df = df.rename(columns={'AbsenceType': 'EmpAbsCount'})
df = df.unstack(['Employee', 'SickOrInjury'], fill_value=0)
# resampling for monthly values:
df = df.resample('M').sum().stack(['Employee', 'SickOrInjury'])
sns.lineplot(x='MonthYear', y='EmpAbsCount', data=df, hue='Employee', style='SickOrInjury', markers=True, ci=None)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Output:
Answered By – Tranbi
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0