Calculate mean values of subsets of a dataframe column and subtract those mean values from a whole dataset where columns match

Issue

So I have the following dataframe:

{'Sample': {0: 'AA',
  1: 'AA',
  2: 'AA',
  3: 'AA',
  4: 'AA',
  5: 'AA',
  6: 'AA',
  7: 'AA',
  8: 'AA',
  9: 'AA',
  10: 'BB',
  11: 'BB',
  12: 'BB',
  13: 'BB',
  14: 'BB',
  15: 'BB',
  16: 'BB',
  17: 'CC',
  18: 'CC',
  19: 'CC',
  20: 'CC',
  21: 'CC',
  22: 'CC',
  23: 'CC',
  24: 'DD',
  25: 'DD',
  26: 'DD',
  27: 'DD',
  28: 'DD',
  29: 'DD',
  30: 'DD',
  31: 'DD',
  32: 'DD',
  33: 'DD',
  34: 'DD',
  35: 'DD',
  36: 'AA',
  37: 'AA',
  38: 'AA',
  39: 'AA',
  40: 'AA',
  41: 'AA',
  42: 'AA',
  43: 'AA',
  44: 'AA',
  45: 'AA',
  46: 'AA',
  47: 'AA',
  48: 'AA',
  49: 'AA',
  50: 'AA',
  51: 'BB',
  52: 'BB',
  53: 'BB',
  54: 'BB',
  55: 'BB',
  56: 'BB',
  57: 'BB',
  58: 'BB',
  59: 'BB',
  60: 'BB',
  61: 'BB',
  62: 'BB',
  63: 'BB',
  64: 'BB',
  65: 'BB',
  66: 'BB',
  67: 'BB',
  68: 'BB',
  69: 'BB'},
 'Test No': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1,
  11: 1,
  12: 1,
  13: 1,
  14: 1,
  15: 1,
  16: 1,
  17: 1,
  18: 1,
  19: 1,
  20: 1,
  21: 1,
  22: 1,
  23: 1,
  24: 1,
  25: 1,
  26: 1,
  27: 1,
  28: 1,
  29: 1,
  30: 1,
  31: 1,
  32: 1,
  33: 1,
  34: 1,
  35: 1,
  36: 2,
  37: 2,
  38: 2,
  39: 2,
  40: 2,
  41: 2,
  42: 2,
  43: 2,
  44: 2,
  45: 2,
  46: 2,
  47: 2,
  48: 2,
  49: 2,
  50: 2,
  51: 2,
  52: 2,
  53: 2,
  54: 2,
  55: 2,
  56: 2,
  57: 2,
  58: 2,
  59: 2,
  60: 2,
  61: 2,
  62: 2,
  63: 2,
  64: 2,
  65: 2,
  66: 2,
  67: 2,
  68: 2,
  69: 2},
 'Person': {0: '7A',
  1: '7A',
  2: '7A',
  3: '7A',
  4: '7A',
  5: '7A',
  6: '7A',
  7: '7A',
  8: '7A',
  9: '7A',
  10: '7A',
  11: '7A',
  12: '7A',
  13: '7A',
  14: '7A',
  15: '7A',
  16: '7A',
  17: '9A',
  18: '9A',
  19: '9A',
  20: '9A',
  21: '9A',
  22: '9A',
  23: '9A',
  24: '9A',
  25: '9A',
  26: '9A',
  27: '9A',
  28: '9A',
  29: '9A',
  30: '9A',
  31: '9A',
  32: '9A',
  33: '9A',
  34: '9A',
  35: '9A',
  36: '7A',
  37: '7A',
  38: '7A',
  39: '7A',
  40: '7A',
  41: '7A',
  42: '7A',
  43: '7A',
  44: '7A',
  45: '7A',
  46: '7A',
  47: '7A',
  48: '7A',
  49: '7A',
  50: '7A',
  51: '12B',
  52: '12B',
  53: '12B',
  54: '12B',
  55: '12B',
  56: '12B',
  57: '12B',
  58: '12B',
  59: '12B',
  60: '12B',
  61: '12B',
  62: '12B',
  63: '12B',
  64: '12B',
  65: '12B',
  66: '12B',
  67: '12B',
  68: '12B',
  69: '12B'},
 'Level': {0: 0,
  1: 0,
  2: 0,
  3: 0,
  4: 10,
  5: 10,
  6: 10,
  7: 20,
  8: 20,
  9: 20,
  10: 0,
  11: 0,
  12: 10,
  13: 10,
  14: 20,
  15: 20,
  16: 30,
  17: 0,
  18: 0,
  19: 0,
  20: 0,
  21: 40,
  22: 40,
  23: 70,
  24: 0,
  25: 0,
  26: 0,
  27: 100,
  28: 100,
  29: 200,
  30: 200,
  31: 300,
  32: 300,
  33: 400,
  34: 400,
  35: 400,
  36: 0,
  37: 0,
  38: 0,
  39: 400,
  40: 400,
  41: 400,
  42: 500,
  43: 500,
  44: 500,
  45: 800,
  46: 800,
  47: 800,
  48: 900,
  49: 900,
  50: 900,
  51: 0,
  52: 0,
  53: 0,
  54: 0,
  55: 5,
  56: 5,
  57: 5,
  58: 5,
  59: 7,
  60: 7,
  61: 7,
  62: 7,
  63: 8,
  64: 8,
  65: 8,
  66: 8,
  67: 9,
  68: 9,
  69: 9},
 'Response': {0: 5,
  1: 7,
  2: 4,
  3: 9,
  4: 10,
  5: 14,
  6: 13,
  7: 22,
  8: 25,
  9: 29,
  10: 3,
  11: 2,
  12: 5,
  13: 3,
  14: 12,
  15: 18,
  16: 22,
  17: 10,
  18: 12,
  19: 18,
  20: 12,
  21: 22,
  22: 27,
  23: 57,
  24: 21,
  25: 22,
  26: 32,
  27: 23,
  28: 31,
  29: 45,
  30: 45,
  31: 34,
  32: 54,
  33: 59,
  34: 58,
  35: 57,
  36: 2,
  37: 3,
  38: 1,
  39: 5,
  40: 8,
  41: 7,
  42: 17,
  43: 14,
  44: 19,
  45: 46,
  46: 45,
  47: 43,
  48: 70,
  49: 87,
  50: 67,
  51: 6,
  52: 9,
  53: 1,
  54: 12,
  55: 15,
  56: 22,
  57: 27,
  58: 24,
  59: 47,
  60: 45,
  61: 87,
  62: 67,
  63: 98,
  64: 97,
  65: 99,
  66: 120,
  67: 200,
  68: 230,
  69: 210}}

I want to calculate the mean values of Response of where Level = 0, but grouped by Sample, Test No, Person. Then, I want to create a new column, which contains the mean value of the Level 0 subtracted from the Response of the other rows, where the Sample, Test No and Person rows match what the mean value was calculated on for the 0 Level.
Or to put it another way, calculate the mean value of the first set of 0 Levels and subtract it from the next rows down, until the next 0 level it reached, then repeat.

The resulting dataframe would look like this (its no problem if the 0 levels in the new column have values or not):

{'Sample': {0: 'AA',
  1: 'AA',
  2: 'AA',
  3: 'AA',
  4: 'AA',
  5: 'AA',
  6: 'AA',
  7: 'AA',
  8: 'AA',
  9: 'AA',
  10: 'BB',
  11: 'BB',
  12: 'BB',
  13: 'BB',
  14: 'BB',
  15: 'BB',
  16: 'BB',
  17: 'CC',
  18: 'CC',
  19: 'CC',
  20: 'CC',
  21: 'CC',
  22: 'CC',
  23: 'CC',
  24: 'DD',
  25: 'DD',
  26: 'DD',
  27: 'DD',
  28: 'DD',
  29: 'DD',
  30: 'DD',
  31: 'DD',
  32: 'DD',
  33: 'DD',
  34: 'DD',
  35: 'DD',
  36: 'AA',
  37: 'AA',
  38: 'AA',
  39: 'AA',
  40: 'AA',
  41: 'AA',
  42: 'AA',
  43: 'AA',
  44: 'AA',
  45: 'AA',
  46: 'AA',
  47: 'AA',
  48: 'AA',
  49: 'AA',
  50: 'AA',
  51: 'BB',
  52: 'BB',
  53: 'BB',
  54: 'BB',
  55: 'BB',
  56: 'BB',
  57: 'BB',
  58: 'BB',
  59: 'BB',
  60: 'BB',
  61: 'BB',
  62: 'BB',
  63: 'BB',
  64: 'BB',
  65: 'BB',
  66: 'BB',
  67: 'BB',
  68: 'BB',
  69: 'BB'},
 'Test No': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1,
  11: 1,
  12: 1,
  13: 1,
  14: 1,
  15: 1,
  16: 1,
  17: 1,
  18: 1,
  19: 1,
  20: 1,
  21: 1,
  22: 1,
  23: 1,
  24: 1,
  25: 1,
  26: 1,
  27: 1,
  28: 1,
  29: 1,
  30: 1,
  31: 1,
  32: 1,
  33: 1,
  34: 1,
  35: 1,
  36: 2,
  37: 2,
  38: 2,
  39: 2,
  40: 2,
  41: 2,
  42: 2,
  43: 2,
  44: 2,
  45: 2,
  46: 2,
  47: 2,
  48: 2,
  49: 2,
  50: 2,
  51: 2,
  52: 2,
  53: 2,
  54: 2,
  55: 2,
  56: 2,
  57: 2,
  58: 2,
  59: 2,
  60: 2,
  61: 2,
  62: 2,
  63: 2,
  64: 2,
  65: 2,
  66: 2,
  67: 2,
  68: 2,
  69: 2},
 'Person': {0: '7A',
  1: '7A',
  2: '7A',
  3: '7A',
  4: '7A',
  5: '7A',
  6: '7A',
  7: '7A',
  8: '7A',
  9: '7A',
  10: '7A',
  11: '7A',
  12: '7A',
  13: '7A',
  14: '7A',
  15: '7A',
  16: '7A',
  17: '9A',
  18: '9A',
  19: '9A',
  20: '9A',
  21: '9A',
  22: '9A',
  23: '9A',
  24: '9A',
  25: '9A',
  26: '9A',
  27: '9A',
  28: '9A',
  29: '9A',
  30: '9A',
  31: '9A',
  32: '9A',
  33: '9A',
  34: '9A',
  35: '9A',
  36: '7A',
  37: '7A',
  38: '7A',
  39: '7A',
  40: '7A',
  41: '7A',
  42: '7A',
  43: '7A',
  44: '7A',
  45: '7A',
  46: '7A',
  47: '7A',
  48: '7A',
  49: '7A',
  50: '7A',
  51: '12B',
  52: '12B',
  53: '12B',
  54: '12B',
  55: '12B',
  56: '12B',
  57: '12B',
  58: '12B',
  59: '12B',
  60: '12B',
  61: '12B',
  62: '12B',
  63: '12B',
  64: '12B',
  65: '12B',
  66: '12B',
  67: '12B',
  68: '12B',
  69: '12B'},
 'Level': {0: 0,
  1: 0,
  2: 0,
  3: 0,
  4: 10,
  5: 10,
  6: 10,
  7: 20,
  8: 20,
  9: 20,
  10: 0,
  11: 0,
  12: 10,
  13: 10,
  14: 20,
  15: 20,
  16: 30,
  17: 0,
  18: 0,
  19: 0,
  20: 0,
  21: 40,
  22: 40,
  23: 70,
  24: 0,
  25: 0,
  26: 0,
  27: 100,
  28: 100,
  29: 200,
  30: 200,
  31: 300,
  32: 300,
  33: 400,
  34: 400,
  35: 400,
  36: 0,
  37: 0,
  38: 0,
  39: 400,
  40: 400,
  41: 400,
  42: 500,
  43: 500,
  44: 500,
  45: 800,
  46: 800,
  47: 800,
  48: 900,
  49: 900,
  50: 900,
  51: 0,
  52: 0,
  53: 0,
  54: 0,
  55: 5,
  56: 5,
  57: 5,
  58: 5,
  59: 7,
  60: 7,
  61: 7,
  62: 7,
  63: 8,
  64: 8,
  65: 8,
  66: 8,
  67: 9,
  68: 9,
  69: 9},
 'Response': {0: 5,
  1: 7,
  2: 4,
  3: 9,
  4: 10,
  5: 14,
  6: 13,
  7: 22,
  8: 25,
  9: 29,
  10: 3,
  11: 2,
  12: 5,
  13: 3,
  14: 12,
  15: 18,
  16: 22,
  17: 10,
  18: 12,
  19: 18,
  20: 12,
  21: 22,
  22: 27,
  23: 57,
  24: 21,
  25: 22,
  26: 32,
  27: 23,
  28: 31,
  29: 45,
  30: 45,
  31: 34,
  32: 54,
  33: 59,
  34: 58,
  35: 57,
  36: 2,
  37: 3,
  38: 1,
  39: 5,
  40: 8,
  41: 7,
  42: 17,
  43: 14,
  44: 19,
  45: 46,
  46: 45,
  47: 43,
  48: 70,
  49: 87,
  50: 67,
  51: 6,
  52: 9,
  53: 1,
  54: 12,
  55: 15,
  56: 22,
  57: 27,
  58: 24,
  59: 47,
  60: 45,
  61: 87,
  62: 67,
  63: 98,
  64: 97,
  65: 99,
  66: 120,
  67: 200,
  68: 230,
  69: 210},
 'Response minus 0': {0: nan,
  1: nan,
  2: nan,
  3: nan,
  4: 3.75,
  5: 7.75,
  6: 6.75,
  7: 15.75,
  8: 18.75,
  9: 22.75,
  10: nan,
  11: nan,
  12: 2.5,
  13: 0.5,
  14: 9.5,
  15: 15.5,
  16: 19.5,
  17: nan,
  18: nan,
  19: nan,
  20: nan,
  21: 9.0,
  22: 14.0,
  23: 44.0,
  24: nan,
  25: nan,
  26: nan,
  27: -2.0,
  28: 6.0,
  29: 20.0,
  30: 20.0,
  31: 9.0,
  32: 29.0,
  33: 34.0,
  34: 33.0,
  35: 32.0,
  36: nan,
  37: nan,
  38: nan,
  39: 3.0,
  40: 6.0,
  41: 5.0,
  42: 15.0,
  43: 12.0,
  44: 17.0,
  45: 44.0,
  46: 43.0,
  47: 41.0,
  48: 68.0,
  49: 85.0,
  50: 65.0,
  51: nan,
  52: nan,
  53: nan,
  54: nan,
  55: 8.0,
  56: 15.0,
  57: 20.0,
  58: 17.0,
  59: 40.0,
  60: 38.0,
  61: 80.0,
  62: 60.0,
  63: 91.0,
  64: 90.0,
  65: 92.0,
  66: 113.0,
  67: 193.0,
  68: 223.0,
  69: 203.0}}

I’ve thought about creating a new df that contains a subset of the original df based on Response = 0 and then doing a groupby on Sample, Test No and Person, but I don’t know how to ‘match’ the columns in the original df to the subset df to subtract the Level 0 values.

Can anyone please offer me some assistance?

Many thanks in advance

Solution

First create a mask where Level is 0. then use it to keep only the Response where 0. groupby the three columns as wanted and transform the mean to get alignthe mean value for each group, even for the non 0 level. Remove this to the Reponse column and populate our new column only where level is not 0 so the opposite of your mask.

m0 = df['Level'].eq(0)
df.loc[~m0, 'Response minus 0'] = (
    df['Response'] 
    - df['Response'].where(m0)
        .groupby([df['Sample'], df['Test No'], df['Person']])
        .transform('mean')
)
print(df)
#    Sample  Test No Person  Level  Response  Response minus 0
# 0      AA        1     7A      0         5               NaN
# 1      AA        1     7A      0         7               NaN
# 2      AA        1     7A      0         4               NaN
# 3      AA        1     7A      0         9               NaN
# 4      AA        1     7A     10        10              3.75
# 5      AA        1     7A     10        14              7.75
# 6      AA        1     7A     10        13              6.75
# 7      AA        1     7A     20        22             15.75
# 8      AA        1     7A     20        25             18.75
# 9      AA        1     7A     20        29             22.75
# 10     BB        1     7A      0         3               NaN
# 11     BB        1     7A      0         2               NaN
# 12     BB        1     7A     10         5              2.50
# 13     BB        1     7A     10         3              0.50
# 14     BB        1     7A     20        12              9.50
# 15     BB        1     7A     20        18             15.50
# 16     BB        1     7A     30        22             19.50
# 17     CC        1     9A      0        10               NaN
# 18     CC        1     9A      0        12               NaN
# ...

Answered By – Ben.T

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published