Using boolean indexing or groupby to sum customers and identify products

Issue

In my dataset I found that disloyal 30-40 year old customers were dissatisfied with Company X. I want to find out the sums of the customers’ responses.

In other words, how many Dissatisfied Disloyal 30-40 year old customers ranked Product A between 0-2 and Product B between 0-2 out of a possible rating of 5.

Customer Cat    Age    Satisfaction    Prod A Rank   Prod B Rank  Gender    House Style    Distance
    Disloyal     28    Dissatisfied              1           2
       Loyal     30       Satisfied              3           5
    Disloyal     36    Dissatisfied              0           2

Finally, what products did these customers rank <=2?

Solution

You can do everything with boolean indexing:

sub = df[
    (df["Customer Cat"] == "Disloyal")
    & (df["Satisfaction"] == "Dissatisfied")
    & df["Age"].between(30, 40)
]

Then you run your analysis like:

sub[(sub["Prod A Rank"] <= 2) & (sub["Prod B Rank"] <= 2)].shape[0]
# Given your example this outputs 1

Alternatively, if you want to know whether your subset of customer was dissatisfied with either one of your products you can use the logical operator | (OR):

sub[(sub["Prod A Rank"] <= 2) | (sub["Prod B Rank"] <= 2)].shape[0]

If you want to study the products, you can try this:

(sub.melt(value_vars=[c for c in sub.columns if c.startswith("Prod")])
    .groupby("variable")
    .value_counts()
    .to_frame()
    .reset_index()
    .rename(columns={0: "count"}))

This outputs:

    variable    value   count
0   Prod A Rank 0       1
1   Prod B Rank 2       1 

Answered By – 965311532

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published