Issue
In my dataset I found that disloyal 30-40 year old customers were dissatisfied with Company X. I want to find out the sums of the customers’ responses.
In other words, how many Dissatisfied
Disloyal
30-40
year old customers ranked Product A between 0-2 and Product B between 0-2 out of a possible rating of 5.
Customer Cat Age Satisfaction Prod A Rank Prod B Rank Gender House Style Distance
Disloyal 28 Dissatisfied 1 2
Loyal 30 Satisfied 3 5
Disloyal 36 Dissatisfied 0 2
Finally, what products did these customers rank <=2?
Solution
You can do everything with boolean indexing:
sub = df[
(df["Customer Cat"] == "Disloyal")
& (df["Satisfaction"] == "Dissatisfied")
& df["Age"].between(30, 40)
]
Then you run your analysis like:
sub[(sub["Prod A Rank"] <= 2) & (sub["Prod B Rank"] <= 2)].shape[0]
# Given your example this outputs 1
Alternatively, if you want to know whether your subset of customer was dissatisfied with either one of your products you can use the logical operator |
(OR):
sub[(sub["Prod A Rank"] <= 2) | (sub["Prod B Rank"] <= 2)].shape[0]
If you want to study the products, you can try this:
(sub.melt(value_vars=[c for c in sub.columns if c.startswith("Prod")])
.groupby("variable")
.value_counts()
.to_frame()
.reset_index()
.rename(columns={0: "count"}))
This outputs:
variable value count
0 Prod A Rank 0 1
1 Prod B Rank 2 1
Answered By – 965311532
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0