I have a dataframe with a column of strings and a column of ID
>>> import pandas as pd >>> df ID Sentence 0 The cat is running away 1 The lazy dog jumped over the brown fox just now 2 Hello
What I would like to do is to remove rows that have too short or too long strings. For example, I want to set a minimum of the strings to be 2 words and a maximum of 8 words. After filtering out the threshold, only ID#1 is returned.
ID Sentence 0 The cat is running away
Could anyone give me a suggestion on how to do this?
Hello this can be done by creating a new column that contains a count of the words then filter your df.
df["Sentence"]=df["Sentence"].str.split() df["WordsCount"]=df["Sentence"].apply(lambda x: len(x)) df=df[(df["WordsCount"]>= 2)&(df["WordsCount"]<=8)]
Answered By – ahmed awada