Issue
I have a dataframe with a column of strings and a column of ID
>>> import pandas as pd
>>> df
ID Sentence
0 The cat is running away
1 The lazy dog jumped over the brown fox just now
2 Hello
What I would like to do is to remove rows that have too short or too long strings. For example, I want to set a minimum of the strings to be 2 words and a maximum of 8 words. After filtering out the threshold, only ID#1 is returned.
ID Sentence
0 The cat is running away
Could anyone give me a suggestion on how to do this?
Solution
Hello this can be done by creating a new column that contains a count of the words then filter your df.
df["Sentence"]=df["Sentence"].str.split()
df["WordsCount"]=df["Sentence"].apply(lambda x: len(x))
df=df[(df["WordsCount"]>= 2)&(df["WordsCount"]<=8)]
Answered By – ahmed awada
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0