Remove rows of short/long strings from column pandas

Issue

I have a dataframe with a column of strings and a column of ID

>>> import pandas as pd
>>> df

ID   Sentence                           
0    The cat is running away
1    The lazy dog jumped over the brown fox just now 
2    Hello

What I would like to do is to remove rows that have too short or too long strings. For example, I want to set a minimum of the strings to be 2 words and a maximum of 8 words. After filtering out the threshold, only ID#1 is returned.

ID   Sentence                           
0    The cat is running away 

Could anyone give me a suggestion on how to do this?

Solution

Hello this can be done by creating a new column that contains a count of the words then filter your df.

 df["Sentence"]=df["Sentence"].str.split()
 df["WordsCount"]=df["Sentence"].apply(lambda x: len(x))
 df=df[(df["WordsCount"]>= 2)&(df["WordsCount"]<=8)]

Answered By – ahmed awada

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published