Pandas Query for date

Issue

I was looking through the pandas.query documentation but couldn’t find anything specific about this.
Is it possible to perform a query on a date based on the closest date to the one given, instead of a specific date?

For example lets say we use the wine dataset and creates some random dates.

    import pandas as pd
    import numpy as np
    from sklearn import datasets
    dir(datasets)
    df = pd.DataFrame(datasets.load_wine().data)
    df.columns = datasets.load_wine().feature_names
    df.columns=df.columns.str.strip()
    
    
    
    
    def random_dates(start, end, n, unit='D'):
        ndays = (end - start).days + 1
        return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
    
    
    
    np.random.seed(0)
    start = pd.to_datetime('2015-01-01')
    end = pd.to_datetime('2022-01-01')
    datelist=random_dates(start, end, 178)

    df['Dates'] = datelist

if you perform a simple query on hue

df.query('hue == 0.6')

you’ll receive three rows with three random dates. Is it possible to pick the query result that’s closest to let’s say 2017-1-1?

so something like

df.query('hue==0.6').query('Date ~2017-1-1')

I hope this makes sense!

Solution

Given a series, find the entries closest to a given date:

def closest_to_date(series, date, n=5):
    date = pd.to_datetime(date)
    return abs(series - date).nsmallest(n)

Then we can use the index of the returned series to select further rows (or you change the api to suit you):

(df.loc[df.hue == 0.6]
 .loc[lambda df_: closest_to_date(df_.Dates, "2017-1-1", n=1).index]
)

Answered By – creanion

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published