Creating new column from transformed existing column in Python

Issue

I have a dataset where I would like to create a new column derived from one of my existing columns. The column is created by extracting the values between the first and last colons.

Data

site        stat    crate   
AA - site 1 ok      AD1:00:AB5.30:100   
AA - site 1 ok      AD1:00:AB5.30:111   
A1 - site 2 fail    AD1:00:AB5.30:200   
AA - site 1 ok      AD1:00:AB5.30:555   
BB - site 8 fail    BB5:01:BA8.40:777   

Desired

site        stat    main_cr     crate
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:100
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:111
A1 - site 2 fail    00:AB5.30   AD1:00:AB5.30:200
AA - site 1 ok      00:AB5.30   AD1:00:AB5.30:555
BB - site 8 fail    01:BA8.40   BB5:01:BA8.40:777   

Doing

My approach is to use some form of regex or split.

df['main_cr'] = df['crate'].str.split(':').str[1:3]

Above is not working, as it provides the result

[00, AB5.30]

I would like to create a new column by extracting the values between the first and last colons of an existing column within my dataframe.

Any suggestion is helpful thank you

Solution

use pd.extract, and extract based on the regex pattern

df['main_cr']=df['crate'].str.extract(r':(\d{2}:.*):')
df

regex:
it matches the first occurrence of ":" followed by two digits, specified as {2}, followed by colon ":", and then any number of characters before the subsequent occurrence of colon ":". the expression in b/w the parenthesis is extract out.

    site        stat    crate               main_cr
AA - site 1     ok      AD1:00:AB5.30:100   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:111   00:AB5.30
A1 - site 2     fail    AD1:00:AB5.30:200   00:AB5.30
AA - site 1     ok      AD1:00:AB5.30:555   00:AB5.30
BB - site 8     fail    BB5:01:BA8.40:777   01:BA8.40

Answered By – Naveed

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published