Add the missing value from one dataframe column to another column using python pandas

Issue

I have two different excel files which I read using pd.readExcel. The first excel file is kind of a master file which has a lot of columns. showing only those columns which are relevant:
df1

Company Name                                              Excel Company ID
0                                    cleverbridge AG      IQ109133656
1  BT España, Compañía de Servicios Globales de T...        IQ3806173
2                                   Technoserv Group       IQ40333012
3                                    Blue Media S.A.       IQ50008102
4            zeb.rolfes.schierenbeck.associates gmbh       IQ30413992

and the second excel is basically an output excel file which looks like this:
df2

company_id          found_keywords  no_of_url                                       company_name
0  IQ137156215      insurance         15                         Zühlke Technology Group AG
1    IQ3806173      insurance         15  BT España, Compañía de Servicios Globales de T...
2   IQ40333012      insurance          4                                   Technoserv Group
3   IQ51614192      insurance         15                             Octo Telematics S.p.A.

I want this output excel file/ df2 to include those company_id and company name from df1 where company id and company name from df1 is not a part of df2. Something like this:
df2

company_id found_keywords  no_of_url                                       company_name
0  IQ137156215      insurance         15                         Zühlke Technology Group AG
1    IQ3806173      insurance         15  BT España, Compañía de Servicios Globales de T...
2   IQ40333012      insurance          4                                   Technoserv Group
3   IQ51614192      insurance         15                             Octo Telematics S.p.A.
4   IQ30413992      NaN               NaN              zeb.rolfes.schierenbeck.associates gmbh          

I tried several ways of achieveing this by using pd.merge as well as np.where but nothing worked out. What exactly do I need to do so that it works as expected.

Solution

How did you use merge? You can rename df1 columns and merge with how='outer':

df1.rename({'Company Name': 'company_name', 'Excel Company ID': 'company_id'}, axis=1, inplace=True)
print(df2.merge(df1, how='outer'))

Output:

    company_id found_keywords  no_of_url                                       company_name
0  IQ137156215      insurance       15.0                         Zühlke Technology Group AG
1    IQ3806173      insurance       15.0  BT España, Compañía de Servicios Globales de T...
2   IQ40333012      insurance        4.0                                   Technoserv Group
3   IQ51614192      insurance       15.0                             Octo Telematics S.p.A.
4  IQ109133656            NaN        NaN                                    cleverbridge AG
5   IQ50008102            NaN        NaN                                    Blue Media S.A.
6   IQ30413992            NaN        NaN            zeb.rolfes.schierenbeck.associates gmbh

Answered By – Tranbi

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published