Issue
Hi there stack overflow community,
I have the following dataframe in an excel:
sparte sparten status stati gesellschaft gesellschaften
10 Krankenvoll B beantragt 0 - Allgemein -
11 Reisekranken A aktiv 10000 nordinvest
12 Krankenkasse N beitragsfrei M552D SV SparkassenVersicherung
and the folliwing column for merging in a csv:
sparten status gesellschaft
10 B 0
11 A 10000
12 N M552D
to merge some columns from an excel and a csv file I’m using the following code:
df1 = pd.read_csv(r'path', sep=',').drop(columns = ['risiko'])
df2 = pd.read_excel(r'path')
df3 = pd.merge(df1,df2[['status','stati']],on='status', how='left').drop(columns = ['status'])
df4 = df3.merge(df2[['sparte','sparten']],on='sparte', how='left').drop(columns = ['sparte'])
It works fine for me, but now i want to me merge the following column:
df4 = df3.merge(df2[['gesellschaft','gesellschaften']],on='gesellschaft', how='left')
print(df4)
…and it does not work. It merges only the cells with this format M552D
, but leaves the cells with numbers untouched. I don’t understand what I’m doing wrong. If I try to put how='right'
the merge works, but the other columns disappear.
Maybe someone has an idea what is happening here! Thanks for any hint!
Solution
The problem is that the geselschaft
column contains only strings in df1
which is loaded with read_csv
, because the column is not fully numeric. But in df2
which is loaded with read_excel
, it contains a mix of int and string values. And at Pandas level and int and a string cannot be equal.
A possible workaround is to force a string conversion at merge time:
df4 = df3.merge(df2[['gesellschaft','gesellschaften']], left_on='gesellschaft',
right_on = df2['gesellschaft'].astype('str'), how='left')
Answered By – Serge Ballesta
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0