I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.
What is the best way to merge these by index, but to not take two copies of currency and adj date.
Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.
df: currency adj_date data_col1 ... date cusip 2012-01-01 XSDP USD 2012-01-03 0.45 ... df2: currency adj_date data_col2 ... date cusip 2012-01-01 XSDP USD 2012-01-03 0.45 ...
If I do:
dfNew = merge(df, df2, left_index=True, right_index=True, how='outer')
dfNew: currency_x adj_date_x data_col2 ... currency_y adj_date_y date cusip 2012-01-01 XSDP USD 2012-01-03 0.45 USD 2012-01-03
You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.
cols_to_use = df2.columns.difference(df.columns)
Then perform the merge (note this is an index object but it has a handy
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')
This will avoid any columns clashing in the merge.
Answered By – EdChum