Accessing dictionaries when having dictionary dataframe values

Issue

The problem

I am facing a problem as I am managing a dataset each of which entry has associated a dictionary in the form of

dictionary = {
    'Step_1': {
        'Q':'123', 
        'W':'456', 
        'E':'789'
    },
    'Step_2': {
        'Q':'753', 
        'W':'159', 
        'E':'888'
    }
}

please note that the dicts have a variable number of Steps
So I am organising the data into a pandas df like:

                                        dicts
0    {'Step_1': {'Q': '123', 'W': '456', ...
1    {'Step_1': {'Q': '123', 'W': '456', ...
2    {'Step_1': {'Q': '123', 'W': '456', ...

and would like now to do some row-wise operations, like getting each dict['Step_1']['Q'] value.

I know that it’s generally suggested to not work with dicts as df values, so I’d like to use a good, pythonic (read: fast) solution.

How would you proceed to get each dict['Step_1']['Q'] row-wise?

What I tried

A simple solution that came to my mind was:
df[dicts]['Step_1']['Q'], but it doesn’t seem to work. (Why? Might it be because this way pandas doesn’t "unpack" the row values, hence cannot access the dicts?)

A more complex solution that I found to work is to use a function to access the data, as follows:

def access(x): 
    return (x["Step_1"]["V"])

df['new_col'] = df['dicts'].apply(lambda x: access(x))

but I don’t quite like this solution. As far as I know, using the apply method is not the optimal way to tackle the problem.

Solution

I think you should reshape your dataset. Check this out:

# Let's say we have this
dictionary = {
    "Step_1": {"Q":"123", "W":"456", "E":"789"},
    "Step_2": {"Q":"753", "W":"159", "E":"888"},
}
dicts = [dictionary, dictionary] # This would be your dataset

Do this:

better = []
for i, d in enumerate(dicts):  # d is a dictionary
    # Iterate over the keys and values of the dictionary
    for k, v in d.items():
        # Get the step from the key
        step = k.split("_")[1]
        # Add this step to the new list
        better.append({"id": i, "step": step, **v})

This is what better is now:

[{'id': 0, 'step': '1', 'W': '456', 'Q': '123', 'E': '789'},
 {'id': 0, 'step': '2', 'W': '159', 'E': '888', 'Q': '753'},
 {'id': 1, 'step': '1', 'W': '456', 'Q': '123', 'E': '789'},
 {'id': 1, 'step': '2', 'W': '159', 'E': '888', 'Q': '753'}]

Now you can re-build your DataFrame and perform all kind of row-wise operations:

df = pd.DataFrame(better)

    id  step W    Q    E
0   0   1    456  123  789
1   0   2    159  753  888
2   1   1    456  123  789
3   1   2    159  753  888

For example, get all of the "Step 1 W" values:

df[df.step == "1"].W
# Output:
0    456
2    456
Name: W, dtype: object

Note: it’s probably a good idea to turn the columns to ints, right now they are stored as strs

Answered By – 965311532

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published