Difference between df[df['col a']] and df['col a']?

Issue

I’m new at coding and feel like to really understand it, I have to truly grasp the concepts.

Quality of life edit:

Why do we do df[df[‘col a’]] == x? INSTEAD of df[‘col a’] == x? when making a search? I understand that on the second expression I would be looking at column names that equal X but I’d love to know what does the addition of making it a list (df[]) does for the code

I would love to know the difference between those two and what I am actually doing when I nest the column on a list.

any help is appreciated thank you so much!

Solution

In general, df[index] selects slices from a dataframe based on an index.

Pandas supports several different indexing methods. The expression in your question chains two of them together. First, the inner index df['col_a'] selects all values in column col_a. These are evaluated in a boolean expression that returns a series that is "masked" with True where the values in the column meet a condition and False elsewhere. The outer part then uses boolean indexing to select all rows in the entire dataframe that meet this condition.

Example:

df = pd.DataFrame({'column1': [0, 1, 2, 3, 4], 'column2': ['x', 'x', 'x', 'y', 'y']})

[In] df
[Out]
column1 column2
0       a       x
1       b       x
2       c       x
3       d       y
4       e       y

Selecting a single column:

[In] df['column2']
[Out] 
0    x
1    x
2    x
3    y
4    y
Name: column2, dtype: object

Creating a mask:

[In] df['column2'] == 'x'
[Out]
0     True
1     True
2     True
3    False
4    False
Name: column2, dtype: bool

Selecting all rows that have value x in column column2:

[In] df[df['column2'] == 'x']
[Out]
  column1 column2
0       a       x
1       b       x
2       c       x

Answered By – lua

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published