Skip to content

df.ix[] inconsistency between axis for MultiIndex #2904

@lodagro

Description

@lodagro
Contributor
In [42]: from itertools import product

In [43]: import pandas as pd

In [44]: import numpy as np

In [45]: index = pd.MultiIndex.from_tuples([t for t in product([10, 20, 30], ['a', 'b'])])

In [46]: df = pd.DataFrame(np.random.randn(6, 6), index, index)

In [47]: df
Out[47]:
            10                  20                  30
             a         b         a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
   b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894
20 a -0.401678  0.481390 -1.080735  0.621861  1.410940 -1.106015
   b -0.504422 -1.555415 -0.023859  0.211287 -0.321643  0.140895
30 a -0.118969 -0.432082 -0.888786  1.167191 -1.642356 -0.281661
   b -0.580182  2.920769 -0.685617  1.327784  0.691514 -0.692361

Slicing ranges is consistent between both axis.

In [48]: df.ix[10:20, :]
Out[48]:
            10                  20                  30
             a         b         a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
   b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894
20 a -0.401678  0.481390 -1.080735  0.621861  1.410940 -1.106015
   b -0.504422 -1.555415 -0.023859  0.211287 -0.321643  0.140895

In [49]: df.ix[:, 10:20]
Out[49]:
            10                  20
             a         b         a         b
10 a  0.077368  0.360018  0.649403 -0.221877
   b  0.890805 -2.142297  0.758411 -1.650710
20 a -0.401678  0.481390 -1.080735  0.621861
   b -0.504422 -1.555415 -0.023859  0.211287
30 a -0.118969 -0.432082 -0.888786  1.167191
   b -0.580182  2.920769 -0.685617  1.327784

This is inconsistent to me:

In [50]: df.ix[10, :]
Out[50]:
         10                  20                  30
          a         b         a         b         a         b
a  0.077368  0.360018  0.649403 -0.221877 -1.527411  0.485647
b  0.890805 -2.142297  0.758411 -1.650710  0.041276 -0.040894

In [51]: df.ix[:, 10]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
...
IndexError: index out of bounds

and this also

In [52]: df.ix[0, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 0

In [53]: df.ix[:, 0]
Out[53]:
10  a    0.077368
    b    0.890805
20  a   -0.401678
    b   -0.504422
30  a   -0.118969
    b   -0.580182
Name: (10, a), Dtype: float64

Activity

jreback

jreback commented on Mar 6, 2013

@jreback
Contributor

i think with #2922, these make more sense (obviously .ix unchanged, but users have a choice to use non-ambiguous selectors instead)...

eg.

In [8]: df
Out[8]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151
30 a -0.361386  0.172049  0.663303  0.545051 -1.071491 -0.144815
   b  1.339875 -0.831864  0.742964  1.297208  0.719399 -0.488385

# this treats the 10 like a label
In [9]: df.loc[10,:]
Out[9]: 
         10                  20                  30          
          a         b         a         b         a         b
a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424

# this treats the 10 like a label
In [10]: df.loc[:,10]
Out[10]: 
             a         b
10 a -0.969812  0.892646
   b  0.757901  0.441437
20 a -1.025012 -0.588379
   b -0.527607  0.897994
30 a -0.361386  0.172049
   b  1.339875 -0.831864

# slices are INCLUSIVE since these are labels
In [11]: df.loc[10:20,:]
Out[11]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564 -0.415579 -1.863745
   b  0.757901  0.441437  0.049145  0.903316 -0.606608  1.106424
20 a -1.025012 -0.588379 -0.011694  0.005748  1.149368 -1.557020
   b -0.527607  0.897994 -1.043933 -1.200322  0.056026 -2.562151

# same here
In [12]: df.loc[:,10:20]
Out[12]: 
            10                  20          
             a         b         a         b
10 a -0.969812  0.892646 -0.098479  1.416564
   b  0.757901  0.441437  0.049145  0.903316
20 a -1.025012 -0.588379 -0.011694  0.005748
   b -0.527607  0.897994 -1.043933 -1.200322
30 a -0.361386  0.172049  0.663303  0.545051
   b  1.339875 -0.831864  0.742964  1.297208

# positional slicing
In [13]: df.iloc[0,:]
Out[13]: 
10  a   -0.969812
    b    0.892646
20  a   -0.098479
    b    1.416564
30  a   -0.415579
    b   -1.863745
Name: (10, a), dtype: float64

# same
In [14]: df.iloc[:,0]
Out[14]: 
10  a   -0.969812
    b    0.757901
20  a   -1.025012
    b   -0.527607
30  a   -0.361386
    b    1.339875
Name: (10, a), dtype: float64
lodagro

lodagro commented on Mar 6, 2013

@lodagro
ContributorAuthor

Hmm, i clearly did not follow the thread in #2922 close enough, since i am surprised by failure of df.loc[10:20, :], some catching up to do :-)

jreback

jreback commented on Mar 14, 2013

@jreback
Contributor

close this? going to add to cookbook in any event

lodagro

lodagro commented on Mar 14, 2013

@lodagro
ContributorAuthor

You prefer to close this since .ix is old stuff now, and no plans to change this?

On the above DataFrame df.loc[10, ] and df.loc[:, 10] (in contrast to ix) work fine, however slicing on an integer MultiIndex level does not, as you already indicated (would that require a seperate issue?).

jreback

jreback commented on Mar 14, 2013

@jreback
Contributor

your example probably SHOULD work, but ix is quite tricky, I am not sure there are plans to change/fix it. could certainly bump this to 0.12. if you would like

slicing does work on integer multi-index just respects labels or positions depending on what you choose. Your example in this issue is good at showing he ambiguity!

am I missing something?

lodagro

lodagro commented on Mar 14, 2013

@lodagro
ContributorAuthor

Ok, we agree that both df.ix[10, :] and df.ix[:, 10] should work. For me it is even fine to bump this to some day, i can work around it just fine, it is just something i noted and thought could be improved.

The label slicing with loc is something else, i don not think i am missing between label and position, it is all label ... however

In [38]: df
Out[38]: 
            10                  20                  30          
             a         b         a         b         a         b
10 a -0.799097 -0.450663 -0.003029  0.340621 -1.248213 -0.900263
   b -0.049115 -1.540385 -0.299996 -3.520201 -0.631406  1.036550
20 a -1.051028 -0.952631  2.114734 -0.285703 -1.346419  0.791299
   b -1.225570  1.063159  0.731514 -0.153996  0.382094  0.797084
30 a -1.176216  1.235405 -0.226777  0.852648  2.481304  0.587310
   b  1.786893 -0.042711  0.742734 -0.041659  2.544889  0.558397

In [40]: df.loc[20:30, :]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
...
KeyError: 'stop bound [29] is not in the [index]'

compare this to

In [43]: df
Out[43]: 
            5                   6                   7          
            a         b         a         b         a         b
5 a -1.312814 -0.839775  0.812328  0.041647  0.231441  0.439760
  b -0.102015  2.163313 -0.489461  0.931466  1.168450  1.134386
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877

In [44]: df.loc[6:7, :]
Out[44]: 
            5                   6                   7          
            a         b         a         b         a         b
6 a -0.173297 -0.319528  0.546089 -0.392548  1.034875  1.825187
  b  1.201444 -0.195438  0.762748 -0.880005 -0.247503 -0.589713
7 a  0.310798 -0.556815  0.355492 -1.554151  0.677812 -1.798690
  b -0.871106 -0.932847  0.678469 -1.226688  0.595985 -0.738877
jreback

jreback commented on Mar 14, 2013

@jreback
Contributor

i think you are right, I am treating the integer slice and expanding it to the integers in the range rather than the labels, so your first example should work, I will file a bug on this. note that it will be an INCLUSIVE range because these are labels

jreback

jreback commented on Mar 14, 2013

@jreback
Contributor

@lodagro I updated the example...thanks for the catch!

jreback

jreback commented on Dec 18, 2013

@jreback
Contributor

@lodagro this should be closed by #3055 right?

lodagro

lodagro commented on Dec 19, 2013

@lodagro
ContributorAuthor

@jreback We discussed in fact two issues here. One being the ix inconsistency (the reason why this issue was opened), the other one loc failure on integer slices (which are labels iso positions). Loc one is resolved, ix seems to be the same.

TomAugspurger

TomAugspurger commented on Jan 19, 2017

@TomAugspurger
Contributor

Closing due to deprecation in #15113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jreback@lodagro@TomAugspurger

        Issue actions

          df.ix[] inconsistency between axis for MultiIndex · Issue #2904 · pandas-dev/pandas