Question
I have a table like this:
- datas = {'CustID': ['A', 'B', 'C', 'A'], 'Purchase': ['Item1', 'Item2', 'Item1', 'Item2']}
- df = pd.DataFrame.from_dict(datas)
I would like to select rows with CustID appear more than 1 in the table.
How-To
This will work:
- counts = df['CustID'].value_counts()
- df[df['CustID'].isin(counts.index[counts > 1])]
Or below code will work too:
- display(df['CustID'].duplicated(keep=False))
- df[df['CustID'].duplicated(keep=False)]
This finds the rows in the data frame where there exist duplicates in the CustID column. The keep=False tells the duplicated function to mark all duplicate rows as True (as opposed to just the first or last ones)
沒有留言:
張貼留言