DAY 12
0
AI & Data

## 排序

### 對單一list排序

#### 預設從小到大(升序)

``````A = [1,3,5,1,2]
sorted(A)
``````

``````[1, 1, 2, 3, 5]
``````

#### 從大到小(降序)

``````A = [1,3,5,1,2]
sorted(A,reverse=True)
``````

``````[5, 3, 2, 1, 1]
``````

### 對複合資料進行排序

#### 預設按元素內順序依次排序

``````B = [("Alice",100), ("Bob", 97), ("Carol", 97), ("Bob", 95) ]
sorted(B)
``````

``````[('Alice', 100), ('Bob', 95), ('Bob', 97), ('Carol', 97)]
``````

#### 也可以指定排序規則或順序

``````# 指定排序元素
B = [("Alice",100), ("Bob", 97), ("Carol", 97), ("Bob", 95) ]
sorted(B,key=lambda x: x[1]) # x為list內每個元素，依照x[1]的元素進行比較
``````

``````[('Bob', 95), ('Bob', 97), ('Carol', 97), ('Alice', 100)]
``````
``````# 指定排序規則
C = ["Alice", "Bob", "Bob", "Bob", "Carol"]
sorted(C,key=len) # 依照長度進行排序
``````

``````['Bob', 'Bob', 'Bob', 'Alice', 'Carol']
``````

### `.sort()`和`sorted`差在哪裡？

#### sorted是複製一份進行排序

``````A = [1,3,5,1,2]
B = sorted(A)
print("A:", A)
print("B:", B)
``````
``````A: [1, 3, 5, 1, 2]
B: [1, 1, 2, 3, 5]
``````

#### sort()是直接在原list排序(所以沒有回傳)

``````A = [1,3,5,1,2]
B = A.sort()
print("A:", A)
print("B:", B)
``````
``````A: [1, 1, 2, 3, 5]
B: None
``````

## 自然排序

``````A = ["file"+str(x) for x in range(15)]
A
``````
``````['file0',
'file1',
'file2',
'file3',
'file4',
'file5',
'file6',
'file7',
'file8',
'file9',
'file10',
'file11',
'file12',
'file13',
'file14']
``````

### without自然排序——字串按照字典順序

``````sorted(A)
``````
``````['file0',
'file1',
'file10',
'file11',
'file12',
'file13',
'file14',
'file2',
'file3',
'file4',
'file5',
'file6',
'file7',
'file8',
'file9']
``````

### natsort——自然排序

``````pip install natsort
``````
``````from natsort import natsorted
natsorted(A)
``````
``````['file0',
'file1',
'file2',
'file3',
'file4',
'file5',
'file6',
'file7',
'file8',
'file9',
'file10',
'file11',
'file12',
'file13',
'file14']
``````

## DataFrame排序

``````import pandas as pd
df
``````

### 依照某欄位進行排序——使用`sort_values`

``````df.sort_values("平均值")
``````

### 依照多個欄位進行排序——`by=["欄位1","欄位2"]`

``````df.sort_values(by= ["標準差", "平均值"])
``````

### 依照多個欄位進行排序且一個升序一個降序——`ascending=["欄位1","欄位2"]`

``````df.sort_values(by= ["標準差", "平均值"], ascending=[True, False])
``````