Python Pandas 基本操作教學_成績表

張凱喬

14 min readJun 8, 2018

最近在學著用Pandas處理一些資料
不用還好，一用驚人! 實在太猛了

不過網路上已經太多Pandas介紹
(重點是中文的文章很豐富，不用再練英文了)
提供參考

這邊就先帶一下受益良多的學習資源

Tag 列表頁 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

iT 邦幫忙是 IT 領域的技術問答與分享社群，透過 IT 人互相幫忙，一起解決每天面臨的靠北時刻。一起來當 IT 人的超級英雄吧，拯救下一個卡關的 IT 人

ithelp.ithome.com.tw

[資料分析&機器學習] 第2.3講：Pandas 基本function介紹(Series, DataFrame, Selection, Grouping)

今天要介紹的是Pandas的基本教學，在2.1講之中我們利用sklearn匯入iris的資料並利用一些簡單的處理，將dictionary的格式轉成pandas，並利用head(3)的方式顯示出資料前3幾筆，以避免顯示資料量過大畫面顯得凌亂。…

medium.com

莫烦Python

零基础入门机器学习不是一件困难的事. 机器学习或者深度学习本来可以很简单, 很多时候我们不必要花特别多的经历在复杂的数学上. 数学只是一种达成目的的工具, 很多时候我们只要知道这个工具怎么用就好了…

morvanzhou.github.io

本文就只是show off一下pandas的好
順便讓我整理一下我最近用過的指令

Series就是序列，只會儲存像是key:value的型態
所以其實比較少用，大部分談到pandas都會用DataFrame

不過還是一定要知道，因為DataFrame有些時候會轉成Series
而這兩個格式有些語法並不互通，最後會舉個例子說明

import pandas as pdscores = pd.Series({'小明':90, '小華':80, '小李':70})
#Series簡單的創建方法，直接給一個dictionary#新增資料的方法，有點像是陣列
scores['小強'] = 55

據說pd已經成為標準縮寫了，不可以亂換成pds之類的XD

print(scores.describe())
#可以印出一些平均、標準差等資料print(scores.mean())
#也可以直接指定，求平均

其實pandas可以接很多東西當index

舉例來說

c = ['apple','banana','cat','dog'] #陣列一
d = [123,456,789,56789] #陣列二sr = pd.Series(d,index = c) # 指定c為index#提供時間變成index如何？f=[1,2,3,4,5,6,7,8,9]
sr_2 = pd.Series(f, pd.date_range(start='2018-05-01', end='2018-05-9'))
#pd.date_range是內建的日期序列產生器
#如範例，這樣你的index就是2018-05-01～2018-05-09 共九天

OK 再來就是厲害的了
可以直接對於Series物件作大小判斷
pandas會幫你輸出每個欄位的true or false

scores = pd.Series({'小明':90, '小華':80, '小李':70, '小強':55})#簡單的運算
print(scores > 60)
#以Series的方式 輸出 True or False
#>> 小明     True
#>> 小華     True
#>> 小李     True
#>> 小強    False#比較常用的做法是 直接篩選(不符合的資料就不會顯示出來)
print(scores[scores > 60])
#>> 小明    90
#>> 小華    80
#>> 小李    70
#小強沒及格所以不會出現在資料裡

而且可以擴充到多個條件

print((scores>60) & (scores<90))
#>> 小明    False
#>> 小華     True
#>> 小李     True
#>> 小強    Falseprint(scores[(scores>60) & (scores<90)])
#>> 小華    80
#>> 小李    70

直接計算開根號乘以10也沒有問題

new_scores = scores**0.5*10
print(new_scores)#>> 小明    94.868330
#>> 小華    89.442719
#>> 小李    83.666003
#>> 小強    74.161985

一條序列可以處理的資料很有限
此時Dataframe就是主角了

這邊介紹一下怎麼建立Dataframe

#最基本的 array+dict 可直接輸入
scores = [{"姓名":"小華","數學":90, "國文":80},
          {"姓名":"小明","數學":70, "國文":55},
          {"姓名":"小李","數學":45, "國文":75}]
score_df = pd.DataFrame(scores)#純字典格式->利用from_dict功能
scores = {"姓名":["小華","小明","小李"],
          "國文":[80,55,75],
          "數學":[90,70,45]}
score_df = pd.DataFrame.from_dict(scores)#兩個都會產生一樣的dataframe
#>>    姓名  國文  數學
#>> 0  小華  80  90
#>> 1  小明  55  70
#>> 2  小李  75  45

其他的方法

小華 = {'數學':90, '國文':80}
小明 = {'數學':70, '國文':55}
小李 = {'數學':45, '國文':75}
df = pd.DataFrame.from_dict([小華,小明,小李])#>>    國文  數學
#>> 0  80  90
#>> 1  55  70
#>> 2  75  45#增加一列的方法
df['姓名'] = ['小華','小明','小李']
#這個時候就跟上面一樣了
print(df)
#>>    國文  數學  姓名
#>> 0  80  90  小華
#>> 1  55  70  小明
#>> 2  75  45  小李

如果要把原本的Series轉成Dataframe也可以～

scores = pd.Series({'小明':90, '小華':80, '小李':70, '小強':55})
#把Series轉成Dataframe只需要使用指令就好
score_df = scores.to_frame() #搞定

參考

Creating Pandas DataFrames from Lists and Dictionaries - Practical Business Python

Pandas offer several options to create DataFrames from lists or dictionaries

pbpython.com

如果發現資料讀進來的時候
方向錯了，需要轉置

基本上就是兩個方法
1. 用from_dict建df的時候設定參數orient=”index” (預設columns)
2. 建完之後使用.transpose()方法

再來認識index，index就是剛顯示在左邊那個0,1,2,3
之所以我們要使用index的目的，就是在於快速地呼叫特定列

index的設定方法

scores = [{"姓名":"小華","數學":90, "國文":80},
          {"姓名":"小明","數學":70, "國文":55},
          {"姓名":"小李","數學":45, "國文":75}]
score_df = pd.DataFrame(scores)score_df.set_index('姓名', inplace=True)
#透過set_index方法，使姓名這個欄位變成index
#inplace就是產生新的df是否取代舊的df，預設是False
#等同於 score_df = score_df.set_index('姓名')#>>      國文  數學
#>> 姓名        
#>> 小華  80  90
#>> 小明  55  70
#>> 小李  75  45#如果要改index的話
score_df.index = ['張小華','吳小明','李小李']

另外介紹一個raname方法
因為可以同時用在index&columns
且是用dictionary對應的方式來寫，不容易出錯

df1.rename(index={"小華": "張曉華","小明":"吳小明","小李":"李小李"})

Rename Multiple pandas Dataframe Column Names

Rename multiple pandas dataframe column names.

chrisalbon.com

再來是基本的欄與列的選取
這邊可能會有點confuse，不過操作過幾次應該就還好

#欄位選取就跟陣列一樣
score_df['國文']
#然後就可以做大小比較、加減乘除了
score_df['國文']>60
score_df['國文']+10

選取列的話就介紹十分常見的loc、iloc
這兩個一定要會用而且要會靈活運用

#loc指向的是index的名稱
score_df.loc["張小華"] #選取張小華的資料
#iloc指向的是index的位置
score_df.iloc[0]#也可以選取範圍
score_df.loc['吳小明':] #就是選吳小明之後的資料
score_df.iloc[1:]
score_df.iloc[-1] #選取最後一筆資料

loc、iloc在series也可以用
都可以用來選取列的資料

只是差別在dataframe時
除了index之外也可以挑columns

score_df.loc["張小華",['國文','數學']] #選取張小華的國文與數學
score_df.loc[:,['國文']] #就是選所有人的國文分數

Indexing and Selecting Data - pandas 0.22.0 documentation

In this section, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of…

pandas.pydata.org

挑某一個位置的值可以這樣寫:
score_df.loc[‘吳小明’][‘國文’]

最後學一下挑最小最大值
有幾個方法，min()、idxmin()、max()、idxmax()
很明顯就是最小最大值與最小最大值的index

score_df.iloc['張小華'].idxmin() #張小華考最差的科目
score_df.iloc['張小華'].min() #張小華考最差的分數score_df['國文'].idxmin() #國文最差的學生名稱
score_df['國文'].min() #國文最差的分數#當然不只這樣
score_df.idxmin() #每個科目考最差的學生
score_df.transpose().idxmin() #每個學生考最差的科目

好，再來就要完成這個成績排序表

我們會設定一個情境，就是假設班級導師
收到了不同科目老師的成績表，他想要先看一下每科的排序
然後再把分數加總起來，再看總分的排序

所以我們會有三個科目的成績
假設是來自不同老師的資料，所以會有三條Series
(讀取資料的方式最後補充，現在先手動建立）

#我們先讀取成績
math = pd.Series({'小明':90, '小華':80, '小李':70, '小強':55})
chinese = pd.Series({'小明':86, '小華':76, '小李':58, '小強':92})
english = pd.Series({'小明':0, '小華':45, '小李':69, '小強':32})

因為我們要看每個科目的排序，所以要多一列
但是Series 的值只有一列，所以我們要用Dataframe
轉換之後，再新增一列來儲存排序這個數值

#寫一個通用的方式，讓每條Series都可以用
def have_rank(sr,subject): #傳入series與科目名稱
  new_sr= sr.to_frame() #轉成dataframe
  new_sr= new_sr.rename(columns={0: subject}) #給column命名
  new_sr[subject+'_rank'] = new_sr[subject].rank(method='min',ascending=False)
  #增加一個新列，並且用pd內建的rank這個功能來產生排名
  return new_sr #最後傳回這個dataframe

關於rank這個工具

pandas.DataFrame.rank - pandas 0.22.0 documentation

Edit description

pandas.pydata.org

裡面的method就是在說同排名是要取最小值、最大值還是平均
然後ascending很明顯就是處理昇序或降序

所以接下來就是把這幾組dataframe都run function
然後合併起來就是完整的成績表

關於合併，也是一門大學問
先上code

#處理這三個科目
math = have_rank(math, '數學')
chinese = have_rank(chinese, '國語')
english = have_rank(english, '英文')sum_table = pd.concat([math,chinese,english], axis=1) #合併

合併我們這邊用concat，基本上如果index處理好的話
我這邊就不需要多設定什麼參數，concat會幫我自動對應

如果對於SQL熟悉的話，可以看看merge跟join
以下參考資料

Merge, join, and concatenate - pandas 0.23.0 documentation

pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various…

pandas.pydata.org

Join And Merge Pandas Dataframe

"Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where…

chrisalbon.com

[筆記] pandas 用法 (2) 讀寫檔合併 concat merge 圖表

pandas 用法 (2) * 本篇資料來源為莫煩 python: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/ pandas 讀寫資料檔案首先準備一個…

violin-tao.blogspot.com

合併完，再加總分數然後給總排名再排序整個表

sum_table['總分'] = sum_table['數學']+sum_table['國語']+sum_table['英文']
#加總
sum_table['總排名'] = sum_table['總分'].rank(method='min',ascending=False)
#Rank功能
sum_table = sum_table.sort_values('總排名')
#sort_values

這樣就產生了我們要的成績排序表～

(terminal輸出有點歪歪的，下面示範怎麼輸出excel檔案)

最後讀取跟輸出都很簡單
pandas都幫你包好了

sum_table.to_excel('table.xls') #這樣就好了

其他滿常見的讀取、輸出方式，例如這些：

.read_csv
.read_sql
.read_excel
.read_html
(有read 就有 to，以下自行擴充)
(如果亂碼就要注意編碼的問題)

最後的最後
補充一個東西是axis，很常在pd內建方法裡面看到

關於Python 中 axis

有鑑於這個axis(軸)的概念實在太常搞混和忘記了特別為他做了一個筆記首先我先來創一個表格 import pandas as pd import numpy as np a =…

itselementary993.wordpress.com

好，pandas初體驗先到這邊
ㄅㄅ

Python Pandas 基本操作教學_成績表

Tag 列表頁 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

iT 邦幫忙是 IT 領域的技術問答與分享社群，透過 IT 人互相幫忙，一起解決每天面臨的靠北時刻。一起來當 IT 人的超級英雄吧，拯救下一個卡關的 IT 人

[資料分析&機器學習] 第2.3講：Pandas 基本function介紹(Series, DataFrame, Selection, Grouping)

今天要介紹的是Pandas的基本教學，在2.1講之中我們利用sklearn匯入iris的資料並利用一些簡單的處理，將dictionary的格式轉成pandas，並利用head(3)的方式顯示出資料前3幾筆，以避免顯示資料量過大畫面顯得凌亂。…

莫烦Python

零基础入门机器学习不是一件困难的事. 机器学习或者深度学习本来可以很简单, 很多时候我们不必要花特别多的经历在复杂的数学上. 数学只是一种达成目的的工具, 很多时候我们只要知道这个工具怎么用就好了…

Creating Pandas DataFrames from Lists and Dictionaries - Practical Business Python

Pandas offer several options to create DataFrames from lists or dictionaries

Rename Multiple pandas Dataframe Column Names

Rename multiple pandas dataframe column names.

Indexing and Selecting Data - pandas 0.22.0 documentation

In this section, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of…

pandas.DataFrame.rank - pandas 0.22.0 documentation

Edit description

Merge, join, and concatenate - pandas 0.23.0 documentation

pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various…

Join And Merge Pandas Dataframe

"Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where…

[筆記] pandas 用法 (2) 讀寫檔合併 concat merge 圖表

pandas 用法 (2) * 本篇資料來源為莫煩 python: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/ pandas 讀寫資料檔案首先準備一個…

關於Python 中 axis

有鑑於這個axis(軸)的概念實在太常搞混和忘記了特別為他做了一個筆記首先我先來創一個表格 import pandas as pd import numpy as np a =…

Written by 張凱喬

Responses (1)

Python Pandas 基本操作教學_成績表

Tag 列表頁 - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

iT 邦幫忙是 IT 領域的技術問答與分享社群，透過 IT 人互相幫忙，一起解決每天面臨的靠北時刻。一起來當 IT 人的超級英雄吧，拯救下一個卡關的 IT 人

[資料分析&機器學習] 第2.3講：Pandas 基本function介紹(Series, DataFrame, Selection, Grouping)

今天要介紹的是Pandas的基本教學，在2.1講之中我們利用sklearn匯入iris的資料並利用一些簡單的處理，將dictionary的格式轉成pandas，並利用head(3)的方式顯示出資料前3幾筆，以避免顯示資料量過大畫面顯得凌亂。…

莫烦Python

零基础入门机器学习不是一件困难的事. 机器学习或者深度学习本来可以很简单, 很多时候我们不必要花特别多的经历在复杂的数学上. 数学只是一种达成目的的工具, 很多时候我们只要知道这个工具怎么用就好了…

Creating Pandas DataFrames from Lists and Dictionaries - Practical Business Python

Pandas offer several options to create DataFrames from lists or dictionaries

Rename Multiple pandas Dataframe Column Names

Rename multiple pandas dataframe column names.

Indexing and Selecting Data - pandas 0.22.0 documentation

In this section, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of…

pandas.DataFrame.rank - pandas 0.22.0 documentation

Edit description

Merge, join, and concatenate - pandas 0.23.0 documentation

pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various…

Join And Merge Pandas Dataframe

"Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where…

[筆記] pandas 用法 (2) 讀寫檔 合併 concat merge 圖表

pandas 用法 (2) * 本篇資料來源為莫煩 python: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/ pandas 讀寫資料檔案 首先準備一個…

關於Python 中 axis

有鑑於這個axis(軸)的概念實在太常搞混和忘記了 特別為他做了一個筆記 首先 我先來創一個表格 import pandas as pd import numpy as np a =…

Written by 張凱喬

Responses (1)

[筆記] pandas 用法 (2) 讀寫檔合併 concat merge 圖表

pandas 用法 (2) * 本篇資料來源為莫煩 python: https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/ pandas 讀寫資料檔案首先準備一個…

有鑑於這個axis(軸)的概念實在太常搞混和忘記了特別為他做了一個筆記首先我先來創一個表格 import pandas as pd import numpy as np a =…