728x90

Python 48

[Python]Dataframe ํŠน์ • ๋ฌธ์ž ์น˜ํ™˜ replace()

- ๋ฌธ์ž ์น˜ํ™˜ , ์ง€์šฐ๊ธฐ df = df.replace(',', '') ,๋ฅผ !๋กœ ๋ณ€๊ฒฝ df = df.replace(',', '!') ๋ฌธ์ž ์ค‘๊ฐ„์— ์žˆ๋Š” ๊ฐ’ ๋ณ€๊ฒฝ : ๋ฌธ์ž ์‚ฌ์ด(ex: 123,125)์— ๋ณ€๊ฒฝํ•˜๊ธฐ ์›ํ•˜๋Š” ๊ฐ’์ด ์žˆ์„ ๊ฒฝ์šฐ regex option์„ True๋กœ ์„ค์ •ํ•ด์•ผ ๋ณ€๊ฒฝ๋จ. ,๋ฅผ *๋กœ ๋ณ€๊ฒฝ → regex option ์ฃผ๊ธฐ df = df.replace(',','*', regex = True)

Code/Python 2022.11.08

[Python]for๋ฌธ ์ „์—ญ ๋ณ€์ˆ˜ ์ผ๊ด„ ์ ์šฉ, globals()

for๋ฌธ ์ง„ํ–‰ ์‹œ ์ „์—ญ ๋ณ€์ˆ˜๋ฅผ ์ผ๊ด„๋กœ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ €๋Š” ์ฃผ๋กœ ํŠน์ • ๋ฌธ์ž๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ณ€์ˆ˜๋“ค๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ €, titanic ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split, StratifiedKFold import pandas as pd # data load df = pd.read_csv('train.csv') df.head() ์ด์ œ, ์ „์—ญ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” A, P, C๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ณ€์ˆ˜๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ..

Code/Python 2022.11.04

[Python] list ํ•ฉ์ง‘ํ•ฉ, ๊ต์ง‘ํ•ฉ, ์ฐจ์ง‘ํ•ฉ, ๋Œ€์นญ์ฐจ์ง‘ํ•ฉ

๋ฐ์ดํ„ฐ ์›์†Œ li1 = ['A', 'B', 'C', 'D'] li2 = ['C', 'D', 'E', 'F'] 1) ํ•ฉ์ง‘ํ•ฉ union = list(set(li1 + li2)) print(union) union1 = list(set(li1) | set(li2)) print(union1) union2 = list(set().union(li1, li2)) print(union2) 2) ๊ต์ง‘ํ•ฉ inter = list(set(li1) & set(li2)) print(inter) inter1= list(set(li1).intersection(li2)) print(inter1) 3) ์ฐจ์ง‘ํ•ฉ comp = list(set(li1) - set(li2)) print(comp) comp1 = list(set(li1).diffe..

Code/Python 2022.11.03

[Python]DataFrame ์ •๋ ฌ, sort_values() / sort_index(), ๋‹ค์ค‘ ์ •๋ ฌ

DataFrame์„ Data ๊ธฐ์ค€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. - sort_values() titanic ๋ฐ์ดํ„ฐ๋ฅผ load ํ•ด์ค๋‹ˆ๋‹ค. import pandas as pd # data load df = pd.read_csv('train.csv') Pclass ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. option์œผ๋กœ by๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์ค€ ์ปฌ๋Ÿผ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. df.sort_values(by = 'Pclass') Pclass ๊ธฐ์ค€์œผ๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌ์ด ๋์Šต๋‹ˆ๋‹ค. Pclass ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ์„ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ascending = False๋กœ ์„ค์ •ํ•˜์—ฌ ๋‚ด๋ฆผ์ฐจ์ˆœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. df.sort_values(by = 'Pclass', ascending = False) - ๋‹ค์ค‘ ์ •๋ ฌ by option์„ ํ†ตํ•ด ๋‹ค์ค‘ ์ •๋ ฌ๋„ ..

Code/Python 2022.11.02

[Python]numpy ๋ฐฐ์—ด ์ €์žฅ ๋ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ, Value Error ์ˆ˜์ •

1๊ฐœ์˜ ๋ฐฐ์—ด์„ binaryํ˜•ํƒœ๋กœ ์ €์žฅ import numpy as np li = ['a', 'b', 'c', 'd'] np.save('filename.npy', li) ์ €์žฅํ•œ ๋ฐฐ์—ด ๋ถˆ๋Ÿฌ์˜ค๊ธฐ li = np.load('filename.npy') ValueError: Object arrays cannot be loaded when allow_pickle=False๊ฐ€ ๋œฌ๋‹ค๋ฉด (allow_pickle = True) option์„ ๋„ฃ์œผ๋ฉด ํ•ด๊ฒฐ li = np.load('filename.npy', allow_pickle = True)

Code/Python 2022.11.01

[Python]์ค‘๋ณต๊ฐ’ ํ™•์ธ(๋ฐ์ดํ„ฐ๊ฐ€ ๋™์ผํ•œ row, column ์ฐพ๊ธฐ)

- ๋™์ผํ•œ row ์ฐพ๊ธฐ row ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๋ณต๋œ row๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. df[df.duplicated()] *option์œผ๋กœ keep = ['first', 'last', 'False']๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. first: ์ค‘๋ณต๋œ row ์ค‘ ์ฒซ๋ฒˆ์งธ row๋ฅผ ๋‚จ๊น๋‹ˆ๋‹ค. last: ์ค‘๋ณต๋œ row ์ค‘ ๋งˆ์ง€๋ง‰ row๋ฅผ ๋‚จ๊น๋‹ˆ๋‹ค. False: ์ค‘๋ณต๋œ row ์ „์ฒด๋ฅผ ๋‚จ๊น๋‹ˆ๋‹ค. - ๋™์ผํ•œ column ์ฐพ๊ธฐ ํ•˜์ง€๋งŒ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ๊ฐ’์„ ๊ฐ€์ง„ ์ปฌ๋Ÿผ์„ ์ฐพ๋Š” ๊ฒƒ์€ ๋ณ„๋„๋กœ ํ•จ์ˆ˜๊ฐ€ ์—†๊ธฐ๋•Œ๋ฌธ์— ์ „์น˜๋ฅผ ํ•ด์ค€ ํ›„ DataFrame.duplicated() ํ†ตํ•ด ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. import pandas as pd import numpy as np # ์ „์น˜๋Š” .T ๋˜๋Š” np.transpose()๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด..

Code/Python 2022.10.31

[Python]ํŠน์ • ๋ฌธ์ž์—ด์ด ํฌํ•จ๋œ column ํ•„ํ„ฐ๋ง

Python ์ฒซ ๊ฒŒ์‹œ๊ธ€์„ ๋ฌด์—‡์œผ๋กœ ํ• ๊นŒ, ๊ณ ๋ฏผํ•˜๋‹ค๊ฐ€ ์ œ๊ฐ€ ๊ฐ€์žฅ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ๋Š” ์ฝ”๋“œ๋กœ ์ฒซ ๊ธ€์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋”ฐ๊ธˆ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋ฉด์„œ ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์„ ๋•Œ, ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ƒ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด์„œ ์ปฌ๋Ÿผ๋“ค์„ ํ•œ ๋ฒˆ์— ํŒŒ์•…ํ•˜๊ธฐ ํž˜๋“ค ๋•Œ ์‚ฌ์šฉํ•˜๊ธฐ ํŽธ๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € titanic ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€ ์ค๋‹ˆ๋‹ค. import pandas as pd df = pd.read_csv('train.csv') df.head() ํƒ€์ดํƒ€๋‹‰์˜ ์ปฌ๋Ÿผ๋“ค์„ ํ™•์ธํ•  ๋•Œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. df.columns ์ด๋ ‡๊ฒŒ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ „์ฒด ์ปฌ๋Ÿผ ๊ฒฐ๊ณผ๋งŒ ๋‚˜์˜ต๋‹ˆ๋‹ค. ํŠน์ • ์ปฌ๋Ÿผ๋ช…๋งŒ ํ™•์ธํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. class๊ฐ€ ํฌํ•จ๋œ ์ปฌ๋Ÿผ๋ช…์„ ์ฐพ๊ฒ ์Šต๋‹ˆ๋‹ค. df.columns[df.columns.str.contains('class')] ๋งŒ..

Code/Python 2022.10.31
728x90