Nacho

Chapter 07. Python Pandas 기초(1) 본문

Python

Chapter 07. Python Pandas 기초(1)

Nacho_13 2024. 2. 23. 20:41
반응형

자 pandas 를 복습해보자.

# 라이브러리 불러오기
import pandas as pd # 아묻따 import 갈겨버리깅

 

 

데이터 프레임 (DataFrame) 생성

1. 딕셔너리를 이용한 방법

# 딕셔너리 만들기
dict1 = {'Name': ['Gildong', 'Sarang', 'Jiemae', 'Yeoin'],
        'Level': ['Gold', 'Bronze', 'Silver', 'Gold'],
        'Score': [56000, 23000, 44000, 52000]}

df = pd.DataFrame(dict1)

Output:

  NAME Level Score
0 Gildong Gold 56000
1 Sarang Bronze 23000
2 Jiemae Silver 44000
3 Yeoin Gold 52000

 

 

 

2. csv 파일을 읽어오는 방법

# 데이터 읽어오기
path = 'https://raw.githubusercontent.com/DA4BAM/dataset/master/titanic_simple.csv'
df = pd.read_csv(path)  

# 상위 5행만 확인
df.head()

Output

  PassengerId Survived Pclass Name Sex Age Fare Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22 7.25 Southampton
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 71.2833 Cherbourg
2 3 1 3 Heikkinen, Miss. Laina female 26 7.925 Southampton
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 53.1 Southampton
4 5 0 3 Allen, Mr. William Henry male 35 8.05 Southampton

 

 

 

데이터 프레임 (DataFrame) 속성 확인

 

pd.DataFrame.info()

# 열 데이터 형식 확인
df.info
'''
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   Fare         891 non-null    float64
 7   Embarked     889 non-null    object 
dtypes: float64(2), int64(3), object(3)
memory usage: 55.8+ KB
'''

 

pd.DataFrame.columns

# 열 정보 확인
df.columns
'''
Output:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'Fare',
       'Embarked'],
      dtype='object')
'''

 

pd.DataFrame.describe()

# 기초통계정보 확인
df.describe()

Output:

  PassengerId Survived Pclass Age Fare
count 891 891 891 714 891
mean 446 0.383838 2.30864 29.6991 32.2042
std 257.354 0.486592 0.836071 14.5265 49.6934
min 1 0 1 0.42 0
25% 223.5 0 2 20.125 7.9104
50% 446 0 3 28 14.4542
75% 668.5 1 3 38 31
max 891 1 3 80 512.329

 

pd.DataFrame.describe().T

# 기초통계정보 확인
df.describe()
Output:
  count mean std min 25% 50% 75% max
PassengerId 891 446 257.354 1 223.5 446 668.5 891
Survived 891 0.383838 0.486592 0 0 0 1 1
Pclass 891 2.30864 0.836071 1 2 3 3 3
Age 714 29.6991 14.5265 0.42 20.125 28 38 80
Fare 891 32.2042 49.6934 0 7.9104 14.4542 31 512.329

 

 

pd.DataFrame.value_counts()

# 고유값 개수 확인
df[['Embarked','Pclass']].value_counts()
'''
Output:
Embarked     Pclass
Southampton  3         353
             2         164
             1         127
Cherbourg    1          85
Queenstown   3          72
Cherbourg    3          66
             2          17
Queenstown   2           3
             1           2
dtype: int64
'''

 

pd.DataFrame.sort_values()

#정렬
df.sort_values(by='Fare',ascending=False).head(10)

 

ascending = False : 내림차순 ,True : 오름차순 (default)

반응형