Code&Data Insights
[Pandas] Pandas DataFrame | Series | Index | Basic APIs 본문
[Pandas] Pandas DataFrame | Series | Index | Basic APIs
paka_corn 2023. 6. 2. 08:25Pandas : a Python library used for working with data sets.
-> Pandas has functions for analyzing, cleaning, exploring, and manipulating data.
[ DataFrame ]
DataFrame : a Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns in RDB(relational database-SQL)
[ Series ]
Series : Series is a one-dimensional array holding data of any type, like a column in a table.
[ Index ]
Index : the rows which it can be defined a set of labels.
- Pandas creates index by default (index start from 0)
how to extract index? -> DataFrame.index | Series.index
[ Basic APIs ]
read_csv() : Load the CSV into a DataFrame
-----------------------------------------------------------------
import pandas as pd
df= pd.read_csv('data.csv')
-----------------------------------------------------------------
Convert DataFrame to ndarray
-----------------------------------------------------------------
import pandas as pd
df.values()
-----------------------------------------------------------------
drop()
- row : axis = 0 | column : axis = 1
- inplace = False => keep original dataframe and assign the new dataframe to " "
loc() - indexing by label of column
-----------------------------------------------------------------
import pandas as pd
df.loc[row, column)
-----------------------------------------------------------------
iloc() - indexing by position
-----------------------------------------------------------------
import pandas as pd
df.iloc[row, column]
-----------------------------------------------------------------
* iloc does not support boolean indexing!
Groupby
: groupby method returns
-------------------------------------------------------------------------------------------------
import pandas as pd
dataframe.transform(by, axis, level, as_index, sort, group_keys, observed, dropna)
-------------------------------------------------------------------------------------------------
-> To change type to DataFrame, we can use aggregation methods(sum, mean, max, min,...)
==> DataFrameGroupBy -> DataFrame
[ Processing for Missing Data ]
isna()
: it returns a DataFrame object where all the values are replaced with a Boolean value True for NA (not-a -number) values, and otherwise False.
- NaN : returns True
-----------------------------------------------------------------
import pandas as pd
df.isna()
-----------------------------------------------------------------
=> To Count how many NaN values in DataFrame
DataFrame. isna().sum()
fillna()
: it replaces the NULL values with a specified value.
-----------------------------------------------------------------
import pandas as pd
df.fillna(value, method, axis, inplace, limit, downcast)
-----------------------------------------------------------------
https://www.w3schools.com/python/pandas/pandas_ref_dataframe.asp