[Pandas] Pandas DataFrame | Series | Index

Recent Posts

Recent Comments

Tags more

Archives

Today

Total

Code&Data Insights

[Pandas] Pandas DataFrame | Series | Index | Basic APIs 본문

Artificial Intelligence/Data Analytics

[Pandas] Pandas DataFrame | Series | Index | Basic APIs

paka_corn 2023. 6. 2. 08:25

Pandas : a Python library used for working with data sets.

-> Pandas has functions for analyzing, cleaning, exploring, and manipulating data.

[ DataFrame ]

DataFrame : a Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns in RDB(relational database-SQL)

[ Series ]

Series : Series is a one-dimensional array holding data of any type, like a column in a table.

[ Index ]

Index : the rows which it can be defined a set of labels.

- Pandas creates index by default (index start from 0)

how to extract index? -> DataFrame.index | Series.index

[ Basic APIs ]

read_csv() : Load the CSV into a DataFrame

-----------------------------------------------------------------

import pandas as pd

df= pd.read_csv('data.csv')

-----------------------------------------------------------------

Convert DataFrame to ndarray

-----------------------------------------------------------------

import pandas as pd

df.values()

-----------------------------------------------------------------

drop()

- row : axis = 0 | column : axis = 1

- inplace = False => keep original dataframe and assign the new dataframe to " "

loc() - indexing by label of column

-----------------------------------------------------------------

import pandas as pd

df.loc[row, column)

-----------------------------------------------------------------

iloc() - indexing by position

-----------------------------------------------------------------

import pandas as pd

df.iloc[row, column]

-----------------------------------------------------------------

* iloc does not support boolean indexing!

Groupby

: groupby method returns

-------------------------------------------------------------------------------------------------

import pandas as pd

dataframe.transform(by, axis, level, as_index, sort, group_keys, observed, dropna)

-------------------------------------------------------------------------------------------------

-> To change type to DataFrame, we can use aggregation methods(sum, mean, max, min,...)

==> DataFrameGroupBy -> DataFrame

[ Processing for Missing Data ]

isna()

: it returns a DataFrame object where all the values are replaced with a Boolean value True for NA (not-a -number) values, and otherwise False.

- NaN : returns True

-----------------------------------------------------------------

import pandas as pd

df.isna()

-----------------------------------------------------------------

=> To Count how many NaN values in DataFrame

DataFrame. isna().sum()

fillna()

: it replaces the NULL values with a specified value.

-----------------------------------------------------------------

import pandas as pd

df.fillna(value, method, axis, inplace, limit, downcast)

-----------------------------------------------------------------

https://www.w3schools.com/python/pandas/pandas_ref_dataframe.asp

Pandas - DataFrame Reference

W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

www.w3schools.com

'Artificial Intelligence > Data Analytics' 카테고리의 다른 글

[Book] Hands On Machine Learning with Scikit Learn and TensorFlow - Chapter 2:End-to-End Machine Learning Project (1)	2023.11.27
[Data Analytics] Entity Linkage - Atomic String Similarity \| Gap Distance \| Jaccard Distance \| Jaro Similarity \| Jaro-Winkler similarity (1)	2023.10.24
[Data Analytics] Cohort Analysis \| Behavioral Analytics (0)	2023.09.11
[Mathematics of Data Management ] study notes \| basic concepts related to data analytics (0)	2023.06.13