[Mathematics of Data Management ] study notes | basic concepts related to data analytics

Recent Posts

Recent Comments

Tags more

Archives

Today

Total

Code&Data Insights

[Mathematics of Data Management ] study notes | basic concepts related to data analytics 본문

Artificial Intelligence/Data Analytics

[Mathematics of Data Management ] study notes | basic concepts related to data analytics

paka_corn 2023. 6. 13. 03:14

[ Type of Data ]

Quantitative data may be either discrete or continuous

[ Sampling Methods ]

1) Random Sampling : sample choice made without any pattern and would be completely unrelated

2) Simple Random Sampling : all of the selections are equally likely, for example drawing one name and each name has the same chance of being selected

3) Systematic Random Sampling : more organized in sample selection, create pattern to choose the samples

4) Stratified Random Sampling : the population is divided into group(strata), the strata can be arranged based on gender, age or any characteristics

Stratify : 계층화하다

5) Cluster Random Sampling : the population is ordered in terms of groups first. then the groups are randomly chosen for sampling of which all the members in the group are surved.

6) Multi-stage Random Sampling : groups are randomly chosen from a population, subgroups from these groups are randomly chosen and then individuals in these subgroups are then randomly chosen to be survey

7) Destructive Sampling : situations where the sample is damaged or killed to extract information ( mosly sample would be not human! )

[ Types of Bias ]

1) Sampling Bias : the chosen sample does not accurately represent population, most common form

(ex) people outside McDonalds are surveyed about their opinions of fast food

2) Non-Response Bias : a situation where the data is not collected from all of the potential respondents

(ex) people do not return mail-in surveys

3) Household Bias : occurs in situations where respondent are over- or under represented because groups of different sizes are polled eqaully

(ex) take surveys only from girls school

4) Response Bias : occurs as a result of the sampling method and the design of the study

(ex) questions poorly worded or the interviewer leading the answers

[ Central Tendency ]

Weighted Mean : the data is organized by a frequency table, the mean can still be calculating using a weighting factor which multiplies the value times the frequency by which it occurs

[ Measures of Spread ]

Variability : how far apart data points lie from each other and from the center of a distribution.

Along with measures of central tendency, measures of variability give you descriptive statistics that summarize your data. Variability is also referred to as spread, scatter or dispersion.

Interquartile Range(IQR) : divide the data up into four equal groups(quartiles)

https://www.youtube.com/watch?v=esskJJF8pCc

Standard Deviation : a valuable tool to measure the spread data ( How far? or How Close? )

=> a measure that indicates how much data scatter around the mean

(deviation : 편차 | variance : 변화 )

- deviation : a term that measures the distance a data point is compared to the mean

- variance : a measure of spread and can be calculated by averaging the deviation squared

** Calculating standard deviation using EXCEL : STDEV.S(cell 1:cell last)

[ Normal Distribution ]

Normal Distribution : a histogram that has a symmetrical 'bell' shape

Mean == Median == Mode

[ Z Scores ]

Z Scores : a statistical measurement that describes a value's relationship to the mean of a group of datas

=> calculated based on the number of standard deviations a data point is away from the mean

=> Positive value : above the mean | Negative value : below the mean

Z score = (Data value - Mean) / Standard deviation

** Calculating standard deviation using EXCEL : STANDARDIZE(x,mean,standard_dev)

[ Mathmatical Indices ]

Indices : indices are valuable because they indicate a value so that we can make comparisons

=> index values do not necessarily represent an actual measurement or quantity, but also have a starting or ending point

(ex) BMI, SLG(Slugging Percentage), Consumer Price Index,

'Artificial Intelligence > Data Analytics' 카테고리의 다른 글

[Book] Hands On Machine Learning with Scikit Learn and TensorFlow - Chapter 2:End-to-End Machine Learning Project (1)	2023.11.27
[Data Analytics] Entity Linkage - Atomic String Similarity \| Gap Distance \| Jaccard Distance \| Jaro Similarity \| Jaro-Winkler similarity (1)	2023.10.24
[Data Analytics] Cohort Analysis \| Behavioral Analytics (0)	2023.09.11
[Pandas] Pandas DataFrame \| Series \| Index \| Basic APIs (0)	2023.06.02