Code&Data Insights

[Mathematics of Data Management ] study notes | basic concepts related to data analytics 본문

Data Science/Data Analytics

[Mathematics of Data Management ] study notes | basic concepts related to data analytics

paka_corn 2023. 6. 13. 03:14

[ Type of Data ]

Quantitative data may be either  discrete  or  continuous

 

 

 

[ Sampling Methods ]

1) Random Sampling : sample choice made without any pattern and would be completely unrelated

2) Simple Random Sampling : all of the selections are equally likely, for example drawing one name and each name has the same chance of being selected 

3) Systematic Random Sampling : more organized in sample selection, create pattern to choose the samples

4) Stratified Random Sampling : the population is divided into group(strata), the strata can be arranged based on gender, age or any characteristics 

Stratify : 계층화하다 

5) Cluster Random Sampling : the population is ordered in terms of groups first. then the groups are randomly chosen for sampling of which all the members in the group are surved. 

6) Multi-stage Random Sampling : groups are randomly chosen from a population, subgroups from these groups are randomly chosen and then individuals in these subgroups are then randomly chosen to be survey 

7) Destructive Sampling : situations where the sample is damaged or killed to extract information ( mosly sample would be not human! )

 

 

 

 

[ Types of Bias ]

1) Sampling Bias : the chosen sample does not accurately represent population, most common form

(ex) people outside McDonalds are surveyed about their opinions of fast food

2) Non-Response Bias : a situation where the data is not collected from all of the potential respondents

(ex) people do not return mail-in surveys 

3) Household Bias : occurs in situations where respondent are over- or under represented because groups of different sizes are polled eqaully 

(ex) take surveys only from girls school 

4) Response Bias : occurs as a result of the sampling method and the design of the study

(ex) questions poorly worded or the interviewer leading the answers 

 

 

 

 

[ Central Tendency ]

 

Weighted Mean : the data is organized by a frequency table, the mean can still be calculating using a weighting factor which multiplies the value times the frequency by which it occurs 

 

 

[ Measures of Spread ]

Variability : how far apart data points lie from each other and from the center of a distribution.

Along with measures of central tendency, measures of variability give you descriptive statistics that summarize your data. Variability is also referred to as spread, scatter or dispersion.

 

 

Interquartile Range(IQR) : divide the data up into four equal groups(quartiles)

IQR = Q3 - Q1

 

 

 

https://www.youtube.com/watch?v=esskJJF8pCc 

Standard Deviation : a valuable tool to measure the spread data ( How far? or How Close? )

=> a measure that indicates how much data scatter around the mean

 

(deviation : 편차 | variance : 변화 ) 

- deviation : a term that measures the distance a data point is compared to the mean

- variance : a measure of spread and can be calculated by averaging the deviation squared

 

** Calculating standard deviation using EXCEL : STDEV.S(cell 1:cell last)

 

 

 

 

 

 

[ Normal Distribution ]

Normal Distribution : a histogram that has a symmetrical 'bell' shape

 

Mean == Median == Mode 

 

N(mean, standard deviation^2)

 

 

 

 

 

[ Z Scores ]

Z Scores : a statistical measurement that describes a value's relationship to the mean of a group of datas 

=> calculated based on the number of standard deviations a data point is away from the mean

=> Positive value : above the mean | Negative value : below the mean 

 

 

Z score = (Data value - Mean) / Standard deviation

 

** Calculating standard deviation using EXCEL : STANDARDIZE(x,mean,standard_dev)

 

 

 

 

 

 

[ Mathmatical Indices ]

Indices : indices are valuable because they indicate a value so that we can make comparisons 

=> index values do not necessarily represent an actual measurement or quantity, but also have a starting or ending point

 

(ex) BMI, SLG(Slugging Percentage), Consumer Price Index, 

Comments