목록All Contents (143)
Code&Data Insights
Exam Format => 60분 짜리 시험으로 예전엔 실기 문제도 나와서 구글링이 가능하다고 했는데 2021년도 부터 아예 필기로만 바뀌었다고 한다. 윗 레벨인 Tableau Data Analyst 는 실기 문제도 있고 Tableau Puplic(무료 버전) 으로는 풀 수 없는 문제들도 나온다고 한다. Exam Content => 유데미에 Tableau Desktop Specialist Certification Prep이라고 치면, 강의가 여러개 나오는데 그 중 3시간 짜리 선택! (검색해보니 다들 이 강의를 들으시는 것 같다) 시험 결제하고 3개월 안으로만 시험 보면 되는 것 같다. 학생 인증하면 tableau desktop(유료 버전)도 1년 무료고, 자격증 시험비도 100불에서 80불로 할인해준다..
[ What is Association Rule Learning? ] Association Rule Learning : Association Rule Learning is a data mining technique that discovers rules indicating the co-occurrence of two or more items. => Identifiy the relationships between items and discovers valuable rules indicating their co-occurrence. => For example, People who bought 'this stuff', they also bought 'this stuff'. | "You may also like”..
[ Hierarchical Clustering ] Hierarchical Clustering : Hierarchical clustering is a data analysis technique that groups data hierarchically based on similarity or distance - Use Euclidean distance or Manhattan distance - 2 approachs for hierarchical Clustering : 1) Agglomaerative- Top-down 2) Divisive - Bottom-up [ Agglomerative Hierarchical Clustering ] ( Agglomerative Hierarchical Clustering : ..
[ K-Nearest Neighbours ] K Nearest Neighbors (KNN) => KNN is a supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. How It Works? Step 1) Choose the number K of neighbors Step 2) Take the K nearst neighbors of the new data point, according to the Euclidean distance - Euclidean Distance : √((x₂ - x₁)² + (y₂ - y..
[ Ensemble Learning ] Ensemble Learning : when we take multiple algorithms or the same algorithm multiple times and we put them together that results in a much more powerful version. => It helps to improve the performance and accuracy of machine learning algorithms. => Putting multiple ML algorithms together to create one bigger ML algorithm that leverages many other ML algorithms. Type of Ensem..
[ R-squared ] R-squared - a measure of the goodness-of-fit of a regression model. - It represents the proportion of the variance in the dependent variable that is predictable from the independent variables. - the percentage of variation explained by the relationship between two variables. => range : 0 to 1 => R² = 1 - (SSR/SST) SSR = the sum of squared residuals (the sum of the squared differenc..
Linear Model - Easy to optimize, fast training and prediction - Good Interpretability - ONLY suitable for linearly separable classes => The capacity of the linear model depends on the input dimensionality D. => VC dimensions : D + 1 for Logistic regression VC dimension? : a measure of the capacity or complexity of a hypothesis space Linear Regression - Parameter space is convex - Objective funct..
In Multiple Iinear Regression Model, there are many variables. To build a model, we need to choose right variables ! ( Using all the variables given in the data, it's NOT a good idea ) [ 5 methods of building models ] 1. All-in - Prior knowledge - Preparing for Backward Elimination 2. Backward Elimination Step 1) Select a significance level to stay in the model (ex) SL = 0.05 ----> SL = Signific..