목록Artificial Intelligence (67)
Code&Data Insights
Linear Model - Easy to optimize, fast training and prediction - Good Interpretability - ONLY suitable for linearly separable classes => The capacity of the linear model depends on the input dimensionality D. => VC dimensions : D + 1 for Logistic regression VC dimension? : a measure of the capacity or complexity of a hypothesis space Linear Regression - Parameter space is convex - Objective funct..
In Multiple Iinear Regression Model, there are many variables. To build a model, we need to choose right variables ! ( Using all the variables given in the data, it's NOT a good idea ) [ 5 methods of building models ] 1. All-in - Prior knowledge - Preparing for Backward Elimination 2. Backward Elimination Step 1) Select a significance level to stay in the model (ex) SL = 0.05 ----> SL = Signific..
[ Type of Data ] [ Sampling Methods ] 1) Random Sampling : sample choice made without any pattern and would be completely unrelated 2) Simple Random Sampling : all of the selections are equally likely, for example drawing one name and each name has the same chance of being selected 3) Systematic Random Sampling : more organized in sample selection, create pattern to choose the samples 4) Stratif..
Pandas : a Python library used for working with data sets. -> Pandas has functions for analyzing, cleaning, exploring, and manipulating data. [ DataFrame ] DataFrame : a Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns in RDB(relational database-SQL) [ Series ] Series : Series is a one-dimensional array holding data of any type, lik..
[ Decision Trees ] Decision Tree : a type of Supervised Machine Learning where the data is continuously split according to a certain parameter(features). The tree can be explained by two entities, namely decision nodes and leaves. Decision Tree Learning Decision 1 . How to choose what feature to spilt on at each node? Decision 2. When do you stop splitting? => stopping critria - When a node is 1..
[ Bias ] Bias - the inability for machine learning method to capture the true relationship -> In linear regression, the straight line has high bias. (underfit) -> Compared to the Squiggly line, it has low variance since the sums of squares are very similar for different data set. [ Variance ] Variance : the difference in fits between data sets. It has very little bias, but high variance ! --> fi..
https://www.youtube.com/watch?v=sbbYntt5CJk&t=4680s https://www.youtube.com/watch?v=Gv9_4yMHFhI&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF 말모... 이해안되는 개념들 진짜 쉽게 설명해주심 ㅠㅠ https://www.coursera.org/collections/machine-learning Andrew Ng’s Machine Learning Collection www.coursera.org + 한국 유튜버분들 (두분다 박사생이신듯.. 짱 멋짐 ㅠㅠ) https://www.youtube.com/watch?v=74NvFfKZm7A https://www.youtube.com/watch?v=Eyxyn..
Unsupervised Learning - Clustering Unsupervised Learning - Unsupervised learning uses unlabeled data. The training examples do not have targets or labels "y". Recall the T-shirt example. The data was height and weight but no target size. Clustering : find the data points related or similar - mostly used in marketing | segmentation | tracking | abnormaly detection - different cluster must have di..