목록All Contents (170)
Code&Data Insights
[ Common Problem Types ] 1. Making predictions 2. Categorizing things - categorized by specific keyword or score 3. Spotting something unusal 4. Identifying themes - Grouping categorized info into broader concepts 5. Discovering connections 6. Finding patterns - using historical data to understand what happened in the past and is therefore likely to happen again [ SMART questions ] => kinds of q..
[ The six phases of data analysis ] Ask : Business Challenge/Objective/Question Prepare : Data generation, collection, storage, and data management Process : Data cleaning/data integrity - what type of data we have, missing data, wrong data collection? Analyze : Data exploration, visualization, and analysis => should be unbiased! look for the patterns Share : Communicating and interpreting resul..
[ Cross Validation ] : Cross Validation allows us to compare different machine learning methods and get a sense of how well they will work in practice. - K-Fold ( K could be arbitrary! ) [ Confusion Matrix ] Confusion Matrix : To decide which method should be performed with the given data sets, we need to summurize how each method performed on the testing Data. => one way to do this is by creati..
[ Terminology in Machine Learning ] - Training set : data used to train the model => input features + target variables ( x + y ) - x : input variable | (input) features - y : output variable | target variable - m : number of training examples - (x,y) : single training example -ŷ (y hat) : predicted output The data type in ML : Machine learning builds predictive models based on your data and lea..
The Art of Statistics: Learning from Data by David Spiegelhalter =>David Spiegelhalter is a prominent British statistician who provides numerous examples from his professional experience in the field of statistics. He endeavors to explain these concepts in an accessible way, without relying on mathematical formulas. 2학년때 probability and statistics 과목을 들었는데, 감이 안잡혀서 책을 읽었다. 다양한 예시를 줘서 전체적인 통계학의..
Schemas and Instances - Databases instance: the current content of the DB - Databases schema: the structure of the data(relations/classes) -> Relation(table) is as a set or tuple! What is the meaning of Data Independence?? : the ability to modify definition of schema at one level with little or no effect on the schema at a high level - Logical data independence: adding new fields to a record or ..
What is a Database?: a collection (Not a random pile of data) of data that exists over a long period time. DBMS: a complex software package developed to store and manage databases- Mysql, mongoDB, …- Provide convenient, efficient, and secure access and manipulation of large amounts of data- Controls access to shared data from multiple, simultaneous users with properties Atomicity, Consistency, l..
Key difference between SQL Keys SQL keys are used to uniquely identify rows in a table. SQL keys can either be a single column or a group of columns. Super key is a single key or a group of multiple keys that can uniquely identify tuples in a table. Super keys can contain redundant attributes that might not be important for identifying tuples. Candidate keys are a subset of Super keys. They cont..