Code&Data Insights
Data & Data Science Field 본문
[ Data & Data Science Field ]
Data - traditional / big
Data science - BI(Business Intelligence) / Traditional Methods / Machine Learning
* Data Collection in TRADITIONAL DATA
after gathering the data, its need to take the data pre-processing
raw data -> data pre-processing -> processing -> Information
Where?
: Basic customer data / historical stock price data
< data pre-processing >
- Class labeling : < Numerical > / < categorical >
- Data cleansing(cleaning) : miss spelling and changed to correct one
< case-specific >
- balancing
- data shuffling: prevents unwanted patterns, improves predictive performance, helps avoid misleading results
- entity-relationship diagram (ER diagram) , relational schema
* Data Collection in BIGDATA
Where?
: Social Media / Financial Trading Data
< data pre-processing >
- Class labeling : number,text,digital images, video, autio, etc.
- Data cleansing(cleaning)
-- Class labeling : < Numerical > / < categorical >
- Data cleansing(cleaning)
< case-specific >
- text data mining
- data masking (hide personal information): analyze the information without compromising private detail
(e.g) confidentiality preserving data mining techniques
Analysis vs Analytics
Analysis : Thigng is already happened in the past
Analytics : Explore potential future event
Data Science : a discipline reliant on data availability, while business analytics does not completely rely on data.
Machine Learning (ML)
: Creating an algorithm, which a computer then uses to find a model that fits the data as best as possible. And makes very accurate predictions based on that.
ML algorithm : 'a trial-and-error process' , each consecutive(연속적인) trial is at least as good as the previous one
( Data -> Model -> Objective Function -> Optimization Algorithm )
" Traning your model! ", to make more precisely / give a final goal instead giving an instruction
Benefit : the robot can learn to fire more effectively than a human
Use : improve complex computational models
Type of ML
1. Superviesed learning : training an algorithm resembles(비슷하게) a teacher supervising her students
(e.g) deep learning , SVMs, NNs, Random forests, Boyesian network
2. Unsuperviesed learning : use unlabelled date
(e.g) deep learning , k-means
3. Reinforcement System : Receive a reward if it is succeed. similar to supervised learning, but instead of minimizing the loss, one maximizes reward.
(e.g) deep learning
* Deep Learning
- new revolutionary approach
- fundamentally different from the other approaches
- broad practical scope of application (extremely high accuracy)
Programming language : Python & R & SQL
Matlab + Java , Scala (useful when combining multiple sources )
=> Can create application software ( or software solutions )
Software : Excel , Apache, Hadoop(software framework), power BI
reference : 365 Data Science