Code&Data Insights

Data & Data Science Field 본문

Data Science

Data & Data Science Field

paka_corn 2023. 2. 3. 00:11

 

[ Data & Data Science Field ]

 

 

Data - traditional / big

Data science - BI(Business Intelligence) / Traditional Methods / Machine Learning

 

* Data Collection in TRADITIONAL DATA

 

after gathering the data, its need to take the data pre-processing

 

raw data -> data pre-processing -> processing -> Information

 

Where?

: Basic customer data / historical stock price data

 

< data pre-processing >

- Class labeling : < Numerical > / < categorical >

- Data cleansing(cleaning) : miss spelling and changed to correct one

 

< case-specific >

- balancing

- data shuffling: prevents unwanted patterns, improves predictive performance, helps avoid misleading results

- entity-relationship diagram (ER diagram) , relational schema

 

 

* Data Collection in BIGDATA

 

Where?

: Social Media / Financial Trading Data

 

< data pre-processing >

- Class labeling : number,text,digital images, video, autio, etc.

- Data cleansing(cleaning)

-- Class labeling : < Numerical > / < categorical >

- Data cleansing(cleaning)

 

< case-specific >

- text data mining

- data masking (hide personal information): analyze the information without compromising private detail

(e.g) confidentiality preserving data mining techniques

 

 

 

Analysis vs Analytics

 

Analysis : Thigng is already happened in the past

 

Analytics : Explore potential future event

 

Data Science : a discipline reliant on data availability, while business analytics does not completely rely on data.

 

 

Machine Learning (ML)

: Creating an algorithm, which a computer then uses to find a model that fits the data as best as possible. And makes very accurate predictions based on that.

 

ML algorithm : 'a trial-and-error process' , each consecutive(연속적인) trial is at least as good as the previous one

( Data -> Model -> Objective Function -> Optimization Algorithm )

 

" Traning your model! ", to make more precisely / give a final goal instead giving an instruction

 

Benefit : the robot can learn to fire more effectively than a human

 

Use : improve complex computational models

 

Type of ML

1. Superviesed learning : training an algorithm resembles(비슷하게) a teacher supervising her students

(e.g) deep learning , SVMs, NNs, Random forests, Boyesian network

 

2. Unsuperviesed learning : use unlabelled date

(e.g) deep learning , k-means

 

3. Reinforcement System : Receive a reward if it is succeed. similar to supervised learning, but instead of minimizing the loss, one maximizes reward.

(e.g) deep learning

 

 

* Deep Learning

- new revolutionary approach

- fundamentally different from the other approaches

- broad practical scope of application (extremely high accuracy)

 

 

Programming language : Python & R & SQL

Matlab + Java , Scala (useful when combining multiple sources )

 

=> Can create application software ( or software solutions )

 

Software : Excel , Apache, Hadoop(software framework), power BI

 

 

reference : 365 Data Science

Comments