Code&Data Insights

[Tableau Desktop Specialist Certification] Exam Prep - Domain 1 : Connecting to & Preparing Data (중요 내용 정리) 본문

Data Science/Certificate

[Tableau Desktop Specialist Certification] Exam Prep - Domain 1 : Connecting to & Preparing Data (중요 내용 정리)

paka_corn 2023. 8. 26. 11:25

[ Domain 1: Connecting to & Preparing Data ] 

1.1 Create live connections and extracts

1) Create a live connection to a data source

 

Live connection : connecting to the data source directly rather than connecting to a copy. 

( default in Tableau Desktop ) 

 

Extract : the subset of data (that we can use to improve performance or to take advandatage of Tableau functionality not available or supported in the original data) 

( default in Tableau Public ) 

 

 

 

2) Explain the differences between using live connections versus extracts

Live Connections

- need to enter credentials to access the data

- workbook won't work unless it is connected to the data source ( can't use it when offline ) 

 

Extract

- once an extract created, no longer access to the data 

- when an extract created, the Tableau view will use the same underlying data unless we choose to refresh the extract

- no credential needed for the extract 

- get out of sync with the data source

- the dashboard won't have to wait for unresponsive database

 

 

3) Create an extract

- when we create an extract, we get many size options  

( can choose only include a sample of the data, or aggregate to visible dimensions or use extract filters  ) 

 

 

 

 

1) Whether to store logical or physical tables

 

- physical tables option could improve performance and help reduce the size of the extract file.

=> When you use the physical table option, the demormalized table resulting from the join is saved.

 

-       Whether to limit the data extracted using filters

 

-       Whether to aggregate measures for visible dimensions

 

-       Number of rows

 

-       Hide All Unused Fields

 

 

 

 

 

- incremental loads : only the new data is refreshed ( reduce the data loading time but unlike to reduce the size ) 

 

* Extracts cannot materialize table calculations! 

(테이블 계산의 계산 과정과 결과를 포함하지 않고, 원본 데이터의 요약 정보와 필요한 데이터만을 추출에 저장) 

=> Materializing calculations may reduce the time to open the workbook since the result of the calculations will be saved in extract, but will not reduce the workbook size

 

 

4) Save metadata properties in a .TDS

metadata : data about the data ( column name, the number of columns ) 

 

TDS = Tableau Data Source

-> TDS files store the connection information (server name and port, or file location) as well as changes you've made to the metadata properties such as field names, default formatting, and additions such as calculated fields, parameters, groups, and sets )

=> TDS file saves metadata + connection information + default data formatting + sort order 

 

< File types > 

- Tableau workbook (.twb)

- Tableau packaged workbook (.twbx) : Tableau workbook + extract

 

- Tableau extract (.hyper(newer ver) and .tde) 

 

- Tableau data source (.tds)

- Tableau packaged data source(.tdsx) : information of data source + extract

 

 

5) Create a data source that uses multiple connections

- can combine tables from multiple connections using join or relationship(ex. UNION) 

 

 

1.2 Create and manage the data model

Physical layer - It involves the specifics of connecting to real data sources like databases, Excel files, web services, etc. Detailssuch as the database server address, user authentication credentials, data retrieval methods.

(데이터 소스와 실제로 연결되는 물리적인 측면, 실제 데이터 원본에 대한 연결 정보)

 

Logical layer - It represents the conceptual understanding and manipulation of data from a user's perspective. It encompasses the logical structure used to interact with and visualize data. 

It includes fields, tables, relationships, and other structures that make it easier for users to comprehend and work with the data. 

(사용자의 관점에서 데이터를 이해하고 조작하는 데 사용되는 개념, 데이터 조작 및 시각화를 위한 필드 매핑, 필터링, 계산 필드 등을 정의)

 

 

1) Add relationships to a data source

=> the values of Order ID are unique in the Returns table, so we can set the cardinality "Many-to-One".

 

< Relationship >

- we use relationship to avoid data duplication when combining table with different granularity with joins

- relationship works similar as 'Outer Join'

- when the view includes a measure, nonmatches from that table are included in the view.

- to ensure that all values from both tables in the relationships are included, we can add a measure to the view

- relationships are simliar to joins in that they are a way of combining columns from one table with columns from another

- relationships don't work with geographic fields.

 

2) Add joins and unions

- Join or a blend are adding more measures and dimensions to an exisiting data set. 

 

< Joins >

physical layer where we can create joins rather than relationship

=> unlike with the relationships(logical), it will create one big table that combines the columns from both

( relationships keep the tables seperate, while joins create a new combined table )

 

* a row is addes each time the join criteria is met, which can result in duplication

( left, right, full outer join - the null is assigned for the empty value ) 

 

< Blends >

blends : the technique of combining multiple data sources to perform a single visualization or analysis. Unlike joins or SQL queries, data blending allows for the integration of data sources stored in different databases or sources.

 

- blends can help combine data with different levels of granularity without duplication

( granularity : 데이터의 세분화 정도나 데이터 항목의 상세함, 더 세분화 된 데이터일 수록 더 낮은 granularity을 가짐 ) 

=> with a blend, a primary data source is linked to a secondary data source.

the data in the secondary data source will not be duplicated even if the linking criteria is satisfied mulitiple times. 

 

- Unlike joins, data blending keeps the data sources separate and simply displays their information together.

=> Blend works similar as 'left join'

 

- Each data source is queried independently and the results are aggregated to the appropriate level then visualized together.

 

Join vs blend

: with a join, the query combines the tables before Tableau aggregates the results in the view

, whereas with a blend the aggregation happens before the results are combined.

 

< Unions >

Union : can be used to combine the rows from one table with the rows from a second table

=> after we performed the UNION,

tableau won't know to line the columns up if their names in the two tables don't match. Then select two columns and then click "merge mismatched fields"

 

- If the column name doesn't match from the tables, the null value will be assigned 

 

- Union 생성하는 방법 

1) Data source창에서 테이블 밑으로 드래그

2) Data source창에서 'New Union' 더블클릭 후 원하는 테이블 끌어다 놓기 

 

 

3) Explain when to use a join versus a relationship

UNION : combining rows

JOIN, Blend, Relationship : combining columns

 

- relationship is defalut is to use ( relationships combine data at different levels of detail without introducing duplication ) 

=> relationships don't query data unless columns from that table are used in the viz. 

 

- joins produce a new row each time the join criteria is met, which can duplicate data

 

 

 

 

1.3 Manage data properties

1) Rename a data field (Click Data pane, right click > drop down menu > click 'Rename' 

 

2) Assign an alias to a data value

 

3) Assign a geographic role to a data field

+ ZIP code / postal code 

 

4) Change data type for a data field (number, date, string, Boolean, etc.)

- the data type has to be make sense 

 

5) Change default properties for a data field (number format, aggregation, color, date

format, etc.)

 

- Default properties which can be assigned to a dimension: Comment / Color / Shape / Sort 

- Default property for a measure : Comment / Color / Number Format / Aggregation / Total Using

 

 

 

 

 

reference : https://www.udemy.com/course/tableau-specialist-certification-prep/

 

Comments