iT邦幫忙

2024 iThome 鐵人賽

DAY 21
0

題目

Questions

Q15

A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically. Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?

  • [x] A. AWS DataSync
  • [ ] B. AWS Glue
  • [ ] C. AWS Direct Connect
  • [ ] D. Amazon S3 Transfer Acceleration

AWS DataSync is a managed data transfer service that simplifies and accelerates moving large amounts of data online between on-premises storage and Amazon S3, EFS, or FSx for Windows File Server. DataSync is optimized for efficient, incremental, and reliable transfers of large datasets, making it suitable for transferring 5 TB of data with daily updates.

描述

  • 有一個工程師要轉移資料 5TB,從地端搬到 S3
  • 其中會有 5% 內容每天會變,這些異動也要被改到 S3
  • 同步作業要定期觸發

解析

  • 考你知不知道 AWS DataSync
  • Glue 是用來做 ETL 的工具,跟這邊無關
  • Direct Connect 是專線網路的服務,跟資料搬遷如何實作也沒直接關係
  • Amazon S3 Transfer Acceleration 是用來傳送大檔案上 S3 的加速功能,針對每日上傳作業、抓出更動資料的部分同步,這個就不適用了

Q16

A company uses an on-premises Microsoft SQL Server database to store financial transaction data. The company migrates the transaction data from the on-premises database to AWS at the end of each month. The company has noticed that the cost to migrate data from the on-premises database to an Amazon RDS for SQL Server database has increased recently. The company requires a cost-effective solution to migrate the data to AWS. The solution must cause minimal downtown for the applications that access the database. Which AWS service should the company use to meet these requirements?

  • [ ] A. AWS Lambda
  • [x] B. AWS Database Migration Service (AWS DMS)
  • [ ] C. AWS Direct Connect
  • [ ] D. AWS DataSync

描述

  • 用地端微軟資料庫存交易資料
  • 每個月將資料搬到雲上
  • 最近搬遷的費用增加,想要省錢

解析

  • 其實就是考你懂不懂 DMS 的用途而已,用來抄資料上雲的,但沒辦法即時同步喔!!

Q17

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour. Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)

  • [x] A. Configure AWS Glue triggers to run the ETL jobs every hour.
  • [ ] B. Use AWS Glue DataBrew to clean and prepare the data for analytics.
  • [ ] C. Use AWS Lambda functions to schedule and run the ETL jobs every hour.
  • [x] D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
  • [ ] E. Use the Redshift Data API to load transformed data into Amazon Redshift.

描述

  • 工程師要做資料管線化
  • 用 Glue 做 ETL

解析

  • AWS 偏好叫使用者去用 Glue,說是很容易建置
  • 做完 ETL 的資料當然需要從 Glue 寫到 Redshift
  • AWS Glue DataBrew 是拿來做資料視覺化的工具,主打不用寫程式,偏向是人類介入操作偶而為之的工作;但都要弄管線化,就不會選到這個
  • 至於 Lambda 這個看起來很合理,但是是否必要呢? 因為題目有提到選兩個選項,而資料來自兩種以上的資料庫,用 Lambda 的話會有很多程式要開發喔!

Q18

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling. Which solution will meet this requirement?

  • [ ] A. Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.
  • [x] B. Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.
  • [ ] C. Turn on concurrency scaling in the settings during the creation of any new Redshift cluster.
  • [ ] D. Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

描述

  • 一公司使用 Redshift 跑在 RA3 型節點主機。
  • 想要提高讀寫能力。
  • 哪個做法合理?

解析

  • 了解 Redshift 的 WLM (WorkLoad Management) 功能

Concurrency scaling in Amazon Redshift allows the cluster to automatically add and remove compute resources in response to workload demands. Enabling concurrency scaling at the workload management (WLM) queue level allows you to specify which queues can benefit from concurrency scaling based on the query workload.
https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling-queues.html

  • 您可以藉由將工作負載管理 (WLM) 佇列啟用為並行擴展佇列,來將查詢路由至並行擴展叢集。若要開啟佇列的並行擴展,請將 Concurrency Scaling mode (並行擴展模式) 值設定為 auto (自動)。

You route queries to concurrency scaling clusters by enabling a workload manager (WLM) queue as a concurrency scaling queue. To turn on concurrency scaling for a queue, set the Concurrency Scaling mode value to auto.

結論

  • AWS Database Migration Service (DMS) 和 AWS DataSync 的比較蠻常會考的
    • DMS 是一次性的
    • DataSync 名字都告訴你在做同步,就是一直看一直抄

上一篇
【Day 20】 做題庫小試身手 - 4
下一篇
【Day 22】 Amazon Kinesis Data Streams 的簡單介紹
系列文
老闆,外帶一份 AWS Certified Data Engineer30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言