iT邦幫忙

2024 iThome 鐵人賽

DAY 30
0

先前我們針對 ExamTopics 的免費 32 題進行了準備。
接下來,你還可以註冊 AWS Skill Builder 去看免費的課程,做免費題庫。
(雖然根據經驗,這幾題可能不太會考)

Questions

### Q1

A data engineer working for an analytics company is working on a consumer to a Kinesis Data Streams application. They have written the consumer using Kinesis Client Library (KCL), however, currently they are receiving an ExpiredIteratorException when reading records from Kinesis Data Streams. What would you recommend to the engineer to solve their issue?

  • [ ] A. Change the capacity mode of the Kinesis Data Stream to on-demand.
  • [x] B. Increase WCU in DynamoDB checkpointing table.
  • [ ] C. Increase the amount of shards in Kinesis Data Streams.
  • [ ] D. Increase RCU in DynamoDB checkpointing table.

ExpiredIteratorException 是在寫入的時候發生,試圖提高 WCU 限制

  • 讀取容量單位 (RCU):每個從資料表讀取資料的 API 呼叫即視為一個讀取請求。讀取請求可以是嚴格一致、最終一致或交易形式。對於大小達 4 KB 的項目,一個 RCU 每秒可執行一個嚴格一致的讀取請求。大於 4 KB 的項目則需要額外的 RCU。對於大小達 4 KB 的項目,一個 RCU 每秒可執行兩個最終一致的讀取請求。對於大小達 4 KB 的項目,需要兩個 RCU 才能每秒執行一個交易讀取請求。例如,嚴格一致讀取 8 KB 項目的請求需要兩個 RCU 來執行;最終一致讀取 8 KB 項目的請求需要一個 RCU;而交易讀取 8 KB 項目的請求則需要 4 個 RCU。如需更多詳細資訊,請參閱讀取一致性。
  • 寫入容量單位 (WCU):每個寫入資料到資料表的 API 呼叫即視為一個寫入請求。對於大小達 1 KB 的項目,一個 WCU 每秒可執行一個標準寫入請求。大於 1 KB 的項目需要額外的 WCU。對於大小達 1 KB 的項目,需要兩個 WCU 才能每秒執行一個交易寫入請求。例如,標準寫入 1 KB 項目的請求需要一個 WCU 來執行;標準寫入 3 KB 項目的請求需要三個 WCU;而交易寫入 3 KB 項目的請求則需要六個 WCU。
  • 複寫的寫入容量單位 (rWCU):使用 DynamoDB 全域表時,系統會自動將資料寫入多個由您選擇的 AWS 區域。每個寫入作業除了發生在本機區域,也會發生在複寫區域。

Q2

You are building a pipeline to process, analyze and classify images. Your image datasets contain images that you need to preprocess as a first step by resizing and enhancing image contrast. Which AWS service should you consider to use to preprocess the datasets?

  • [ ] A. AWS Step Functions
  • [ ] B. AWS Data Pipeline
  • [x] C. AWS SageMaker Data Wrangler
  • [ ] D. AWS Glue DataBrew

Amazon SageMaker Data Wrangler 可將 ML 彙總和準備資料表與影像資料所需的時間從數週減少至數分鐘。藉助 Amazon SageMaker Data Wrangler,您可以簡化資料準備和特徵工程的程序,並從單一視覺化界面完成資料準備工作流程的每個步驟,包括資料選取、清理、探索、視覺化和大規模處理。您可以使用 SQL,從各種資料來源中選擇您想要的資料並快速匯入。接著,您可以使用資料品質和洞察報告,自動驗證資料品質並偵測異常狀況,如重複行和目標洩漏。Amazon SageMaker Data Wrangler 包含超過 300 個內建資料轉換,因此您無需編寫任何程式碼,即可快速轉換資料。

Q3

A social media company currently stores 1TB of data within S3 and wants to be able to analyze this data using SQL queries. Which method allows the company to do this with the LEAST effort?

  • [ ] A. Use AWS Kinesis Data Analytics to query the data from S3
  • [x] B. Use Athena to query the data from S3
  • [ ] C. Create an AWS Glue job to transfer the data to Redshift. Query the data from
  • [ ] Redshift
  • [ ] D. Use Kinesis Data Firehose to transfer the data to RDS. Query the data from RDS

選步驟最簡單最省事的,因為資料都已經在 S3 裡面了

Q4

A solutions architect at a retail company is working on an application that utilizes SQS for decoupling its components. However, when inspecting application logs, they see that SQS messages are often being processed multiple times. What would you advise the architect to fix the issue?

  • [ ] A. Decrease the visibility timeout for the messages
  • [ ] B. Change the application to use long polling with SQS
  • [x] C. Increase the visibility timeout for the messages
  • [ ] D. Change the application to use short polling with SQS

零售公司使用 SQS 訊息佇列去進行資料解耦,但發現好多重複的訊息被塞進 SQS 搞了好多次
調高 Visibility timeout 使 SQS 的消費者收資料的時候,不馬上刪除 Queue 中的資料,以避免相同資料連續收到
To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents all consumers from receiving and processing the message. The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours. For information about configuring visibility timeout for a queue using the console, see Configuring queue parameters using the Amazon SQS console.
image

Q5

Which of the following services are capable of reading from AWS Kinesis Data Streams (SELECT THREE)?

  • [x] A. Amazon Managed Service for Apache Flink
  • [ ] B. EFS
  • [ ] C. S3
  • [x] D. EC2
  • [x] E. EMR
  • The currently available services for processing data from Kinesis Data Streams are:
    Amazon Managed Service for Apache Flink, Spark on Amazon EMR, EC2, Lambda, Kinesis Data Firehose, and the Kinesis Client Library.
  • S3 and EFS can be output locations using the previously mentioned services, however, they cannot be used with Kinesis Data Streams without an intermediary processing step/service.
    需要中介層才可以放到 S3 和 EFS

Q6

A company has a daily running ETL process, which processes transactions from a production database. The process is not time sensitive and can be run at any point of the day. Currently, the company is in the process of migrating the ETL job to an AWS Glue Spark job. As a Certified Data Engineer, what would be the most cost-efficient way to structure the Glue ETL job?

  • [ ] A. Set the Glue to version 2.0
  • [ ] B. Set the execution class of the Glue job to FLEX
  • [ ] C. Set the execution class of the Glue job to STANDARD
  • [x] D. Set the Glue job to use Spot instances

Q7

Which of the following statements are CORRECT regarding AWS SQS (SELECT TWO)?

  • [x] A. Message size is limited to 256KB
  • [ ] B. Messages cannot be duplicated
  • [ ] C. Messages will be delivered in order
  • [x] D. Messages can be duplicated

Q8

A data engineer at a company is tasked with designing a data integration and transformation solution for the organization‘s data lake in AWS. The goal is to ensure efficient, automated, and scalable data ingestion and transformation workflows. Which AWS service is best suited for achieving this, offering capabilities for data cataloging, ETL job orchestration, and serverless execution?

  • [x] A. AWS Glue
  • [ ] B. AWS Lambda
  • [ ] C. AWS Data Pipeline
  • [ ] D. AWS Step Functions

AWS Step Functions. Explanation: AWS Step Functions are great for orchestrating serverless workflows, but do not offer the full range of ETL and data cataloging capabilities required for a data lake integration and transformation solution.
答案是 A 因為 Glue 有 serverless 版本
但是 Step functions 並沒有 full range ETL 能力,也沒有 cataloging 能力,需要外加 data lake

Q9

A data engineer working in an analytics company has been tasked to migrate their Apache Cassandra database to AWS. What AWS service should the engineer use to migrate the database to AWS, with the LEAST amount of operational overhead?

  • [ ] A. DocumentDB
  • [ ] B. Amazon Neptune
  • [x] C. Amazon Keyspaces
  • [ ] D. Amazon RDS

背誦題 Amazon Keyspaces 是 Cassandra,是一套開源分散式NoSQL資料庫系統。它最初由Meta開發,用於改善電子郵件系統的搜尋效能的簡單格式資料
Amazon Neptune: Managed Graph Database - AWS
Json Document Database - Amazon DocumentDB - AWS

Q10

A data analyst at a social media company wants to create a new Redshift table from a query. What would you recommend to the analyst?

  • [x] A. Use the SELECT INTO command on Redshift to query data and create a table from the results of the query
  • [ ] B. Use the CREATE TABLE command on Redshift to create a table from a given query
  • [ ] C. Use the COPY command on Redshift to create a table from a given query
  • [ ] D. Use the SELECT command on Redshift and save the intermediary results to S3.
  • [ ] E. Use the COPY command to create a new table on Redshift from the S3 data

觀念題,考知不知道 analyst

結論

  • 有些有爭議的題目,看似都正確,不過還是建議實際做過一輪。

上一篇
【Day 29】 做題庫小試身手 - 10
系列文
老闆,外帶一份 AWS Certified Data Engineer30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言