iT邦幫忙

2024 iThome 鐵人賽

DAY 18
0

題目

Questions

Q7

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application. Which solution will meet these requirements with the LEAST operational overhead?

  • [ ] A. Establish WebSocket connections to Amazon Redshift.
  • [x] B. Use the Amazon Redshift Data API.
  • [ ] C. Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.
  • [ ] D. Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

描述

  • 金融服務公司把金融資料放在 Redshift (資料倉儲/關聯式資料/附帶算力)
  • 工程師要挖出即時資料,給網頁服務的交易程式使用
  • 並且透過交易程式來挖資料
  • 找最容易實踐和維護的方式

解析

  1. RedShift 有提供 API 給查詢,並不支援 WebSocket。
  2. 用 JDBC 顯得畫蛇添足,還要處理 Java 與 Web-based application 之間的溝通。
  3. 把 Redshift 的資料再扔到 S3 還要查詢的話還是要再透過像是 Athena 的工具,也是繞路做法。
  4. 建議參考影片: https://www.youtube.com/watch?v=q3TTNBYcDG4&ab_channel=AWSDevelopers

Q8

A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account. Which solution will meet these requirements?

  • [ ] A. Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.
  • [x] B. Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an IAM policy that uses the tags to apply appropriate permissions to the workgroup.
  • [ ] C. Create an IAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.
  • [ ] D. Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.

描述

  • 有個公司用 Athena 去做一次性查詢 S3 裡的東西
  • 需要做權限管控,不同使用者可以看不同的查詢結果

解析

  1. 題目說的一次性查詢,是指資料不會一直被更新,所以沒有必要每次都耗費算力去撈出相同結果,所以透過選項A喔,光去做 bucket 切分而已,沒有提到怎麼樣去分配查詢結果的權限。
  2. 選項B是在曬功能啦,如果有玩過 Athena 的話就知道可以根據 IAM User 配不同 Policy 分配權限。
  3. 選項C,看似合理但是不太對的地方在於,一次性查詢的結果要給不同人看。那使用 Role 即使綁對權限,挖完資料後,不同的 IAM user 分成可以看這個 Role 所有挖出來的結果,以及不能看這個 Role 所有挖出來的結果。
  4. 選項D,是 Glue 的做法,不是 Athena。 https://aws.amazon.com/blogs/big-data/restrict-access-to-your-aws-glue-data-catalog-with-resource-level-iam-permissions-and-resource-based-policies/

Q9

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time. Which solution will run the Glue jobs in the MOST cost-effective way?

  • [x] A. Choose the FLEX execution class in the Glue job properties.
  • [ ] B. Use the Spot Instance type in Glue job properties.
  • [ ] C. Choose the STANDARD execution class in the Glue job properties.
  • [ ] D. Choose the latest version in the GlueVersion field in the Glue job properties.

描述

  • 有個工程師要排程 Glue jobs 並且每天觸發。
  • 不在乎他幾點幾分被執行。
  • 如何觸發最划算?

解析

  1. 選 Flex 比較便宜,可以參考: https://aws.amazon.com/blogs/big-data/introducing-aws-glue-flex-jobs-cost-savings-on-etl-workloads/

Q10

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket. Which solution will meet these requirements with the LEAST operational overhead?

  • [x] A. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
  • [ ] B. Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
  • [ ] C. Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
  • [ ] D. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

描述

  • 有個工程師透過 Lambda function 去轉換格式
  • csv 轉 Apache Parquet
  • 當有資料被上傳到 S3 的時候觸發
  • 選最容易做到的

解析

  1. 考你有沒有聽過 S3 Event Notifications
  2. 選項 B 抓取的行為是 s3:ObjectTagging:* 一看就不符合上傳檔案(檔案被新增在 Bucket)
  3. 選項 C 抓取的行為是 s3:* 一點風吹草到就觸發轉換,顯然不對
  4. 最後是如何觸發,選擇 Lambda function 作為 notification 的 destination 即可;無關 SNS topic

結論

  • 關於 Apache Parquet 這部分是我目前不熟悉的東西
  • 看了網路資料,這個東西就是把 關聯式資料表 Table 轉置

上一篇
【Day 17】 AWS 上的容器服務
下一篇
【Day 19】 Hadoop / Spark / Amazon EMR
系列文
老闆,外帶一份 AWS Certified Data Engineer30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言