iT邦幫忙

2024 iThome 鐵人賽

DAY 5
0

題目

Questions

Q1

A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint. The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket. Which solution will meet this requirement?

  • [ ] A. Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.
  • [ ] B. Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.
  • [ ] C. Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.
  • [x] D. Verify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.

描述

  • 資料工程師,設定 AWS Glue job 讀取 S3 裡的資料。
  • 設定 AWS Glue connection 和 IAM role; 執行 Glue job 的時候跳錯誤說 「there are problems with the Amazon S3 VPC gateway endpoint」

解析

  1. 首先你要知道三個名詞
    • AWS Glue (用來方便資料科學家進行菜渣集中的無伺服器服務)
    • IAM role (IAM role 被用來配置操作 AWS 服務權限),在這題的情境是要「允許 Glue 服務,去訪問 S3 Bucket 中的資料,所以要配置一個 role 給 Glue 權限去允許它碰 S3 Bucket」
    • VPC Gateway Endpoint 顧名思義是一條通道,用來提供私有雲(VPC)轄下的裝置或網卡,從 AWS 的內部網路去訪問 AWS 的特定服務,像是 S3 或 DynamoDB。 通常是為了避免繞出 Internet 才會走這個機制。
  2. 因為錯誤訊息都說了是 VPC Endpoint 的問題。 因為蓋出密道之後,要給 Glue 所在的子網路的網卡認得訪問 S3 的密道如何走,所以要檢查子網路所使用的 Route table。

Q2

A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts. Which solution will meet these requirements with the LEAST operational effort?

  • [ ] A. Create a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves.
  • [x] B. Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.
  • [ ] C. Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.
  • [ ] D. Load the data into Amazon Redshift. Create a view for each country. Create separate IAM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.

描述

  • 跨國零售業者有個用來存放使用者資料的 S3 bucket
  • 個資,在不同國家地區的規範不同,所以要避免跨國的資料團隊,偷挖別國使用者的資料

解析

  1. 考你知道知道 「AWS Lake Formation」
  2. https://aws.amazon.com/tw/lake-formation/features/

Q3

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company's existing analytics platform. The company wants to minimize the effort and time required to incorporate third-party datasets. Which solution will meet these requirements with the LEAST operational overhead?

  • [x] A. Use API calls to access and integrate third-party datasets from AWS Data Exchange.
  • [ ] B. Use API calls to access and integrate third-party datasets from AWS DataSync.
  • [ ] C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.
  • [ ] D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).

描述

  • 資料放在 AWS 以外的地方
  • 不要大費周章,要盡量簡單做到

解析

  1. 考兩個名詞:AWS Data Exchange 是一次性的資料搬運; AWS DataSync 是用來備份抄寫。
  2. Amazon Kinesis Data Streams 是針對串流資料設計的服務,跟 Git Repository 和 Docker Images Registry 無關。

結論

  • 從題庫來看哪些部分的知識需要加強
  • IAM role / VPC Endpoint 等係屬於常用的 AWS 的部分
  • 其他的產品專有名詞,看看就好哩

上一篇
【Day 4】 (Big) Data Analytics 資料分析的範疇
下一篇
【Day 6】AWS Glue 的簡單介紹
系列文
老闆,外帶一份 AWS Certified Data Engineer30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言