A financial company wants to implement a data mesh. The data mesh must support centralized data governance, data analysis, and data access control. The company has decided to use AWS Glue for data catalogs and extract, transform, and load (ETL) operations. Which combination of AWS services will implement a data mesh? (Choose two.)
[ ] A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.
[x] B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
[ ] C. Use AWS Glue DataBrew for centralized data governance and access control.
[ ] D. Use Amazon RDS for data storage. Use Amazon EMR for data analysis.
[x] E. Use AWS Lake Formation for centralized data governance and access control.
A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions. The data engineer requires a less manual way to update the Lambda functions. Which solution will meet this requirement?
[ ] A. Store a pointer to the custom Python scripts in the execution context object in a shared Amazon S3 bucket.
[x] B. Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.
[ ] C. Store a pointer to the custom Python scripts in environment variables in a shared Amazon S3 bucket.
[ ] D. Assign the same alias to each Lambda function. Call reach Lambda function by specifying the function's alias.
A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline. Which AWS service or feature will meet these requirements MOST cost-effectively?
[ ] A. AWS Step Functions
[x] B. AWS Glue workflows
[ ] C. AWS Glue Studio
[ ] D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
描述
一公司建造了資料管線(Data Pipeline),串 AWS Glue。
資料工程師要去爬 MS SQL Server
ETL 完的資料,放到 S3 bucket 中。
試問最便宜的方案?
解析
選項A,的 Step Functions 是「無伺服器工作微服務工作編排」,其主要的用途是用來觸發 AWS 上的服務、Lambda function,並且接收觸發 function 的回傳結果,根據狀態不同去分別觸發不同任務。
原則上可以串 AWS 的服務,但是要去和 MS SQL Srever 對接,有額外的工(需要寫程式)