With over 6 years of experience in data engineering and machine learning. Mainly includes NLP, LLM model training and application, and streaming or batch ETL, ELT related data architecture design and development. I am willing to pursue new knowledge and skill and share it so that I can make greater contributions in Big Data and ML field in the future. My Tech Blog: https://marssu.coderbridge.io/
Professional Skills:
• Programming Languages: Python, SQL, Golang, Java
• Cloud Service Platform: Amazon Web Service(AWS), Google Cloud Platform(GCP), Azure
• BigData Tools: Apache Spark, Apache Airflow, Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Delta Lake, Databricks, DBT
• Databases: MySQL, MariaDB, PostgreSQL, MongoDB, Cassandra
• Data Warehouse: Clickhouse, Apache Druid
• Data Visualizations: Redash, Apache Superset, Grafana
• ML Framework: Tensorflow 2, PyTorch, Pytorch-Lightning
• LLM: LangChain, OpenAI, GPTCache, Chroma, PGVector
• Backend: FastAPI, Flask
• Infrastructure as Code: Packer, Terraform, AWS CDK, AWS Cloudformation
• Container: Docker, K8S
• Others: Vector.dev, OpenTelemetry, Git, Github Action, Jenkins, dbdocs, Label Studio, Datasaur, Prometheus, Grafana, Weight&Bias(WanDB), Jira
• 2024 itHome SRE Conference Keynote Speaker
• 2023 Sciwork Conference Program Committee, Reviewer and Speaker
• 2022 PyCon APAC Speaker
• 2022 Publish Apache NiFi data pipeline Chinese book.
• 2021 itHome AI&Data Contest - Champion