。该工作已被公司管理人员暂停申请，不再接受简历投递

Data Engineer - Python

We are looking for a highly skilled and experienced Data Engineer with expertise in Python and the AWS ecosystem to join our dynamic team. In this role, you will design, build, and maintain scalable data pipelines and manage large datasets within AWS infrastructure. You will work closely with data scientists, analysts, and other stakeholders to ensure the availability of clean, reliable, and high-performing data systems to support our analytical and operational needs.

Key Responsibilities:

Design and Build Data Pipelines: Develop robust and scalable data pipelines using Python, AWS services, and other technologies to move and process large datasets efficiently. (Java experience is also a plus.)
Data Integration and ETL Processes: Implement ETL (Extract, Transform, Load) processes to aggregate data from multiple sources into data lakes and data warehouses.
AWS Services Management: Utilize AWS services such as S3, Lambda, EC2, Glue, Redshift, RDS, Athena, and Kinesis to manage and optimize data processing workflows. AWS Services Expertise: Demonstrate comprehensive knowledge and hands-on experience in managing the full range of AWS services required for the organization's data processing workflows, including:
AWS Glue: Design, develop, and maintain Glue jobs for data extraction, transformation, and loading (ETL) tasks. Optimize Glue job performance and reliability.
AWS Lambda: Architect and implement serverless data processing functions using Lambda, integrating them with other AWS services as needed.
AWS API Gateway: Build and manage secure, scalable, and RESTful APIs using API Gateway to expose data and functionalities.
Amazon S3: Utilize S3 for reliable and scalable data storage, including the ingestion of JSON files and integration with downstream services.
Amazon EC2: Provision and manage EC2 instances as needed to support the data processing infrastructure, such as triggering Lambda functions or other custom applications.
AWS IAM: Establish appropriate IAM roles, policies, and permissions to ensure secure access to AWS resources and data.
AWS Snowflake Integration: Integrate the data processing workflows with Snowflake, leveraging features like Snowpipe to load data into the data warehouse.
Design and implement end-to-end data processing pipelines, seamlessly integrating the various AWS services.
Ensure data integrity, security, and compliance throughout the data processing lifecycle.
Optimize the performance, cost-effectiveness, and scalability of the data processing infrastructure.
Automate and streamline the deployment, monitoring, and maintenance of the AWS-based data processing environment.
Collaborate with cross-functional teams to understand requirements and deliver data-driven solutions.
Stay up-to-date with the latest AWS service updates, best practices, and industry trends to continuously improve the data processing capabilities.
Automation and Monitoring: Automate data ingestion, transformation, and quality checks, ensuring data availability and integrity across platforms. Monitor the health and performance of data pipelines and optimize them for performance and cost.
Data Storage and Retrieval: Design and manage cloud-based storage solutions, including data lakes and warehouses, ensuring easy retrieval and accessibility of data.
Collaboration with Data Teams: Work closely with data scientists, analysts, and business teams to understand their data requirements and design solutions that meet their needs.
Data Quality and Consistency: Implement data quality checks and validation mechanisms to ensure the accuracy, completeness, and consistency of data across various systems.
Documentation and Reporting: Maintain comprehensive documentation on data systems, pipeline workflows, and solutions. Provide insights and recommendations based on data analysis.
Stay Current with Trends: Keep up-to-date with the latest technologies, tools, and best practices in data engineering, cloud computing, and the AWS ecosystem.

Required Skills and Qualifications:

5+ years of experience as a Data Engineer or similar role with expertise in Python and AWS ecosystem.
Proficiency in Python: Strong experience in writing clean, efficient, and scalable Python code for data processing, automation, and pipeline development.
AWS Ecosystem Expertise: Deep understanding and hands-on experience with key AWS services including S3, Lambda, Glue, EC2, Redshift, RDS, Athena, and Kinesis, including API Gateway and EMR Cluster.
Data Pipeline Development: Proven experience in designing and building scalable and reliable data pipelines, ETL processes, and automating data workflows.
Database Management: Experience working with relational (SQL) and NoSQL databases. Familiarity with AWS Redshift, RDS, DynamoDB, and other database systems.
Data Warehousing: Experience with cloud-based data warehousing solutions such as AWS Redshift or similar.
Snowflake: Deep understanding of the Snowflake cloud data warehouse platform, including its features, capabilities, and best practices for data loading, transformation, and querying.
Big Data Tools: Familiarity with big data technologies like Apache Spark, Hadoop, and Kafka is a plus.
Data Integration: Experience integrating data from multiple sources, including APIs, third-party services, and on-premise data systems.
Data Quality Assurance: Experience in ensuring data quality, consistency, and security across data pipelines.
Version Control: Proficient in using Git or similar version control systems for code management.
CI/CD: Familiar with Continuous Integration and Continuous Deployment practices and tools.
Problem-Solving: Strong analytical and troubleshooting skills, with the ability to quickly resolve data issues and optimize pipeline performance.
Collaboration Skills: Excellent communication and interpersonal skills with the ability to work effectively across teams and stakeholders.

Preferred Skills:

Containerization: Experience with Docker or Kubernetes for containerizing data applications and pipelines.
DevOps Practices: Familiarity with infrastructure-as-code tools like Terraform, CloudFormation, and AWS CDK.
Data Visualization: Experience with data visualization tools (e.g., Tableau, Power BI) for reporting and presenting data insights.
Machine Learning: Familiarity with machine learning concepts or frameworks for integrating data pipelines into ML workflows.
Apache Airflow: Experience with workflow orchestration tools like Apache Airflow for managing complex data pipelines.
Experience in managing project communication or Client-facing experience.

Education:

Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field (or equivalent work experience).

Why Join Us:

Innovative Projects: Work on challenging and impactful data engineering projects using the latest AWS tools and technologies.
Career Growth: Take advantage of continuous learning opportunities and career development in a fast-growing field.
Collaborative Culture: Join a team of talented professionals in a collaborative and supportive work environment.
Competitive Compensation: Enjoy a competitive salary and benefits package based on your experience.