Based in Hyderabad, this full-time role is designed for an experienced infrastructure management professional with 6 to 8 years of expertise, particularly in Linux systems and cloud platforms. The position focuses on designing, implementing, and optimizing both cloud and on-premises infrastructure to support AI/ML applications. The successful candidate will ensure that the infrastructure is highly available, scalable, and secure. Strong skills in Linux administration, cloud services across AWS and Azure, infrastructure as code, and AI/ML infrastructure deployment are essential.
Key Responsibilities
Demonstrate in-depth knowledge of Linux operating systems such as CentOS, Ubuntu, and Red Hat. This includes proficiency in shell scripting, package management, and routine system administration tasks. Configure and optimize Linux servers to enhance performance, security, and resource efficiency, covering areas like kernel tuning, file system management, and network configuration.
Leverage hands-on experience with AWS and Azure services, including EC2, S3, Lambda, RDS, Azure VMs, Blob Storage, and Azure Functions. Architect cloud solutions that follow best practices, emphasizing scalability, reliability, and cost-effectiveness. Implement and manage hybrid cloud environments to enable seamless integration between AWS and Azure platforms.
Develop and maintain Infrastructure as Code (IaC) templates using tools such as Terraform or AWS CloudFormation. Ensure automated provisioning and configuration while applying version control to maintain traceability, auditability, and reproducibility of infrastructure changes.
Set up cloud infrastructure stacks, databases, and service endpoints, managing CPU/GPU resource scaling and optimization specifically for AI/ML workloads. Deploy AI/ML applications using Docker and Kubernetes, focusing on scaling, high availability, and reliability. Manage GPU clusters dedicated to AI/ML training and inference, ensuring optimal performance and maintenance.
Work closely with AIOps and MLOps practices to streamline AI/ML operations and improve overall efficiency.
Required Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related discipline. A minimum of 6 years’ experience in infrastructure management roles, with a strong emphasis on cloud platforms, preferably AWS and Azure. Proficiency in scripting languages such as Python, PowerShell, or Bash is required. A solid understanding of DevSecOps principles and best practices is essential. Excellent communication and collaboration skills are necessary to work effectively within cross-functional teams.
Preferred Qualifications and Benefits
Relevant certifications such as AWS Solutions Architect Associate, AWS Cloud Practitioner, Azure DevOps Engineer Expert, Azure Administrator, or Certified Kubernetes Administrator are highly desirable. The role offers the opportunity to engage with impactful technical challenges that have a global reach.
Employees will have access to continuous learning resources, including online university courses and knowledge-sharing forums. Participation in sponsored tech talks and hackathons is encouraged to foster innovation and skill development.
A comprehensive benefits package is provided, featuring health insurance, retirement plans, flexible working hours, and more. The organization promotes a supportive work culture that encourages exploring personal passions alongside professional growth.
This position presents a unique opportunity to contribute to cutting-edge business solutions while advancing your career in a collaborative and innovative environment.