Devops Engineer - Machine Learning
At CoMind, we are developing a non-invasive neuromonitoring technology that will result in a new era of clinical brain monitoring. In joining us, you will be helping to create cutting-edge technologies that will improve how we diagnose and treat brain disorders, ultimately improving and saving the lives of patients across the world.
The Role
CoMind is seeking a skilled DevOps Engineer to join our dynamic Research Data Science team to lead the orchestration of a robust ML training pipeline in AWS. This role is critical to enabling the scalable training and testing of a range of ML models on large volumes of a totally new form of clinical neuromonitoring data.
Responsibilities:
-
Architect and implement a scalable solution to support the Research Data Science Team in running a large number of assorted machine learning pipelines, including model training, evaluation, and inference
-
Create a CI/CD pipeline for building containers from in-house Python packages, running integration tests, and publishing to AWS ECR
-
Set up ECS or AWS Batch Tasks to run containers stored in AWS ECR
-
Establish a robust configuration management system to store, version, and retrieve configurations associated with multiple machine learning workflows
-
Implement robust error handling and monitoring solutions to ensure timely debugging across the pipeline with centralised logging and error reporting
-
Implement cost monitoring solutions to track and manage compute costs across different runs, building dashboards to provide insights into resource usage and cost optimization
-
Ensure security and data protection are integrated into the pipelines by applying AWS best practices for security protocols and data management
-
Monitor and manage the team's compute resources, including both cloud (AWS) and on-premise GPU nodes, ensuring efficient use and scalability
-
Implement Infrastructure as Code (IaC) to set up and manage the pipeline architecture, using Terraform, AWS CloudFormation, or similar tools.
Skills & Experience:
-
Git or Bitbucket for version control, including experience with managing versioned infrastructure-as-code (IaC) repositories
-
CI/CD pipelines for automating workflows, including experience with integration testing and containerization pipelines
-
Experience managing and orchestrating complex cloud workflows (e.g., ECS Tasks, AWS Batch), with a focus on event-driven and parallel processing
-
Infrastructure as Code (IaC) experience (e.g., Terraform, AWS CloudFormation) for creating, maintaining, and scaling cloud infrastructure
-
Docker for containerization, including experience with containerizing machine learning workflows and publishing containers to repositories like AWS ECR.
Benefits:
-
Company equity plan
-
Company pension scheme
-
Private medical, dental and vision insurance
-
Group life assurance
-
Comprehensive mental health support and resources
-
Unlimited holiday allowance (+ bank holidays)
-
Hybrid working (3 days in-office)
-
Quarterly work-from-anywhere policy
-
Weekly lunches
-
Breakfast and snacks provided.
Search jobs by borough
- Barking and Dagenham
- Barnet
- Bexley
- Brent
- Bromley
- Camden
- Camden Town, England, United Kingdom (On-site)
- City of London
- Croydon
- Ealing
- East Ham, England, United Kingdom
- Enfield
- Greenwich
Tech Jobs London
© 2024 techjobslondon.co.uk