For one our clients based in Czech Republic, we are currently looking for a highly skilled Lead Site Reliability Engineer (SRE).This role involves ensuring the reliability and performance of services while collaborating with development teams. Key responsibilities include designing and implementing scalable infrastructure on Azure cloud, managing CI/CD pipelines and ensuring system reliability. Developer experience is necessary, ideally in C#, .NET and knowledge of SQL queries.
The role is not just about running performance troubleshooting but also about problem-solving and understanding the overall system.
The candidate should be able to find hotspots in the code and SQL queries and suggest practical solutions.
Details:
- Role: Lead SRE Engineer
- Location: Remote with possible occasional in person team sessions / workshops / gatherings (i.e. 1x quarter) likely to take place in Prague
- Working overlap needed: 2 - 6 CET possibility of a wider overlap (flexibility) appreciated
- Start: asap
- Duration: 6 months+ extension
Scope and responsibilities:
- Design and maintain scalable infrastructure solutions with standard CI/CD processes to deliver microservices-based applications
- Optimize deployment processes and reduce release cycle times
- Monitor and troubleshoot system and possible issues pro actively
- Analysis and integration of modern technologies as opportunities to improve
- Performance engineering:
- Conduct performance analysis and optimization, assist with performance troubleshooting efforts and communicate findings to various teams
- Collaborate with teams to identify and resolve performance bottlenecks in C# code and SQL queries
- Engage in failover testing, infrastructure hardening, and performance troubleshooting.
- Address system-wide issues by viewing the infrastructure and applications holistically
- Operate independently and confidently, building relationships and providing innovative solutions
- Transition current application services to App Service plans, legacy apps to Azure Cloud, IaaS to PaaS
- Knowledge sharing:
- Educate team members on best practices, architectural patterns, and performance engineering
- Share lessons learned, frequent issues and how to overcome them
Requirements and skills:
- Working in agile and cross-functional teams
- Development Experience: Strong background in C# and SQL development
- DevOps Expertise: Comprehensive knowledge of Azure services and tools, including AppInsights, Azure CLI, PowerShell, and bash scripting
- Performance Engineering: Ability to identify performance issues, analyze hotspots in code, and suggest effective solutions
- Problem-Solving: Strong analytical skills to diagnose and resolve outages and performance bottlenecks
- Communication: Excellent ability to talk to various teams, understand their concerns, and provide actionable insights
- Independence: Capable of operating independently with loosely defined tasks, confident in executing tasks, and comfortable in building relationships
- Professional proficiency in English
- Nice-to-have Skills:
- Experience with Python scripting.
- Knowledge of Databricks.
- Familiarity with DataDog and SonarQube
- Success Factors:
- High level of confidence and independence in daily operations.
- Ability to execute tasks with minimal guidance.
- Strong relationship-building skills with team members and stakeholders.
- Proactive in sharing ideas and innovative solutions.
- Hands-on experience and practical knowledge in development and DevOps
- Additional Expectations
- Provide practical hands-on experience and context to team members.
- Share knowledge on patterns, architecture, and best practices.
- Not expected to solve everything but to help educate and guide the team
- Systems and tools:
- Required:
- C#, .NET Core, ASP.NET WebApi, Entity Framework
- SQL, SQL Profiler
- Azure DevOps CI/CD pipelines
- Azure tooling - Azure CLI, Azure KeyVault
- PowerShell, bash
- AppInsights
- Docker, Containers
- k8s, Azure Kubernetes Service
- Terraform
- ARM
- git
- Nice to have:
- Python scripting, DataDog, Sonarqube