Senior Site Reliability Engineer (Kubernetes) | (IVN-849)

21 nov
el Prat de Llobregat

SITA FOR AIRCRAFT is the air travel industry’s trusted connected aircraft service expert. With its unrivalled industry-backed heritage, SITA FOR AIRCRAFT empowers 400+ airlines, 16,000+ aircraft and 30+ operators to navigate the complexity of connectivity and unlock connected aircraft value.

SITA FOR AIRCRAFT was born to drive new thinking and address challenges airlines face in ground and inflight connectivity. We provide cockpit data services, air traffic management solutions, aircraft communications and infrastructure solutions, alongside application development for both passengers and crew.

We are a fully owned subsidiary of SITA. In January 2015,

SITA and OnAir formed SITAONAIR (now SITA FOR AIRCRAFT) as part of the SITA Group to help airlines realise the full potential of the connected aircraft.

SITA FOR AIRCRAFT we believe that creating and nurturing an inclusive culture is about who we are as an organisation, and as an employer. Diversity is more than a target to us, it’s a key part out of our collective identity and values.


As a Site Reliability Engineer, you will solve exciting technical challenges by analysing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance.

As an SRE, you will understand the end-to-end configuration, technical dependencies, and overall behavioural characteristics of the production services you collaborate with. In partnership with your Development colleagues, you will have the responsibility to ensure that services are designed and delivered to be mission critical with a focus on security, resiliency, scale, and performance.

You'll be responsible to help support 24x7 uptime and availability of production mission critical customer facing cloud services distributed across multiple regions. You'll help to create more consistent, automated push button environments across all tiers, proactively test and tune all aspects of the infrastructure, streamline CI/CD processes, monitor and respond to system notifications and alerts and continually work to optimize and improve the performance, security and reliability of our systems.

Your role will involve:

• Help build a Site Reliability Engineering culture across the organization by sharing your best practices, approaches, documentation, and code with other engineering teams.

• Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.

• Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices.

• Monitor application performance take steps to improve overall application performance and stability and follow through with implementation.

• Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability.

• Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency.

• Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability.

• Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it.

• Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure

• Keep up-to date with security and proactively identify, diagnose, and solve complex security issues


We would like it if you have the below qualifications, knowledge and experiences:

• 9+ years IT Operations experience in large scale, mission critical enterprise heterogeneous infrastructure leveraging DevOps, SRE & Agile methodologies with B. Tech./B.E. degree in Electronics & Telecomm or Computer Science

• Demonstrated understanding of ITIL methodologies, ITIL v3 or v4 certification

• 5+ years of experience in writing automation scripts, building application dashboards for proactive monitoring using Ruby, PowerShell, Python scripting or similar technologies, ability to debug and optimize code and automate routine tasks.

• 5+ years of experience of supporting / managing hypervisor-based products/infrastructure (VMware, KVM, etc.)

• Experience operating applications and databases with demanding scalability or availability requirements

• Database experience, including knowledge of SQL and NoSQL, such as MySQL, MongoDB and PostgreSQL

• Experience with CI/CD in cloud environments and container technology, Docker and Kubernetes, Docker Swarm

• Experience as Linux systems administrator (e.g. CentOS, RedHat) and command line system administration such as Bash, VIM, SSH.

• Experience in monitoring and analysing infrastructure performance using standard performance monitoring tools - Perfmon, PerfView, ProcDump, DebugDiag, Nagios, New Relic

• Extended expertise in infrastructure core components: storage, system and/or networking

• Strong understanding of TCP/IP networking, including familiarity with concepts such as OSI stack.

• Strong understanding of Internet protocols and applications such as SMTP, DNS, HTTP, SSH, SNMP etc.

• Hands on experience in configuration management of server farms (using tools such as Puppet, Chef, Ansible etc).


If you apply we will carefully review your fit against the position criteria and feedback to you. If your profile does not meet the criteria, we will retain your profile as an active applicant for future consideration.

Suscribete a esta alerta