Hardware Systems Engineer – RTP – Meta (formerly Facebook)
Software EngineerBookmark Details
Meta (formerly Facebook) (501+ Employees, 34% 2 Yr Employee Growth Rate)
16% 1-Year Employee Growth Rate | 34% 2-Year Employee Growth Rate | LinkedIn | $16.1B Venture Funding
What Is Employee Growth Rate & Why Is It Important?
Meta is seeking to hire an experienced Systems Engineer to join our RTP (Release to Production) Infrastructure Sustaining team in Dublin. This team is responsible for the hardware health of all of Facebook’s servers. Our servers and data centers are the foundation upon which our rapidly scaling infrastructure operates and hence hardware health is critical for services to be delivered.Engineers on the Infrastructure Sustaining team will work closely with Production Engineering, compute, storage, network, and datacenter operations teams. RTP Engineers use their hardware insights and software development expertise to maintain the hardware health of the entire fleet at a high level. This is a full-time position based in Dublin. The Dublin team is formed of several pods, and we are hiring for an engineer to have either a particular focus on automation and tooling, or supporting the Meta CDN systems, including Cloud Gaming.
Hardware Systems Engineer – RTP Responsibilities:
- develop, modify and extend new and existing automation infrastructure to manage, monitor and support fleet health at scale
- proactively conduct investigations and create tooling to detect and diagnose hardware health issues
- identify, validate and implement remediations across software and hardware stack
- interface with outside vendors and internal hardware, mechanical, power, thermal and software engineers to understand and debug system architecture
- interface with internal service, infrastructure, capacity and tooling teams to understand software and service requirements and behaviour
- develop and publish updates on resolutions and communicate findings internally
- troubleshoot, diagnose and root cause system failures and isolate the components / failure scenarios while working with internal & external stakeholders
- employ data visualisation techniques to highlight issues and implement systemic solutions to hardware health issues
Minimum Qualifications:
- OPTION 1: 5+ years of experience with computer hardware and systems
- OPTION 1: hands-on experience of debugging and resolving hardware and systems issues
- OPTION 1: facility with Python or equivalent development / scripting language
- OPTION 2: 5+ years of professional Python (or equivalent) experience
- OPTION 2: familiar with server architecture and components
- troubleshooting and analytical skills
- strong communicator
Preferred Qualifications:
- strong experience working with Linux systems
- experience with server GPU solutions (AI/ML and/or Gaming)
- experience with HPE server solutions