Site Reliability Engineer
Dev Ops & SysAdminBookmark Details
FireHydrant (11-50 Employees, 680% 2 Yr Employee Growth Rate)
179% 1-Year Employee Growth Rate | 680% 2-Year Employee Growth Rate | LinkedIn | $9.5M Venture Funding
What Is Employee Growth Rate & Why Is It Important?
Overview
COVID Update – We are actively hiring for this position and have been operating as a fully distributed organization across the United States since FireHydrant started. We anticipate a future where we will be able to periodically gather as a company again, but we put the health and wellbeing of our employees (and their families) first.
About FireHydrant
FireHydrant helps companies recover from IT disasters more quickly. The FireHydrant platform includes Incident Response, Status Pages, Retrospectives, and Deploy Events so you can take control of your complex system, reduce downtime, and work better together. We’re a Series A company with around 35 employees who sit across the United States and we’re growing.
About the Role
This is an exciting opportunity to work across our engineering team to help all of us level up, teaching us and building with us excellent tooling and best-practices around monitoring, observability and reliability. You’ll be collaborating with engineers across our growing team and finding ways to help all of us be more successful at providing a valuable tool our customers use to mitigate incidents and learn from them.
We’re looking for people with a strong background in working with an organization to shape the SRE practice from the ground up. And, as with all roles here, you should be comfortable learning new skills and mentoring others new to the space.
You’ll be working on
- Helping to improve and refine our on-call and incident management process, how we handle incidents and how we learn from them
- Updating and maintaining our development and production environments’ observability to ensure they are reliable and available
- Working to further improve our disaster recovery plan and acting as a mentor for the rest of engineering around how to plan for these types of incidents
- Refining and improving our Service Level Objectives to help us improve our internal Service Level Agreements and Indicators
We’re looking for someone who
- Has experience as an SRE working to help ensure the reliability, stability and scalability of applications by increasing observability
- Has experience working on Developer Empowerment by implementing and advocating for tooling automation for engineering teams
- Is looking to help educate and mentor not only our engineers but the larger FireHydrant organization on best-practices for site and service reliability
- Is excited to pilot new programs to help us grow in our incident response maturity by running Game Days, running audits around availability and performance, and to continue to improve our customer experience
Life at FireHydrant
- We’re remote-first with engineers around the US, our headquarters is in NYC (Union Square)
- We collaborate through Slack, Zoom, GitHub pull requests, Notion, and Clubhouse
- We believe in a healthy work-life balance; we’re early stage but work reasonable hours and want you to use your vacation time
Benefits
- 100% employer-paid health, vision and dental premiums for the employee and 75% of dependents
- Unlimited vacation policy with a minimum requirement of three weeks off per-year
- Wellness program: reimbursements for your gym membership, athletic equipment, nutrition plans, etc
- Education budget: conferences, books, online courses, etc
- 401k match