Sr. Site Reliability Engineer

Discipline: DevOps
Salary: $150000 to $170000
Contact email: rgarcia@brightmetro.com
Job ref: 516304
Published: 3 months ago
About the Company:
Our client created a digital marketing platform used by Spotify, Airbnb, and many other leading companies, to win in key moments of the customer journey.

About the Position:
  • 100% remote from any U.S. City
  • No Visa Sponsorship
As a Site Reliability Engineer, you will be part developer, part operations, all continuous integration, and delivery expert; you will be integral to the design, set up, automation, and maintenance of our entire integration and delivery pipeline.  The ideal candidate should have a deep software development background married with effective intercommunication skills to promote collaboration with developers, support engineers, customers, and senior management.

Responsibilities: 
  • Be part of PagerDuty rotation responding to platform incidents and provide support for other engineers who are responding to customer issues.
  • Use your daily interactions with the platform and your experience and skills to constantly improve our environment and ensure that issues do not reoccur.
  • Maintain and augment our monitoring systems so that they alert on symptoms, instead of issues.
  • Be proactive and take ownership in identifying, raising, and resolving issues or deficiencies you see anywhere in our environment.
  • Produce and improve internal documentation and SOPs where they are missing or lacking quality or details
    Write new tools and improve existing ones to help automate and remove toil from the team.
  • Live-debug applications and issues, and identify, resolve or own resolution for functionality and performance deficiencies
Qualifications:
  • Have a bachelor’s degree in computer science or other highly technical, scientific discipline
  • Are able to program (structured and OO) with one or more high-level languages, preferably Python and either C#, Java, or Go
  • Comfortably “own” the Linux shell
  • Have a proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Have coding experience beyond simple scripts