• Site Reliability Engineer

    Location US-CA-Sunnyvale
    Posting date 1 month ago(9/17/2018 2:13 AM)
    Job ID
    Software Engineering
  • Company description

    At Red Hat, we connect an innovative community of customers, partners, and contributors to deliver an open source stack of trusted, high-performing solutions. We offer cloud, Linux, middleware, storage, and virtualization technologies, together with award-winning global customer support, consulting, and implementation services. Red Hat is a rapidly growing company supporting more than 90% of Fortune 500 companies.

    Job summary

    Come work with some of the brightest engineers in the open source industry. The Red Hat Openshift Engineering Services team is looking for a Site Reliability Engineer to join us in San Francisco, CA. In this role, you will participate in developing and running the infrastructure and tools used by Red Hat's software developers to develop OpenShift by Red Hat and related technologies. As a Site Reliability Engineer, you will be a key part in the development and operation of our offerings, helping to increase the productivity and agility of the organization to deliver the platform using continuous integration (CI) processes and tools. You’ll also manage some of the production systems which host critical services that are used by development engineers and customers.

    Primary job responsibilities

    • Monitor systems and offerings on a daily basis to ensure that systems are performing as expected
    • Investigate, communicate, and troubleshoot escalated issues coming from engineering team
    • Collaborate with team members to determine the root cause of problems and the best course of action to resolve the problems and avoid them in the future
    • Maintain development, staging, and production environments and their monitoring and alerting systems
    • Participate as a member of an on-call rotation to support multiple systems during off-hours
    • Participate in internal tooling development
    • Constantly learn new things and understand current technologies
    • Inspire and contribute to upstream open source projects where necessary

    Required skills

    • Experience being part of the Site Reliability Engineering or DevOps team
    • 5+ years of related experience   
    • Solid experience with Linux, cloud computing and related software like Amazon Web Services (AWS), distributed web technologies, database administration, high availability, load balancing, failover, monitoring and alerting, shell scripting, and build tools
    • Experience in container technologies like Docker and Kubernetes
    • Ability to conduct research into a wide range of computing issues as required
    • Ability to effectively prioritize and carry out tasks in a high-pressure environment   
    • Application and process support experience in a 24/7 environment; ability and willingness to work flexible hours   
    • Good written and verbal communication skills to handle communication in a distributed team
    • Knowledge of Git, GitHub, JIRA, Jenkins, OpenShift by Red Hat, etcd, and Prometheus is a plus

    Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, uniformed services, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.

    Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.


    Interested in this job?

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed