Certified K8s Site Reliability Engineer - Remote (US based position) remote

Olive

Orlando Florida

United States

Engineering
(No Timezone Provided)

Description

Olive’s AI workforce is built to fix our broken healthcare system by addressing healthcare’s most burdensome issues -- delivering hospitals and health systems increased revenue, reduced costs, and increased capacity. People feel lost in the system today and healthcare employees are essentially working in the dark due to outdated technology that creates a lack of shared knowledge and siloed data. Olive is designed to drive connections, shining a new light on the broken healthcare processes that stand between providers and patient care. She uses AI to reveal life-changing insights that make healthcare more efficient, affordable and effective. Olive’s vision is to unleash a trillion dollars of hidden potential within healthcare by connecting its disconnected systems. Olive is improving healthcare operations today, so everyone can benefit from a healthier industry tomorrow.

Our Infrastructure team is looking to add an Engineer and continue to advance the cloud capabilities and services/systems for our internal engineering teams. As part of our engineering team, you’ll be responsible for ensuring Olive’s applications build, deploy and run smoothly. You’ll help keep our infrastructure up to date, and use new and existing tools to solve technical problems. At Olive, automation, reliability and efficiency are part of everything we do.

Responsibilities

  • Work directly with product engineering teams to architect and deploy applications using AWS services and methodologies.
  • Be one of the Kubernetes experts on the team.
  • Design and implement pipelines for Continuous Improvement and Continuous Delivery.
  • Create high quality alerts based on business centric performance metric including uptime, error rate, performance baseline, infrastructure load metrics.
  • Partner with product engineering teams and other SREs to optimize performance and solve issues across the entire stack: hardware, software, application, and network.
  • Plan, develop, and implement automated systems for deployment and automated issue remediation.
  • Embrace changing requirements.
  • Actively participate in architecture, design reviews and operational readiness exercises for new and existing services.
  • Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Go and Docker preferred.
  • Incident Triage & Response
  • Scalability Reviews (JVM tuning, Load testing, Architecture reviews, Database performance)
  • Requirements

  • High level experience in architecting systems using AWS services and methodologies.
  • Strong experience with Kubernetes. Certification is preferred
  • The ability to identify, document, and execute common deployment patterns to increase service coherency.
  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems.
  • Programming experience in Python, Golang or Ruby.
  • Experience with APM tools like New Relic, DataDog.Experience with deploying large projects using infrastructure as code tools like Terraform, Serverless Framework, or AWS CDK.
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degrees
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisting of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineering team practices DevOps culture and owns their respective services
  • Requirements

  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems
  • Programming skills in Python and shell scripting language
  • Experience with APM tools like New Relic, Data Dog and understanding the difference of APM vs Infrastructure monitoring tools is preferred.
  • Experience with infrastructure as code (Terraform, AWS, Azure)
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • Knowledge of TCP/IP, HTTP, web application security
  • Able to configure or learn to fix network systems including DNS, DHCP, and Load Balancer technologies.
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degree
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisted of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineer team practices Dev-Ops culture and owns their respective services
  • At Olive, we're committed to growing and empowering an inclusive community within our company and industry. This is why we hire and cultivate diverse teams of the best and brightest from all backgrounds, experiences, and perspectives across our organization. Research shows that oftentimes women and other minority groups only apply to open roles if they meet 100% of the listed criteria. Olive encourages everyone — including women, people of color, individuals with disabilities and those in the LGBTQIA+ community — to apply for our available positions, even if they don't necessarily check every box on the job description.

    Benefits

    Disclaimer:

    This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Duties, responsibilities and activities may change or new ones may be assigned.

    This job description does not constitute a contract of employment and Olive AI, Inc. may exercise its employment-at-will rights at any time.

    Benefits:

    We take the health and happiness of our employees seriously and consistently evaluate new ways to provide an amazing place to work. From retirement planning, to a wellness program designed to actively incorporate mental and physical wellness into daily interactions amongst fellow Olivians, we make sure to take care of our own.

  • Health, Dental, and Vision insurance that starts on your first day at Olive with 100% of premiums covered for team members and 75% covered for dependents
  • Monthly Grid stipend to cover work related expenses
  • Unlimited PTO
  • Telemedicine
  • EAP/Mental health resources
  • Getaways by Marriott Bonvoy
  • Family-building and fertility support via Kindbody
  • 12 weeks of parental leave
  • 401(K) match
  • Wellness program
  • Stock Options
  • Certified K8s Site Reliability Engineer - Remote (US based position) remote

    Olive

    Orlando Florida

    United States

    Engineering

    (No Timezone Provided)

    Description

    Olive’s AI workforce is built to fix our broken healthcare system by addressing healthcare’s most burdensome issues -- delivering hospitals and health systems increased revenue, reduced costs, and increased capacity. People feel lost in the system today and healthcare employees are essentially working in the dark due to outdated technology that creates a lack of shared knowledge and siloed data. Olive is designed to drive connections, shining a new light on the broken healthcare processes that stand between providers and patient care. She uses AI to reveal life-changing insights that make healthcare more efficient, affordable and effective. Olive’s vision is to unleash a trillion dollars of hidden potential within healthcare by connecting its disconnected systems. Olive is improving healthcare operations today, so everyone can benefit from a healthier industry tomorrow.

    Our Infrastructure team is looking to add an Engineer and continue to advance the cloud capabilities and services/systems for our internal engineering teams. As part of our engineering team, you’ll be responsible for ensuring Olive’s applications build, deploy and run smoothly. You’ll help keep our infrastructure up to date, and use new and existing tools to solve technical problems. At Olive, automation, reliability and efficiency are part of everything we do.

    Responsibilities

  • Work directly with product engineering teams to architect and deploy applications using AWS services and methodologies.
  • Be one of the Kubernetes experts on the team.
  • Design and implement pipelines for Continuous Improvement and Continuous Delivery.
  • Create high quality alerts based on business centric performance metric including uptime, error rate, performance baseline, infrastructure load metrics.
  • Partner with product engineering teams and other SREs to optimize performance and solve issues across the entire stack: hardware, software, application, and network.
  • Plan, develop, and implement automated systems for deployment and automated issue remediation.
  • Embrace changing requirements.
  • Actively participate in architecture, design reviews and operational readiness exercises for new and existing services.
  • Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Go and Docker preferred.
  • Incident Triage & Response
  • Scalability Reviews (JVM tuning, Load testing, Architecture reviews, Database performance)
  • Requirements

  • High level experience in architecting systems using AWS services and methodologies.
  • Strong experience with Kubernetes. Certification is preferred
  • The ability to identify, document, and execute common deployment patterns to increase service coherency.
  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems.
  • Programming experience in Python, Golang or Ruby.
  • Experience with APM tools like New Relic, DataDog.Experience with deploying large projects using infrastructure as code tools like Terraform, Serverless Framework, or AWS CDK.
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degrees
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisting of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineering team practices DevOps culture and owns their respective services
  • Requirements

  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems
  • Programming skills in Python and shell scripting language
  • Experience with APM tools like New Relic, Data Dog and understanding the difference of APM vs Infrastructure monitoring tools is preferred.
  • Experience with infrastructure as code (Terraform, AWS, Azure)
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • Knowledge of TCP/IP, HTTP, web application security
  • Able to configure or learn to fix network systems including DNS, DHCP, and Load Balancer technologies.
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degree
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisted of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineer team practices Dev-Ops culture and owns their respective services
  • At Olive, we're committed to growing and empowering an inclusive community within our company and industry. This is why we hire and cultivate diverse teams of the best and brightest from all backgrounds, experiences, and perspectives across our organization. Research shows that oftentimes women and other minority groups only apply to open roles if they meet 100% of the listed criteria. Olive encourages everyone — including women, people of color, individuals with disabilities and those in the LGBTQIA+ community — to apply for our available positions, even if they don't necessarily check every box on the job description.

    Benefits

    Disclaimer:

    This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Duties, responsibilities and activities may change or new ones may be assigned.

    This job description does not constitute a contract of employment and Olive AI, Inc. may exercise its employment-at-will rights at any time.

    Benefits:

    We take the health and happiness of our employees seriously and consistently evaluate new ways to provide an amazing place to work. From retirement planning, to a wellness program designed to actively incorporate mental and physical wellness into daily interactions amongst fellow Olivians, we make sure to take care of our own.

  • Health, Dental, and Vision insurance that starts on your first day at Olive with 100% of premiums covered for team members and 75% covered for dependents
  • Monthly Grid stipend to cover work related expenses
  • Unlimited PTO
  • Telemedicine
  • EAP/Mental health resources
  • Getaways by Marriott Bonvoy
  • Family-building and fertility support via Kindbody
  • 12 weeks of parental leave
  • 401(K) match
  • Wellness program
  • Stock Options