Certified K8s Site Reliability Engineer - Remote (US based position) remote

Olive

2021-09-26T11:14:43Z

Orlando Florida

United States

Engineering

(No Timezone Provided)

Description

Olive’s AI workforce is built to fix our broken healthcare system by addressing healthcare’s most burdensome issues -- delivering hospitals and health systems increased revenue, reduced costs, and increased capacity. People feel lost in the system today and healthcare employees are essentially working in the dark due to outdated technology that creates a lack of shared knowledge and siloed data. Olive is designed to drive connections, shining a new light on the broken healthcare processes that stand between providers and patient care. She uses AI to reveal life-changing insights that make healthcare more efficient, affordable and effective. Olive’s vision is to unleash a trillion dollars of hidden potential within healthcare by connecting its disconnected systems. Olive is improving healthcare operations today, so everyone can benefit from a healthier industry tomorrow.

Our Infrastructure team is looking to add an Engineer and continue to advance the cloud capabilities and services/systems for our internal engineering teams. As part of our engineering team, you’ll be responsible for ensuring Olive’s applications build, deploy and run smoothly. You’ll help keep our infrastructure up to date, and use new and existing tools to solve technical problems. At Olive, automation, reliability and efficiency are part of everything we do.

Responsibilities

Work directly with product engineering teams to architect and deploy applications using AWS services and methodologies.

Be one of the Kubernetes experts on the team.

Design and implement pipelines for Continuous Improvement and Continuous Delivery.

Create high quality alerts based on business centric performance metric including uptime, error rate, performance baseline, infrastructure load metrics.

Partner with product engineering teams and other SREs to optimize performance and solve issues across the entire stack: hardware, software, application, and network.

Plan, develop, and implement automated systems for deployment and automated issue remediation.

Embrace changing requirements.

Actively participate in architecture, design reviews and operational readiness exercises for new and existing services.

Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Go and Docker preferred.

Incident Triage & Response

Scalability Reviews (JVM tuning, Load testing, Architecture reviews, Database performance)

Requirements

High level experience in architecting systems using AWS services and methodologies.

Strong experience with Kubernetes. Certification is preferred

The ability to identify, document, and execute common deployment patterns to increase service coherency.

Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems.

Programming experience in Python, Golang or Ruby.

Experience with APM tools like New Relic, DataDog.Experience with deploying large projects using infrastructure as code tools like Terraform, Serverless Framework, or AWS CDK.

Experience running services in a large scale environment is a bonus but not required

Understanding of Linux operating system, networking, and databases

A degree in computer science is helpful but not required. We value skills and technical aptitude over degrees

Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer

You will be part of an on-call rotation consisting of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineering team practices DevOps culture and owns their respective services

Requirements

Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems

Programming skills in Python and shell scripting language

Experience with APM tools like New Relic, Data Dog and understanding the difference of APM vs Infrastructure monitoring tools is preferred.

Experience with infrastructure as code (Terraform, AWS, Azure)

Experience running services in a large scale environment is a bonus but not required

Understanding of Linux operating system, networking, and databases

Knowledge of TCP/IP, HTTP, web application security

Able to configure or learn to fix network systems including DNS, DHCP, and Load Balancer technologies.

A degree in computer science is helpful but not required. We value skills and technical aptitude over degree

Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer

You will be part of an on-call rotation consisted of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineer team practices Dev-Ops culture and owns their respective services

At Olive, we're committed to growing and empowering an inclusive community within our company and industry. This is why we hire and cultivate diverse teams of the best and brightest from all backgrounds, experiences, and perspectives across our organization. Research shows that oftentimes women and other minority groups only apply to open roles if they meet 100% of the listed criteria. Olive encourages everyone — including women, people of color, individuals with disabilities and those in the LGBTQIA+ community — to apply for our available positions, even if they don't necessarily check every box on the job description.

Benefits

Disclaimer:

This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Duties, responsibilities and activities may change or new ones may be assigned.

This job description does not constitute a contract of employment and Olive AI, Inc. may exercise its employment-at-will rights at any time.

Benefits:

We take the health and happiness of our employees seriously and consistently evaluate new ways to provide an amazing place to work. From retirement planning, to a wellness program designed to actively incorporate mental and physical wellness into daily interactions amongst fellow Olivians, we make sure to take care of our own.

Health, Dental, and Vision insurance that starts on your first day at Olive with 100% of premiums covered for team members and 75% covered for dependents

Monthly Grid stipend to cover work related expenses

Unlimited PTO

Telemedicine

EAP/Mental health resources

Getaways by Marriott Bonvoy

Family-building and fertility support via Kindbody

12 weeks of parental leave

401(K) match

Wellness program

Stock Options

Certified K8s Site Reliability Engineer - Remote (US based position) remote

Olive

2021-09-26T11:14:43Z

Orlando Florida

United States

Engineering

(No Timezone Provided)

Description

Responsibilities

Work directly with product engineering teams to architect and deploy applications using AWS services and methodologies.

Be one of the Kubernetes experts on the team.

Design and implement pipelines for Continuous Improvement and Continuous Delivery.

Create high quality alerts based on business centric performance metric including uptime, error rate, performance baseline, infrastructure load metrics.

Partner with product engineering teams and other SREs to optimize performance and solve issues across the entire stack: hardware, software, application, and network.

Plan, develop, and implement automated systems for deployment and automated issue remediation.

Embrace changing requirements.

Actively participate in architecture, design reviews and operational readiness exercises for new and existing services.

Incident Triage & Response

Scalability Reviews (JVM tuning, Load testing, Architecture reviews, Database performance)

Requirements

High level experience in architecting systems using AWS services and methodologies.

Strong experience with Kubernetes. Certification is preferred

The ability to identify, document, and execute common deployment patterns to increase service coherency.

Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems.

Programming experience in Python, Golang or Ruby.

Experience with APM tools like New Relic, DataDog.Experience with deploying large projects using infrastructure as code tools like Terraform, Serverless Framework, or AWS CDK.

Experience running services in a large scale environment is a bonus but not required

Understanding of Linux operating system, networking, and databases

A degree in computer science is helpful but not required. We value skills and technical aptitude over degrees

Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer

Requirements

Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems

Programming skills in Python and shell scripting language

Experience with APM tools like New Relic, Data Dog and understanding the difference of APM vs Infrastructure monitoring tools is preferred.

Experience with infrastructure as code (Terraform, AWS, Azure)

Experience running services in a large scale environment is a bonus but not required

Understanding of Linux operating system, networking, and databases

Knowledge of TCP/IP, HTTP, web application security

Able to configure or learn to fix network systems including DNS, DHCP, and Load Balancer technologies.

A degree in computer science is helpful but not required. We value skills and technical aptitude over degree

Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer

Benefits

Disclaimer:

This job description does not constitute a contract of employment and Olive AI, Inc. may exercise its employment-at-will rights at any time.

Benefits:

Health, Dental, and Vision insurance that starts on your first day at Olive with 100% of premiums covered for team members and 75% covered for dependents

Monthly Grid stipend to cover work related expenses

Unlimited PTO

Telemedicine

EAP/Mental health resources

Getaways by Marriott Bonvoy

Family-building and fertility support via Kindbody

12 weeks of parental leave

401(K) match

Wellness program

Stock Options

Apply