Technical Manager of Cloud Infrastructure Operations (Remote)

Humana

Troy Michigan

United States

Information Technology
(No Timezone Provided)

About this job

Description

The Manager of Cloud Infrastructure Operations will lead the Hosting Operations of our Azure, AWS and GCP Cloud offerings. This role will be a 50% management role and 50% hands on role. The Manager is an expert in 24/7 operations with high performing and scaling systems that meet a high degree of uptime. An expert in all facets of Cloud hosting operations with the ability to effectively communicate with customers, and internal stake holders. The manager will oversee the Managed Service Provider within the 24×7 environment. A leader in continuous integration and continuous development with automated deployment in an Agile SDLC. This will include generating and providing recommendations on how

Description

The Manager of Cloud Infrastructure Operations will lead the Hosting Operations of our Azure, AWS and GCP Cloud offerings. This role will be a 50% management role and 50% hands on role. The Manager is an expert in 24/7 operations with high performing and scaling systems that meet a high degree of uptime. An expert in all facets of Cloud hosting operations with the ability to effectively communicate with customers, and internal stake holders. The manager will oversee the Managed Service Provider within the 24×7 environment. A leader in continuous integration and continuous development with automated deployment in an Agile SDLC. This will include generating and providing recommendations on how Humana should optimize usage of Cloud services, compliance controls through analysis and develop automated reporting that enables teams to leverage best practices for running efficient Cloud solutions.

If you’re passionate about innovation and love working in an environment where you can constantly improve and adopt new technologies to drive business results, then Humana’s Cloud Infrastructure Operations team could be the place for you!

Responsibilities

Responsibilities:

  • Manage operations plans, staffing, budget and execution.
  • Lead escalated Incident Management team and develop maturity plans to continually improve – reducing MTTD/MTTR.
  • Improve incident and problem management functions while working to build a world-class incident response function for our customers.
  • Build out and/or automate required L2 SOP’s for L2 MSP Team.
  • Provide hands-on experience in the Cloud Well Architect Framework to support overall Operations initiatives
  • Establish and refine automated monitoring tools to track systems’ health, uptime and outages.
  • Ensure compliance with best security practices and continuously assess potential vulnerabilities.
  • Optimize operations costs across vendors and service providers.
  • Partner with monitoring team to build maturity around event management.
  • Partner with Engineering, L3 teams and DevOps on CICD and automated deployment.
  • Collaborate with vendor partners, maintain strategic relationships and identifying continuous improvement opportunities.
  • Identify key procedures that can be automated and either automate them or work with platform engineering team to develop automation.
  • Establish, report, and improve various metrics associated with the efficiency of operating the Humana foundation environment suite delivering value to our customers.
  • Adhere to established customer SLA’s.
  • Executes data-driven decisions by delivering operational metrics by analyzing operational data to identify trends and potential problems.
  • What you bring and what you will do:

    People and Leadership

  • Strong leadership and people management skills.
  • Have worked with-in or lead an Outsource Mananged Service Provider
  • Ability to make solid business decisions in a dynamic and fast-paced environment.
  • Ability to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiatives.
  • Manage and optimize Cloud infrastructure and services.
  • Solution-oriented leadership (lead by example) and a management-based approach.
  • The ability to communicate effectively to executives, engineers and customers.
  • The ability to build effective relationships with internal business stakeholders and external partners.
  • An inspiring and creative leadership style that inspires and influences others.
  • Availability for off‐hours work related to 24/7 up-time and availability of the Cloud product suite; willingness to support the team who has on-call coverage expectations. Provide guidance, objectives, and metrics and oversight to help teams maintain 24/7 uptime and availability of production mission critical customer facing services.Oversee and refine processes, practices, and tooling that teams will use to meet their service level objectives using a hands on approach.
  • Technical Acumen

  • Deep understanding of the key concepts and practices of Cloud observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide holistic state of the Cloud operations.
  • You have a deep understanding how to apply best practices around monitoring, alerting, logging and have hands-on implementation experience with one or more monitoring, alerting, and logging systems (Azure Monitor, CloudWatch, AppInsight, Log Analytics, Splunk, Dynatrace, BigPanda, ThousandEyes, SolarWinds, etc…).
  • Knowledge of corporate IT, data centers, ticketing system implementations, monitoring software implementation, troubleshooting, and continuous improvement approaches.
  • Server-less computing experience with containers (AKS/EKS) and VM based workloads along with a solid understanding of the trade-offs of different serverless implementations emerging in public Cloud.
  • Experience with and enthusiasm for operating in an agile DevOps oriented organization and culture.
  • A technical business acumen that ensures the organization is operating efficiently and effectively in a hybrid environment.
  • Knowledge of monitoring systems for infrastructure monitoring as well as application performance monitoring including SLAs/KPIs and reporting approaches for the multi Cloud platforms.
  • Hands-on experience and knowledge in ITIL processes related to Incident Management, Service Requests, Event Management, Access Management, Change Management, Knowledge Management and Escalated Incident Management.
  • Partner with Engineering and Architecture team to design key concepts and practices of Observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide understanding of system state. Advocate for that strategy with engineers, managers, and executives.
  • Required Qualifications

  • Bachelor’s Degree in Computer Science, Information Technology, or equivalent experience.
  • 2+ years of experience managing 24/7 production operations for a high-volume, business-critical Cloud service.
  • 2+ years’ experience with Azure and/or AWS.
  • 2+ years’ experience working with a Managed Service Provider and managing IT vendor relationships.
  • 2+ years of transformational experience running Cloud at scale.
  • Must be passionate about contributing to an organization focused on continuously improving consumer experiences.
  • 2+ years of management experience.
  • Desired Qualifications

  • Azure cloud certification
  • Advanced understanding of Cloud platforms, consoles, and services (Azure, Google and AWS).
  • Knowledge or experience with Ansible Tower, API queries, and Power BI.
  • Scripting knowledge using Python, Perl, PowerShell, JavaScript, or similar scripting languages.
  • LI#Remote

    #Cloud

    Scheduled Weekly Hours

    40

    Technical Manager of Cloud Infrastructure Operations (Remote)

    Humana

    Troy Michigan

    United States

    Information Technology

    (No Timezone Provided)

    About this job

    Description

    The Manager of Cloud Infrastructure Operations will lead the Hosting Operations of our Azure, AWS and GCP Cloud offerings. This role will be a 50% management role and 50% hands on role. The Manager is an expert in 24/7 operations with high performing and scaling systems that meet a high degree of uptime. An expert in all facets of Cloud hosting operations with the ability to effectively communicate with customers, and internal stake holders. The manager will oversee the Managed Service Provider within the 24×7 environment. A leader in continuous integration and continuous development with automated deployment in an Agile SDLC. This will include generating and providing recommendations on how

    Description

    The Manager of Cloud Infrastructure Operations will lead the Hosting Operations of our Azure, AWS and GCP Cloud offerings. This role will be a 50% management role and 50% hands on role. The Manager is an expert in 24/7 operations with high performing and scaling systems that meet a high degree of uptime. An expert in all facets of Cloud hosting operations with the ability to effectively communicate with customers, and internal stake holders. The manager will oversee the Managed Service Provider within the 24×7 environment. A leader in continuous integration and continuous development with automated deployment in an Agile SDLC. This will include generating and providing recommendations on how Humana should optimize usage of Cloud services, compliance controls through analysis and develop automated reporting that enables teams to leverage best practices for running efficient Cloud solutions.

    If you’re passionate about innovation and love working in an environment where you can constantly improve and adopt new technologies to drive business results, then Humana’s Cloud Infrastructure Operations team could be the place for you!

    Responsibilities

    Responsibilities:

  • Manage operations plans, staffing, budget and execution.
  • Lead escalated Incident Management team and develop maturity plans to continually improve – reducing MTTD/MTTR.
  • Improve incident and problem management functions while working to build a world-class incident response function for our customers.
  • Build out and/or automate required L2 SOP’s for L2 MSP Team.
  • Provide hands-on experience in the Cloud Well Architect Framework to support overall Operations initiatives
  • Establish and refine automated monitoring tools to track systems’ health, uptime and outages.
  • Ensure compliance with best security practices and continuously assess potential vulnerabilities.
  • Optimize operations costs across vendors and service providers.
  • Partner with monitoring team to build maturity around event management.
  • Partner with Engineering, L3 teams and DevOps on CICD and automated deployment.
  • Collaborate with vendor partners, maintain strategic relationships and identifying continuous improvement opportunities.
  • Identify key procedures that can be automated and either automate them or work with platform engineering team to develop automation.
  • Establish, report, and improve various metrics associated with the efficiency of operating the Humana foundation environment suite delivering value to our customers.
  • Adhere to established customer SLA’s.
  • Executes data-driven decisions by delivering operational metrics by analyzing operational data to identify trends and potential problems.
  • What you bring and what you will do:

    People and Leadership

  • Strong leadership and people management skills.
  • Have worked with-in or lead an Outsource Mananged Service Provider
  • Ability to make solid business decisions in a dynamic and fast-paced environment.
  • Ability to work with minimal supervision, making decisions based upon priorities, schedules and an understanding of business initiatives.
  • Manage and optimize Cloud infrastructure and services.
  • Solution-oriented leadership (lead by example) and a management-based approach.
  • The ability to communicate effectively to executives, engineers and customers.
  • The ability to build effective relationships with internal business stakeholders and external partners.
  • An inspiring and creative leadership style that inspires and influences others.
  • Availability for off‐hours work related to 24/7 up-time and availability of the Cloud product suite; willingness to support the team who has on-call coverage expectations. Provide guidance, objectives, and metrics and oversight to help teams maintain 24/7 uptime and availability of production mission critical customer facing services.Oversee and refine processes, practices, and tooling that teams will use to meet their service level objectives using a hands on approach.
  • Technical Acumen

  • Deep understanding of the key concepts and practices of Cloud observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide holistic state of the Cloud operations.
  • You have a deep understanding how to apply best practices around monitoring, alerting, logging and have hands-on implementation experience with one or more monitoring, alerting, and logging systems (Azure Monitor, CloudWatch, AppInsight, Log Analytics, Splunk, Dynatrace, BigPanda, ThousandEyes, SolarWinds, etc…).
  • Knowledge of corporate IT, data centers, ticketing system implementations, monitoring software implementation, troubleshooting, and continuous improvement approaches.
  • Server-less computing experience with containers (AKS/EKS) and VM based workloads along with a solid understanding of the trade-offs of different serverless implementations emerging in public Cloud.
  • Experience with and enthusiasm for operating in an agile DevOps oriented organization and culture.
  • A technical business acumen that ensures the organization is operating efficiently and effectively in a hybrid environment.
  • Knowledge of monitoring systems for infrastructure monitoring as well as application performance monitoring including SLAs/KPIs and reporting approaches for the multi Cloud platforms.
  • Hands-on experience and knowledge in ITIL processes related to Incident Management, Service Requests, Event Management, Access Management, Change Management, Knowledge Management and Escalated Incident Management.
  • Partner with Engineering and Architecture team to design key concepts and practices of Observability, coupled with experience implementing robust systems that leverage metrics, logs, and traces to provide understanding of system state. Advocate for that strategy with engineers, managers, and executives.
  • Required Qualifications

  • Bachelor’s Degree in Computer Science, Information Technology, or equivalent experience.
  • 2+ years of experience managing 24/7 production operations for a high-volume, business-critical Cloud service.
  • 2+ years’ experience with Azure and/or AWS.
  • 2+ years’ experience working with a Managed Service Provider and managing IT vendor relationships.
  • 2+ years of transformational experience running Cloud at scale.
  • Must be passionate about contributing to an organization focused on continuously improving consumer experiences.
  • 2+ years of management experience.
  • Desired Qualifications

  • Azure cloud certification
  • Advanced understanding of Cloud platforms, consoles, and services (Azure, Google and AWS).
  • Knowledge or experience with Ansible Tower, API queries, and Power BI.
  • Scripting knowledge using Python, Perl, PowerShell, JavaScript, or similar scripting languages.
  • LI#Remote

    #Cloud

    Scheduled Weekly Hours

    40