Skip to main content
Job Search

We are DIRECTV

Revolutionize the streaming world?
Challenge accepted.
Join the #WeAreDIRECTV Movement

Principal, IT Software Engineer 2 - AIOps Lead

IT Software Engineering Job ID: R250134 El Segundo, California  Virtual, California JOIN THE PARTY

DIRECTV is seeking an AIOps Lead (Principal, IT Software Engineer 2) who will play a crucial role in driving the adoption and execution of Artificial Intelligence for IT Operations (AIOps) practices across the organization. This individual will be responsible for leading observability standards, AIOps initiatives, automation-first strategies, leveraging AI and machine learning technologies to optimize IT operations, detecting anomalies, improving system performance, and automating incident and problem management processes.

The ideal candidate will have a strong background in IT operations, SRE, a deep understanding of observability platforms and AIOps and tools, DevOps, software development and the ability to lead cross-functional teams to drive innovation in the realm of IT operations automation and monitoring.

Here’s what you’ll do:

Team Leadership and Guidance:

  • Lead projects from a team of 3-4 NPW engineers dedicated to stability and observability improvements and operation efficiency.
  • Technical lead for a team to design and develop end-to-end solutions, managing dependencies and cross-team impacts.
  • Provide hands-on guidance and support to team members (50% hands-on, 50% managerial).
  • Lead a team of AIOps engineers and specialists, ensuring their development, coaching, and alignment with organizational goals.
  • Develop and report on team performance KPIs.
  • Foster a culture of continuous learning, DevOPS excellence through regular technical sessions and internal workshops.
  • Active participant in the development community (Business Unit) to promote best practices through educating their peers.
  • Manage risk and request help from leadership, when necessary, to meet commitments or change directions.

Observability, AIOPS Strategy and Execution:

  • Define and implement an Observability, AIOPS strategy aligned with business objectives and an autonomous IT operations vision.
  • Responsible for planning short term (sprint-to-sprint) and long-term (multiple PI) initiatives and organizing work and designs to meet the long-term target.
  • Implement and optimize AI and machine learning algorithms to detect performance anomalies, predict outages, automate incident response, and improve overall operational efficiency.
  • Implement automated workflows for proactive issue resolution, reducing manual intervention and improving operational agility.
  • Seek opportunities to improve processes and take an automation-first approach.
  • Lead the evaluation, selection, and deployment of AIOps platforms and tools.
  • Design and implement cost-efficient observability and AIOps solutions across cloud and on-premise environments using a mix of commercial, open source, and CNCF solutions.
  • Leverage data analytics and monitoring systems to generate actionable insights that improve system health, application performance, and availability.
  • Develop internal resources and training materials to ease the adoption and implementation of AIOPS tools and practices.

Cross-functional Collaboration:

  • Work closely with IT operations, DevOps, SRE and application development teams to identify pain points and automate processes with AIOps tools and techniques.
  • Present findings, improvements, and key metrics to senior management and stakeholders.

Automation and Process Improvement:

  • Leverage scripting, AI/ML, and automation skills for automation first approach.
  • Embed Observability and AIOps capabilities into reusable platform services by utilizing DevOps, CI/CD, and IaC tools and practices like Terraform, Jenkins, GitHub, ArgoCD, Harness and Ansible.

Technical Implementation and Management:

  • Establish and enforce observability standards, policies, and best practices across the enterprise.
  • Ensure compliance with regulatory and security requirements.
  • Plan and migrate legacy tools and functions to new AIOPS approach.
  • Develop and maintain AIOPS dashboards, extensions, applications, and workflow automation.
  • Integrate AIOPS with tools like Jira, ServiceNow, MS Teams, Slack, xMatters, Confluence/wiki/KB and MoogSoft/BigPanda.
  • Set up and manage observability stacks for cloud monitoring (AWS, Azure), VMs, Kubernetes, and various databases.
  • Optimize naming conventions, management zones, alerting profiles, and tagging to align with business processes.

Performance Monitoring and Reporting:

  • Analyze and report on observability metrics, KPIs, Service Level Indicators (SLI), and Service Level Objectives (SLOs).
  • Develop and recommend baseline monitoring thresholds, SLO, and error budgets to drive continuous improvement in MTR and Availability.

What you’ll need to be successful:

Educational and Professional Experience:

  • Bachelor’s degree in computer science or engineering, or related field.
  • 5 – 7 years required, 7+ years preferred, of experience in IT operations, DevOps, or site reliability engineering, with at least 2 years in AIOps-related roles.
  • Strong experience with AIOps tools such as Moogsoft, BigPanda, Splunk, Dynatrace, Datadog, ServiceNow, xMatters or similar.
  • Solid understanding of machine learning algorithms and their application in IT operations.
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).

Technical Skills:

  • 3+ years of experience with Dynatrace SaaS, DQL, and Logs on Grail or similar.
  • Strong scripting/automation skills in Python, Perl, Shell, and JavaScript.
  • Experience with automation, DevOps, GitOps, CI/CD, and IaC tools (Terraform, Jenkins, GitHub, Ansible).
  • Experience integrating and automating ITSM tools like ServiceNow, xMatters, PagerDuty, JIRA.
  • Hands on experience in building and operating open-source observability tools like ELK, Grafana, Prometheus fluentd, fluent bit, Loki, OpenTelemetry, OpenSearch, and Thanos.
  • Experience in designing and implementing observability and AIOPS solutions for complex, distributed systems.
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions (both frontend and backend).
  • Experience with OS: Linux & Windows, Java, NodeJS, ReactJS, databases: Oracle, Casandra, Kafka, MuleSoft, Salesforce, networking.
  • Expertise in incident management, monitoring systems, and ITSM processes.

Leadership and Communication:

  • 2+ years of experience leading engineering teams in Observability, SRE, Platform, Infrastructure, or Application organizations.
  • Excellent communication, collaboration, and problem-solving skills.
  • Proficient in developing and maintaining technical documentation, runbooks, and process.
  • Proven track record of driving change and innovation in a fast-paced, dynamic environment.

May require a background check due to job duties requiring routine access to DIRECTV and DIRECTV customer’s proprietary data. Qualified applicants with arrest and conviction will be considered for employment in accordance with local ordinances and state law.

This role may require occasional travel, less than 5%.

This is a remote position that can be located anywhere in the United States. #LI-Remote

A career with us comes with big rewards:

DIRECTV's compensation structure is designed to be market-competitive and fully supports efforts to attract and retain employees. It is the company's policy to offer pay that is competitive with other employers in the local market. Our salary ranges are determined by role, level, and location.

The Base Salary range displayed below reflects the minimum and maximum target salary for each of DIRECTV's 4 (four) US Labor Market Zones. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.

DIRECTV WAGE ZONES: $140,790 - $255,530

Low (N1): $140,790 - $211,090

Mid (N2): $148,200 - $222,200

High (N3): $163,020 - $244,420

Top (N4): $170,430 - $255,530

Click HERE to review information on some of the largest Designated Market Areas (DMAs). Your recruiter can share more about the specific salary range for your preferred location during the hiring process. 

Please note that the salary ranges reflect base salary only and do not include bonus or benefits - when you consider all of these together, it represents a pretty impressive total compensation package.

Apply today!

Fair Chance Ordinance Notice for Los Angeles County applying for jobs at DIRECTV

Compliance Notice Regarding Use of Automated Decision-Making Tools in Hiring Process

RSRDTV

Remote: True
JOIN THE PARTY