Link copied to clipboard!
Back to Jobs
DevOps Engineer Lead at Tek Spikes
Tek Spikes
Richmond, VA
Information Technology
Posted 2 days ago
Job Description
Job DescriptionJob DescriptionJob Title: DevOps Engineer - LeadJob ID: 94330-1, 94329-1 & 94503-1Only-EX-Capital one ,C2CClient: Capital OneLocation: 15075 Capital One Drive Richmond, VA 23238 (Hybrid)Duration: 12+ Months with possible of extensionKey Skills & Tools:Observability Tools: Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch.Programming Languages: Expertise in languages such as Python and Go for scripting and automation.Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes.Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations.CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins.System & Software Engineering: A strong background in both system operations and software development.Optimize cloud agent instrumentation, with cloud certifications being a plus.Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)Strong understanding of Observability concepts (Logs, Metrics, Tracing)Expertise in security & vulnerability management in observabilityPossesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments.Job Description:Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces.System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry.Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues.Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services.Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring.Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications.Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools.
Resume Suggestions
Highlight relevant experience and skills that match the job requirements to demonstrate your qualifications.
Quantify your achievements with specific metrics and results whenever possible to show impact.
Emphasize your proficiency in relevant technologies and tools mentioned in the job description.
Showcase your communication and collaboration skills through examples of successful projects and teamwork.