Production Support (Java or SRE)
Job Title: Tech Lead/Engineers Production Support & Platform (Java Developers and SRE Engineers)Location: Richmond, McLean, or RemoteKey Responsibilities:• Tech Lead - Lead and mentor a team of 15+ engineers for production support and platform stability. • Engineer more than 50% Production Support and Deployment and follow to application development and migration based on Sprint scope. • Manage pager duty rotations and ensure timely incident resolution. • Provide Level 4 support (deep technical troubleshooting and fixes).• Oversee rotational night shifts (approximately once every 2.5 months). • Ensure compliance with SLAs and operational excellence for critical systems. • Collaborate with stakeholders for platform strategy and migration planning. • Drive Run-the-Engine development work and support enhancements. • Prepare for and lead the platform migration phase in the third year. • Monitor application performance, batch jobs, and system health across production and lower environments. • Respond to incidents, alerts, Sev1/Sev2 outages, and provide real-time support following bank's Incident Management processes.• Perform root cause analysis (RCA), create remediation plans, and ensure issues are permanently resolved. • Support on-call rotations and pager duty responsibilities. • Collaborate with development, SRE, and infrastructure teams to troubleshoot application, database, and integration issues. • Execute deployments, configuration changes, and release support using bolthires/CD pipelines (OnePipeline preferred). • Create/maintain operational dashboards, runbooks, SOPs, and automation scripts. • Ensure compliance with bank technology and security standards.Required Skills &Experience:• Tech Lead - Ability to manage large teams (10 50 members) and complex platforms. • Java Development and Site Reliability Engineering (SRE) expertise. • Strong experience in production support and incident management. • Hands-on experience with pager duty tools and support workflows. • Excellent problem-solving and communication skills. • Minimum 2 years of experience in similar roles. • Strong experience in Unix/Linux, shell scripting, and troubleshooting distributed systems.• Hands-on experience with AWS (CloudWatch, Lambda, EC2, S3, IAM, RDS, DynamoDB). • Familiarity with Java-based applications, microservices, APIs, and log analysis (Splunk, CloudWatch Logs). • Experience with bolthires/CD tools like Jenkins, OnePipeline, Git, and automated deployment strategies. • Knowledge of incident management, problem management, and change management processes. • Strong analytical skills and the ability to quickly diagnose complex issues. Apply tot his job