Background

Paymenttools had grown quickly, but incident response practices remained inconsistent across teams. Different squads used their own tooling, had uneven escalation flows, and lacked shared expectations. This created confusion during high-pressure events and led to unnecessary customer impact.

My Role

I led the end-to-end definition and rollout of the new incident management framework. This included shaping the operating model, partnering with engineering and product leadership, running training for 14 teams, and aligning SRE, Security, and Operations stakeholders on a single standard.

Execution

  • Defined a unified incident taxonomy and severity model tailored to PSP requirements and PCI workloads.
  • Introduced a consistent on-call and escalation workflow for all teams, fully aligned with SRE best practices.
  • Built templates for incident timelines, IC roles, communication flows, and post-incident reviews.
  • Partnered with Platform, TXP and Payment Services teams to embed the process into daily operations.
  • Ran training sessions for engineers, product owners, and incident commanders across the organization.
  • Established a repeatable process for post-incident learning and reliability improvements.

Results

  • Clear ownership during incidents, reducing confusion and response delays.
  • Improved MTTD and MTTR through structured escalation and communication.
  • PRRs embedded SRE and Security requirements into production readiness for all product teams.
  • Stronger culture of accountability and learning across Platform and Product.

Technologies & Tools

Technologies & Tools Used
incident.io ISMS Slack Backstage