Web3

Scaling security and reliability with DevOps maturity

Services Provided

Engineering

Platform

Specialisms

Platforms

The Challenge

Ledger was looking to enhance the consistency of value delivery, targeting areas such as incident response, platform stability, and developer productivity. Strengthening these areas was key to improving operational reliability and accelerating delivery.

The Client

Ledger provides security and infrastructure solutions to critical digital assets for consumers and institutional investors.

Client’s Goal

Improve efficiency, enhance collaboration and drive sustainable service improvements, with a focus on proactive reliability and shared responsibility, including a redefined SRE role and empowered development teams.

The Approach

As Ledger’s business continued to scale, so did the complexity and risk associated with its operations. Recognising this, they saw the need to adopt a DevOps maturity model as a strategic foundation and partnered with YLD to support them in achieving this goal. YLD worked side by side with the Ledger Enterprise team, and this hands-on involvement gave us a grounded understanding of existing practices and challenges. Building on these insights, we developed a tailored set of recommendations and a roadmap to guide their DevOps maturity journey.

Current state

During our engagement, we observed opportunities to strengthen collaboration between development, SRE, and platform teams and their product counterparts. Improving this integration was key to supporting consistent service quality and enabling more efficient delivery of innovation.

Using the core principles of DevOps, Platform Engineering, and Site Reliability Engineering, we worked on recommendations to improve the following pain points:

Reactive incident response: Teams were spending a large amount of time reacting to incidents, diverting resources from strategic initiatives, with an impact on developer productivity.
Limited visibility & slow recovery: Monitoring and alerting capabilities didn’t fully meet their needs, which would slow down the ability to quickly identify and resolve issues, resulting in prolonged outages and slow recovery times.
Firefighting culture: A largely reactive approach to incidents made it difficult to establish a stable and predictable platform, contributing to ongoing operational challenges.
Enhanced production: Strengthening confidence in the stability and reliability of the production environment presented a key opportunity to accelerate release cycles and create more space for innovation.

Recommendations

Our proposed realignment focused on proactive reliability and shared responsibility, including a redefined SRE role and empowered development teams, which would improve efficiency, enhance collaboration and drive sustainable service improvements.

Our recommendations focused on improving reliability, empowering teams, and accelerating engineering velocity. We structured them into two overarching categories to reflect these priorities:

Unified monitoring and observability

In order to strengthen a unified monitoring and observability approach, we recommended adopting consistent, organisation-wide standards. Doing so would improve visibility across systems and help detect issues earlier, reducing downtime and operational risk.

Clarifying accountability for escalation and resolution was also recommended as a way to improve response speed and effectiveness when issues arise.

Regular incident response simulations, the so-called “fire drills”, were also highly recommended. These exercises would help teams build familiarity and confidence in their processes, so they're better prepared to handle real incidents under pressure.

Finally, we recommended considering extending data retention periods. With more historical data, teams would then be able to perform more meaningful trend analysis, support predictive maintenance initiatives, and make better-informed decisions about system health and capacity planning.

Realignment of scopes and responsibilities

Clarifying and realigning scopes and responsibilities could reduce friction between teams while supporting faster, safer delivery. Rather than treating these boundaries as fixed, we recommended revisiting them to ensure they match our evolving needs.

The Platform Team could be positioned as a true enabler, offering centralised, well-maintained infrastructure, tooling, and core services that development teams can build on confidently. This shift would promote consistency and allow product teams to focus on delivering value rather than managing shared foundations.

We also advised development teams to expand ownership beyond feature delivery to include operational responsibility that would strengthen the feedback loop between building and running software. This would encourage better design decisions, improve reliability, and ensure that teams would be accountable for the end-to-end experience.

Meanwhile, the SRE function could transition away from purely reactive firefighting toward a more proactive, partnership-based role. By investing in reliability engineering practices and collaborating closely with other teams, SREs could embed resilience into the platform itself and align service levels with business priorities.

The Deliverables

Our team delivered a comprehensive recommendation report, which informed the creation of a prioritised backlog of initiatives by the Ledger Enterprise team. Rather than serving as a static set of suggestions, it provided a structure for change, suggesting a shift in best practices and ways of working. This report shared suggestions to move away from reactive patterns to a more proactive, preventative approach, laying the groundwork for greater long-term stability and sustainable growth.

By streamlining processes and reducing operational overhead, they would be able to achieve gains in efficiency and free up resources for higher-value work. Improved communication and shared responsibility between teams would strengthen collaboration, breaking down silos and ensuring alignment on priorities. At the same time, empowering development teams on a more stable platform would enable faster innovation and accelerate the delivery of new features.

Ultimately, this approach was designed to deliver a superior quality of service with enhanced visibility, allowing for quicker and more effective responses, and a better overall experience for stakeholders and users.

Closing the Engagement

With our support, Ledger gained a clear, external perspective on their current ways of working, alongside a prioritised set of practical recommendations. Drawing on our DevOps expertise, YLD provided a structured assessment and clear recommendations that helped surface the right conversations and build alignment across teams.

Equipped with an actionable roadmap, Ledger was able to confidently take forward and implement the backlog, building momentum toward a more proactive and preventative culture. This now provides a strong foundation for their continued operational excellence and an increasingly secure, stable, and resilient platform.