Github Designing Data-intensive Applications Hot! Instant

Second, and more radically, GitHub implemented (horizontal partitioning) using a custom middleware layer called gh-ost (GitHub Online Schema Transfers) and later, their Vitess-inspired system. They split the massive issues and pull_requests tables by repository ID. This meant that data for a single repository always lived on one shard. This is a thoughtful choice: most queries (e.g., “list all issues in this repo”) are naturally local to a shard, avoiding costly distributed joins. The downside, as Kleppmann warns, is the loss of cross-shard transactional guarantees. For example, moving an issue from one repository to another becomes a complex distributed transaction, something GitHub handles with asynchronous workflows and idempotent retries. Reliability and the Chaos of Large Scale Designing a reliable system at GitHub’s scale means accepting that components will fail—and not just servers, but also network partitions, clock skews, and software bugs. Kleppmann emphasizes that reliability is not about preventing failure, but about building systems that tolerate it.

Ultimately, GitHub’s success lies in its relentless pragmatism. It does not aim for pure, mathematical data consistency (like Spanner’s TrueTime). Instead, it aims for good-enough consistency, coupled with fast performance and high developer productivity. For every trade-off—between consistency and availability, between normalization and denormalization, between immediate integrity and eventual convergence—GitHub makes a conscious choice and then builds tooling to manage the consequences. In doing so, it transforms the abstract principles of designing data-intensive applications into the living, breathing reality of a platform that hosts the world’s code. And that, perhaps, is the ultimate lesson: the best architecture is not the one that is theoretically perfect, but the one that actually works at scale. github designing data-intensive applications

This is where gh-ost (GitHub Online Schema Tool) shines. Traditional ALTER TABLE locks the table, blocking writes for minutes or hours. gh-ost instead creates a shadow table with the new schema, copies data in small chunks, and replays the binary log of writes from the original table onto the shadow table—all while the application continues running. At the final moment, it performs a near-instantaneous atomic swap of table names. This is a direct implementation of Kleppmann’s discussion of and eventual consistency . The system is in a temporary, inconsistent state (rows exist in both tables), but the application logic hides this complexity. The maintainability payoff is immense: GitHub can deploy schema changes hundreds of times per day, a velocity unthinkable in a system that required scheduled maintenance windows. Conclusion: The Eternal Trade-Offs GitHub is not a perfect system. It has suffered outages, data inconsistencies, and scaling pains. But its evolution from a single MySQL database to a global, polyglot data platform exemplifies every major idea in Designing Data-Intensive Applications . It teaches us that there is no “one true way.” Reliable systems use replication, but fight lag. Scalable systems use sharding, but lose distributed transactions. Maintainable systems evolve online, but pay the complexity of dual-writes and temporary inconsistency. This is a thoughtful choice: most queries (e

	SpinRite v6.1 — Purchase To purchase and immediately download your own personally-licensed copy of SpinRite:
	Post-Purchase Support — Updates or Loss Replacement Any of our software purchased or upgraded online can be replaced or updated to its latest version at any time. The original links contained in your purchase receipt will always work. Our system's 13‑character codes can be used to obtain a copy of your original receipt with active download links. See our Customer Service page for more information.
	Contacting Gibson Research Corporation Unlike many Internet-based companies, we have been in business for over 20 years. Although we have "gone virtual" to streamline our operations, we remain HIGHLY RESPONSIVE to all contact and are fully committed to supporting our customers. Please write to us using the links below to receive our prompt attention: