diff --git a/README.md b/README.md index 2451245..893dc01 100644 --- a/README.md +++ b/README.md @@ -1,388 +1,320 @@ # System Design Resources These are the best resources for System Design on the Internet. -## -## Video Processing - -Transcoding Videos at Scale: https://www.egnyte.com/blog/2018/12/transcoding-how-we-serve-videos-at-scale/ -Facebook Video Broadcasting: https://engineering.fb.com/ios/under-the-hood-broadcasting-live-video-to-millions/ +# Table of Contents + +- [Video Processing](#video-processing) +- [Cluster and Workflow Management](#cluster-and-workflow-management) +- [Intra-Service Messaging](#intra-service-messaging) +- [Message Queue Antipattern](#message-queue-antipattern) +- [Service Mesh](#service-mesh) +- [Practical System Design](#practical-system-design) +- [Distributed File System](#distributed-file-system) +- [Time Series Databases](#time-series-databases) +- [Rate Limiting](#rate-limiting) +- [In Memory Database - Redis](#in-memory-database---redis) +- [Network Protocols](#network-protocols) +- [Chess Engine Design](#chess-engine-design) +- [Subscription Management System](#subscription-management-system) +- [Google Docs](#google-docs) +- [API Design](#api-design) +- [NoSQL Database Internals](#nosql-database-internals) +- [NoSQL Database Algorithms](#nosql-database-algorithms) +- [Database Replication](#database-replication) +- [Containers and Docker](#containers-and-docker) +- [Capacity Estimation](#capacity-estimation) +- [Publisher Subscriber](#publisher-subscriber) +- [Event Driven Architectures](#event-driven-architectures) +- [Software Architectures](#software-architectures) +- [Microservices](#microservices) +- [Distributed Transactions consistency Patterns](#distributed-transactions-consistency-patterns) +- [Load Balancing](#load-balancing) +- [Alerts and Anomaly Detection](#alerts-and-anomaly-detection) +- [Distributed Logging](#distributed-logging) +- [Metrics and Text Search Engine](#metrics-and-text-search-engine) +- [Single Point of Failure](#single-point-of-failure) +- [Location Based Services](#location-based-services) +- [Batch Processing](#batch-processing) +- [Real Time Stream Processing](#real-time-stream-processing) +- [Caching](#caching) +- [Distributed Consensus](#distributed-consensus) +- [Authorization](#authorization) +- [Content Delivery Network](#content-delivery-network) +- [Testing Distributed Systems](#testing-distributed-systems) +- [System Design Resources](#system-design-resources) -Netflix Video Encoding at Scale: https://netflixtechblog.com/high-quality-video-encoding-at-scale-d159db052746 - -Netflix Shot based encoding: https://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830 +## +## Video Processing +- [Transcoding Videos at Scale](https://www.egnyte.com/blog/2018/12/transcoding-how-we-serve-videos-at-scale/) +- [Facebook Video Broadcasting](https://engineering.fb.com/ios/under-the-hood-broadcasting-live-video-to-millions/) +- [Netflix Video Encoding at Scale](https://netflixtechblog.com/high-quality-video-encoding-at-scale-d159db052746) +- [Netflix Shot based encoding](https://netflixtechblog.com/optimized-shot-based-encodes-now-streaming-4b9464204830) -## +## ## Cluster and Workflow Management +- [Facebook Cluster Management](https://engineering.fb.com/data-center-engineering/twine/) +- [Google Autopilot - Autoscaling](https://dl.acm.org/doi/pdf/10.1145/3342195.3387524) +- [Netflix Workflow Orchestration](https://netflix.github.io/conductor/) +- [Opensource Workflow Management](https://github.com/spotify/luigi) +- [Meta Hardware Management](https://engineering.fb.com/2020/12/09/data-center-engineering/how-facebook-keeps-its-large-scale-infrastructure-hardware-up-and-running/) +- [Meta Capacity Assignment](https://engineering.fb.com/2022/09/06/data-center-engineering/viewing-the-world-as-a-computer-global-capacity-management/) +- [Amazon EC2](https://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html) -Facebook Cluster Management: https://engineering.fb.com/data-center-engineering/twine/ - -Google Autopilot - Autoscaling: https://dl.acm.org/doi/pdf/10.1145/3342195.3387524 - -Netflix Workflow Orchestration: https://netflix.github.io/conductor/ - -Opensource Workflow Management: https://github.com/spotify/luigi - -Meta Hardware Management: https://engineering.fb.com/2020/12/09/data-center-engineering/how-facebook-keeps-its-large-scale-infrastructure-hardware-up-and-running/ - -Meta Capacity Assignment: https://engineering.fb.com/2022/09/06/data-center-engineering/viewing-the-world-as-a-computer-global-capacity-management/ - -Amazon EC2: https://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html - -## -## Intra-Service Messaging - -What is a message queue: https://www.cloudamqp.com/blog/what-is-message-queuing.html - -AirBnb Idempotency: https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb - -Nginx Service Mesh: https://www.nginx.com/learn/service-mesh/ - -Meta Async Task Computing: https://engineering.fb.com/2023/01/31/production-engineering/meta-asynchronous-computing/ +## +## Intra-Service Messaging +- [What is a message queue](https://www.cloudamqp.com/blog/what-is-message-queuing.html) +- [AirBnb Idempotency](https://medium.com/airbnb-engineering/avoiding-double-payments-in-a-distributed-payments-system-2981f6b070bb) +- [Nginx Service Mesh](https://www.nginx.com/learn/service-mesh/) +- [Meta Async Task Computing](https://engineering.fb.com/2023/01/31/production-engineering/meta-asynchronous-computing/) -## ## Message Queue Antipattern +- [DB as queue Antipattern](http://blog.codepath.com/2012/11/15/asynchronous-processing-in-web-applications-part-1-a-database-is-not-a-queue/) +- [Using a database as a message queue](https://softwareengineering.stackexchange.com/questions/231410/why-database-as-queue-so-bad) +- [Anti-pattern of DB as a queue](http://mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html) +- [Drawbacks of DB as a queue](https://www.cloudamqp.com/blog/why-is-a-database-not-the-right-tool-for-a-queue-based-system.html) -DB as queue Antipattern: http://blog.codepath.com/2012/11/15/asynchronous-processing-in-web-applications-part-1-a-database-is-not-a-queue/ - -Using a database as a message queue: https://softwareengineering.stackexchange.com/questions/231410/why-database-as-queue-so-bad - -Anti-pattern of DB as a queue: http://mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html - -Drawbacks of DB as a queue: https://www.cloudamqp.com/blog/why-is-a-database-not-the-right-tool-for-a-queue-based-system.html - -## +## ## Service Mesh +- [Kubernetes Service Mesh](https://akomljen.com/kubernetes-service-mesh/) +- [Kubernetes Sidecar](https://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery) +- [Service Mesh](https://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery) +- [NginX Service Mesh](https://www.nginx.com/learn/service-mesh/) -Kubernetes Service Mesh: https://akomljen.com/kubernetes-service-mesh/ - -Kubernetes Sidecar:https://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery - -Service Mesh: https://www.weave.works/blog/introduction-to-service-meshes-on-kubernetes-and-progressive-delivery - -NginX Service Mesh: https://www.nginx.com/learn/service-mesh/ - -## +## ## Practical System Design +- [Facebook Messenger Optimisations](https://spectrum.ieee.org/how-facebooks-software-engineers-prepare-messenger-for-new-years-eve) +- [YouTube Architecture](http://highscalability.com/youtube-architecture) +- [YouTube scalability 2012](https://www.youtube.com/watch?v=w5WVu624fY8) +- [Distributed Design Patterns](http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html) +- [Monolith to Microservice](https://martinfowler.com/articles/break-monolith-into-microservices.html) +- [Zerodha Tech Stack](https://zerodha.tech/blog/hello-world/) -Facebook Messenger Optimisations: https://spectrum.ieee.org/how-facebooks-software-engineers-prepare-messenger-for-new-years-eve - -YouTube Architecture: http://highscalability.com/youtube-architecture - -YouTube scalability 2012: https://www.youtube.com/watch?v=w5WVu624fY8 - -Distributed Design Patterns: http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html - -Monolith to Microservice: https://martinfowler.com/articles/break-monolith-into-microservices.html - -Zerodha Tech Stack: https://zerodha.tech/blog/hello-world/ - -## +## ## Distributed File System +- [Open Source Distributed File System](https://docs.ceph.com/en/latest/architecture/) +- [Amazon S3 Performance hacks](https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/) +- [Amazon S3 object expiration](https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/) -Open Source Distributed File System: https://docs.ceph.com/en/latest/architecture/ - -Amazon S3 Performance hacks: https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/ - -Amazon S3 object expiration: https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/ - -## +## ## Time Series Databases +- [Pinterest Time Series Database](https://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181) +- [Uber Time Series DB](https://eng.uber.com/aresdb/) +- [TimeSeries Relational DB](https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c) +- [Facebook Gorilla Time Series DB](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf) -Pintrest Time Series Database: https://medium.com/pinterest-engineering/goku-building-a-scalable-and-high-performant-time-series-database-system-a8ff5758a181 - -Uber Time Series DB: https://eng.uber.com/aresdb/ - -TimeSeries Relational DB: https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/ - -Facebook Gorilla Time Series DB: http://www.vldb.org/pvldb/vol8/p1816-teller.pdf - -## +## ## Rate Limiting - -Circuit Breaker Algorithm: https://martinfowler.com/bliki/CircuitBreaker.html - -Uber Rate Limiter: https://github.com/uber-go/ratelimit/blob/master/ratelimit.go +- [Circuit Breaker Algorithm](https://martinfowler.com/bliki/CircuitBreaker.html) +- [Uber Rate Limiter](https://github.com/uber-go/ratelimit/blob/master/ratelimit.go) ## ## In Memory Database - Redis +- [Redis Official Documentation](https://redis.com/) +- [Learn Redis through Redis University](https://university.redis.com/) +- [Redis Open Source Repo](https://github.com/redis/redis) +- [Redis Architecture](https://medium.com/opstree-technology/redis-cluster-architecture-replication-sharding-and-failover-86871e783ac0) -Redis Official Documentation : https://redis.com/ - -Learn Redis through Redis University : https://university.redis.com/ - -Redis Open Source Repo : https://github.com/redis/redis - -Redis Architecture : https://medium.com/opstree-technology/redis-cluster-architecture-replication-sharding-and-failover-86871e783ac0 - -## +## ## Network Protocols +- [What is HTTP](https://engineering.cred.club/head-of-line-hol-blocking-in-http-1-and-http-2-50b24e9e3372) +- [QUIC Protocol](https://www.akamai.com/blog/performance/http3-and-quic-past-present-and-future) +- [TCP Protocol algorithms](https://ee.lbl.gov/papers/congavoid.pdf) (First 10 pages are important) +- [WebRTC](https://webrtc.github.io/webrtc-org/blog/2012/07/23/a-great-introduction-to-webrtc.html) +- [WebSockets](https://datatracker.ietf.org/doc/html/rfc6455#section-1.2) +- [Dynamic Source Routing using QUIC](https://fb.watch/fSEbI4KHlA/) -What is HTTP: https://engineering.cred.club/head-of-line-hol-blocking-in-http-1-and-http-2-50b24e9e3372 - -QUIC Protocol: https://www.akamai.com/blog/performance/http3-and-quic-past-present-and-future - -TCP Protocol algorithms: https://ee.lbl.gov/papers/congavoid.pdf (First 10 pages are important) - -WebRTC: https://webrtc.github.io/webrtc-org/blog/2012/07/23/a-great-introduction-to-webrtc.html - -WebSockets: https://datatracker.ietf.org/doc/html/rfc6455#section-1.2 - -Dynamic Source Routing using QUIC: https://fb.watch/fSEbI4KHlA/ - -## +## ## Chess Engine Design +- [Chess Engine Building](https://www.youtube.com/watch?v=U4ogK0MIzqk) -Chess Engine Building: https://www.youtube.com/watch?v=U4ogK0MIzqk - -## +## ## Subscription Management System +- [Subscription Manager](https://netflixtechblog.com/building-a-rule-based-platform-to-manage-netflix-membership-skus-at-scale-e3c0f82aa7bc) -Subscription Manager: https://netflixtechblog.com/building-a-rule-based-platform-to-manage-netflix-membership-skus-at-scale-e3c0f82aa7bc - -## +## ## Google Docs +- [Operational Transform](http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation) +- [Google Docs](https://www.youtube.com/watch?v=uOFzWZrsPV0&list=PLX) -Operational Transform: http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation - -Google Docs: https://www.youtube.com/watch?v=uOFzWZrsPV0&list=PLXDe3d8o9VFtydBV5biyz9iS3WqKsBMD5&index=3 ## ## API Design -API Design: https://medium.com/airbnb-engineering/building-services-at-airbnb-part-1-c4c1d8fa811b +- [API Design at Airbnb](https://medium.com/airbnb-engineering/building-services-at-airbnb-part-1-c4c1d8fa811b) +- [Swagger APIs](https://swagger.io/docs/specification/about/) -Swagger APIs: https://swagger.io/docs/specification/about/ - -## +## ## NoSQL Database Internals -Cassandra Architecture: https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html - -Google BigTable Architecture: https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf - -Amazon Dynamo DB Internals: https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html - -Design Patterns in Amazon Dynamo DB: https://www.youtube.com/watch?v=HaEPXoXVf2k - -Internals of Amazon Dynamo DB: https://www.youtube.com/watch?v=yvBR71D0nAQ +- [Cassandra Architecture](https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html) +- [Google BigTable Architecture](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf) +- [Amazon Dynamo DB Internals](https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html) +- [Design Patterns in Amazon Dynamo DB](https://www.youtube.com/watch?v=HaEPXoXVf2k) +- [Internals of Amazon Dynamo DB](https://www.youtube.com/watch?v=yvBR71D0nAQ) -## +## ## NoSQL Database Algorithms -Hyperloglog Algorithm: https://odino.org/my-favorite-data-structure-hyperloglog/ - -Log Structured Merge Tree: https://www.cs.umb.edu/~poneil/lsmtree.pdf - -Sorted String Tables and Compaction Strategies: https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies - -Leveled Compaction Cassandra: https://www.datastax.com/blog/leveled-compaction-apache-cassandra +- [Hyperloglog Algorithm](https://odino.org/my-favorite-data-structure-hyperloglog/) +- [Log Structured Merge Tree](https://www.cs.umb.edu/~poneil/lsmtree.pdf) +- [Sorted String Tables and Compaction Strategies](https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies) +- [Leveled Compaction Cassandra](https://www.datastax.com/blog/leveled-compaction-apache-cassandra) +- [Scylla DB Compaction](https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies) +- [Indexing in Cassandra](https://www.bmc.com/blogs/cassandra-clustering-columns-partition-composite-key/) -Scylla DB Compaction: https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies - -Indexing in Cassandra: https://www.bmc.com/blogs/cassandra-clustering-columns-partition-composite-key/ - -## +## ## Database Replication -Database replication: https://dev.mysql.com/doc/refman/8.0/en/replication.html - -Netflix Data replication - Change Data Capture: https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b - -LinkedIn Logging Usecases: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying +- [Database replication](https://dev.mysql.com/doc/refman/8.0/en/replication.html) +- [Netflix Data replication - Change Data Capture](https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b) +- [LinkedIn Logging Usecases](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) -## -## Containers and Docker - -Facebook Twine Containerization: https://engineering.fb.com/developer-tools/zookeeper-twine/ - -CloudFlare Containerization: https://blog.cloudflare.com/cloud-computing-without-containers/ +## +## Containers and Docker -Docker Architecture: https://docs.docker.com/get-started/overview/#docker-architecture +- [Facebook Twine Containerization](https://engineering.fb.com/developer-tools/zookeeper-twine/) +- [CloudFlare Containerization](https://blog.cloudflare.com/cloud-computing-without-containers/) +- [Docker Architecture](https://docs.docker.com/get-started/overview/#docker-architecture) -## +## ## Capacity Estimation -Google Capacity Estimation: https://www.youtube.com/watch?v=modXC5IWTJI - -Scalability at YouTube 2012: https://www.youtube.com/watch?v=G-lGCC4KKok - -Back of envelope Calculations at AWS: https://www.youtube.com/watch?v=-3qetLv2Yp0 +- [Google Capacity Estimation](https://www.youtube.com/watch?v=modXC5IWTJI) +- [Scalability at YouTube 2012](https://www.youtube.com/watch?v=G-lGCC4KKok) +- [Back of envelope Calculations at AWS](https://www.youtube.com/watch?v=-3qetLv2Yp0) +- [Capacity Estimation](http://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf) -Capacity Estimation: http://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf - -## +## ## Publisher Subscriber -Oracle Publisher Subscriber: https://docs.oracle.com/cd/B10501_01/appdev.920/a96590/adg15pub.htm +- [Oracle Publisher Subscriber](https://docs.oracle.com/cd/B10501_01/appdev.920/a96590/adg15pub.htm) +- [Amazon Pub Sub Messaging](https://aws.amazon.com/pub-sub-messaging/) +- [Asynchronous processing](http://blog.codepath.com/2013/01/06/asynchronous-processing-in-web-applications-part-2-developers-need-to-understand-message-queues/) +- [Async Request Response](https://www.enterpriseintegrationpatterns.com/patterns/conversation/RequestResponse.html) -Amazon Pub Sub Messaging: https://aws.amazon.com/pub-sub-messaging/ - -Asynchronous processing: http://blog.codepath.com/2013/01/06/asynchronous-processing-in-web-applications-part-2-developers-need-to-understand-message-queues/ - -Async Request Response: https://www.enterpriseintegrationpatterns.com/patterns/conversation/RequestResponse.html - -## +## ## Event Driven Architectures -Martin Fowler- Event Driven Architecture: https://www.youtube.com/watch?v=STKCRSUsyP0 +- [Martin Fowler- Event Driven Architecture](https://www.youtube.com/watch?v=STKCRSUsyP0) +- [Event Driven Architecture](https://martinfowler.com/articles/201701-event-driven.html) -Event Driven Architecture: https://martinfowler.com/articles/201701-event-driven.html - -## +## ## Software Architectures -Hexagonal Architecture: https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749 - -Hexagonal architecture (Alistair Cockburn) https://alistair.cockburn.us/hexagonal-architecture/ - -The Clean Code by Robert C. Martin (Uncle Bob) https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html - -CQRS https://martinfowler.com/bliki/CQRS.html +- [Hexagonal Architecture](https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749) +- [Hexagonal architecture (Alistair Cockburn)](https://alistair.cockburn.us/hexagonal-architecture/) +- [The Clean Code by Robert C. Martin (Uncle Bob)](https://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html) +- [CQRS](https://martinfowler.com/bliki/CQRS.html) +- [DomainDrivenDesign](https://martinfowler.com/bliki/DomainDrivenDesign.html) -DomainDrivenDesign https://martinfowler.com/bliki/DomainDrivenDesign.html - - -## +## ## Microservices -Monolith Architecture: https://buttercms.com/books/microservices-for-startups/should-you-always-start-with-a-monolith/ - -Monoliths vs Microservices: https://articles.microservices.com/monolithic-vs-microservices-architecture-5c4848858f59 - -Microservices: http://highscalability.com/blog/2018/4/5/do-you-have-too-many-microservices-five-design-attributes-th.html - -Uber Nanoservices antipattern: https://www.youtube.com/watch?v=kb-m2fasdDY - -Uber Domain oriented microservice: https://eng.uber.com/microservice-architecture/ - +- [Monolith Architecture](https://buttercms.com/books/microservices-for-startups/should-you-always-start-with-a-monolith/) +- [Monoliths vs Microservices](https://articles.microservices.com/monolithic-vs-microservices-architecture-5c4848858f59) +- [Microservices](http://highscalability.com/blog/2018/4/5/do-you-have-too-many-microservices-five-design-attributes-th.html) +- [Uber Nanoservices antipattern](https://www.youtube.com/watch?v=kb-m2fasdDY) +- [Uber Domain oriented microservice](https://eng.uber.com/microservice-architecture/) ## ## Distributed Transactions consistency Patterns -Transactional outbox https://microservices.io/patterns/data/transactional-outbox.html - -SAGAS Long lived transactions (LLTs) https://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf +- [Transactional outbox](https://microservices.io/patterns/data/transactional-outbox.html) +- [SAGAS Long lived transactions (LLTs)](https://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf) -## +## ## Load Balancing -Load Balancer with Sticky Sessions: https://stackoverflow.com/questions/10494431/sticky-and-non-sticky-sessions - -Citrix what is load balancing: https://www.citrix.com/en-in/solutions/app-delivery-and-security/load-balancing/what-is-load-balancing.html - -Nginx Load Balancing: https://www.nginx.com/resources/glossary/load-balancing/ +- [Load Balancer with Sticky Sessions](https://stackoverflow.com/questions/10494431/sticky-and-non-sticky-sessions) +- [Citrix what is load balancing](https://www.citrix.com/en-in/solutions/app-delivery-and-security/load-balancing/what-is-load-balancing.html) +- [Nginx Load Balancing](https://www.nginx.com/resources/glossary/load-balancing/) +- [Consistent hashing](https://michaelnielsen.org/blog/consistent-hashing/) -Consistent hashing: https://michaelnielsen.org/blog/consistent-hashing/ - -## +## ## Alerts and Anomaly Detection -Outlier Detection: https://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e - -Anomaly Detection: https://towardsdatascience.com/machine-learning-for-anomaly-detection-and-condition-monitoring-d4614e7de770 - -Uber Real Time Monitoring and Root Cause Analysis Argos: https://eng.uber.com/argos-real-time-alerts/ - -Microsoft Anomaly Detection: https://www.youtube.com/watch?v=12Xq9OLdQwQ&t=0s - -Facebook Data Engineering: https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/ - -LinkedIn Real Time Alerting: https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor +- [Outlier Detection](https://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e) +- [Anomaly Detection](https://towardsdatascience.com/machine-learning-for-anomaly-detection-and-condition-monitoring-d4614e7de770) +- [Uber Real Time Monitoring and Root Cause Analysis Argos](https://eng.uber.com/argos-real-time-alerts/) +- [Microsoft Anomaly Detection](https://www.youtube.com/watch?v=12Xq9OLdQwQ&t=0s) +- [Facebook Data Engineering](https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/) +- [LinkedIn Real Time Alerting](https://engineering.linkedin.com/blog/2019/06/smart-alerts-in-thirdeye--linkedins-real-time-monitoring-platfor) +- [LinkedIn Isolation Forests](https://engineering.linkedin.com/blog/2019/isolation-forest) -LinkedIn Isolation Forests: https://engineering.linkedin.com/blog/2019/isolation-forest - -## +## ## Distributed Logging -Uber Distributed Request Tracing: https://eng.uber.com/distributed-tracing/ - -Pintrest Logging: https://medium.com/@Pinterest_Engineering/open-sourcing-singer-pinterests-performant-and-reliable-logging-agent-610fecf35566 - -Google Monitoring Infrastructure: https://www.facebook.com/atscaleevents/videos/959344524420015/ +- [Uber Distributed Request Tracing](https://eng.uber.com/distributed-tracing/) +- [Pintrest Logging](https://medium.com/@Pinterest_Engineering/open-sourcing-singer-pinterests-performant-and-reliable-logging-agent-610fecf35566) +- [Google Monitoring Infrastructure](https://www.facebook.com/atscaleevents/videos/959344524420015/) -## +## ## Metrics and Text Search Engine -Facebook real time text search engine: https://www.facebook.com/watch/?v=432864835468 - -Elastic Search Time Based Querying: https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html +- [Facebook real-time text search engine](https://www.facebook.com/watch/?v=432864835468) +- [Elastic Search Time Based Querying](https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html) +- [Elastic Search Aggregation](https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations.html) -Elastic Search Aggregation: https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations.html - -## +## ## Single Point of Failure -Avoiding Single Points of Failure: https://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-3-16e8601c488e +- [Avoiding Single Points of Failure](https://medium.com/the-cloud-architect/patterns-for-resilient-architecture-part-3-16e8601c488e) +- [Netflix Multi-Region Availability](https://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b) +- [Oracle Single Points of failure](https://docs.oracle.com/cd/E19693-01/819-0992/fjdch/index.html) +- [DNS single point of failure 2004](http://www.tenereillo.com/GSLBPageOfShame.htm) +- [Sharding](https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6) -Netflix Multi-Region Availability: https://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b - -Oracle Single Points of failure: https://docs.oracle.com/cd/E19693-01/819-0992/fjdch/index.html - -DNS single point of failure 2004: http://www.tenereillo.com/GSLBPageOfShame.htm - -Sharding: https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6 - -## +## ## Location Based Services -Google S2 library: https://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/ +- [Google S2 library](https://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/) -## +## ## Batch Processing -Map Reduce Architecture: https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf +- [Map Reduce Architecture](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) -## +## ## Real Time Stream Processing -LinkedIn Brooklin- Real time data streaming: https://engineering.linkedin.com/blog/2019/brooklin-open-source - -Netflix Real Time Stream Processing: https://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a +- [LinkedIn Brooklin- Real-time data streaming](https://engineering.linkedin.com/blog/2019/brooklin-open-source) +- [Netflix Real Time Stream Processing](https://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a) +- [KSQLDB for Kafka](https://docs.ksqldb.io/en/latest/operate-and-deploy/how-it-works/) -KSQLDB for Kafka: https://docs.ksqldb.io/en/latest/operate-and-deploy/how-it-works/ - -## +## ## Caching -Google Guava Cache: https://github.com/google/guava/wiki/CachesExplained - -Caching (See the README): https://github.com/ben-manes/caffeine/ - -Caching: http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html +- [Google Guava Cache](https://github.com/google/guava/wiki/CachesExplained) +- [Caching (See the README)](https://github.com/ben-manes/caffeine/) +- [Caching](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html) +- [Microsoft Caching Guide](https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn589802(v%3dpandp.10) +- [Caching patterns](https://hazelcast.com/blog/a-hitchhikers-guide-to-caching-patterns/) -Microsoft Caching Guide: https://docs.microsoft.com/en-us/previous-versions/msp-n-p/dn589802(v%3dpandp.10) - -Caching patterns: https://hazelcast.com/blog/a-hitchhikers-guide-to-caching-patterns/ - -## -## Distributed concensus - -Paxos: http://ifeanyi.co/posts/understanding-consensus/ +## +## Distributed consensus -Raft: https://raft.github.io/ +- [Paxos](http://ifeanyi.co/posts/understanding-consensus/) +- [Raft](https://raft.github.io/) -## +## ## Authorization -Designing an Authorization Model for an Enterprise: https://cerbos.dev/blog/designing-an-authorization-model-for-an-enterprise - -The Architectural Patterns of Cloud-native Authorization Systems: https://www.aserto.com/blog/5-laws-cloud-native-authorization +- [Designing an Authorization Model for an Enterprise](https://cerbos.dev/blog/designing-an-authorization-model-for-an-enterprise) +- [The Architectural Patterns of Cloud-native Authorization Systems](https://www.aserto.com/blog/5-laws-cloud-native-authorization) ## ## Content Delivery Network -AWS CloudFront CDN with S3: https://aws.amazon.com/blogs/networking-and-content-delivery/amazon-s3-amazon-cloudfront-a-match-made-in-the-cloud/ +- [AWS CloudFront CDN with S3](https://aws.amazon.com/blogs/networking-and-content-delivery/amazon-s3-amazon-cloudfront-a-match-made-in-the-cloud/) ## ## Testing Distributed Systems -Deterministic Testing: https://www.youtube.com/watch?v=4fFDFbi3toc +- [Deterministic Testing](https://www.youtube.com/watch?v=4fFDFbi3toc) ## ## System Design Resources -Designing Data Intensive Applications Book: https://amzn.to/3SyNAOy - -WhitePapers: https://interviewready.io/blog/white-papers-worth-reading-for-software-engineers - -InterviewReady Videos: [https://interviewready.io](https://interviewready.io?source=github) +- [Designing Data Intensive Applications Book](https://amzn.to/3SyNAOy) +- [WhitePapers](https://interviewready.io/blog/white-papers-worth-reading-for-software-engineers) +- [InterviewReady Videos](https://interviewready.io?source=github)