
Martin Klepman starts out by solidly giving the reader the conceptual framework in the first chapter: what does reliability mean? How is it defined? What is the difference between a "fault" and a "failure"? How do you describe load on a data intensive system? How do you talk about performance and scalability in a meaningful way? What does it mean to have a "maintainable" system?Second chapter gives a brief overview of different data models and shows the suitability of of them to different use cases, using modern challenges that companies such as Twitter faced. But if you want to understand the main principles, issues, as well as the challenges of data intensive and distributed system, you've come to the right place. If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it.

But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. Like a specialized encyclopedia, it covers a broad field and in considerable detail. I consider this book a mini-encyclopedia of modern data engineering. Peek behind the scenes of major online services, and learn from their architectures.Understand the distributed systems research upon which modern databases are built.Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity.Make informed decisions by identifying the strengths and weaknesses of different tools.Peer under the hood of the systems you already use, and learn how to use and operate them more effectively.With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

Software keeps changing, but the fundamental principles remain the same.


In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. What are the right choices for your application? How do you make sense of all these buzzwords? In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. Data is at the center of many challenges in system design today.
