Designing Data-Intensive Applications
Martin Kleppmann, Designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems. 978-1-4493-7332-0
Kleppmann gives a tour covering everything from low-level details of disk storage layout to the high-level concepts of what kind of guarantees it even makes sense to fulfill.
While reading this book, I did not have a job, which involved data of an intensive
scale, but through some data management in my research work, I did hit my head against many of the problems discussed.
While I did know about databases and SQL, I never really learned to use those until I had to manage an optimization problem using a kind of distributed computing.
The solutions I found worked, but I also felt like I needed to both dig a bit deeper and learn more about the big picture of these ideas.
Designing Data-Intensive Applications was an excellent sort of map-of-the-landscape. Where many technology discussions come down to an unfortunate pro-con discussion, Kleppmann’s book explains the fundamental models behind various approaches and then discuss how they might solve various problems. This gives a far more nuanced insight into how to actually design a data system. A quote from page 452 exemplifies this well:
As we have seen throughout this book, there is no single system that can satisfy all data storage, querying, and processing needs. In practice, most nontrivial applications need to combine several different technologies in order to satisfy their requirements. — p. 452
A lot of this nuance also speaks to the difference between very idealized views of correct solutions and the more messy engineering of real life. It might seem weird to have different storage solutions for different users, but it is an acknowledgment of different usage patterns, and thus can be the right approach to take.