Solr: Instant Apache Solr for Indexing Data How-to Book Review

Apache Solr Beginner's GuideI’ve been writing this review for to long, sadly some work related issues forbade me of concluding this sooner. And this is why I want to formally apologize with the author Alfredo Serafini and Punit Shetty, the “guy” form Packt Publishing; wonderful fellows who gives me the opportunity of writing this reviews and provide access to the book, which otherwise I couldn’t afford.

When I start reading a book (any technical book, actually) I like to take a sneak peek through the index before actually reading the book and my first expression about the index of this great book was: This is a BEGINNERS GUIDE? My first impression came from the fact that in the index I saw sections about merging of segments and it’s impact in your indexes, another section about writing Solr plugins; so you don’t need to be a rocket scientist to understand any of those topics; and yet you don’t expect to see those topics covered in a book with the word “beginners” in the title. Nevertheless it’s fare to say that this only increase my interest on reading the book, and as I sayed before this was my first impression BEFORE reading the book, but when we start reading the book you get hit by this sentence in the Acknowledgments:

I know I have probably oversimplified some of the more advanced topics, in order to expose readers to a broader vision of the context in which this technology exists. But when conducting technical courses I have learned that people often need to share ideas in order to construct their own path into a practical knowledge. So I thank you in advance for every time you’ll want to share this read with your teammates, integrating different knowledge and points of view, exploring these topics outside an approach oriented merely to the technical features.”

So after reading this I hope you could rest and get confident in the fact the the author take a lot of trouble into putting this kind of hard problems into very simple terms for any reader to understand. Which in fact its a great thing about this book, and it’s the first point I want to highlight: The book it’s written for beginners reader, but this doesn’t stop the author to expose advanced topics for a more knowledgeable reader, keeping the simplicity as “mantra”.

One key aspect I will like to highlight about this book (and about the author itself) it’s how it manages to keep things simple, and how it escalate this simplicity into deeper topics.

The book it’s organized into 9 chapters and one appendix. This appendix it’s one of the great thing about this book, but let’s not go ahead of our self and try to keep the same order of the content as is exposed on the book. And it will focus on the current “branch” of the Solr project, meaning that it will cover Solr 4.x, actually Solr 4.5 its used in the book and it should remain compatible with superior versions, up to 4.9 (the latest stable release at the time of this post).

You will find along the book a lot of Tips, I really appreciate this tips where the author lays down some sort of “methodology” for you to gain knowledge and confidence in the skill you’re getting from the book. This gives the book some sense of “textbook” from the school and It takes me into a little memory trip :-). Also along the book you’ll encounter endless Notes, and even directions from the author into online resources about specific topics.

At the end of each chapter you’ll find a Pop quiz that will allow you to double check your knowledge about that chapter. I’ve found this very handy when you want to highlight what was covered in each chapter, also a summary it’s available but this pop quiz will give you a sense of important bullets, if you will. I strongly advice you into going through this quizzes.

One thing I really love about this book is that it uses Scala for the coding parts, also a little of shell scripting is used, but putting that aside I think that writing Scala code to extend Solr is just great. I’ve written more Java code to Solr than Scala code, but this is some times a requirements from the client/project I’m working or in some cases it’s just “laziness” from my part.

If you like me are using a the Mac OS X operating system, and iBooks to read the EPUB version, then take into account that in some snippets of code you’ll see some dashes that really shouldn’t be the there. This is an iBooks issue when you’ve enabled the Auto-hyphenate words option in the preferences. In any case if you copy and paste the code in your IDE or text editor it will be copied correctly.

In the chapter 1 Getting Ready with the Essentials, you’ll encounter a fresh introduction into what is Solr and what Solr will allow you to do; one positive aspect is the use of real-life examples to explain a feature and get the reader related to what Solr will offer.

Also, in this chapter you’ll get a roundtrip about who is using Solr, and you’ll be surprised of hearing some of those big names. And one outstanding point of this book it’s that right from the start it provides with resources to go deeper in the Solr world.

The second chapter (despite the title) will introduce you into the fundamentals concepts of Lucene and Solr. Also something that I really appreciate is that this chapter doesn’t start by introducing the Inverted Index used by Solr/Lucene to do its magic. Yes, this is a wonderful data structure and you should really get to know it, but let’s facet for a user doing it’s first dive into Solr this wouldn’t be attractive and don’t give any advantages in so early stage, in the author  own words: “the real internal structure adopted for storing index data (and the actual process to search over the index data) is less intuitive”.

You will be put in contact with Solr some basic configurations and the useful admin UI in the Chapter 2 Indexing with Local PDF Files. Also some introduction into the REST-like interface provided by Solr is explained. This is a great chapter for getting your hands dirty. In this chapter you will be introduced into some essential concepts: tokenizers, filters, etc. which eventually will become your day to day within Solr. Also you will be introduced into the deduplication component, and I must say I was a little surprised to see this topic so ahead in the book, but there is no need to be scare about it, it’s pretty simple and yet tremendously useful. Basically you’ll know how to put data into Solr and how to get your data out, along the way you’ll know to configure this process.

Chapter 3. Indexing Example Data from DBpedia – Painting, will cover a more comprehensive journey through analyzers, tokenizers, filters, etc. As well, more advanced features will be covered. It will mainly focus on how to get your data into Solr and gradually will be introducing you into how this ingestion pipeline in Solr works.

Chapter 4. Searching the Example Data, will teach you how to ask questions to Solr, so basically you’ll learn the different parameters that you can use in your queries, the basically all what you’ve to know about the default query parser bundled with solr. I’ve really enjoyed the boolean queries section, this is a great functionality of Solr thats worth knowing; also you’ll learn about different types of Solr queries: sort, fuzzy, ranges, etc.

Chapter 5.  Extending Search, will allow you to know a little more about the different query parsers existing in Solr. A common use case is used and you will see it through the entire chapter and how to put together several solr components to make it work.

Chapter 6. Using Faceted Search – from Search to Finding  will cover faceted search, one simple and yet powerful and complex feature of Solr. Also several ways of improving search will be explored, including More Like This component, filter queries, etc.

Chapter 7. Working with Multiple Entities, Multicores, and Distributed Search, will provide new ways of dealing with your data, and it will introduce the concepts around distributed search with SolrCloud, although I must say that I would like this topic taking a  little further.

Chapter 8. Indexing External Data Sources, will introduce you to DataImportHandler, essentially this will let you to allow Solr to import your data for you so, for fast prototyping this is awesome it allows you to get a dataset very quickly into your Solr index, which in fact will keep you from worrying about your data ingestion pipeline until you really have to. There are some cases where maybe the DataImportHandler is just enough for your needs, and once you become familiar with the DIH you will be amazed on how powerful it really is.

Chapter 9. Introducing Customizations, will cover the new syntax for the solr.xml file, which will be mandatory as version 5.0 of Solr. Also an introduction to extending/customizing Solr is treated in the book. In this chapter you will encounter a diagram explaining how a plugin works, the author will show you how a plugin is instantiated by Solr, but one aspect I would love to be cover in this book is all the extension points Solr has, but not only to mention it but a graph showing the lifecycle of a document inside Solr with highlights on those points where you could put your own logic. Part of this is cover in the section “Pointing ideas for Solr’s customizations” but it doesn’t fulfill my desire, basically because you kind of get lost on “where” this components are placed in the Solr engine.

In Appendix A you’ll encounter a comprehensive display of different Solr clients, including SolrJ the default library that comes with the Solr distribution. Also several CMS that offers integration with Solr are mentioned. Several languages will be cover, and I recommend taken special attention into the Javascript clients, mainly to AJAX-Solr; I’ve been playing with it this summer and is ver very good for creating fast prototypes. PHP and it ecosystem are also vastly covered in this chapter. Generally speaking this chapter will open your eyes on how to use Solr from your favorite programming language, framework or platform.

Appendix B will cover the Pop Quiz answers for all chapters, so use it to check the answers you were formulating during the reading, I recommend to answer and check each pop quiz at the end of each chapter, it will be greatly pleasant and it will let you check the knowledge gained.

Summarizing Apache Solr Beginner’s Guide it’s a great book about Solr, it doesn’t only give you a gentle introduction but the author will give resources and tips for you to continue your journey beyond the book. Also it’s really great to see that this book takes the word “beginner’s guide” a little further, and while keeping the simplicity and the introductory purpose of the book in mind, it let’s you gently dive into the main aspects of Solr. I really enjoyed the sections about extending Solr (but this could be my inner developer hunger for code talking), although I think a couple of diagrams could improve the content exposed. I really recommend this book for people starting with Solr. This book will grasp the basic concepts of Solr that even those users with some experience will found very valuable.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s