This is a crawler project I built a few years ago that I use to Index all of my projects and sites. I also use this to power the real-time capabilities, advanced analytics and media indexing functionality via a Search API for clients and friends. If you want to use this system to index your content then please get in touch.