http://bentasker.i2p/posts/blog/software-development/building-a-self-hosted-url-and-tags-search-engine.html
Once the tags have been processed, the main index file is read and iterated through in order to build dicts representing each entry {
"u" : index_l[0], # url
"h" : index_l[1], # index hash
"t" : index_l[2], # content-type
"n" : index_l[3], # title/name
"i" : idx_num, # Index number
"p" : pth, # path
"d" : dom, # domain
"k" : [], # tag IDs
} The first four attributes are drawn directly from the index file. The others are calculated or derived: idx_num is a...