Lunr.js - Simple full-text search in your browser

kristopolous · on March 5, 2013

I see you are using my stemmer implementation ... snowball and porter2 are better - git clone that instead.

In fact, if you had poked around, you probably could have snagged about 90% of this code from various projects ... too bad I didn't put it together like you did.

Ah well ... internet fame points to you I guess.

jchapron · on March 5, 2013

I'm not the author, I just stumbled on the project and found it interesting. I guess you could suggest it to him on twitter @olivernn

FuzzyDunlop · on March 4, 2013

There's a detailed write up from the author about how it all works at: http://blog.new-bamboo.co.uk/2013/02/26/full-text-search-in-...

jchapron · on March 5, 2013

There's also a website with docs and an example by the author here : http://lunrjs.com/

smagch · on March 4, 2013

As for server side, reds is a simple full-text search module of Node.js. https://github.com/visionmedia/reds

tantalor · on March 4, 2013

How does this compare to http://reyesr.github.com/fullproof/?

slashdotdash · on March 6, 2013

I created a small Jekyll plugin to add full-text search using lunr.js for the generated, static sites.

https://github.com/slashdotdash/jekyll-lunr-js-search

augustl · on March 4, 2013

I think I'll put this to use in an internal admin/support system in need of search. All the data is on the client already (AngularJS), and it's less than thousand docs for now.

ErikRogneby · on March 4, 2013

I am sure there are some applications that might need this due to obfuscation of data from the users... But doesn't the browser already have full text search? (Control-F)?

nzadrozny · on March 4, 2013

Full text search usually presumes an index, for a lot of functional differences compared to the browser's naive substring-matching Ctrl-F. And any proper search index is going to be a better user experience than naive string matches.

I haven't read through all of Lunr's docs and source, but based on my Solr/Elasticsearch experience, I'd expect to see (in time)…

Tokenization and (presumably) term normalization/analysis; a faster and smarter query language, for term order independence and boolean combinations of clauses; relevance scores and maybe even score boosting per field.

Better queryability really shouldn't be understated here. Just having term order independence focused on a specific set of JSON is going to be way better than naively matching any substring on the entire rendered page.

olivernn · on March 4, 2013

That is almost exactly what lunr is doing. It tokenises the input text, stems the tokens and filters out any stop words. The index it can be searched, the order is not relevant, a prefix search is currently used so that you can find documents containing terms without having to type the whole term exactly. The matching documents are also scored as to how relevant they are to the search term.

In the future I want to add even more powerful querying, restricting search to specific fields, taking into account the distance between terms, and adding faceted search to reduce the total documents being searched over.

One of the original goals of the project was specifically to provide a better alternative to just using the browsers built in find-in-page functionality

bambax · on March 4, 2013

What browsers do cannot really be called full-text search.

For example, the ability to search for a paragraph that contain two non-contiguous words would be very useful, but no browser (that I know of) is able to return elements that contain a set of tokens.

All browsers do is return exact matches from a string, with no concept of words.

It would be interesting to know if in this solution the index can be persisted to file or if it has to be rebuilt every time?

goldfeld · on March 4, 2013

Supposedly you can persist it with HTML5 on localStorage or the indexed browser DB.

abecedarius · on March 4, 2013

Relevance ranking. Not sure how valuable it is at this scale, but it seems worth a try.

tantalor · on March 4, 2013

> A browser is required for running the tests.

Why? This is a red flag.

olivernn · on March 4, 2013

Why is it a red flag for a browser-based javascript library to require a browser for testing?

tantalor · on March 4, 2013

Continuous integration usually run JavaScript tests in browserless-environments.

altcognito · on March 5, 2013

While I agree with downstream comments that you can run tests headless or browserless, arguably, you're not really testing the user experience until you execute it the way that the user will execute it. Perhaps this is a case of perfect vs. good enough.

cheald · on March 5, 2013

I run mine in a PhantomJS environment, which works just fine for headless browser testing.

erichocean · on March 4, 2013

Uh, platforms like Node can easily simulate the browser, despite being 'browserless'.

hajrice · on March 4, 2013

Any stats on the limitations ?

I'm wondering how efficient this would be given that indexing a lot of data via javascript might really not be a good idea..

olivernn · on March 4, 2013

The example (http://lunrjs.com/example/) indexes 100 stackoverflow questions, some of which are relatively long.

If indexing performance starts to become an issue the whole search index can be moved into a web-worker, which prevents indexing from blocking the rest of the page.

piranha · on March 4, 2013

Maybe you can create index on the server and then just load it on a client?

edit: looking at the docs it's unclear if it's possible. I guess index should be a JS object, so it's pretty simple to save it to a disc and then fetch it from client.

olivernn · on March 4, 2013

Since the library can be run outside of the browser (using node.js for example) the index could be generated server side, and then just passed to the client. I hadn't considered this before but it might be worth looking at.

pudo · on March 4, 2013

Depending on the performance of this, it might be awesome to have some serialization format (i.e. inverted, normalized, tokenized JSON).

BaconJuice · on March 4, 2013

How do you index your pages? is that a manual process by creating your json file to be read?

napoleond · on March 4, 2013

This is amazing, and perfect timing for me. Thanks!!

on March 4, 2013

[deleted]

chris24 · on March 4, 2013

Wrong thread?

taylorlapeyre · on March 4, 2013

Whoops! Very sorry about this.