Searchex Readme
A full-text search engine written in pure Elixir.
Three goals: 1) a simple CLI, 2) a scalable API, and 3) shareable document repos.
BEAM, OTP, and GenStage give us the best possible foundation on which to build.
About Searchex
Searchex provides a search capability for full-text documents. Example document types include:
- text, markdown, XML and JSON files
- source code files
- product descriptions
- blog and forum posts
- chat rooms and twitter feeds
- web pages
Searchex allows you to create searchable Repos that can be shared over the Internet. See a sample repo on GitHub.
Searchex is a new project, usable for testing but not for production. For testing, we’re using collections of up to 2,000 documents with 1MB of raw text. See the Roadmap for development plans.
Demo
Searchex Architecture
A Searchex DOCUMENT has two key elements:
document
META-DATA, liketitle,author_name,publication_datethe
FULL-TEXTof the document
Searchex organizes documents into separate COLLECTIONS. Each collection has
two main elements:
the
CATALOG, a table-like structure that contains the document ID, meta-data, and document location.the
INDEX, an inverted index built for fast search and retrieval
Each collection is defined by a CONFIG file, a yaml file that specifies
things like:
- document directories
- file types
- meta-data fields definitions and extraction regexes
- document separator regex (for multi-doc files)
The Searchex CLI
The searchex command-line program can manage config files, build catalogs and
indexes, and perform searches.
Escript Installation
If you have Elixir 1.4+ enter this at the console:
mix escript.install hex searchex
Make sure ~/.mix/escripts is on your path!
The Searchex API
Elixir developers can embed Searchex into their applications.
Add searchex to your list of dependencies in mix.exs:
def deps do
[{:searchex, "~> 0.0.4"}]
end
Then run mix deps.get
View API documentation at https://hexdocs.pm/searchex
Quick Start
After the searchex escript is installed…
1) Fetch a Searchex repository
> searchex fetch elixir-search/sample
2) Run
> searchex help # show help page
> searchex ls # list collections
> searchex info # show collection stats
> searchex query tiny . # list docs from collection: tiny
> searchex show tiny 1 # show the doc 1 from 'tiny'
> searchex query genesis 'cain abel' # query docs from collection: genesis
Roadmap
- [x] Config management
- [x] Porter stemming algorithm
- [x] BM25 query algorithm
- [x] Middleware framework
- [x] CLI Indexing and Query
- [x] LRU Cache
- [x] CLI Results Display
- [x] Fetchable document repos
- [ ] Docsource Adapters (filesys, ecto)
- [ ] Incremental add/remove/update
- [ ] Git-based file-change detection
- [ ] Server mode
- [ ] Phoenix/Firestorm integration
- [ ] Streaming document ingestion (GenStage/Flow)
- [ ] Faceted Search
- [ ] LRU Registry
- [ ] Typeahead support
- [ ] Alerting
- [ ] Multi-collection search
- [ ] Configuration GUI
- [ ] Toolchain Integration (ExDoc, Hex, GitHub Issues)
- [ ] Searchable Tutorials (Elixir Blogs, Slide Decks, Videos)
- [ ] Output formatting plugins (Vim, Emacs, etc.)
- [ ] Internationalization
- [ ] Dockerization
- [ ] Bayesian/ML Classifiers
Links
- Searchex Website: http://searchex.org
- GitHub Source: https://github.com/elixir-search/searchex
- Sample Repository: https://github.com/elixir-search/sample
- StemEx: https://github.com/elixir-search/stem_ex
- Hex Package: https://hex.pm/packages/searchex
- API Documentation: https://hexdocs.pm/searchex
