Fulltext retrieval with whoosh and Jieba

Environment dependencies

pip install django-haystack
pip install jieba
pip install whoosh

Environment configuration

Add this configuration in settings.py

INSTALLED_APPS = (
    'haystack',	#register fulltext searching framework
    )

#the configuration of fulltext searching
HAYSTACK_CONNECTIONS = {
    'default': {
    	# use the whoosh search engine
    	'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
    	# Specifies the default path of the index files generated by the index data corresponding to the keyword. When using the custom index file, write the custom file path here.
    	'PATH': os.path.join(BASE_DIR,'whoosh_index'), #  the file path of the index files.
    	}
}

# Auto generate indexes when add, change and delete data in database tables.
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

Create search_indexs.py in applications which need support searching.

from haystack import indexes
from apps.blog.models import Article


class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        return Article

    def index_queryset(self, using=None):
        return self.get_model().objects.filter(status='p')

In the templates folder of the project, create folder structure like search/indexes/article/article_text.txt, where article is the lowercased model name.

# Specifies which fields in the table to index
{{ object.title }}	# index Title field
{{ object.body }}	# index Body field

Add search route

path('search', include('haystack.urls'), name='search'),

Add search form in the template.

<form action="/search" method="get">
            {% csrf_token %}
                <span id="searchAria" tabindex="0" onclick="searching()" onblur="offsearch()">
                    <input type="text" id="searchInput" style="display: none; border: none;" name="q">
                    <input type="submit" value="Search" style="border: none; display: none;" id="submitInput">
                    <a href="javascript:void(0);"  class="navPlugs"><i class="fa fa-search" aria-hidden="true"></i></a>
                </span>
</form>

tips: there must be an input tag whose attribute named name equals q in the form.

The following is the page of search results.

    <div class="jupe main-body">
        <ul class="post-list">
            {% if query and page.object_list %}
                {% for result in page.object_list %}
                    <li class="post-item">
                        <a class="post-title"
                           href="{{ result.object.get_absolute_url }}"
                           title="{{ result.object.title }}">{{ result.object.title | truncatesmart:34 }}</a>
                        <span class="post-time">{{ result.object.create_time | date:"Y.m.d" }}</span>
                    </li>
                {% empty %}
                    <p>Not found</p>
                {% endfor %}

                {% if is_paginated %}{% load_pages %}{% endif %}
            {% else %}
                <h3>Found nothing. Try to search by another keyword</h3>
            {% endif %}

        </ul>
    </div>

Build index

python manage.py rebuild_index

Configure Jieba Chinese Search

Because the default engine of whoosh doesn't support Chinese, u need to improve it.

Copy the default engine file \site-packages\haystack\backends\whoosh_backend.py to the project folder and rename it to whoosh_cn_backend.

Open it and import Jieba Chinese analyzer from jieba.analyse import ChineseAnalyzer.

Replace StemmingAnalyzer in the file with ChineseAnalyzer

Change the file path of search engine to custom path in settings.py

'ENGINE': 'apps.search.whoosh_cn_backend.WhooshEngine'

Rebuild index python manage.py rebuild_index

It supports Chinese search now.


Prev OAuth2简述