2013年8月19日星期一

MySQL + Sphinx full-text search

 

recently doing a search engine, mainly on aspects of object-level books search, first to understand under Sphinx bar.

 

it can improve the speed of your query, this is not a general fast.

 

Sphinx is a SQL-based full-text search engine, can be combined with MySQL, PostgreSQL do full-text searches, he can provide more professional than the database itself, the search function, makes the application easier to implement specialized full-text search. Sphinx some scripting language designed specifically for the search API interfaces, such as: PHP, Python, Perl, Ruby, etc., while also designed a MySQL storage engine plugin.

 

Sphinx largest single index can contain 100 million records, 10 million records in the case of query speed in milliseconds. Sphinx indexing speed is: create 1 million records indexed only 3 to 4 minutes, creating 10 million records index can be completed within 50 minutes, but only 100,000 records containing the latest incremental index rebuild time only tens of seconds.

 

Sphinx The main features include:

 

speed index (in the new CPU, nearly 10 MB / sec);
high-speed search (2-4G average amount of text query speed is less than 0.1 seconds);
high availability (single CPU the maximum support 100 GB of text, 100M documents);
provides good relevance ranking

 

support distributed search;

 

offer document summary generation;

 

available from MySQL pluggable storage engine within the search

 

supports Boolean, phrases, and synonyms queries;

 

supports multiple full-text fields per document (the default maximum 32);

 

support multi-attribute for each document;
support hyphenation;

 

supports single-byte encoding and UTF-8 encoding;

 

read the above characteristics was quite good, a look at the mode of use.

 

native MySQl storage engine retrieval process:

 

 

Sphinx-based storage engine search:

 

 

I still prefer to use the second storage engine, even if your programming language does not support an API interface Sphinx can also use yo.

 

before starting the installation you need to install the necessary components

                      

yum-y install gcc g+ + gcc-c + + libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5 krb5-devel libidn libidn-devel openssl openssl-devel openldap openldap-devel nss_ldap openldap-clients openldap-servers patch libtool automake imake mysql-devel expat-devel

 

 

 

 

(1) Install python support

                      

yum install-y python python-devel

 

 

(2) compile and install LibMMSeg (LibMMSeg Sphinx full-text search engine for the Chinese word segmentation design packages which are released under the GPL Chinese sub-lexical, using Chin-Hao Tsai's MMSEG algorithm. LibMMSeg In this paper, used to generate Chinese word thesaurus).

                      

wget http://www.coreseek.com/uploads/sources/mmseg-0.7.3.tar.gz

tar zxvf mmseg-0.7.3.tar . gz

cd mmseg-0.7.3

. / configure

make

make install

 

(1) compile and install MySQL5.1.26-rc, Sphinx, SphinxSE storage engine

                      

wget http://blog.s135.com/soft/linux/nginx_php/mysql/mysql-5.1.26-rc.tar.gz

tar zxvf mysql-5.1.26-rc.tar.gz

wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz

wget http://www. coreseek.com/uploads/sources/sphinx-0.98rc2.zhcn-support.patch

wget http://www.coreseek.com/uploads/sources/fix-crash-in-excerpts.patch

tar zxvf sphinx-0.9.8.rc2.tar.gz

patch-p1 <.. / sphinx-0.98rc2.zhcn-support.patch # patch

patch-p1 <.. / fix-crash-in-excerpts.patch # patch

cp-rf mysqlse .. / mysql-5.1.26-rc/storage/sphinx

cd .. /

cd mysql-5.1.26-rc /

sh BUILD / autorun.sh

. / configure - with-plugins = partition, innobase, myisammrg, sphinx - prefix = / usr / local / mysql / - enable-assembler - with-extra-charsets = complex - enable- thread-safe-client - with-big-tables - with-readline - with-ssl - with-embedded-server - enable-local-infile

make & ; & make install

cd .. /

 

 Start the MySQL database

                      

cp support-files/my-medium.cnf / etc / my.cnf # configuration file

cp support-files/mysql.server / etc / rc.d / mysqld # add MySQL Service Control

cd / usr / local / mysql

bin / mysql_install_db - user = mysql # install

bin / mysqld_safe - user = mysql & # test installation was successful

bin / mysql # to enter the MySQL command prompt

Start Stop

/ etc / rc.d / mysqld start

/ etc / rc.d / mysqld stop

So we ourselves create the file / etc / rc.local and give execute permission. Roughly says:

#! / Bin / sh

/ usr / local / mysql / bin / mysqld_safe - user = mysql & ;

or

/ etc / rc.d / mysqld start

 

enter the following command appears SPHINX represents SphinxSE has been ported to MySQL went.

                      

show engines;

 

0.9.8 version used in this article, we recommend using version 0.9.9, 0.9.9 version is the most stable version, I finally turned into a 0.9.9 version.

 

Sphinx default does not support Chinese indexing and retrieval, previously Coreseek patch to fix, patch currently Coreseek not available separately, but the development of Coreseek Sphinx-based full-text search server, Coreseek should be now the most used Sphinx Chinese full-text search, which provides for the design of the Chinese word Sphinx package LibMMSeg contains mmseg Chinese word segmentation, in fact coreseek-3.2.14.tar.gz already contains a sphinx, when installed in front of SphinxSE can also use this compression bag mysqlse .

 

install autoconf

                      

tar zxvf autoconf-2.64.tar.gz

cd autoconf-2.64

. / configure-prefix = / usr

make

make install

 

install Coreseek

                      

tar zxvf coreseek-3.2.14.tar.gz

cd coreseek-3.2.14

cd mmseg-3.2.14 /

. / bootstrap

. / configure - prefix = / usr/local/mmseg3

make

make install

cd .. / csft-3.2.14 /

sh buildconf.sh

. / configure - prefix = / usr / local / coreseek - without-python - without-unixodbc - with-mmseg - with-mmseg-includes = / usr/local/mmseg3/include/mmseg / - with-mmseg-libs = / usr/local/mmseg3/lib / - -with-mysql - host = arm

make

make install

cd / usr / local / coreseek / etc

 

enter the configuration directory through the command ls can see three files

 

example.sql sphinx.conf.dist sphinx-min.conf.dist

 

example.sql is an instance where we sql script to import into the database test database as the test data (documents created tables and tags table)

 

vi sphinx.conf

 

enter some content:

                      

source src1

{

type = mysql

sql_host = localhost

sql_user = root

sql_pass = 12345678

sql_db = test

sql_port = 3306 ; # optional, default is 3306

sql_sock ; = / tmp / mysql.sock

sql_query_pre = SET NAMES utf8

sql_query ; = \

SELECT id, group_id, UNIX_TIMESTAMP ( date_added) AS date_added, title, content \

FROM documents

sql_attr_uint = group_id

sql_attr_timestamp ; = date_added

sql_query_info ; = SELECT * FROM documents WHERE id = $ id

}

index test1

{

source ; = src1

path ; = / usr/local/coreseek/var/data/test1

docinfo = extern

charset_type = zh_cn.utf-8

mlock = 0

morphology = none

min_word_len = 1

html_strip = 0

charset_dictpath = / usr/local/mmseg3/etc /

ngram_len = 0

}

indexer

{

< p align = "left"> mem_limit = 32M

}

searchd

{

port = 9312

log = / usr / local / coreseek / var / log / searchd. log

query_log = / usr / local / coreseek / var / log / query.log

read_timeout = 5

max_children = 30

pid_file = / usr / local / coreseek / var / log / searchd.pid

max_matches ; = 1000

seamless_rotate ; = 1

preopen_indexes ; = 0

unlink_old ; = 1

}

 

Description: code sorce src1 {***} represents the main data source which contains a database of configuration information, src1 represents the data source name, you can just write.

 

snippet index test1 {***} on behalf of the data source to create an index, and source *** appear in pairs, in which the source parameter value must be the name of a data source .

 

generate index

                      

/ usr / local / coreseek / bin / indexer-c / usr / local / coreseek / etc / sphinx.conf - all

 

problems:

 

Question 1: If sh BUILD / autorun.sh

 

but sphinx that do not appear in configure-h inside, need to run sh BUILD / cleanup then run sh BUILD / autorun.sh then run. / configure-h now you can see a sphinx.

 

Question 2: If the compiler error to see if mysql is installed ncurses package

 

can do: yum list | grep ncurses

 

yum-y install ncurses-devel

 

yum install ncurses-devel

 

then run. / configure.

 

Question 3 in the installation LibMMSeg need to perform yum install mysql-devel libxml2-devel expat-devel

 

Question 4 When the installation MMSeg an error message as: css / UnigramCorpusReader.cpp: 89: error: 'strncmp' was not declared in this scope

 

manually modify the src / css / UnigramCorpusReader.cpp

 

above adds a

 

# include

 

then start compile and install it.

1 条评论:

  1. Without code that compiles and executes, I can't make any suggestions other than to add lots of println statements to print out messages to show execution flow and the values of variables as they are changed and used.

    回复删除