recently doing a search engine, mainly on aspects of object-level books search, first to understand under Sphinx bar.
it can improve the speed of your query, this is not a general fast.
Sphinx is a SQL-based full-text search engine, can be combined with MySQL, PostgreSQL do full-text searches, he can provide more professional than the database itself, the search function, makes the application easier to implement specialized full-text search. Sphinx some scripting language designed specifically for the search API interfaces, such as: PHP, Python, Perl, Ruby, etc., while also designed a MySQL storage engine plugin.
Sphinx largest single index can contain 100 million records, 10 million records in the case of query speed in milliseconds. Sphinx indexing speed is: create 1 million records indexed only 3 to 4 minutes, creating 10 million records index can be completed within 50 minutes, but only 100,000 records containing the latest incremental index rebuild time only tens of seconds.
Sphinx The main features include:
speed index (in the new CPU, nearly 10 MB / sec);
high-speed search (2-4G average amount of text query speed is less than 0.1 seconds);
high availability (single CPU the maximum support 100 GB of text, 100M documents);
provides good relevance ranking
support distributed search;
offer document summary generation;
available from MySQL pluggable storage engine within the search
supports Boolean, phrases, and synonyms queries;
supports multiple full-text fields per document (the default maximum 32);
support multi-attribute for each document;
support hyphenation;
supports single-byte encoding and UTF-8 encoding;
read the above characteristics was quite good, a look at the mode of use.
native MySQl storage engine retrieval process:
Sphinx-based storage engine search:
I still prefer to use the second storage engine, even if your programming language does not support an API interface Sphinx can also use yo.
before starting the installation you need to install the necessary components
yum-y install gcc g+ + gcc-c + + libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5 krb5-devel libidn libidn-devel openssl openssl-devel openldap openldap-devel nss_ldap openldap-clients openldap-servers patch libtool automake imake mysql-devel expat-devel |
(1) Install python support
yum install-y python python-devel |
(2) compile and install LibMMSeg (LibMMSeg Sphinx full-text search engine for the Chinese word segmentation design packages which are released under the GPL Chinese sub-lexical, using Chin-Hao Tsai's MMSEG algorithm. LibMMSeg In this paper, used to generate Chinese word thesaurus).
wget http://www.coreseek.com/uploads/sources/mmseg-0.7.3.tar.gz tar zxvf mmseg-0.7.3.tar . gz cd mmseg-0.7.3 . / configure make make install |
(1) compile and install MySQL5.1.26-rc, Sphinx, SphinxSE storage engine
wget http://blog.s135.com/soft/linux/nginx_php/mysql/mysql-5.1.26-rc.tar.gz tar zxvf mysql-5.1.26-rc.tar.gz
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz wget http://www. coreseek.com/uploads/sources/sphinx-0.98rc2.zhcn-support.patch wget http://www.coreseek.com/uploads/sources/fix-crash-in-excerpts.patch tar zxvf sphinx-0.9.8.rc2.tar.gz patch-p1 <.. / sphinx-0.98rc2.zhcn-support.patch # patch patch-p1 <.. / fix-crash-in-excerpts.patch # patch cp-rf mysqlse .. / mysql-5.1.26-rc/storage/sphinx cd .. /
cd mysql-5.1.26-rc / sh BUILD / autorun.sh . / configure - with-plugins = partition, innobase, myisammrg, sphinx - prefix = / usr / local / mysql / - enable-assembler - with-extra-charsets = complex - enable- thread-safe-client - with-big-tables - with-readline - with-ssl - with-embedded-server - enable-local-infile make & ; & make install cd .. / |
Start the MySQL database
cp support-files/my-medium.cnf / etc / my.cnf # configuration file cp support-files/mysql.server / etc / rc.d / mysqld # add MySQL Service Control cd / usr / local / mysql bin / mysql_install_db - user = mysql # install bin / mysqld_safe - user = mysql & # test installation was successful bin / mysql # to enter the MySQL command prompt Start Stop / etc / rc.d / mysqld start / etc / rc.d / mysqld stop So we ourselves create the file / etc / rc.local and give execute permission. Roughly says: #! / Bin / sh / usr / local / mysql / bin / mysqld_safe - user = mysql & ; or / etc / rc.d / mysqld start |
enter the following command appears SPHINX represents SphinxSE has been ported to MySQL went.
show engines; |
0.9.8 version used in this article, we recommend using version 0.9.9, 0.9.9 version is the most stable version, I finally turned into a 0.9.9 version.
Sphinx default does not support Chinese indexing and retrieval, previously Coreseek patch to fix, patch currently Coreseek not available separately, but the development of Coreseek Sphinx-based full-text search server, Coreseek should be now the most used Sphinx Chinese full-text search, which provides for the design of the Chinese word Sphinx package LibMMSeg contains mmseg Chinese word segmentation, in fact coreseek-3.2.14.tar.gz already contains a sphinx, when installed in front of SphinxSE can also use this compression bag mysqlse .
install autoconf
tar zxvf autoconf-2.64.tar.gz cd autoconf-2.64 . / configure-prefix = / usr make make install |
install Coreseek
tar zxvf coreseek-3.2.14.tar.gz cd coreseek-3.2.14 cd mmseg-3.2.14 / . / bootstrap . / configure - prefix = / usr/local/mmseg3 make make install cd .. / csft-3.2.14 / sh buildconf.sh . / configure - prefix = / usr / local / coreseek - without-python - without-unixodbc - with-mmseg - with-mmseg-includes = / usr/local/mmseg3/include/mmseg / - with-mmseg-libs = / usr/local/mmseg3/lib / - -with-mysql - host = arm make make install cd / usr / local / coreseek / etc |
enter the configuration directory through the command ls can see three files
example.sql sphinx.conf.dist sphinx-min.conf.dist
example.sql is an instance where we sql script to import into the database test database as the test data (documents created tables and tags table)
vi sphinx.conf
enter some content:
source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = 12345678 sql_db = test sql_port = 3306 ; # optional, default is 3306 sql_sock ; = / tmp / mysql.sock sql_query_pre = SET NAMES utf8 sql_query ; = \ SELECT id, group_id, UNIX_TIMESTAMP ( date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp ; = date_added sql_query_info ; = SELECT * FROM documents WHERE id = $ id } index test1 { source ; = src1 path ; = / usr/local/coreseek/var/data/test1 docinfo = extern charset_type = zh_cn.utf-8 mlock = 0 morphology = none min_word_len = 1 html_strip = 0 charset_dictpath = / usr/local/mmseg3/etc / ngram_len = 0 } indexer { < p align = "left"> mem_limit = 32M}
searchd { port = 9312 log = / usr / local / coreseek / var / log / searchd. log query_log = / usr / local / coreseek / var / log / query.log read_timeout = 5 max_children = 30 pid_file = / usr / local / coreseek / var / log / searchd.pid max_matches ; = 1000 seamless_rotate ; = 1 preopen_indexes ; = 0 unlink_old ; = 1 }
|
Description: code sorce src1 {***} represents the main data source which contains a database of configuration information, src1 represents the data source name, you can just write.
snippet index test1 {***} on behalf of the data source to create an index, and source *** appear in pairs, in which the source parameter value must be the name of a data source .
generate index
/ usr / local / coreseek / bin / indexer-c / usr / local / coreseek / etc / sphinx.conf - all |
problems:
Question 1: If sh BUILD / autorun.sh
but sphinx that do not appear in configure-h inside, need to run sh BUILD / cleanup then run sh BUILD / autorun.sh then run. / configure-h now you can see a sphinx.
Question 2: If the compiler error to see if mysql is installed ncurses package
can do: yum list | grep ncurses
yum-y install ncurses-devel
yum install ncurses-devel
then run. / configure.
Question 3 in the installation LibMMSeg need to perform yum install mysql-devel libxml2-devel expat-devel
Question 4 When the installation MMSeg an error message as: css / UnigramCorpusReader.cpp: 89: error: 'strncmp' was not declared in this scope
manually modify the src / css / UnigramCorpusReader.cpp
above adds a
# include
then start compile and install it.
Without code that compiles and executes, I can't make any suggestions other than to add lots of println statements to print out messages to show execution flow and the values of variables as they are changed and used.
回复删除