Using MySQL Full-text Searching

November 30, -0001

Tutorials

Intended Audience
Overview
Learning Objectives
Definitions
Background Information
Prerequisites
Synopsis

Initial Ideas
The Solution: Setup
The Solution: Actually Doing Something
More about Basic Searching
Example: Basic Searching Application
Advanced Boolean Searching
Boolean: The Basic Technical Aspect
A Basic Boolean Searching Application
Resources

About The Author:


Intended Audience

This tutorial is intended for developers using MySQL
(http://www.MySQL.com/) and PHP
(http://www.php.net) who want to create a
searchable database of some sort of textual data. It will focus on the
Full-text capabilities presented by MySQL, moving into the Boolean opportunities
that are presented in the latest alpha version, 4.1, of MySQL.


Overview

Using directories to group articles by category is a great way
to help people to navigate through many articles. At some point, however,
someone will want to find all the articles that pertain to a certain
topic that may not have a directory of it’s own, or may span many
directories . This is what the search engine is for.


Learning Objectives


In this tutorial, you will learn:

  • How to modify your current
    database to accommodate Full-text
    searching
  • How to use a simple
    Full-text search to quickly gather relevant
    responses
  • How to execute a more
    complex Full-text search with Boolean
    capabilities
  • Tips on what to do,
    and what not to do, as well as the security implications of some of the example
    scripts presented.

Definitions

  • MySQL – An Open
    Source database that is used by many PHP developers for it’s support and
    speed, as well as because it’s
    free.
  • Full-text
    Built in functionality in MySQL that allows users to search through certain
    tables for matches to a
    string.
  • Boolean Search

    – A search which allows users to narrow their results through the use of
    Boolean operators.

  • Boolean
    Operators
    – A deductive logical system by which a user can narrow
    results through the use of AND, OR, XOR, and other
    operators.


Background Information


Before the advent of the search engine, users had to search
manually through dozens – or hundreds – of articles and tidbits to
find the ones that were right for them. Nowadays, in our more user-centered
world, we expect the results to come to the user, not the other way around. The
search engine gets the computer to do the work for the user.


Prerequisites

  • MySQL version 3.23.23 or better
    for the simple Full-text
    searching
  • MySQL version 4.1 alpha
    or better for the complex Boolean
    searching
  • PHP & A knowledge
    thereof.


Synopsis


Let’s start with a quick review of our
situation:

We have a database that contains articles. We might create a
table of database contents using a statement like this:

CREATE TABLE articles (body
TEXT, title VARCHAR(250), id INT NOT NULL auto_increment, PRIMARY
KEY(id);

Let’s say we have about 100 of these articles, covering
various topics: MySQL, PHP, and various other topics of that sort. How do the
users find the tutorials they want? Remember, we need to bring the results to
the user. This is going to be a search engine operation.


Initial Ideas


When I started to work with my first database which was only a
tenth of the size, my MySQL query went something like this:

SELECT FROM articles WHERE body LIKE '%$keyword%';

This was slow and inefficient.Every time someone searched for an article, they got
far too many results, and as the database grew the system became downright
shameful.

So what is the solution? It’s right here: Full-text
Searching.



The Solution: Setup

Full-text Search is a feature introduced to MySQL in version
3.23.23. This is how I used it to fix my problem:

I started out with an update to my table:


ALTER TABLE articles ADD FULLTEXT(bodytitle);


This set ups our Full-text index. The (body, title) part tells us that we can search
the body and title for keywords later on. We’ll find out how to use this
later, once we’ve overcome a potential problem.

In my original database BLOB was my datatype for the body of
the article. What’s the problem, you ask? BLOBs are meant primarily for
binary data. What use is searching binary data? MySQL has been programmed not
to index BLOB datatypes for Full-text searching. If you try to index BLOB
datatypes, you get an Error 140.

The fix for this is simple:


ALTER TABLE articles MODIFY body TEXT;


That switches datatype from BLOB to TEXT, thus making a useful column for searching.


The Solution: Actually Doing Something


How do we get results? Let’s jump right in and try it
out:


<?php

    MySQL_connect
("hostname""username""password");

    MySQL_select_db("our_db");

    
$query "

        SELECT * FROM articles

        WHERE MATCH(title, body) AGAINST ('PHP')

    "
;

    
$sql MySQL_query($query);

    /* output results */

?>

What will this give us? Well, let’s go over
Full-Text first.

According to the MySQL manual, Full-text is a “natural
language search”; it indexes words that appear to represent the row, using
the columns you specified. As an example, if all your rows contain
“MySQL” then “MySQL” won’t match much. It’s
not terribly unique, and it would return too many results. However, if

“MySQL” were present in only 5% of the rows, it would return those
rows because it doesn’t appear too often to be known as a keyword
that’s very common. (If you have “MySQL” in none of your
rows, it’ll return nothing; duh.)

MySQL also does something pretty useful. It creates a score.
This score is usually something like .9823475 or .124874, but always larger than
zero. It can range up above 1, and I have seen it at 4 sometimes. (Don’t
try to multiply it by 100 and portray it as a % value; people will wonder why
their keyword matches an article 431%!)

MySQL will also order a row by its score, descending.

Another useful tidbit: If you use
MATCH() AGAINST() Change the
document style for this to “Inline Code” twice in a query, as we
will, there is no additional speed penalty. You might expect that because you
are executing the same search twice the query would take twice as long, but in
fact MySQL remembers the results from the first search as it runs the second.

So, let’s talk about the actual query: We are taking every
column from articles, and searching “title” and “body” for
$keyword This is also Inline
Code. Pretty simple.

And if we want to display the score too:


<?php

    
/* connect to MySQL (same as always) */

    
$query "

        SELECT *,

            MATCH(title, body) AGAINST ('PHP') AS score

        FROM articles

        WHERE MATCH(title, body) AGAINST('PHP')

    "
;

    
$sql MySQL_query($query);

    /* display the results... */

?>


More about Basic Searching


What more is there to say? Well, there’s another feature I
can introduce to you.

When most people meet up with a search box they don’t type in
only one word. Not knowing the backend, they just type in as many words as they
feel like!

MySQL realizes this and deals with it. If I were you, the
only thing I would do is remove the commas that might be there, using
str_replace. MySQL will take
all the words, split them up, and then match using a natural language search.

As a secondary note, you should never send input directly from
the user to the MySQL prompt because any number of characters could terminate
your MySQL query and begin another dastardly statement. (This is presuming you
replace PHP with a $keyword in
the above script.)


Example: Basic Searching Application


Now that we know all about basic searching, you ask: What more
is there to learn? Well, not much except that I’ve created a quick sample
application for you.

Let’s launch straight into the code. This bare bones
application will search for a phrase or a keyword that the user
inputs:

<?php

    
/* call this script "this.php" */

    
if ($c != 1) {

?>

<form action="this.php?c=1">

<input type="text" name="keyword">

<input type="submit" value="Search!">

</form>

<?php

    
} else if ($c==1) {

        
MySQL_connect("hostname""username""password");

        MySQL_select_db("database");

        
$sql "

            SELECT *,

                MATCH(title, body) AGAINST('$keyword') AS score

                FROM articles

            WHERE MATCH(title, body) AGAINST('$keyword')

            ORDER BY score DESC

        "
;

        
$res MySQL_query($sql);

?>

<table>

<tr><td>SCORE</td><td>TITLE</td><td>ID#</td></tr>

<?php

        
while($row MySQL_fetch_array($rest)) {

            echo 
"<tr><td>{$sql2['score']}</td>";

            echo "<td>{$sql2['title']}</td>";

            echo 
"<td>{$sql2['id']}</td></tr>";

        }

        echo 
"</table>";

    }

?>

What does this script do? First, it checks
$c to see if user input has been
sent. If it has not, the form is displayed. If it has, the script moves
onwards.

The same query that we’ve been using is used here: we match
against what the user inputs. We then draw a table and display it in
[semi-]pretty form. The ORDER BY score
DESC

Code Inline makes sure that the best scores (the most
accurate matches) are shown first.

Important note: Never use this simple script in any production
form because I have done absolutely no error checking. The
$query variable provides an easy
opening for an intruder to input something nasty into your query that might
destroy your data.


Advanced Boolean Searching

If you need more options in your MySQL searching, or you
haven’t finished your coffee yet, keep reading. The advanced search
engine tutorial begins here.

Before we get started into the magic of bool, I recommend you
do a quick SELECT version();

Code Inlineon your MySQL server. I spent several hours battling my
computer until I read this line in the MySQL manual:

As of Version 4.0.1, MySQL can also perform Boolean
full-text searches using the IN BOOLEAN MODE modifier.

Whoops; 4.0.1 is the newest, alpha release of MySQL
Still
check for this… it should by MySQL. If you’re looking to use this on a
production server, I’d strongly recommend against that decision. I found I was
using a 3.23.23, and I had to set up an experimental MySQL server to use the
bool functions that it offers now.

Overall, I was very pleased with the performance of the new
bool searching; the scoring system is changed, but one can still manage. Within
15 minutes of upgrading, I had a simple bool search page up and running on my
articles database.


Boolean: The Basic Technical Aspect

The only thing you change to use Boolean mode is the
AGAINST() part of your query.
You add IN BOOLEAN MODE to the
very end of it, and place the arguments right before it. E.g. to search for all
the articles that contain the word PHP, you could
write:

SELECT * FROM articles WHERE
MATCH(title, body) AGAINST('PHP' IN BOOLEAN MODE)DR[10]

That will find all the articles that contain the word “PHP”
somewhere in them. It’s a fairly simple search. If you were to get more
complex, and wanted everything that has to do with PHP, but not with
MySQL
, then you could execute this statement:


SELECT FROM articles WHERE MATCH(titlebodyAGAINST('+PHP -MySQL' IN BOOLEAN MODE);


There are more modifiers that one can use to search with, and
I will quote from the MySQL manual since I see no point in typing out a synopsis
of the manual:





































Operator

Meaning


By default (when neither plus nor minus is specified) the word
is optional, but the rows that contain it will be rated higher. This mimics the
behavior of MATCH() ...
AGAINST()DR


+

A leading plus sign indicates that this word must be present
in every row returned.

-

A leading minus sign indicates that this word must not be
present in any row returned.

< >

These two operators are used to change a word’s contribution
to the relevance value that is assigned to a row. The < operator decreases
the contribution and the > operator increases it. See the example below.


( )

. Parentheses are put round sub-expressions to give them
higher precedence in the search.

~


A leading tilde acts as a negation operator, causing the
word’s contribution to the row relevance to be negative. It’s useful for marking
noise words. A row that contains such a word will be rated lower than others,
but will not be excluded altogether, as it would be with the minus
operator.

*

An asterisk is the truncation operator. Unlike the other
operators, it is appended to the word, or fragment, not prepended.



Double quotes at the beginning and end of a phrase, matches
only rows that contain the complete phrase, as it was typed.



A Basic Boolean Searching Application


Again, we’ll start with the code straight off:


<?php 

   
/* call this script "advs.php" */

   
if(!$c) { 

?>

<form action="advs.php?c=1" method=POST>

<b>Find Results with: </b><br>

Any of these words: <input type="text" length=40 name="any"> <br>

All of these words: <input type="text" length=40 name="all"> <br>

None of these words: <input type="text" length=40 name="none"> <br>

<input type="submit" value="Search">

</form>

<?

   
} else if($c) {

   
MySQL_connect("hostname""username""password");

       MySQL_select_db("database");

   if((!
$all) || ($all == "")) { $all ""; } else { $all "+(".$all.")"; }

   if((!$any) || ($any == "")) { $any ""; } 

   if((!
$none) || ($none == "")) { $none ""; } else { $none "-(".$none.")"; }

   $query "

       SELECT *,

          MATCH(title, story) AGAINST ('$all $none $any' IN BOOLEAN MODE) AS score 

          FROM compsite 

       WHERE MATCH(title, story) AGAINST ('$all $none $any' IN BOOLEAN MODE)"
;

      
$artm1 MySQL_query($query);

      if(!$artm1) { 

         echo 
MySQL_error()."<br>$query<br>"

      }

      echo 
"<b>Article Matches</b><br>";

      if(MySQL_num_rows($artm1) > 0) {

         echo 
"<table>";

          echo 
"<tr><td>Score </td><td>Title </td><td>Body</td></tr>";

             while($artm2 MySQL_fetch_array($artm1)) {

            
$val round($artm2['score'], 3);

            $val $val*100;

            echo 
"<tr><td>$val</td>";

            echo 
"<td>{$artm2['title']}</td>";

            echo "<td>{$artm2['body']}</td></tr>";

         }

      echo 
"</table>";

   }

   else { 

      echo 
"No Results were found in this category.<br>"

   }

   echo 
"<br>"

   }



After we get the input from the form,
$c Code Inlineis set to 1
and we start the real work.

First we check our input. If it’s empty, we leave it empty, if
it’s not, we append the proper + or – to it. The parentheses are to allow for
the user typing more than 1 word in a given field.


$query "

    SELECT *

        MATCH(title, story) AGAINST ('$all $none $any' IN BOOLEAN MODE) AS score

    FROM compsite

    WHERE

        MATCH(title, story) AGAINST ('$all $none $any' IN BOOLEAN MODE)"
;



That’s the final query that we use.
$all,
$none
Code Inline, and $any have already been prepared for
the query, and they are inserted. Score is returned as a column to order them
by (if we wanted to do that), and from there on, we just have to output the
results.


   if(MySQL_num_rows($artm1) > 0) {

      echo "<table>";

    echo 
"<tr><td>Score </td><td>Title </td><td>Body</td></tr>";

          while($artm2 MySQL_fetch_array($artm1)) {

         
$val round($artm2['score'], 3);

         $val $val*100;

         echo 
"<tr><td>$val</td>";

         echo 
"<td>{$artm2['title']}</td>";

         echo "<td>{$artm2['body']}</td></tr>";

      }

   echo 
"</table>";



That’s the output code. If there’s less than 1 row to output, we send a “no records found”
message out.

That’s about it for Full-Text searching.


Resources


MySQL Man Page for Full-Text:
http://www.MySQL.com/doc/F/u/Fulltext_Search.html

I found the following note on the Man pages for Full-text. It
might be useful for anyone trying to work out why their database returns no
matches.

           Extra [important] Note from the Man Page: Tim Gustafson:
Note: you should add at least 3 rows to the table before you try to match
anything, and what you’re searching for should only be contained in one of the
three rows. This is because of the 50% threshold. If you insert only one row,
then now matter what you search for, it is in 50% or more of the rows in the
table, and therefore disregarded.


About The Author


Jim Ferrara has never held an IT job in his life, but will be
heading off to Old Dominion University in Virginia, USA in the fall. He
currently works at Outback Steakhouse as a cook and loves it. Questions or
comments can be directed to
jferr007@odu.edu.

14 Responses to “Using MySQL Full-text Searching”

  1. lysanderd Says:

    Good article for certain, and still making peoples’ lives easier even years later I imagine. So that noobies finding this article have an even easier time, here’s a few tips (and corrections):

    First off, to answer a question posted by rajuru (quite awhile ago now), the IN BOOLEAN MODE as specified here actually disables the auto-sort, so ORDER BY ‘score’ DESC becomes necessary to get things straight again (it’s also been mentioned, but to reiterate this WILL slow things down on larger databases with a few thousand records or more).

    Second, the AND and NOT operations are actually wrong in the article (at the very least wrong for MySQL 5):

    [code]
    if((!$all) || ($all == "")) { $all = ""; } else { $all = "+(".$all.")"; }

    if((!$none) || ($none == "")) { $none = ""; } else { $none = "-(".$none.")"; }
    [/code]

    This should be something more like:

    [code]
    if((!$search_all) || ($search_all == "")) { $search_all = ""; } else { $search_all = "+".str_replace(' ', '+', trim(str_replace(',', ' ', $search_all)))." "; }

    if((!$search_none) || ($search_none == "")) { $search_none = ""; } else { $search_none = "-".str_replace(' ', '-', trim(str_replace(',', ' ', $search_none)))." "; }
    [/code]

    This will turn commas into spaces, trim leading and trailing spaces off, and then turn spaces into ‘+’ or ‘-’ so that the terms of the select statement are nice and separated and work properly (extra +’s and -’s are ignored and won’t hurt anything as long as you trim() your GET and POST values). In the original setup, whether by design of newer MySQL syntax or even in older version (I don’t know), it wouldn’t actually search for the proper terms and instead only put a + or – in front of the first term entered (second term, third term, etc treated as OR).

    Also be absolutely sure to AT THE VERY LEAST use a function like this at the top of your scripts:

    [code]
    function from_array($x=null){
    if(!isset($x)) return null;
    else if(is_string($x)) return trim(strip_tags(mysql_real_escape_string($x)));
    else if(is_array($x)){
    foreach($x as $k=>$v){
    $k2 = trim(strip_tags(mysql_real_escape_string($k)));
    if($k!=$k2) unset($x[$k]);
    $x[$k2]=from_array($v);
    }
    return $x;
    }
    }
    [/code]

    Throw your POST and GET arrays into that, with $_GET = from_array($_GET); for instance so you bring a little security into play before implementing a search function. strip_tags can be removed if you want to be able to search for HTML through the function though.

    Hope that helps somebody, cheers.

  2. mark1inla Says:

    Hello,

    Thanks for the article, this is a very popular page based on the number of times it ranked #1 on searches.

    It gives a very breakdown of what full-text searches are how theyre useful.

    There are a couple ‘curveballs’ but it actually forces people to learn more about it:
    -the default setting for the min. number chars to search is 4. so the PHP example above will probably return an empty search (i haven’t tested it specifically but did encounter in my own experience).
    -also, you can index multiple columns together like title and description but i found that the searches are too strict. if the keyword isn’t in both, it will not be returned. so in my case, i had to create an fulltext index for columns and add the score together for relevance.

    Hope this helps as much as this helped me.

    Thanks

  3. rajuru Says:

    Hi,
    it is a nice and clean article. However, I do not see any necessity of using ORDER BY `score` DESC because when use Match() is used in WHERE clause MySQL automatically sorts it in descending order off the relevance. Isn’t it?

    Yes, we may need it when we want different orders.

    Thanks again for nice tut!

  4. _____anonymous_____ Says:

    Very good article

  5. _____anonymous_____ Says:

    Good article, but warning if you are using a database of any substantial size you should avoid ORDER BY on any FULLTEXT searches, unless you want extremely large query times…

  6. _____anonymous_____ Says:

    So, I understand how to do it, and it works for the most part.

    But, let us assume that your term column has a term like:

    Relative Strength Index (RSI)

    If i do a full text search for ‘RSI’ it doesn’t pull anything up; yet we can clearly see that there is an entry for it.

    How can i get that to work?

  7. _____anonymous_____ Says:

    That is a great tutorial!!
    You should mention thow that fulltext is supported only when using
    MyISAM as engine for the specific tables, otherwise it won’t work :-(

    One question thow…
    Do I have to perform some extra customization to this if I use utf8 encoding?
    My db stores text in greek also (encoded as utf-8) and when I query my table I always get no results…

    Thanx a lot!

  8. _____anonymous_____ Says:

    Yes, It is very beneficial for us.

    Thanks

  9. _____anonymous_____ Says:

    @bayofislands:

    fulltext only searches queries larger than 3 letters (perhaps 4) so that is your problem. i believe you can change this in the mysql configuration.

  10. _____anonymous_____ Says:

    If it doesn’t work on your ISP you may have to change the "c" variable to $_GET['c'] and the "all" "any" and "none" variables to $_POST['all']

  11. _____anonymous_____ Says:

    Astanda.com uses this:

    $result = mysql_query(‘SELECT *, MATCH(`url`,`title`,`description`) AGAINST ("’.$searchwords.’") as Relevance FROM `links` WHERE MATCH (`url`,`title`,`description`) AGAINST("’.$searchwords.’" IN BOOLEAN MODE) ORDER BY `Relevance` DESC, `gpr` DESC LIMIT ‘.($page-1)*$limit.’,’.$limit.”);

  12. bayofislands Says:

    clair – i had the same problem, on my side it seemed to be from me copying and pasting the mySql query from a web browser into the mySql query box. not sure if that’s the same problem you’re having (i was using cocoa sql on a mac), but thought i’d chip in.
    just got it working today on my <a href="http://www.bayofislands.net/">bay of islands</a> site, have aplay around in the search box top right. Not yet got any kind of three character search going, thinking of doing a LIKE % $search % just for single 3 letter words just to get something.

  13. turnerleafe Says:

    This is a great article. Thank You.

  14. _____anonymous_____ Says:

    Hi Jim –
    First, thanks for excellent (clear) explanation of how to implement FULLTEXT.

    1. I tried hard to register but wouldn’t let me because it wanted a ZIP CODE. BUT, there is no entry area for entering a zip code– you might want to check this out.

    2. I modified my MySQL db for FULLTEXT and all went well. Then I tried to use the exact query you used:

    $query = "SELECT * FROM blog_entries WHERE MATCH
    (title, entry) AGAINST (‘$searchword’)";

    the "searchword" comes from a one line form also exactly like yours.

    This is the error msg. I get:
    You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(blog_entry=) AGAINST (searchterm=float) )’ at line 2.

    The query was SELECT * FROM blog_entries WHERE MATCH ( (blog_entry=) AGAINST (searchterm=testing) ).

    "testing" was the search term I entered in the form. The table name is "blog_entries". I have no idea where the "blog_entry" in the error msg. came from.

    title and entry are both columns in the table and are the columns which I modified to be treated as FULLTEXT.

    Any help very much appreciated.

    And again, thanks for your work.

    Clair