FS#12822 - Search in wiki bad.

Attached to Project: Arch Linux
Opened by kongokris 2 (nut543) - Friday, 16 January 2009, 20:00 GMT
Last edited by Pierre Schmitz (Pierre) - Friday, 27 February 2009, 19:22 GMT
Task Type General Gripe
Category Web Sites
Status Closed
Assigned To Pierre Schmitz (Pierre)
Architecture All
Severity High
Priority Normal
Reported Version None
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

For example when i search for "scan" i don't get the main page for http://wiki.archlinux.org/index.php/Scanner_setup_%26_configure up except as linked from other pages the first result of which i see as result 33...

I do get:

http://wiki.archlinux.org/index.php/Scanning_tips

and

http://wiki.archlinux.org/index.php/USB_Scanner_Support

But not before search-result 5 & 6 (!)

Surely entries specifically about "scanning" need to take precedence to all the other entries?

I see two possibilities (granted without knowing the system very well) here: 1. make entries with the titlename of which one is searching appear closer to the top 2. Implement tags or clarify for the user his ability to make tags so the search will get better..

I'm aware that it might be tempting to just suggest that the wiki should just be edited(all those articles merged to one called Scanning for example) and managed better but that would just be treating the symptom because there will always be users who doesn't bother or think to bother before they add to the wiki and there will always be admins/managers of the wiki(do we even have one?) which isn't uptodate with their work..

However if you got an other idea for implementation i'm all ears(google wasn't that much better as wikisearchengine). I sat severity to high since i think finding help is damn important...
This task depends upon

Closed by  Pierre Schmitz (Pierre)
Friday, 27 February 2009, 19:22 GMT
Reason for closing:  Won't implement
Comment by Dan McGee (toofishes) - Saturday, 17 January 2009, 06:02 GMT
Not sure what we can do here. Sounds like a gripe with either Mediawiki's search or disorganization of the wiki, both of which are a bit out of our control.

Feel free to lead an initiative to get articles consolidated and renamed for ease of finding. In addition, cross-article links can help a lot sometimes.
Comment by kongokris 2 (nut543) - Sunday, 18 January 2009, 15:50 GMT
I asked on #mediawiki(no idea what merit the answers holds though..)
<ekimmargni> hairball: There have been some recent upgrades to the searching, depending on what you use for that...
<Nikerabbit> hairball: 1) update 2) do 1) first
<ekimmargni> I think we use MWSearch
<Nikerabbit> if that doesn't help you could try lucene

If this bug entry can also hold as "Howto make wiki search better" perhaps the following can also help:

from the mediawiki faq:

…is a search for a short keyword giving no hits?

By default, MediaWiki uses MyISAM's fulltext matching functionality to allow searching page content. The default settings for this mean that words of less than four characters won't be indexed, so results won't be returned for those queries.

To alter this behaviour, MySQL needs to be reconfigured to index shorter terms, and MediaWiki's search index table needs to be repaired, to rebuild the indices.

* For help on reconfiguring MySQL, see http://dev.mysql.com/doc/refman/4.1/en/fulltext-fine-tuning.html
* To repair the search index table, run the query REPAIR TABLE searchindex QUICK; against your database

Comment by kongokris 2 (nut543) - Sunday, 18 January 2009, 16:26 GMT
I asked on #mediawiki(no idea what merit the answers holds though..)
<ekimmargni> hairball: There have been some recent upgrades to the searching, depending on what you use for that...
<Nikerabbit> hairball: 1) update 2) do 1) first
<ekimmargni> I think we use MWSearch
<Nikerabbit> if that doesn't help you could try lucene

If this bug entry can also hold as "Howto make wiki search better" perhaps the following can also help:

from the mediawiki faq:

…is a search for a short keyword giving no hits?

By default, MediaWiki uses MyISAM's fulltext matching functionality to allow searching page content. The default settings for this mean that words of less than four characters won't be indexed, so results won't be returned for those queries.

To alter this behaviour, MySQL needs to be reconfigured to index shorter terms, and MediaWiki's search index table needs to be repaired, to rebuild the indices.

* For help on reconfiguring MySQL, see http://dev.mysql.com/doc/refman/4.1/en/fulltext-fine-tuning.html
* To repair the search index table, run the query REPAIR TABLE searchindex QUICK; against your database

Comment by kongokris 2 (nut543) - Monday, 26 January 2009, 20:11 GMT
ok.. i see noone else knows either.. just a search button that uses google in the mean time then.
Comment by kongokris 2 (nut543) - Wednesday, 28 January 2009, 15:32 GMT
one next to or under the wiki search button would make it much better atleast.
Comment by kongokris 2 (nut543) - Wednesday, 25 February 2009, 19:50 GMT
with the new mediawiki update can someone assign this to pierre?

I just tried google again and for the example in the bug description above and it's even worse than the internal search now..

there seems to be some light in the tunnel however. with the new mediawiki update it's a new option after you've searched which you can use to 'list pages that start with ... ' The problems are however

* it's case-sensitive, which isn't exactly obvious
* you have to make a search first to see this possibility
* it's visibility is low even when the option is on screen
and to top it off it's "hard" to "see" the queryresult when you've done all that..

see for yourself
Comment by kongokris 2 (nut543) - Wednesday, 25 February 2009, 19:53 GMT
/it's case-sensitive, which isn't exactly obvious/it's case-sensitive, which isn't exactly optimal/
Comment by Pierre Schmitz (Pierre) - Friday, 27 February 2009, 09:21 GMT
I have no idea what you think I should do about this. MediaWiki just uses the fulltext search built in into mysql. This is more or less "simple" text matching. Google works quite differently (pageranking, linking authorities etc.)

I won't write another search backend for MediaWiki. This would take a lot of time and is everything but easy. Maybe you want to discuss this with the MediaWiki developers.
Comment by Dan McGee (toofishes) - Friday, 27 February 2009, 13:24 GMT
Pierre- I'm fine with you closing this, I just wanted to make you aware this "bug" has been out there for a while. I personally find the searching to work OK most of the time, so I have little stake in this.

I agree with your sentiment that MW development is best left to their team, and anyone that thinks the search is that bad should try to raise the issue with them.
Comment by kongokris 2 (nut543) - Friday, 27 February 2009, 15:59 GMT
  • Field changed: Percent Complete (100% → 0%)
argh 5 min late:

concrete things to do:

disable all of mediawiki's case-sensitivity: it's not clear that the reason your not getting a hit is because you forgot to CAPS your searchresult correctly. Even so, it's no way to know which way an articlewriter decided to cApS his article.
introduce the (all pages starting with "scan") (you know, the link you get *after* doing a search...) as a button under searchbox or atleast if that's not possible include it as a link under the searchbuttons.

http://www.mediawiki.org/wiki/Manual:FAQ
Comment by Aaron Griffin (phrakture) - Friday, 27 February 2009, 15:59 GMT
Is it possible to disable case-sensitive search without patching mediawiki?
Comment by Pierre Schmitz (Pierre) - Friday, 27 February 2009, 16:05 GMT
The search is case-insensitive by default. Or did you mean it should be case-sensitive?
Comment by kongokris 2 (nut543) - Friday, 27 February 2009, 18:58 GMT
@pierre, youre right. with the new mediawiki update it is now case-insensitive which is GOOOOD.

That leaves point nr.2 though which would also help a lot.
Comment by Pierre Schmitz (Pierre) - Friday, 27 February 2009, 19:22 GMT
As I said: I won't programm anything myself. Please report this upstream.

Loading...