FS#7132 - Package meta info: 'category'
|
Details
I've been tossing this around in my head for a while and
just wanted to write it down.
The idea is an additional package member, we'll call it "category" for now, that indicates simply the categorization of a package. It is similar to "groups" but is not the same. It's more like a tagging scheme. Some examples: pkgname=cups category=(printing) pkgname=mplayer category=(media audio video) (I can't think of any good ones) What does this give us? It gives us additional info that we can use when searching (pacman -Ss --category audio foo), display information for helpful hints (community/democracy-player - media iptv torrent) and things like that. I do not foresee categories being installable like groups, but at the same times, using groups for this doesn't work *because* of the installability (pacman -Sg video seems daft). Opinions welcome. |
This task depends upon
This is a very good case too - who would ever "pacman -Sg xorg-video-drivers"? Do you have 30 video cards in your machine?
8)
Personally I'd like to see pacman knowing the ABS categories, and be able to list base packages for example. This could either be yet another variable or you simply mark bash as categories=(base shells) for example.
Maybe implementing this in webinterface only would be enought?
I think most users search for new software in internet anyway,
so there's little outcome for duplicating this functionality in pacman (and increasing the complexity).
Besides, recategorization (in case we'll want to split/merge/rename/remove some categories)
would be not easy to do when data will be stored in package instead of DB
(as this would require modifying all relevant packages).
Opinions?
http://archlinux.org/pipermail/pacman-dev/2007-June/008555.html
While developing AUR2 I started thinking about this again. I really hate how categories only exist on the web interface, and it's not really ideal to have such rigid categories. I think it would be much better if the package itself could define which category it belongs to, since I often have no idea which category to select in the AUR package upload form. Also having an array in the PKGBUILD would allow us to define multiple categories and in turn describing the package better.
If I would want to know which multimedia software was available I now have to search for say: media,music,video,player,burn,encode... and whatever words such package-description could contain. I am making a dialog interface for pacman and the category browsing like Xilon describes would make it a lot nicer.
If we can get this in pacman soon-ish, that means the SVN move will be near flawless.
http://github.com/Daenyth/pacman-tags/commits/tags
http://archlinux.org/pipermail/pacman-dev/2008-August/012838.html
http://archlinux.org/pipermail/pacman-dev/2007-June/008555.html
I would just call them tags then :)
I have the feeling we will need to carefully orchestrate the usage of these tags. (eg to not have people using printers and others printing), maybe have an official list of recognized tags? and maybe even have namcap complain about an unrecognized tag?
much more appropriate and effective to write a good, complete description.
Adding tags isn't worth the added bloat. Just write a good description and
people will find the package. No extra code required.
I think this is silly and will produce the _opposite_ of what you think it will. People will struggle to cram words into their descriptions to make it searchable.
"An audio / video player that support streaming" is awfully generic, catches the buzzwords, and really tells me nothing about the app.
Additionally, parsing natural language for ANY of our tools is going to be ridiculous - the point of meta-info is to be machine readable. Proposing that description fits meta info such as this, is like proposing pkgver and pkgrel are useless because we could just use "pkgname=foobar-1.2-3"
> think it will. People will struggle to cram words into their
> descriptions to make it searchable.
> "An audio / video player that support streaming" is awfully generic,
> catches the buzzwords, and really tells me nothing about the app.
That awfully generic description isn't sufficient then, it should be
elaborated upon. Maybe this could be a case for an optional long
description field.
> Additionally, parsing natural language for ANY of our tools is going
> to be ridiculous - the point of meta-info is to be machine readable.
> Proposing that description fits meta info such as this, is like
> proposing pkgver and pkgrel are useless because we could just use
> "pkgname=foobar-1.2-3"
pkgver and pkgrel aren't exactly useless because they aid in a machine
operation: that is dependency and version tracking.
Meta info like tags and descriptions are for human functions: searching
and browsing. Doesn't package search already work via pkgname and
pkgdesc? It seems to have been pretty sufficient so far.
I'm not proposing that description is for machine use. It's for humans
to search or browse through to find packages that fit their criteria.
Having tags and allowing people to search/grep/browse through them isn't
really that different than having those terms in the description. If a
package's description doesn't contain any relevant terms, then it can't
be considered a proper description.
The problem is that adding tags adds a new infrastructure for
little gain. Then you've got to have to standardise tags, or just have
multiple similar tags: games.devel devel.games. The issues that exist
with descriptions will still exist, and solutions may be even more
contorted than ever. Tags would only be useful to reinforce a description
when it comes to alternate spellings: color vs colour.
You have to really think about the purpose of tagging. The machine
doesn't care if a package is an audio player or a text editor. All it
cares about is its name, its dependencies and its version.
Recommending more complete descriptions and maybe more powerful search
methods is the better alternative to tags or categories.
I do NOT agree in the least bit that better descriptions are going to do anything of the sort. If text was all that was required to categorize something, then why is it even blog posts are typically categorized with tags? Hell, why does anything use tags if this were the case?
> doesn't care if a package is an audio player or a text editor. All it
> cares about is its name, its dependencies and its version.
We already have umpteen pieces of meta info "the machine" doesn't care about (since when do we care about what the machine does and not what the user does with said machine?). If we really wanted to be purist about this, we'd stick with the original CRUX format.
packages in the first place. So no parsing or thesaurus or category system.
As far as I can tell, categories are really only meant to help in searching,
and that can be improved by writing better descriptions.
I'm not convinced implementing categories would significantly improve usability.
I don't think that tags offer any greater significant usefulness for searching
over the blog text itself, or a good text description of an object like an image.
Maybe it could be useful for poems with lots of abstract imagery, but we're
dealing with packages here. Descriptions shouldn't be vague or abstract.
Tags for browsing blogs on the other hand is a lot more relevant. People read
random blogs as entertainment sometimes. I don't think people install random
packages for entertainment. So, lets keep the tags for the bloggers.
> random blogs as entertainment sometimes. I don't think people install random
> packages for entertainment. So, lets keep the tags for the bloggers.
People do install random packages though. Ok, they're not random, but the users also don't know exactly what they're getting. This is the "I want a to do X, Y and Z, I want to find all programs that do this" use case. I think this is the main use for tags. I already mentioned this on the aur-dev ML (there's a similar thread). A description, even a good one, won't give the same accuracy and refinement that tags would. Even tough a description has a keyword, it doesn't mean that the keyword is associated with that package (for instance, in the rare case of a negative, e.g., "without foobar support"). Tags allow for easy "indexing" of data. A description would often have to be specifically crafted to include the correct keywords, so that the software could be found. Tags can contain arbitrary text, with no connection to any other tags. Words in a sentence can not.
This is somewhat an insignificant point, but filtering by tags is much more efficient than filtering by words in descriptions. It's another reason why blogs use tags, it's just more efficient to search.
I have a feeling the performance would be the same, or very similar,
at least when it comes to the pacman DB format.
pacman -Ss audio
pacman -Ss music
pacman -Ss mp3
firefox http://google.com/?q=linux+audio+players
I'm proposing an easier way to do this, via categorization. Yes, this does mean we would need a list of "suggested categories", but that's minor and not a necessity.
I'm proposing converting an N-step process into a 1-step process, making it easier on the end user
pacman -Ss music
pacman -Ss mp3
That could be helped by supporting OR searches.
If OR is the default, then AND searching with tags may become fugly.
You could provide a new flag:
pacman -Sso audio music mp3
Which looks a bit nicer than:
pacman -Ss 'audio|music|mp3'
Searching efficiency is a side effect of proper categorization, not the reason for it.
I posit that categorization of packages is a good idea, you think it is not. Let's just leave it at that
or categorization will offer much benefit.
This report has gotten completely off topic from the original request. Although Aaron said "opinions welcome", this back and forth is a bit out of control.
I have no objections to categories/tags/whatever, especially since this is a package manager deal, and it is really up to the distribution (e.g. Arch) to determine how to use them. This bug report should not be at all about policy, and rather whether categories make sense from a package manager perspective.
Long descriptions are definitely a bad idea because packagers WILL NOT maintain them. Think about the 'release a new version' process -- its typically increment pkgver and go. They are NOT going to bother to check the source website to see if the package description should be updated, even if they're told to. I'm pretty sure there is a 3 or 4 year old mailing list thread on this topic that is still relevant, it may even be pre-Aaron. (WHOAH!) KISS is about easy packaging, and there's a reason we maintain the url in the PKGBUILD (although I'm willing to bet nobody ever bothers to keep that up to date either). Maintainers may not keep their tags up to date either, but they're much less likely to change -- an audio processing app is not suddenly going to become a network tool (unless in both cases it is also emacs).
The only thing I can comment on usability-wise is the very common statement in my field: "recognition is better than recall". If you know a package you want to use but you don't remember the name, you can browse categories/tags and you will recognize the name when you see it, even if you can't remember to search for it. If we had tagging available on each package, users will use it and will find it useful. From the end user's perspective I don't think there is a drawback to tagging; its easy to ignore if you don't want to use it. Bear in mind that even Google, king of search, saw fit to add tagging (labels) to gmail. It would also cut down on those annoying "what X should I use?" threads, and Skottish has already attempted a sort of categorization of common apps on the wiki, so clearly there is a need here that needs to be filled. Does it need to be filled by official tools?
It is a useful feature. Is it a necessary feature? Does it violate kiss? Every piece of meta-data on a PKGBUILD makes it that much more work to maintain. And think about how much of a bitch its been to get licensing on every PKGBUILD? Is Eric the only person who's going to bother updating tags too?
I hesitate to offer an actual opinion, but from discussion here it appears that the pros and cons appear to nullify each other, perhaps it should just be tabled again for a while. I'm pretty sure if someone stepped up to implement it the patch would be accepted, but if nobody yet has the gumption to do it, its probably not important enough to anyone.
Dusty said: Every piece of meta-data on a PKGBUILD makes it that much more work to maintain.
True, what portion of packages actually have a changelog?
And this comment almost swayed me against this. But, that should not be a factor in a decision made here. The question should be: is there a want/need for this in package management with pacman? Whether or not Arch makes use of such a feature is beside the point.
So I say someone should implement this if they really want to...
It's not included in the PKGBUILD but in an sperate database. This database is structured like this:
core
name - categories,..,..,..
name - categories,..,..,..
extra
name - categories,..,..,..
name - categories,..,..,..
community
name - categories,..,..,..
name - categories,..,..,..
local (for local apps,which categories distinguish from former descriptions..)
name - categories,..,..,..
name - categories,..,..,..
The databases for core, extra and community are maintained by the community via a webinterface. The entries and additions are reviewed to prevent abuse, before comming into the final db. Categories are restricted to a maximum number of categories an application can be assigned to (no app should have 1000 categories) and exists at all on a community decision.
Pacman has optional support for it. If the database is installed fine, if it's not no problem too.
The local database will have preference over the others, if a user builds a package hisown with a special feature (-nox,-x) which makes it fit into another categorie, he can change this by adding a local entry to the database.
At the moment we have 4488 packages in core, extra and community together. If the community wants this feature it is possible to do and categories the applications. Once categorized there is just little work to do to maintain it. Most packages will not change categories (mplayer will ever be in video ,eg..) for a long time. We can track changes by bug-reports, make them easily fixable via the entry/addition webinterface.
Pros:
* no changes in PKGBUILDs needed
* optinal feature, if you don't like it, don't use it
* most work is done by the community
Cons:
* (very very little) overhead in pacman for people who don't use this feature when pacman checks for database-presence
Implementation needs:
* small patches to pacman to support categories
* webinterface to submit categories to applications (and a place to host this)
* script which exports categories from webinterface-db and creates a database for pacman
Maybe we could start with the webinterface and see if the community likes the idea by adding categories and first if we have them, implement it in pacman? If someone can provide a little webspace I could implement this in Rails? What do you think?
http://bbs.archlinux.org/viewtopic.php?id=44933
Details in duplicate
FS#29479It this going to added or is in the 'Not sure really'??
probably add them to the web interface (official repo search arch web) first and see if are used correctly can by used as an idicator if really are to going to by used and not ignored (like pacman -Qc)
but what about deffining first a fixed categories??
This bug is about adding basic functionality to libalpm and makepkg that will be useful for graphical frontends.
the black/witelist could behave the basic and most generic thing.
Polish it could be done throw the time, there is no hurry for that, and even a black/withelist added to pacman could be edited by the upstream of pacman for non arch-distros