FS#7132 - Package meta info: 'category'

Attached to Project: Pacman
Opened by Aaron Griffin (phrakture) - Friday, 11 May 2007, 23:26 GMT
Task Type Feature Request
Category Backend/Core
Status Assigned
Assigned To Aaron Griffin (phrakture)
Dan McGee (toofishes)
Architecture All
Severity Low
Priority Normal
Reported Version 3.0.4
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 21
Private No

Details

I've been tossing this around in my head for a while and just wanted to write it down.

The idea is an additional package member, we'll call it "category" for now, that indicates simply the categorization of a package. It is similar to "groups" but is not the same. It's more like a tagging scheme.

Some examples:

pkgname=cups
category=(printing)

pkgname=mplayer
category=(media audio video)

(I can't think of any good ones)

What does this give us? It gives us additional info that we can use when searching (pacman -Ss --category audio foo), display information for helpful hints (community/democracy-player - media iptv torrent) and things like that. I do not foresee categories being installable like groups, but at the same times, using groups for this doesn't work *because* of the installability (pacman -Sg video seems daft).

Opinions welcome.
This task depends upon

Comment by Dan McGee (toofishes) - Friday, 11 May 2007, 23:29 GMT
There is currently a 'group' for xorg-video-drivers, calling this a category would make much more sense.
Comment by Aaron Griffin (phrakture) - Friday, 11 May 2007, 23:31 GMT
That's what reminded me of this idea - your commit.

This is a very good case too - who would ever "pacman -Sg xorg-video-drivers"? Do you have 30 video cards in your machine?

8)
Comment by Dag (rixxon) - Tuesday, 19 June 2007, 20:01 GMT
Would these categories possibly be the same as in ABS? Your description seems different, e.g. bash would be categories=(shells) rather than categories=(base). (By the way, I think categories is a better variable name than category, if it would support multiple categories.)

Personally I'd like to see pacman knowing the ABS categories, and be able to list base packages for example. This could either be yet another variable or you simply mark bash as categories=(base shells) for example.
Comment by Dan McGee (toofishes) - Tuesday, 19 June 2007, 21:56 GMT
The ABS categories idea is far too rigid anyway- no package can be in more than one category, which is a bit odd. Take a step back from thinking of directories having a one to one correlation with categories, and I think that is a better way to approach this. With that said, nearly every directory in ABS can correspond to at least one of the categories on a package.
Comment by Dag (rixxon) - Tuesday, 19 June 2007, 22:49 GMT
Mainly I just wish there was a neat way to list base packages via pacman.
Comment by Roman Kyrylych (Romashka) - Wednesday, 20 June 2007, 08:24 GMT
Is it worth to put category info into package metadata?
Maybe implementing this in webinterface only would be enought?
I think most users search for new software in internet anyway,
so there's little outcome for duplicating this functionality in pacman (and increasing the complexity).
Besides, recategorization (in case we'll want to split/merge/rename/remove some categories)
would be not easy to do when data will be stored in package instead of DB
(as this would require modifying all relevant packages).
Opinions?
Comment by Dan McGee (toofishes) - Wednesday, 20 June 2007, 13:43 GMT
Instead of reposting one comment/opinion from the thread, why don't we repost the thread itself?

http://archlinux.org/pipermail/pacman-dev/2007-June/008555.html
Comment by Dag (rixxon) - Wednesday, 20 June 2007, 13:45 GMT
I think no distribution should require web access to use all the features. I hate the fact I can only list package files on the web and not via pacman for one. Sorry but I disagree with you, Roman.
Comment by Xilon (Xilon) - Saturday, 05 January 2008, 17:47 GMT
So what's the status on this feature? It seems it has been forgotten. I would love to see this feature in pacman, it's very useful when searching, especially in GUI frontends. In fact I'd say the only purpose of a GUI would be to more easily _browse_ (not search) through categories. Doing so from the command line would be somewhat slower.

While developing AUR2 I started thinking about this again. I really hate how categories only exist on the web interface, and it's not really ideal to have such rigid categories. I think it would be much better if the package itself could define which category it belongs to, since I often have no idea which category to select in the AUR package upload form. Also having an array in the PKGBUILD would allow us to define multiple categories and in turn describing the package better.
Comment by Ben Dibbens (ibendiben) - Thursday, 10 January 2008, 14:25 GMT
It would be awesome if the pacman database contained package categories. -->%CATEGORY%
If I would want to know which multimedia software was available I now have to search for say: media,music,video,player,burn,encode... and whatever words such package-description could contain. I am making a dialog interface for pacman and the category browsing like Xilon describes would make it a lot nicer.
Comment by Aaron Griffin (phrakture) - Wednesday, 09 April 2008, 22:11 GMT
This is increasingly important with our new repo layout that does NOT contain category info in the directory structure.

If we can get this in pacman soon-ish, that means the SVN move will be near flawless.
Comment by Gavin Bisesi (Daenyth) - Wednesday, 12 November 2008, 17:19 GMT
I know next to no C and have no experience with ALPM, but I'll try to learn and hack around.
http://github.com/Daenyth/pacman-tags/commits/tags
Comment by Allan McRae (Allan) - Thursday, 13 November 2008, 13:31 GMT
Didn't Ronald (pressh) start a patch for this on the mailing list a few months back?
Comment by Gavin Bisesi (Daenyth) - Thursday, 13 November 2008, 17:24 GMT
Possibly, I don't usually follow the pacman ML, I'll look at it.
Comment by Gavin Bisesi (Daenyth) - Wednesday, 19 November 2008, 13:03 GMT Comment by Dieter Plaetinck (Dieter_be) - Sunday, 15 March 2009, 19:27 GMT
so, the categories behave like tags? one category can belong to many packages, one package can have many categories.
I would just call them tags then :)

I have the feeling we will need to carefully orchestrate the usage of these tags. (eg to not have people using printers and others printing), maybe have an official list of recognized tags? and maybe even have namcap complain about an unrecognized tag?
Comment by Gavin Bisesi (Daenyth) - Sunday, 15 March 2009, 20:01 GMT
I agree... I think adding an "official" list to namcap is a good idea, but apart from that, don't do any enforcing. We can make a basic list and as we find more that we need, add them to namcap
Comment by Loui Chang (louipc) - Sunday, 15 March 2009, 20:05 GMT
I don't think I really see the value of tags any more. I think it would be
much more appropriate and effective to write a good, complete description.

Adding tags isn't worth the added bloat. Just write a good description and
people will find the package. No extra code required.
Comment by Dieter Plaetinck (Dieter_be) - Sunday, 15 March 2009, 20:15 GMT
If searching for packages would be the only use case, I would agree with Loui. But this can also be used for other stuff, such as "show me all packages that contain a printer driver". If using only descriptions, you will also see for example a package with a description "backend for printer drivers". Otoh searching for packages is always a manual process, so maybe it doesn't matter that much..
Comment by Loui Chang (louipc) - Sunday, 15 March 2009, 20:22 GMT
What use case for this feature is there other than searching?
Comment by Gavin Bisesi (Daenyth) - Sunday, 15 March 2009, 20:41 GMT
I think the main advantage is browsing more than searching -- you know the type of thing you want, but you don't know any specific software.
Comment by Aaron Griffin (phrakture) - Monday, 16 March 2009, 18:02 GMT
Re: write a good description rather than use tags.

I think this is silly and will produce the _opposite_ of what you think it will. People will struggle to cram words into their descriptions to make it searchable.

"An audio / video player that support streaming" is awfully generic, catches the buzzwords, and really tells me nothing about the app.

Additionally, parsing natural language for ANY of our tools is going to be ridiculous - the point of meta-info is to be machine readable. Proposing that description fits meta info such as this, is like proposing pkgver and pkgrel are useless because we could just use "pkgname=foobar-1.2-3"
Comment by Loui Chang (louipc) - Monday, 16 March 2009, 21:10 GMT
> I think this is silly and will produce the _opposite_ of what you
> think it will. People will struggle to cram words into their
> descriptions to make it searchable.

> "An audio / video player that support streaming" is awfully generic,
> catches the buzzwords, and really tells me nothing about the app.

That awfully generic description isn't sufficient then, it should be
elaborated upon. Maybe this could be a case for an optional long
description field.

> Additionally, parsing natural language for ANY of our tools is going
> to be ridiculous - the point of meta-info is to be machine readable.
> Proposing that description fits meta info such as this, is like
> proposing pkgver and pkgrel are useless because we could just use
> "pkgname=foobar-1.2-3"

pkgver and pkgrel aren't exactly useless because they aid in a machine
operation: that is dependency and version tracking.

Meta info like tags and descriptions are for human functions: searching
and browsing. Doesn't package search already work via pkgname and
pkgdesc? It seems to have been pretty sufficient so far.

I'm not proposing that description is for machine use. It's for humans
to search or browse through to find packages that fit their criteria.
Having tags and allowing people to search/grep/browse through them isn't
really that different than having those terms in the description. If a
package's description doesn't contain any relevant terms, then it can't
be considered a proper description.

The problem is that adding tags adds a new infrastructure for
little gain. Then you've got to have to standardise tags, or just have
multiple similar tags: games.devel devel.games. The issues that exist
with descriptions will still exist, and solutions may be even more
contorted than ever. Tags would only be useful to reinforce a description
when it comes to alternate spellings: color vs colour.

You have to really think about the purpose of tagging. The machine
doesn't care if a package is an audio player or a text editor. All it
cares about is its name, its dependencies and its version.

Recommending more complete descriptions and maybe more powerful search
methods is the better alternative to tags or categories.
Comment by Gavin Bisesi (Daenyth) - Monday, 16 March 2009, 21:14 GMT
I agree with louipc in that pkgdesc is mostly sufficient, but in either case we'd need new infrastructure. On one hand, adding tags, and on the other, adding a "long" description option. I think tags suit the purpose better than a $longdesc(?) does
Comment by Gavin Bisesi (Daenyth) - Monday, 16 March 2009, 21:15 GMT
In addition, I think it's pointless duplication; it seems silly to me to have two separate and disconnected descriptions
Comment by Aaron Griffin (phrakture) - Monday, 16 March 2009, 22:00 GMT
Ok, so if we want any tool to categorize packages in any way, what do we do? If you can convince me of a way that doesn't take man-months to implement (natural language parser, thesaurus lookups, etc etc) that one could correctly categorize a package, then I'll concede that a better description is better.

I do NOT agree in the least bit that better descriptions are going to do anything of the sort. If text was all that was required to categorize something, then why is it even blog posts are typically categorized with tags? Hell, why does anything use tags if this were the case?
Comment by Aaron Griffin (phrakture) - Monday, 16 March 2009, 22:04 GMT
> You have to really think about the purpose of tagging. The machine
> doesn't care if a package is an audio player or a text editor. All it
> cares about is its name, its dependencies and its version.

We already have umpteen pieces of meta info "the machine" doesn't care about (since when do we care about what the machine does and not what the user does with said machine?). If we really wanted to be purist about this, we'd stick with the original CRUX format.
Comment by Loui Chang (louipc) - Monday, 16 March 2009, 23:00 GMT
Sorry if I was unclear. I mean to dispell the whole need to categorize
packages in the first place. So no parsing or thesaurus or category system.
As far as I can tell, categories are really only meant to help in searching,
and that can be improved by writing better descriptions.

I'm not convinced implementing categories would significantly improve usability.
I don't think that tags offer any greater significant usefulness for searching
over the blog text itself, or a good text description of an object like an image.
Maybe it could be useful for poems with lots of abstract imagery, but we're
dealing with packages here. Descriptions shouldn't be vague or abstract.

Tags for browsing blogs on the other hand is a lot more relevant. People read
random blogs as entertainment sometimes. I don't think people install random
packages for entertainment. So, lets keep the tags for the bloggers.
Comment by Xilon (Xilon) - Tuesday, 17 March 2009, 02:17 GMT
> Tags for browsing blogs on the other hand is a lot more relevant. People read
> random blogs as entertainment sometimes. I don't think people install random
> packages for entertainment. So, lets keep the tags for the bloggers.

People do install random packages though. Ok, they're not random, but the users also don't know exactly what they're getting. This is the "I want a to do X, Y and Z, I want to find all programs that do this" use case. I think this is the main use for tags. I already mentioned this on the aur-dev ML (there's a similar thread). A description, even a good one, won't give the same accuracy and refinement that tags would. Even tough a description has a keyword, it doesn't mean that the keyword is associated with that package (for instance, in the rare case of a negative, e.g., "without foobar support"). Tags allow for easy "indexing" of data. A description would often have to be specifically crafted to include the correct keywords, so that the software could be found. Tags can contain arbitrary text, with no connection to any other tags. Words in a sentence can not.

This is somewhat an insignificant point, but filtering by tags is much more efficient than filtering by words in descriptions. It's another reason why blogs use tags, it's just more efficient to search.
Comment by Loui Chang (louipc) - Tuesday, 17 March 2009, 08:03 GMT
How are those more efficient tags implemented?
I have a feeling the performance would be the same, or very similar,
at least when it comes to the pacman DB format.
Comment by Xilon (Xilon) - Tuesday, 17 March 2009, 08:24 GMT
It's more about being concise. Like I said it's an insignificant point. It's not that much more efficient. Searching a couple strings (assumed to total less than 80 chars) as opposed to searching an ~80 character string (in the blog example the difference is much larger).
Comment by Aaron Griffin (phrakture) - Tuesday, 17 March 2009, 15:21 GMT
See, Xilon had it close to my original point. Categorization is a way to help end users. The current use case for, say, finding a new audio player is (perhaps) as follows:

pacman -Ss audio
pacman -Ss music
pacman -Ss mp3
firefox http://google.com/?q=linux+audio+players

I'm proposing an easier way to do this, via categorization. Yes, this does mean we would need a list of "suggested categories", but that's minor and not a necessity.

I'm proposing converting an N-step process into a 1-step process, making it easier on the end user
Comment by Loui Chang (louipc) - Tuesday, 17 March 2009, 18:46 GMT
pacman -Ss audio
pacman -Ss music
pacman -Ss mp3

That could be helped by supporting OR searches.
Comment by Loui Chang (louipc) - Tuesday, 17 March 2009, 18:49 GMT
Erk. Looks like it's already supported.
Comment by Aaron Griffin (phrakture) - Tuesday, 17 March 2009, 18:54 GMT
I don't see how that as an OR search is any less fugly
Comment by Loui Chang (louipc) - Tuesday, 17 March 2009, 19:14 GMT
If AND searches are default with tags, OR searching won't be any less fugly.
If OR is the default, then AND searching with tags may become fugly.

You could provide a new flag:
pacman -Sso audio music mp3

Which looks a bit nicer than:
pacman -Ss 'audio|music|mp3'
Comment by Aaron Griffin (phrakture) - Tuesday, 17 March 2009, 19:18 GMT
I think we're getting hung up on semantics here, and details of searching where that is NOT the intent of this report, and this is getting retarded.

Searching efficiency is a side effect of proper categorization, not the reason for it.

I posit that categorization of packages is a good idea, you think it is not. Let's just leave it at that
Comment by Loui Chang (louipc) - Tuesday, 17 March 2009, 19:21 GMT
Sorry. I was just trying to demonstrate that I don't think tags
or categorization will offer much benefit.
Comment by Dan McGee (toofishes) - Wednesday, 18 March 2009, 01:56 GMT
Can I get a blue bikeshed?

This report has gotten completely off topic from the original request. Although Aaron said "opinions welcome", this back and forth is a bit out of control.

I have no objections to categories/tags/whatever, especially since this is a package manager deal, and it is really up to the distribution (e.g. Arch) to determine how to use them. This bug report should not be at all about policy, and rather whether categories make sense from a package manager perspective.
Comment by Dusty Phillips (Dusty) - Wednesday, 18 March 2009, 12:52 GMT
I agree with Dan, but Aaron explicitly asked me to comment on this from a usability standpoint.

Long descriptions are definitely a bad idea because packagers WILL NOT maintain them. Think about the 'release a new version' process -- its typically increment pkgver and go. They are NOT going to bother to check the source website to see if the package description should be updated, even if they're told to. I'm pretty sure there is a 3 or 4 year old mailing list thread on this topic that is still relevant, it may even be pre-Aaron. (WHOAH!) KISS is about easy packaging, and there's a reason we maintain the url in the PKGBUILD (although I'm willing to bet nobody ever bothers to keep that up to date either). Maintainers may not keep their tags up to date either, but they're much less likely to change -- an audio processing app is not suddenly going to become a network tool (unless in both cases it is also emacs).

The only thing I can comment on usability-wise is the very common statement in my field: "recognition is better than recall". If you know a package you want to use but you don't remember the name, you can browse categories/tags and you will recognize the name when you see it, even if you can't remember to search for it. If we had tagging available on each package, users will use it and will find it useful. From the end user's perspective I don't think there is a drawback to tagging; its easy to ignore if you don't want to use it. Bear in mind that even Google, king of search, saw fit to add tagging (labels) to gmail. It would also cut down on those annoying "what X should I use?" threads, and Skottish has already attempted a sort of categorization of common apps on the wiki, so clearly there is a need here that needs to be filled. Does it need to be filled by official tools?

It is a useful feature. Is it a necessary feature? Does it violate kiss? Every piece of meta-data on a PKGBUILD makes it that much more work to maintain. And think about how much of a bitch its been to get licensing on every PKGBUILD? Is Eric the only person who's going to bother updating tags too?

I hesitate to offer an actual opinion, but from discussion here it appears that the pros and cons appear to nullify each other, perhaps it should just be tabled again for a while. I'm pretty sure if someone stepped up to implement it the patch would be accepted, but if nobody yet has the gumption to do it, its probably not important enough to anyone.
Comment by Allan McRae (Allan) - Wednesday, 20 May 2009, 12:29 GMT
I haven't commented on this idea because of all the bikesheding going on. So just having read all the comments here is my opinion:

Dusty said: Every piece of meta-data on a PKGBUILD makes it that much more work to maintain.
True, what portion of packages actually have a changelog?

And this comment almost swayed me against this. But, that should not be a factor in a decision made here. The question should be: is there a want/need for this in package management with pacman? Whether or not Arch makes use of such a feature is beside the point.

So I say someone should implement this if they really want to...
Comment by Rorschach (Rorschach) - Saturday, 20 February 2010, 13:43 GMT
This is an interesting feature, especially because this could be a place where we could mark if software is free/unfree somthing what arch doesn't have at the moment. The reason against implementing this in the PKGBUILD is a good point too but what about this:

It's not included in the PKGBUILD but in an sperate database. This database is structured like this:

core
name - categories,..,..,..
name - categories,..,..,..

extra
name - categories,..,..,..
name - categories,..,..,..

community
name - categories,..,..,..
name - categories,..,..,..

local (for local apps,which categories distinguish from former descriptions..)
name - categories,..,..,..
name - categories,..,..,..


The databases for core, extra and community are maintained by the community via a webinterface. The entries and additions are reviewed to prevent abuse, before comming into the final db. Categories are restricted to a maximum number of categories an application can be assigned to (no app should have 1000 categories) and exists at all on a community decision.

Pacman has optional support for it. If the database is installed fine, if it's not no problem too.

The local database will have preference over the others, if a user builds a package hisown with a special feature (-nox,-x) which makes it fit into another categorie, he can change this by adding a local entry to the database.

At the moment we have 4488 packages in core, extra and community together. If the community wants this feature it is possible to do and categories the applications. Once categorized there is just little work to do to maintain it. Most packages will not change categories (mplayer will ever be in video ,eg..) for a long time. We can track changes by bug-reports, make them easily fixable via the entry/addition webinterface.



Pros:
* no changes in PKGBUILDs needed
* optinal feature, if you don't like it, don't use it
* most work is done by the community

Cons:
* (very very little) overhead in pacman for people who don't use this feature when pacman checks for database-presence




Implementation needs:
* small patches to pacman to support categories
* webinterface to submit categories to applications (and a place to host this)
* script which exports categories from webinterface-db and creates a database for pacman


Maybe we could start with the webinterface and see if the community likes the idea by adding categories and first if we have them, implement it in pacman? If someone can provide a little webspace I could implement this in Rails? What do you think?




Comment by Xavier (shining) - Saturday, 20 February 2010, 14:31 GMT
That webinterface thing reminded me of pacnet :
http://bbs.archlinux.org/viewtopic.php?id=44933
Comment by Allan McRae (Allan) - Sunday, 22 April 2012, 12:11 GMT Comment by Pablo Lezaeta (Jristz) - Wednesday, 14 November 2012, 18:02 GMT
I ask, the actual (and new) AUR 2.0 remain the categories, and as far I see in 75% are acurated

It this going to added or is in the 'Not sure really'??

probably add them to the web interface (official repo search arch web) first and see if are used correctly can by used as an idicator if really are to going to by used and not ignored (like pacman -Qc)
Comment by Ashley Whetter (AWhetter) - Friday, 12 July 2013, 18:12 GMT
I think this is a bad idea. Partly because I don't think it's useful, but mostly because there are categories on the AUR already but they just don't work. They're too general and as a result too large to be useful. This might be because users can't create their own categories but even if that was the case I think some categories would end up being too specific and just as useless.
Comment by Allan McRae (Allan) - Friday, 12 July 2013, 21:42 GMT
And despite that, many distributions already use categories successfully... And they are a key part of many graphical package managers. I believe Chakra carries a patch to implement this.
Comment by Pablo Lezaeta (Jristz) - Friday, 12 July 2013, 22:21 GMT
I that case is best first stablished a fixed categories and make sure that they are OK and neither soo unespesiffic or soo specific

but what about deffining first a fixed categories??
Comment by Allan McRae (Allan) - Friday, 12 July 2013, 22:32 GMT
Who care about that. It is for a distribution to decide what categories are to be used. And as a hint, Arch will not be using them.

This bug is about adding basic functionality to libalpm and makepkg that will be useful for graphical frontends.
Comment by Pablo Lezaeta (Jristz) - Monday, 05 January 2015, 03:21 GMT
Why not use a predefined black/whitelist and warn if a package have a categorie not listed in the list.
the black/witelist could behave the basic and most generic thing.

Polish it could be done throw the time, there is no hurry for that, and even a black/withelist added to pacman could be edited by the upstream of pacman for non arch-distros
Comment by Allan McRae (Allan) - Monday, 05 January 2015, 03:40 GMT
We don't care what the categories are. That is a detail for distributions to figure out. This bug is for tracking the implementation only.

Loading...