-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better way to manage extension priority for multiple types #20
Comments
I am debating between two different APIs here:
I'm putting this out here in case anyone has opinions on it. |
My $.02: Prioritize at the source level, and be explicit about which source trumps which. E.g. I'd suggest:
One thing to consider is what happens if one of these datasets (e.g. Apache) is updated to cause a conflict. The priority here would ideally reflect the willingness of organizations to bring there dataset in line with what other orgs are doing. |
@broofa yes, this is what I'm doing, but in order to provide this organized list for extension -> mime lookup (we already do this for mime -> data), we will have to actually provide the list in some way. Right now the modules on top of this are simply iterating over the db and just building the list, but of course that means their list is going to be organized alphabetically by mime type, rather than order by source preference (this is an issue specific to extension -> mime lookups). What I'm currently contemplating is what the API of |
@dougwilson you mean something like // require() style
import types from 'mime-db/types.json'
import extensions from 'mime-db/extensions.json'
// all at once
import { types, extensions } from 'mime-db'
// by default
import mime from 'mime-db'
mime.extensions
mime.types |
i don't mind breaking changes :D |
I'm confused about what the issue is here.
You mean, if the same extension appears in multiple type definitions? Is that allowed??? In node-mime I disallowed that. (i.e. node.types was not allowed to conflict with mime.types.) I would suggest you do the same here. That's part of the reason for having an explicit priority for the sources, so you can resolve such conflicts as part of building Assuming the above, then I don't believe shipping an If your concern is that secondary modules may be building their own extensions map in inconsistent ways, then codify how it should be done as a separate module ( One reason for this is that redundant data in datasets is presumably an anti-pattern. As long as the extension map can be built dynamically from what's currently in db.json, then I would think encouraging dependent module authors to do so would be a good thing. ... or am I missing something. |
Correct.
It sure is, even within the IANA database itself (which is what we're supposed to be mimicking here). For example, both http://www.iana.org/assignments/media-types/image/vnd.dvb.subtitle and http://www.iana.org/assignments/media-types/text/vnd.dvb.subtitle list
So, there is a fundamental disconnect between this module and some of the things that depend on it: this module provides a mime -> data mapping, but I know module are trying to build an extension -> mime mapping out of this data, but it doesn't work that great, since there is no fundamental reason multiple mime types cannot be mapped to the same file extension. Just think legacy reasons: at one time,
Yes, this is try, but you cannot take this list and reverse it into an extension -> mime list; it's a pure mime -> extension list.
So, if you want, you can sort of do this by simply looking at the As far as a separate module, even doing it based of "source" is not good enough, because that's the source of the mime type, not the source of the mime -> extension mapping. The only way to do it as a separate module correctly is to duplicate this entire module and maintain two things doing the same basic task. The decision needs to be made directly when pulling down from IANA, Apache, etc.; waiting until after
So, TL;DR what I'm saying is that it's impossible to build a correct extension -> mime mapping from the current |
Ah, I see. Thanks for clarifying.
I see Regardless, the more I think about this, the more I suspect priority is going to be a matter of preference. Someone running on nginx may want those mappings to take precedence over Apache. And do you give IANA types precedence over custom.json types? Hard to say. Fundamentally, creating Aside: Following up on my idea of a separate module, what about an API as follows for allowing clients to specify priority?
|
It still does not work, because Scenario 1{
"mime/type": {
"source": "iana", // (because it's iana registered)
"extensions": ["foo", "bar"]
}
} So in the above, Scenario 2Also, this is becoming a problem and it's even harder to resolve, and we need a solution: {
"mime/type": {
"source": "iana", // (because it's iana registered)
"extensions": ["foo"]
},
"mime/type2": {
"source": "iana", // (because it's iana registered)
"extensions": ["foo"]
}
} Well... so there are two MIMEs that are IANA registered, but due to historical reasons, one has the officially registered extension, and the other has the traditional community extension from Apache or nginx. How can a library like |
For anyone that's following along, I'm looking for at least two votes for doing one of the following: 1. expand the current extensions to add a sourceEntries in {
"mime/type": {
"source": "iana", // this is the source for the _mime_
"extensions": [
{
"source": "iana", // this is the source for the extension
"name": "foo"
},
{
"source": "apache", // this is the source for the extension
"name": "bar"
}
]
}
} 2. add a second db to provide extension -> mime mappingsEntries in {
"foo": {
"source": "iana", // this is the source for the _extension_
"type": "mime/type"
},
"bar": {
"source": "apache", // this is the source for the _extension_
"type": "mime/type"
}
} Both of these will help libraries trying to build a proper extension -> mime mapping; neither of these solutions allow clients to specify priority, since they would loose information from |
either one is fine for me. how do you handle "default" mime types, though? just the first extension? |
I'm not sure what this means. Do you mean what is the "default mime type for a given extension"? for num 2, it's just a straight map. for num 1, it's out of the scope of this library, like it is today in v1. |
oh fuck. what i want to know is, "what is the default extension for a mime type?" |
Gotcha. So that is here already: it is |
okay cool. that's all i'm worried about, so i would opt for num 1 unless num 2 handles that. |
Both provide type -> default extension; in fact, num 2 doesn't touch |
Is it appropriate for mime-db to make decisions about conflicts and ambiguity in the dataset? This is the question I'm wrestling with right now. If the answer is, "no", then wouldn't it make sense for the [
{
"source": "iana",
"types": {
"text/foo": {
"extensions": [...],
"compressible": true
}, // etc... other types from IANA
}
},
{
"source": "apache",
"types": {
// etc... other types from Apache
}
},
// etc ... data from other sources
] There are a couple of advantages to this:
If the answer is, "yes", then ... well... I'm not sure what to do. You can codify how sources should be prioritized ("IANA trumps Apache trumps nginx trumps custom"?) in the build script, but 1. that doesn't solve the problem of inconsistencies w/in a particular source and 2. consumers of You can make it configurable by downstream modules, but I'm having a hard time convincing myself that's what is needed. The three of us are probably the only people who really care about that debate at the moment. Everyone else probably feels more like, "just fix the damn problem and tell us how it should work." (If/when people have issues with what you decide, maybe they just hack a workaround into their project as needed?) Or you can make a decision independently for each type/extension. But how/where are such decisions recorded? The fact that "mime/type2 > mime/type2" has to be recorded somewhere... essentially become part of the custom dataset being maintained in this project. The biggest downside to this is that this increases the support overhead needed. You end up dealing with everyone's "this type/extension isn't what I expect!" issues... which is the main reason node-mime has an API for enhancing the mapping information per-project. (Sorry, I know this doesn't narrow the problem down any, which probably isn't helpful... I'm just regurgitating some thoughts.) |
I appreciate the feedback. We definitely cannot maintain a manual resolution map, unless someone is going to volunteer to go through all 1.7k entries and create this map and be available once a week to resolve issues from doing pulls. The intent is that everything from remote sources requires no manual intervention. |
FWIW, https://github.com/broofa/mime-score is now a thing. It's my best attempt at the logic needed to resolve this issue. It prioritizes by (in decreasing priority) RFC "facet", source, type, and, lastly, string length. (The string length is of debatable merit, but very rarely comes into play) Would it make sense to add this as a [Note: As I mentioned in the |
We need a better way to manage extension priority for multiple types (i.e. provide an extension -> mime mapping).
The reason we need this is because as we source from more places, you cannot just build this mapping in other libraries by iterating over the types and just accumulating the extensions in a map, as they may not be in the most optimal order.
The text was updated successfully, but these errors were encountered: