ABOUT THE MODEL
November 29, 2007 draft
1. One article I read about the semantic web defined "resource" as "what you get when you click on a URL." My use of the Web leads me to assume that when you click on a URL you always get only one thing; in other words, that a URL is a one-to-one link. What we need for our data are one-to-many links. For example, you need a URL that stands for a work, and when you click on it, you need to see all the available expressions of that work (e.g. the translations into various languages, the annotated editions, the illustrated editions, etc.). When you click on an expression, you need to see all the available manifestations of that expression (e.g. the online version, the text version, etc.) My friend Sara has argued that many one-to-one links can have the appearance of a one-to-many link (did I get that right, Sara?) Do you agree? Should I stop worrying about this? Should I be careful to use a particular kind of representation for a one-to-many link in RDF?
"Serialisation of binary
associations is simple, at least in English and similar languages, but
automatic serialisation of associations with three or more role players becomes
difficult if not impossible. That is another reason why binary associations are
so widely used"--The Topic Maps Handbook by H. Holger Rath, on the web at
www.empolis.com
"Well, I think we need to
define that 'one' thing. You can well define it as one list of things, which
usually is what happens, or it could be one relationship between many, and so on.
The URL is there as an addressing mechanism for this to happen, and as you've
pointed out it's one of those big issues in the RDF world. In the Topic Maps
world we don't have this problem because we don't use URL as *the* semantic of
the relationships we build. For example, if RDF points to 'http://some.example/'
as RDF:about, do they mean the RDF semantics are about *that* website, or does
that website *represent* some other notion (like a topic), or is the website or
the organization it belongs to somehow mixed up in the aboutness? These are not
huge unsolvable problems, but enough to cause ambiguity and confusion, which of
course is fatal in computer models
A URL in itself can *mean* many
things, and many-to-many, one-to-many or many-to-one are all fine examples of a
URL. Let's try a few examples ;
one :
Monteverdi, composer - http://example.com/composer/monteverdi
one-to-many :
Monteverdi's 1610 vespers - http://example.com/work/123456789
one-to-one :
A particular recording of the 1610 vespers - http://example.com/record/2345
many-to-one :
recordings of "Dixit Dominus" -
http://example.com/work/123456789/2/recordings
many-to-many :
1610 vesper psalms recorded : http://example.com/work/123456789/type:p/recordings
Some might argue that these URL
aren't unambiguous, which is true ; they provide some semantics to exemplify
that their use. These URLs can equally all be expressed as
http://example.com/[some huge number here]. All of them.
The point here is that a URL can represent
anything you want. There are no restrictions on them, no one-to-one,
many-to-many or otherwise. The only limitation on URLs are their resolvability
at the hosting systems. If the server you're pointing to in http://example.com/work/123456789/2/recordings
have no idea what that means, then that might be a problem. The funny part here
is that in RDF there is no requirement for URLs to resolve to anything ; you
can create fake URLs all the time, as much as you like. RDF state that resolvability
is a good thing, though.
In the Topic Maps world we use
what's known as Public Subject Indicators (PSI), which are URLs that are 1)
publicly published with 2) a guarantee of maintained persistence, and 3)
indicators of a subject (meaning; to get to the semantics of that URL, resolve
the URL to read what it means)"--September 11, 2007 email from Alexander
Johannesen.
2. We need to be able to identify a work for human beings (rather than machines) using a combination of both the author and the title (when there is an author), but we also need to treat the author as an entity (in RDF terms, a class?) so that we can create a record for it that contains all of its variant names, biographical information and so forth. I think what I am saying is that we need to treat an author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. Is it possible to model this? Or is it possible that RDF (and other data modelling) works against effective use of bibliographic data because of an absolute requirement that something either be a class or a property, but never both?
OWL Web Ontology Language Guide
(http://www.w3.org/TR/owl-guide/) p. 7 (Section 1.1 under OWL DL): "A class can not also be an individual
or property." "A topic could be class and instance at the same
time"--The Topic Maps Handbook by H. Holger Rath, on the web at
www.empolis.com
"Sure. I guess this part points
to something that splits the RDF and Topic Maps worlds apart. With RDF, you
have an ever-growing tree-structure with no cross-pollination without what's
known as anonymous nodes. So, every thing expressed, every relationship, is created
as a child-node of the thing it belongs to. Example ;
Alex
is_a : person
has_a : dog
with_name : Oscar
of_type : english_cocker-spaniel
married_to : julie
Let's take just the one part of this
out, and express it as a falling triplet ;
"Alex" : http://shelter.nu/me.html
"married_to" : http://some.ontology/human_relationships#marriage
"Julie" :
http://shelter.nu/my_family.html#julie
For each thing that's attached to me
I must make a triplet, and for every thing within a triplet more triplets to
explain them. Now, the thing here is ; Where do you express these things? Where
do you express that Julie is a person, and where do you define that URL?
With Topic Maps, all things must
first be a topic, so the same tree would be (and remember the Public Subject
Identifiers talked about earlier, PSI's) ;
topic : "Alex", PSI=http://shelter.nu/me.html
topic : "married_to",
PSI=http://some.ontology/human_relationships#marriage
topic : "Julie", PSI=http://shelter.nu/my_family.html#julie
These things are expressed as
things. Making the links between things are done *outside* their context,
treating relationships as topics themselves ;
association of type "married_to" : between "Alex"
and "Julie"
('association' is simply a topic of type 'association' :)
This association itself can have a
URL (unlike RDF relationships), a PSI, a name, or more relationships and
properties attached (basically, RDF's ability to reify statements is handled
ambiguously). The reason I'm talking about these RDF and Topic Maps differences
is to point out this problem of where to define a thing. Where do you define a
thing in RDF? There is the notion in the RDF world that you have a separate document
with RDF triplets to define these things which you link in when you resolve or
infer over your triplets, but in RDF this is implicit while in Topic Maps they
are explicit. this is one of those things that's confusing about RDF.
But back to your question ;
Just talk about things, there and
then. If they resolve to the right thing, your inference engine will thank you
and be able to work with it. The persistence of these things lie with the URL
you choose for your thing. If it's an author, I can do ;
"Frank Herbert" : http://authors.psi.org/f_herbert(1920-)
And do ;
"Dune"
is_a book : http://things.psi.org/book
has_author : http://authors.psi.org/f_herbert(1920-)
It should be up to smart SemWeb
systems to pick these URLs up and use them for persistence. Smart systems
should also be able to link http://authors.psi.org/f_herbert(1920-) and http://authors.psi.org/f_herbert
together as the same author, but this unambiguity lies at the heart of the
semWeb problem.
RDF comes in many parts, and as basic RDF itself there is no such constraint (everything can be anything). You can build those constraints with RDFS. You can use more specific language within RDF to define things, or more complex statements with using the three levels of OWL. (At this point we've included five levels of ontology redirection, and I'm sure I've lost you along the way ... :)"--September 11, 2007 email from Alexander Johannesen.
3. Would I pay a price for defining expression as a subclass of work, manifestation as a subclass of expression, item as a subclass of manifestation, etc.?
"Always ; the better you define anything, the more chance that definition will break in some other context. I know this sounds terribly gloomy, but it's just a fact of life that when I say "that is a zebra" (which most people will be able to identify just fine) some weird zoologist will come along and ask "what sort of zebra?" (yes, there's more than one, and the differences between them cause great problems within zoological taxonomies and is mostly unresolved)"--September 11, 2007 email from Alexander Johannesen.
The RDF primer in section 5.1 says that any instance of a subclass is also an instance of the class. I think it is correct to say that any item is also a manifestation, expression and work. Am I misinterpreting anything here?
"No. But I do fear that the
taxomatic constraints of class-instance hierarchies are somewhat flawed the
further away from the original definition one goes. Some talk about ontological
distance (how many jumps from where you are now to the thing that originated
the classification path) needing to be weakened to the proportion of each sub-class's
weight (which, funnily, is a recursive argument and can't be solved, certainly
not by humans). It basically mean that you need to have full knowledge of what
you're about to classify before you can classify the thing. Doesn't work, does
it? :)
RDF and Topic Maps are at best
"close enough" for the things we're trying to do, although my
personal view is that all of this modeling business is followed as it was the
truth because it looks like the truth. I'm no longer sure we're doing what's
right, but I digress.
Back to your question here ; a thing
that's an instance of a thing is by inheritance an instance of all parent
classes. For example, if we use this silly classification tree ;
thing
living thing
walking living thing
human
What's meant is that an instance of
human is at the same time a thing, a living thing *and* a walking living thing.
Things are whatever is up the tree, and this lies at the root of taxonomical
classification theory. For a thing to be anything, it must be what has passed
before it. That's the tree of knowledge for instances of things, but I'm sure I'm
preaching to the choir on this one. :)
In your problem, the thing that precedes it up the tree isn't really an instance of that thing, so here I wouldn't use instances of things, but properties. But then again, neither of these models are satisfactory in my eyes. This is where I suspect RDF:OWL comes to the rescue where you can make further qualifications to the instances to relieve them from parent class inheritance, but I'd have to read up on it (been a while)"--September 11, 2007 email from Alexander Johannesen.
I note that there has been at least one attempt to express FRBR in RDF and that they have not defined expression as a subclass of work. Instead, they have defined work as "disjoint with" expression, manifestation and item. Yet the FRBR primer (section 5.5), following OWL, says that when two classes are disjoint "no resource is an instance of both classes." If an item is also a manifestation, expression and work, the FRBR-RDF people are wrong to do this, aren't they?
"Ah, yes, I should have read a bit more of your email to find what I was trying to say with the previous section. :) I think it would be worthwhile to ask them why they cose to do so ; I susect there's limitations between the taxonomical hierarchy and how OWL operates on atomic entities that share properties but not class instantiations. (Hm, that sounded Greek to me. Let me know if this comes across as such.)"--September 11, 2007 email from Alexander Johannesen.
Gordon Dunsire (email to RDA list)
refers to FRBR having a "no manifestation without expression rule."
DECISION: With some trepidation, I decided that since
an item is always also a manifestation, an expression and a work, these should
all be treated as subclasses of each other.
4. Related to 3, is there any value in defining a "superwork" class? A superwork is really just a work from which a large number of related works have spun off (for example, dramatizations, films, etc.).
DECISION: Superwork was not defined as a formal class in the model.
5. The FRBR-RDF people seem to have followed OWL and defined two properties for every property, each an inverse of the other. Thus, they have defined the class responsibleentity, and the two properties creator and creatorof. The creator property is defined as "an entity in some way responsible for the creation of a work," the domain is work and the range is responsibleentity. The creatorof property is defined as "a work that was in some way created by an entity," the domain is responsibleentity and the range is work. Is that correct RDF logic, and would every statement of authorship have to be duplicated, e.g. 1. this person is the author of this work and 2. this work has this author? (This may actually be related to point 1 above.)
"Good questions. I don't know.
We're entering the world of free modeling as opposed to the default taxonomical
model, and as such people are free to make up what they want. I'm positive you
can't say there's anything such as "correct" RDF logic. I'm sure that
model makes sense to whoever did it, but if it feels unnatural to others reading
it, I'd say that's an indication of someone getting it wrong. Mind you, getting
models such as this *right* is close to impossible (but not absolutely
impossible :).
As to the bi-directionality of the
statements, I'm not sure how the property is defined (optional, mandatory?) but
it shouldn't be necessary to duplicate the expressions, but then, it comes down
to the use of the ontology. There's a slight difference between ;
"Herbert"
has_written
book_1
book_2
and ;
"Book 1"
written_by
"Herbert"
"Book 2"
written_by
"Herbert"
It comes down to practicalities of
where you model *from*. Sometimes, if you model from the book level, repeating
author is a must (which is modus operandi in the cataloging world, as you know)
but you have to make a judgment call on what other parts of the author file you
should express, given your knowledge on what other things exist in the vicinity
of the thing you're expressing. This is of course quite impossible, and leads
to a lot of replication which inference engines need to sort out (and indeed
bring a lot of trouble to the semWeb world). The trick with any of this is to
have control over the URLs that are used as persistence identifiers.
I think we're seeing just how messy
RDF can get without proper RDF tools. :)"--September 11, 2007 email from
Alexander Johannesen.
DECISION: To keep the model to a reasonable size, inverse classes were not defined, for the most part; it is hoped that they can be viewed as being implicit in the model as is. Exceptions: whole-part relationships; words for music/music for words; broader/narrower term relationships between subjects.
6. OWL contains the concept of "transitivity;" could it be an attempt to deal with inheritance?
"RDF (with its sub-standards included) has a long history, particularly from the DAML and OIL days (you may have heard of DAML+OIL), where the two different ontologies were merge quite successfully (and I suspect because each of them were quite beautifully designed) to form the basis for what RDF is supposed to be doing. Unfortunately, there were too many people and too much politics in the process, so the elegancy of DAML+OIL (which, really, was adequate for most modeling and were cleanly separated) turned into the 5 level RDF thing we've got now. They probably thought basic RDF was good enough to get you started, that more complex stuff should be handled by OWL (what does "more complex" mean in ontological terms?) and that the schema should express exchange rules (again, what does this mean in ontological terms?) The truth is that when doing ontology work the thing that makes them work is fuzziness and ambiguity, but the strictness and split personality of RDF is becoming a bit of a problem. I've seen some work on trying to bring these things together in an RDF 2.0 effort, but don't know where they're up to"--September 11, 2007 email from Alexander Johannesen.
I read a book on data modelling awhile ago that stated that object-oriented models support inheritance but relational models do not. Does RDF support inheritance?
"Yeah, it's in the RDFS ;
<rdf:RDF xml:lang="en"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:ID="Author">
<rdfs:comment>The class of people are authors</rdfs:comment>
<rdfs:subClassOf
rdf:resource="http://purl.org/dc#Creator"/>
</rdfs:Class>
</rdf:RDF>
It's a sour mix of where in RDF you find what you can express, and how"--September 11, 2007 email from Alexander Johannesen.
(Do you agree with that characterization of object-oriented as opposed to relational models?)
"Sure, but I guess what they mean is that the relational model by default don't carry inheritance constraints, and that's very true, but it doesn't mean the relational model can't express them. In some ways you can define a lookup table as a form of inheritance mechanism, but it certainly isn't a constraint nor documented as such, and, if one goes down to the SQL language, isn't easy to express (but is certainly doable :)"--September 11, 2007 email from Alexander Johannesen.
I think inheritance is what we need to ensure that the cross reference from FBI to United States. Federal Bureau of Investigation will also apply when someone searches for the FBI Department of Counterterrorism. Do you agree?
"This is one of those biggies ; can we predict what context a user needs and / or want with any given search? Some times they indeed want the whole FBI as part of the search, at other times they're just interested in the counter-terrorism part. Or, when we're classifying, what parent class is that special zebra part of?"--September 11, 2007 email from Alexander Johannesen.
If so, can you help me figure out how to represent data in RDF such that inheritance is part of the model?
"Dig into the RDFS which deals
with implications within the taxonomical part of RDF. I can't provide much
specifics per se as it's been ages I've abandoned RDF as a modeling tool. I
prefer to use Topic Maps (much cleaner as an expression engine) and then export
it to RDF when need be"--September 11, 2007 email from Alexander
Johannesen.
DECISION: From my readings of RDF literature so far, I believe that by creating corporate subdivision as a subclass of corporate body and subject subdivision as a subclass of subject, the model supports hierarchy such that anything that is true of the main heading should be held to be true for the subdivision.
7. The aforementioned book on data modelling seemed to imply that models require each entity to have one and only one name. Does that mean that data modelling would preclude recognizing the fact that entities are often known by several variant names and that searchers should be successful no matter what variant name they search on? Or does the rule pertain only to the machine-readable identifier (not the human-readable identifier(s) linked to that machine-readable identifier)? Is there a "best-practices" way to model what we used to call a "see-reference" in RDF?
"No, it's just a technical
constraint of the relational model. To make persistent identifiers work, one
thing is that there's more than one per entity, but you have to resolve them
using relational paradigms such as lookup tables. For example ;
herbert,_frank
f_herbert(1920-)
both are PI's to the same author,
but in a relational model this is ;
table : "author" columns : id, name
--------------------
200 | Frank Herbert
table : "author_pi",
columns : id, pi, author_id, primary
--------------------
1 | herbert,_frank |
200 | false
2 | f_herbert(1920-) | 200
| true
It's the "author_pi" table
(lookup table) that binds the PI's together with the "author_id"
field which points to the "author" table. this binding is expressed
with SQL. (Not sure you know about SQL and relational models, so I'm writing as
if you don't. Just tell me if you do.)
The before mentioned constraint is that any id of any table must (well, should) be unique, not that PI's can't be unique. Basically, identifiers of a table should be unique, but the columns with their various properties need not"--September 11, 2007 email from Alexander Johannesen.
8. Is it possible to define a class subject and a class person and still use the class subject for a work about a person? How do you model that relationship between two classes in RDF? (The same problem occurs with works about corporate bodies and works about other works.)
"Yes. An *instance* work can be
a sub-class of any other class expressed through OWL, without that instance
inheriting all those classes, but you need to express it in the ontology. It's
a bit messy, but certainly doable.
There's two main ways of expressing
relationships ; the class-instance hierarchy (which you're referring to above),
and using OWL (or some parts of standard RDF) to just model the things free
form. Unless someone tries to validate your RDF with RDFS statements, you can
build up pretty much anything you like.
First, there's some modeling
differences between relationships between classes and between instances of
those classes, and different "rules" (some are rules, other
guidelines, some just gut feelings). We can create classes ;
animal
dog
We by default express the
taxonomical relationship between these. Constraints between these two can be ;
animal
dog
RDFS : instances cannot be
"fish"
RDFS : instances can be "furry
types"
furry types
cat
Now, this gets complicated of
course, but OWL ;
"Oscar"
is_a : dog
OWL : is_subclass_of "furry types"
... is perfectly legal ; the
instance "Oscar" is now a subclass of "furry types" because
Oscar is a particularly furry example of a dog (he really is :). I guess all of
this again points to understanding all the RDF stack in necessary to do any
serious modeling. I guess I would at this point be keen to model it a bit
simpler. In reality, you can model all of these things without the constraints,
and still get a decent model for inference models to work with. You can do ;
book_a
is_a
manifestation_x
translation_y
idea_z
and let inference engines work out
the details. Both will work in the long run, although I understand the want and
sometimes need for rigid ontologies"--September 11, 2007 email from
Alexander Johannesen.
DECISION: I defined the class Subject as having an Equivalency relationship ("Equivalent to Union of") to Work, Expression, Person, and Corporate body. Also, all subject properties are defined as having domain of Resource.
I'm getting confused about when to label examples using class names and when to label them using properties. I got most of the way through coding the examples using only properties, assuming that that is enough since the properties are defined as belonging to particular classes in the model. However, when I came to subject headings, I realized that if you use only the properties of subject headings, you cannot specify that a particular subject heading is for a geographic heading, the name of a person, etc. In those cases, I started using class names in the examples, but I suspect I have run across an underlying problem in the model here.
9. There are times when you have an expression of an expression (as when a specific edition is the basis for a new edition, or when there are two Japanese translations, etc.) or a manifestation of a manifestion (as when a reproduction is made of a specific other manifestation). If I put in subexpressions and submanifestations as subclasses does that somehow require every resource to use all of these levels even when they are not necessary? Is there another way to deal with a hierarchy that needs to be expansible and contractible in RDF?
"I would just reuse expression
and manifestation classes ; they are "sub" by inheritance. I usually
point to the difference between ;
<topic>
<topicName> some name </topicName>
</topic>
and
<topic>
<name> some name </name>
</topic>
Here, classes and instances are
described in XML, but the point is that we can reuse <name> at will, as
we certainly will define what "name" mean somewhere. In this case,
"name" is a sub-element of "topic", so the topic's name.
This is straight class-instance theory, and doesn't need a lot of explaining.
The former example needs an explanation of what "topicName" mean, and
especially if it means anything different from a normal "name".
So, the same with
"expression" and subexpression" ; we can already see that the
one is a sub of the other, so does "subexpression" express more than
just "sub"? In my eyes, the less "things" you need to
define and the fewer relationships you need to explain, the easier it will become
to express and explain them. Maybe these are too simplistic examples?"--September
11, 2007 email from Alexander Johannesen.
DECISION: Rather than define subexpression and submanifestation class levels, I defined expression-to-expression relationship properties.
10. Is there any way to define types of data in an RDF model? For example, would it be possible to define a particular data element as being transcribed (copied from the resource), as opposed to composed by the cataloger? I have discovered the "built-in datatypes" which seem to provide a way to define certain data as "date" or "integer," for example, but I haven't quite figured out whether these datatypes can be placed right into the model or whether they have to be placed in the resource records created according to the model.
"I don't think there's much
correlation between a type and a datatype. :) Datatypes are very closely linked
to the exchange format they're expressed in (so you can say that an RDF field's
value must be integers, dates, strings of certain kinds, and so forth), so I
don't think you need to worry about that. A type is a straight "is_a"
relationship in the taxonomical hierarchy, so it's enough to just add ;
some_book
has_subject
fish
whales
element_content_source
transcribed
just like ;
fish
element_content_source
copied_from_resource
is the default in your model (but
you wouldn't need to express this)"--September 11, 2007 email from
Alexander Johannesen.
Addition to question 10: Could "literal value surrogate" be used to mean a particular element is transcribed, and "non-literal value surrogate" be used to mean that a particular element is selected from a list, including lists such as LCSH (for subject headings) and the LC/NACO authority file (for name and work identifiers)?
11. I notice that the Functional Requirements for Authority Data (http://www.ifla.org/VII/d4/FRANAR-ConceptualModel-2ndReview.pdf) (brought to us by the FRBR folks) has defined the following classes and properties (entities and attributes, in their terminology):
Class: Person
Properties: dates associated with the person, title of person, other designation, gender, place of birth, place of death, country, place of residence, affiliation, address, language of person, field of activity, profession/occupation, biography/history
Class: Name
Properties: type of name (personal, corporate, family, etc.), scope of usage (e.g. fictional genres by that person that use a particular pseudonym), dates of usage, language of name, script of name, transliteration scheme of name
Class: Controlled Access Point
Properties: type of controlled access point (personal, corporate, family, etc.), status (level of establishment, e.g., provisional), designated usage (preferred or non-preferred), whether undifferentiated, language of base access point, language of cataloging, script of base access point, script of cataloging, transliteration scheme of base access point, transliteration scheme of cataloging, source of controlled access point (publication cataloged), base access point (e.g. surname and forename), addition (e.g. birth and death date)
Is that the most elegant way to model the cataloging practice of choosing a preferred form of name for every person, with cross references from all variant names?
"It
depends. It works, of course, but it does indeed sound dodgy to
have a
class instance Name in a relationship with a class instance
Person,
however that might just be a persona preference rather than a
flaw per
se.
In Topic
Maps a name is always attached to a Topic; there is no
separation
of the entities, as this is in many ways closer to human
thinking.
There might be a Topic "the name Fred", but again, that is a
Topic in
its own right, and not a separate class instance of that
name.
As to
general modeling I agree that names shouldn't be classes just
for the odd
chance the name is complex enough to warrant its own
instance,
but looking at the definition it's pretty clear that "name"
isn't a
name at all but rather a complex node used for identification
of things.
Now that's all fine and well, but doesn't sit too well with
the label
"name" nor the concept of the Controlled Access Point which
try to do
pretty much the same thing. I'd agree that there's something
ambiguous
about this setup. In fact, why not make names part of the
Controlled
Access Point? The interesting thing here is indeed the
notion of
"access", and one way to access any piece of information is
through its
label name. I'm sure there's something
funky that could
be done here"--October 8, 2007 email from Alexander Johannesen
Is it possible that FRAD chose this model because of the fact that current Anglo-American cataloging practice considers Mark Twain and Samuel Clemens (for example) to be two different authors, and considers a corporate body that has changed its name to be two different corporate bodies?
"Possibly,
but it feels counter to the way we humans look at it. You
could do
(in Topic Maps terms) ;
Topic 123
name "Samuel Clemens [Mark Twain]"
name [pseudonym] "Mark Twain"
name [original] "Samuel Clemens"
This does
what they want. Of course, if they *truly* want the modeling
flexibility
;
Topic 123
name "Samuel Clemens"
Topic 234
name "Mark Twain"
Association
type "pseudonym"
- person 123
- pseudonym 234
The
difference between this and the proposed RDF is of course that
"Mark
Twain" is a *topic* or *subject* in its own right, and *not*
just a
name. Names should never be class instances in themselves, so I
think I agree with you"--October 8, 2007 email from Alexander Johannesen
What would be the trade-off in treating name as a property of person, with subproperties of preferred form of name, variant form of name, etc.?
"I
think the trade-off is the lack of a node to link information too.
This is
probably why they've done it, so that they could treat "Mark
Twain"
as a node itself. Of course, the notion of "name" is vague (for
example, his
writing style when Mark Twain was very different from the
Samuel
Clemens writing and they might as well be treated as two
different
people with some pseudonymic relationship between them.
I guess
we're coming into implementation land now, but there's also
the notion
of how software is to use the proposed models. Systems
probably
deal easier with properties, and indeed going back to frames
theory,
everything is key-value properties in complex formations. The
label
property of a topic is just a property with certain semantic
meaning.
Another
possibility for using Name classes could be that it's easier
to work
with OWL in inference engines, although I can't verify this at
this time"--October 8, 2007 email from Alexander Johannesen
DECISION: I have created just the single class Person;
preferred form of name is treated as a property as is variant form of
name. Bibliographic identities (Mark
Twain and Samuel Clemens as two different authors) are not recognized in my rules
or my RDF model; for one thing, this will support more standardization
internationally and across the Internet.
What would be the most elegant way to allow the preferred form of name to vary based on the preferred language, script and transliteration scheme of a particular user? How would you model this situation (feel free to use topic map form, if you prefer)?
"Well,
names and variant names
(http://www.isotopicmaps.org/sam/sam-model/#sect-topic-name)
are
already
part of the Topic Maps standard, so we don't have to create a
model for
it (did I mention how well Topic Maps is suited to the
library
world? :). Of course, we're here talking about a proper
separation
of a person in terms of pseudonyms, so I'll throw a few
false ones
into the mix. I've also introduced type here like this ;
[type],
where type is an instance of some other Topic (meaning, if you
see
[person] there is a Topic somewhere defined with this identifier)
like this ;
Topic
person
name "A person"
Topic
alternate
name "Alternate spelling of a
name"
Topic nick
name "A nick name"
Topic 123
[person]
name "Samuel Clemens"
name [alternate] "Sam Clemens"
name [nick] "Sam Clam"
Topic 234
[person]
name "Mark Twain"
name [alternate] "Mr Twain"
Association
[pseudonym]
- [person] 123
- [pseudonym] 234
In Topic
Maps a few types of names are defined, such as sort name,
display
name, and we can use scoping too to further model what these
names are
(such as language, source, bias, etc.) I'd suggest a quick
glance at
http://www.infoloom.com/tmsample/pep4.htm (search for "topic
names")"--October 8, 2007 email from Alexander Johannesen
12. A related question. FRAD also treats key identifier (unique machine-readable identifier for an entity) as a class rather than a property. Why not treat key identifier as a property of person?
"Indeed,
why not? I don't know of any standard that has tried to solve
persistent
identification as well as Topic Maps. Here's what we do ;
Topic 234
[person]
[psi]
"http://psi.marktwainfoundantion.org/mark_twain"
name
"Mark Twain"
name [alternate] "Mr Twain"
We use
Public Subject Indicators [psi] as properties which, really,
are URI's
that you have published and are committed to keep alive,
meaningful
and persistent. If you are committed to keeping the PSI for
"Martha
Yee" alive and well, you publish your PSI and make sure that
doesn't
resolve to anything 40x (meaning, not found, unless that's
intentional).
These are basically global constants, and they (as the
name
suggest) indicate subjects. Institutions and organisations map
what they
want, but as long as they're PSI's they are committed to
those
URI's, making sure that if *others* out there use them they will
be useful
in the future as well.
Of course,
this doesn't stop you from making up imaginary URI's or
even
existent but not really in your domain; it's crazy to think that
all
organizations of the world will commit to PSI's, little less know
about Topic
Maps or persistent identifiers, so some leeway must be
given. But
at least it's a mechanism that works well.
There are
whole sections of the Topic Maps standard that deal with
PSI's and
rules for merging Topic Maps together with them. This way,
as long as
we talk about the same PSI's we can merge smaller and
diverse
Topic Maps together at any stage later. In fact, this is one
of the
absolutely most exciting part of the Topic Maps standard in my
eyes, and
sorely overlooked by the world at large. You read more about
"locators"
too, at
http://www.isotopicmaps.org/sam/sam-model/#sect-locators
In fact,
skim the standard itself; it's quite full of good advice and
modeling ideas"--October 8, 2007 email from Alexander Johannesen
DECISION: All key identifiers are treated as
properties of the entities (classes) they represent.
13. Sometimes a place is a jurisdiction and behaves like a corporate body (e.g. United States as the name of the government of the United States). Sometimes place is a physical location in which something is located (e.g. a book about the birds of the United States). In order to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, I have defined two different classes for place, Place as Jurisdictional Corporate Body and Place as Geographic Area. Will this cause problems in the model? Will there be times when it prevents us from making elegant generalizations in the model about place per se? There is a similar problem with events. Some events are corporate bodies (e.g. conferences that publish papers) and some are a kind of subject (e.g. an earthquake). I have defined two different classes for event, Conference or Other Event as Corporate Body Creator and Event as Subject.
14. The bound with relationship is actually between two items representing two different works, and the issued with relationship is between two manifestations representing two different works. Is this a work to work relationship? Will designating it a work to work relationship cause problems for indicating which specific items or manifestation/items of each work are physically located in the same place? This question may also apply to whole-part relationships when the part is physically contained within the whole and both are located in the same place. One thing to bear in mind is that the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that represent the particular manifestation or item that is bound with or issued with or part of the whole.
DECISION: For now, I have treated these as
work-to-work relationships with specified properties; these three types of
work-to-work relationships should be understood to apply to only some items or
manifestations of the works involved.
15. Are there nesting rules in RDF that I need to be aware of? For example, to create a related work link, should the related work property be "outside" the work name property, etc.?