ABOUT THE MODEL

 

November 29, 2007 draft

1. One article I read about the semantic web defined "resource" as "what you get when you click on a URL." My use of the Web leads me to assume that when you click on a URL you always get only one thing; in other words, that a URL is a one-to-one link. What we need for our data are one-to-many links. For example, you need a URL that stands for a work, and when you click on it, you need to see all the available expressions of that work (e.g. the translations into various languages, the annotated editions, the illustrated editions, etc.). When you click on an expression, you need to see all the available manifestations of that expression (e.g. the online version, the text version, etc.) My friend Sara has argued that many one-to-one links can have the appearance of a one-to-many link (did I get that right, Sara?) Do you agree? Should I stop worrying about this? Should I be careful to use a particular kind of representation for a one-to-many link in RDF?

"Serialisation of binary associations is simple, at least in English and similar languages, but automatic serialisation of associations with three or more role players becomes difficult if not impossible. That is another reason why binary associations are so widely used"--The Topic Maps Handbook by H. Holger Rath, on the web at www.empolis.com

"Well, I think we need to define that 'one' thing. You can well define it as one list of things, which usually is what happens, or it could be one relationship between many, and so on. The URL is there as an addressing mechanism for this to happen, and as you've pointed out it's one of those big issues in the RDF world. In the Topic Maps world we don't have this problem because we don't use URL as *the* semantic of the relationships we build. For example, if RDF points to 'http://some.example/' as RDF:about, do they mean the RDF semantics are about *that* website, or does that website *represent* some other notion (like a topic), or is the website or the organization it belongs to somehow mixed up in the aboutness? These are not huge unsolvable problems, but enough to cause ambiguity and confusion, which of course is fatal in computer models

A URL in itself can *mean* many things, and many-to-many, one-to-many or many-to-one are all fine examples of a URL. Let's try a few examples ;

  one :

     Monteverdi, composer - http://example.com/composer/monteverdi

  one-to-many :

     Monteverdi's 1610 vespers - http://example.com/work/123456789

  one-to-one :

      A particular recording of the 1610 vespers - http://example.com/record/2345

  many-to-one :

      recordings of "Dixit Dominus" -

http://example.com/work/123456789/2/recordings

  many-to-many :

      1610 vesper psalms recorded : http://example.com/work/123456789/type:p/recordings

Some might argue that these URL aren't unambiguous, which is true ; they provide some semantics to exemplify that their use. These URLs can equally all be expressed as http://example.com/[some huge number here]. All of them.

The point here is that a URL can represent anything you want. There are no restrictions on them, no one-to-one, many-to-many or otherwise. The only limitation on URLs are their resolvability at the hosting systems. If the server you're pointing to in http://example.com/work/123456789/2/recordings have no idea what that means, then that might be a problem. The funny part here is that in RDF there is no requirement for URLs to resolve to anything ; you can create fake URLs all the time, as much as you like. RDF state that resolvability is a good thing, though.

In the Topic Maps world we use what's known as Public Subject Indicators (PSI), which are URLs that are 1) publicly published with 2) a guarantee of maintained persistence, and 3) indicators of a subject (meaning; to get to the semantics of that URL, resolve the URL to read what it means)"--September 11, 2007 email from Alexander Johannesen.

2. We need to be able to identify a work for human beings (rather than machines) using a combination of both the author and the title (when there is an author), but we also need to treat the author as an entity (in RDF terms, a class?) so that we can create a record for it that contains all of its variant names, biographical information and so forth. I think what I am saying is that we need to treat an author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. Is it possible to model this? Or is it possible that RDF (and other data modelling) works against effective use of bibliographic data because of an absolute requirement that something either be a class or a property, but never both?

OWL Web Ontology Language Guide (http://www.w3.org/TR/owl-guide/) p. 7 (Section 1.1 under OWL DL):  "A class can not also be an individual or property." "A topic could be class and instance at the same time"--The Topic Maps Handbook by H. Holger Rath, on the web at www.empolis.com

"Sure. I guess this part points to something that splits the RDF and Topic Maps worlds apart. With RDF, you have an ever-growing tree-structure with no cross-pollination without what's known as anonymous nodes. So, every thing expressed, every relationship, is created as a child-node of the thing it belongs to. Example ;

  Alex

     is_a : person

     has_a : dog

        with_name : Oscar

        of_type : english_cocker-spaniel

     married_to : julie

Let's take just the one part of this out, and express it as a falling triplet ;

   "Alex" : http://shelter.nu/me.html

      "married_to" : http://some.ontology/human_relationships#marriage

         "Julie" : http://shelter.nu/my_family.html#julie

For each thing that's attached to me I must make a triplet, and for every thing within a triplet more triplets to explain them. Now, the thing here is ; Where do you express these things? Where do you express that Julie is a person, and where do you define that URL?

With Topic Maps, all things must first be a topic, so the same tree would be (and remember the Public Subject Identifiers talked about earlier, PSI's) ;

  topic : "Alex", PSI=http://shelter.nu/me.html

  topic : "married_to", PSI=http://some.ontology/human_relationships#marriage

  topic : "Julie", PSI=http://shelter.nu/my_family.html#julie

These things are expressed as things. Making the links between things are done *outside* their context, treating relationships as topics themselves ;

  association of type "married_to" : between "Alex" and "Julie"

  ('association' is simply a topic of type 'association' :)

This association itself can have a URL (unlike RDF relationships), a PSI, a name, or more relationships and properties attached (basically, RDF's ability to reify statements is handled ambiguously). The reason I'm talking about these RDF and Topic Maps differences is to point out this problem of where to define a thing. Where do you define a thing in RDF? There is the notion in the RDF world that you have a separate document with RDF triplets to define these things which you link in when you resolve or infer over your triplets, but in RDF this is implicit while in Topic Maps they are explicit. this is one of those things that's confusing about RDF.

But back to your question ;

Just talk about things, there and then. If they resolve to the right thing, your inference engine will thank you and be able to work with it. The persistence of these things lie with the URL you choose for your thing. If it's an author, I can do ;

  "Frank Herbert" : http://authors.psi.org/f_herbert(1920-)

And do ;

   "Dune"

      is_a book : http://things.psi.org/book

      has_author : http://authors.psi.org/f_herbert(1920-)

It should be up to smart SemWeb systems to pick these URLs up and use them for persistence. Smart systems should also be able to link http://authors.psi.org/f_herbert(1920-) and http://authors.psi.org/f_herbert together as the same author, but this unambiguity lies at the heart of the semWeb problem.

RDF comes in many parts, and as basic RDF itself there is no such constraint (everything can be anything). You can build those constraints with RDFS. You can use more specific language within RDF to define things, or more complex statements with using the three levels of OWL. (At this point we've included five levels of ontology redirection, and I'm sure I've lost you along the way ... :)"--September 11, 2007 email from Alexander Johannesen.

3. Would I pay a price for defining expression as a subclass of work, manifestation as a subclass of expression, item as a subclass of manifestation, etc.?

"Always ; the better you define anything, the more chance that definition will break in some other context. I know this sounds terribly gloomy, but it's just a fact of life that when I say "that is a zebra" (which most people will be able to identify just fine) some weird zoologist will come along and ask "what sort of zebra?" (yes, there's more than one, and the differences between them cause great problems within zoological taxonomies and is mostly unresolved)"--September 11, 2007 email from Alexander Johannesen.

The RDF primer in section 5.1 says that any instance of a subclass is also an instance of the class. I think it is correct to say that any item is also a manifestation, expression and work. Am I misinterpreting anything here?

"No. But I do fear that the taxomatic constraints of class-instance hierarchies are somewhat flawed the further away from the original definition one goes. Some talk about ontological distance (how many jumps from where you are now to the thing that originated the classification path) needing to be weakened to the proportion of each sub-class's weight (which, funnily, is a recursive argument and can't be solved, certainly not by humans). It basically mean that you need to have full knowledge of what you're about to classify before you can classify the thing. Doesn't work, does it? :)

RDF and Topic Maps are at best "close enough" for the things we're trying to do, although my personal view is that all of this modeling business is followed as it was the truth because it looks like the truth. I'm no longer sure we're doing what's right, but I digress.

Back to your question here ; a thing that's an instance of a thing is by inheritance an instance of all parent classes. For example, if we use this silly classification tree ;

   thing

     living thing

        walking living thing

           human

What's meant is that an instance of human is at the same time a thing, a living thing *and* a walking living thing. Things are whatever is up the tree, and this lies at the root of taxonomical classification theory. For a thing to be anything, it must be what has passed before it. That's the tree of knowledge for instances of things, but I'm sure I'm preaching to the choir on this one. :)

In your problem, the thing that precedes it up the tree isn't really an instance of that thing, so here I wouldn't use instances of things, but properties. But then again, neither of these models are satisfactory in my eyes. This is where I suspect RDF:OWL comes to the rescue where you can make further qualifications to the instances to relieve them from parent class inheritance, but I'd have to read up on it (been a while)"--September 11, 2007 email from Alexander Johannesen.

I note that there has been at least one attempt to express FRBR in RDF and that they have not defined expression as a subclass of work. Instead, they have defined work as "disjoint with" expression, manifestation and item. Yet the FRBR primer (section 5.5), following OWL, says that when two classes are disjoint "no resource is an instance of both classes." If an item is also a manifestation, expression and work, the FRBR-RDF people are wrong to do this, aren't they?

"Ah, yes, I should have read a bit more of your email to find what I was trying to say with the previous section. :) I think it would be worthwhile to ask them why they cose to do so ; I susect there's limitations between the taxonomical hierarchy and how OWL operates on atomic entities that share properties but not class instantiations. (Hm, that sounded Greek to me. Let me know if this comes across as such.)"--September 11, 2007 email from Alexander Johannesen.

Gordon Dunsire (email to RDA list) refers to FRBR having a "no manifestation without expression rule."

DECISION: With some trepidation, I decided that since an item is always also a manifestation, an expression and a work, these should all be treated as subclasses of each other.

4. Related to 3, is there any value in defining a "superwork" class?  A superwork is really just a work from which a large number of related works have spun off (for example, dramatizations, films, etc.).

DECISION: Superwork was not defined as a formal class in the model.

5. The FRBR-RDF people seem to have followed OWL and defined two properties for every property, each an inverse of the other. Thus, they have defined the class responsibleentity, and the two properties creator and creatorof. The creator property is defined as "an entity in some way responsible for the creation of a work," the domain is work and the range is responsibleentity. The creatorof property is defined as "a work that was in some way created by an entity," the domain is responsibleentity and the range is work. Is that correct RDF logic, and would every statement of authorship have to be duplicated, e.g. 1. this person is the author of this work and 2. this work has this author? (This may actually be related to point 1 above.)

"Good questions. I don't know. We're entering the world of free modeling as opposed to the default taxonomical model, and as such people are free to make up what they want. I'm positive you can't say there's anything such as "correct" RDF logic. I'm sure that model makes sense to whoever did it, but if it feels unnatural to others reading it, I'd say that's an indication of someone getting it wrong. Mind you, getting models such as this *right* is close to impossible (but not absolutely impossible :).

As to the bi-directionality of the statements, I'm not sure how the property is defined (optional, mandatory?) but it shouldn't be necessary to duplicate the expressions, but then, it comes down to the use of the ontology. There's a slight difference between ;

   "Herbert"

      has_written

         book_1

         book_2

and ;

   "Book 1"

      written_by

         "Herbert"

   "Book 2"

      written_by

         "Herbert"

It comes down to practicalities of where you model *from*. Sometimes, if you model from the book level, repeating author is a must (which is modus operandi in the cataloging world, as you know) but you have to make a judgment call on what other parts of the author file you should express, given your knowledge on what other things exist in the vicinity of the thing you're expressing. This is of course quite impossible, and leads to a lot of replication which inference engines need to sort out (and indeed bring a lot of trouble to the semWeb world). The trick with any of this is to have control over the URLs that are used as persistence identifiers.

I think we're seeing just how messy RDF can get without proper RDF tools. :)"--September 11, 2007 email from Alexander Johannesen.

DECISION: To keep the model to a reasonable size, inverse classes were not defined, for the most part; it is hoped that they can be viewed as being implicit in the model as is. Exceptions: whole-part relationships; words for music/music for words; broader/narrower term relationships between subjects.

6. OWL contains the concept of "transitivity;" could it be an attempt to deal with inheritance?

"RDF (with its sub-standards included) has a long history, particularly from the DAML and OIL days (you may have heard of DAML+OIL), where the two different ontologies were merge quite successfully (and I suspect because each of them were quite beautifully designed) to form the basis for what RDF is supposed to be doing. Unfortunately, there were too many people and too much politics in the process, so the elegancy of DAML+OIL (which, really, was adequate for most modeling and were cleanly separated) turned into the 5 level RDF thing we've got now. They probably thought basic RDF was good enough to get you started, that more complex stuff should be handled by OWL (what does "more complex" mean in ontological terms?) and that the schema should express exchange rules (again, what does this mean in ontological terms?) The truth is that when doing ontology work the thing that makes them work is fuzziness and ambiguity, but the strictness and split personality of RDF is becoming a bit of a problem. I've seen some work on trying to bring these things together in an RDF 2.0 effort, but don't know where they're up to"--September 11, 2007 email from Alexander Johannesen.

I read a book on data modelling awhile ago that stated that object-oriented models support inheritance but relational models do not. Does RDF support inheritance?

"Yeah, it's in the RDFS ;

<rdf:RDF xml:lang="en"

   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

 <rdfs:Class rdf:ID="Author">

  <rdfs:comment>The class of people are authors</rdfs:comment>

  <rdfs:subClassOf rdf:resource="http://purl.org/dc#Creator"/>

 </rdfs:Class>

</rdf:RDF>

It's a sour mix of where in RDF you find what you can express, and how"--September 11, 2007 email from Alexander Johannesen.

(Do you agree with that characterization of object-oriented as opposed to relational models?)

"Sure, but I guess what they mean is that the relational model by default don't carry inheritance constraints, and that's very true, but it doesn't mean the relational model can't express them. In some ways you can define a lookup table as a form of inheritance mechanism, but it certainly isn't a constraint nor documented as such, and, if one goes down to the SQL language, isn't easy to express (but is certainly doable :)"--September 11, 2007 email from Alexander Johannesen.

I think inheritance is what we need to ensure that the cross reference from FBI to United States. Federal Bureau of Investigation will also apply when someone searches for the FBI Department of Counterterrorism. Do you agree?

"This is one of those biggies ; can we predict what context a user needs and / or want with any given search? Some times they indeed want the whole FBI as part of the search, at other times they're just interested in the counter-terrorism part. Or, when we're classifying, what parent class is that special zebra part of?"--September 11, 2007 email from Alexander Johannesen.

If so, can you help me figure out how to represent data in RDF such that inheritance is part of the model?

"Dig into the RDFS which deals with implications within the taxonomical part of RDF. I can't provide much specifics per se as it's been ages I've abandoned RDF as a modeling tool. I prefer to use Topic Maps (much cleaner as an expression engine) and then export it to RDF when need be"--September 11, 2007 email from Alexander Johannesen.

DECISION: From my readings of RDF literature so far, I believe that by creating corporate subdivision as a subclass of corporate body and subject subdivision as a subclass of subject, the model supports hierarchy such that anything that is true of the main heading should be held to be true for the subdivision.

7. The aforementioned book on data modelling seemed to imply that models require each entity to have one and only one name. Does that mean that data modelling would preclude recognizing the fact that entities are often known by several variant names and that searchers should be successful no matter what variant name they search on? Or does the rule pertain only to the machine-readable identifier (not the human-readable identifier(s) linked to that machine-readable identifier)? Is there a "best-practices" way to model what we used to call a "see-reference" in RDF?

"No, it's just a technical constraint of the relational model. To make persistent identifiers work, one thing is that there's more than one per entity, but you have to resolve them using relational paradigms such as lookup tables. For example ;

   herbert,_frank

   f_herbert(1920-)

both are PI's to the same author, but in a relational model this is ;

   table : "author" columns : id, name

   --------------------

   200  |  Frank Herbert

   table : "author_pi", columns : id, pi, author_id, primary

   --------------------

   1  |  herbert,_frank  |  200  |  false

   2  |  f_herbert(1920-)  |  200 |  true

It's the "author_pi" table (lookup table) that binds the PI's together with the "author_id" field which points to the "author" table. this binding is expressed with SQL. (Not sure you know about SQL and relational models, so I'm writing as if you don't. Just tell me if you do.)

The before mentioned constraint is that any id of any table must (well, should) be unique, not that PI's can't be unique. Basically, identifiers of a table should be unique, but the columns with their various properties need not"--September 11, 2007 email from Alexander Johannesen.

8. Is it possible to define a class subject and a class person and still use the class subject for a work about a person? How do you model that relationship between two classes in RDF? (The same problem occurs with works about corporate bodies and works about other works.)

"Yes. An *instance* work can be a sub-class of any other class expressed through OWL, without that instance inheriting all those classes, but you need to express it in the ontology. It's a bit messy, but certainly doable.

There's two main ways of expressing relationships ; the class-instance hierarchy (which you're referring to above), and using OWL (or some parts of standard RDF) to just model the things free form. Unless someone tries to validate your RDF with RDFS statements, you can build up pretty much anything you like.

First, there's some modeling differences between relationships between classes and between instances of those classes, and different "rules" (some are rules, other guidelines, some just gut feelings). We can create classes ;

   animal

      dog

We by default express the taxonomical relationship between these. Constraints between these two can be ;

   animal

         dog

            RDFS : instances cannot be "fish"

            RDFS : instances can be "furry types"

         furry types

            cat

Now, this gets complicated of course, but OWL ;

   "Oscar"

      is_a : dog

      OWL : is_subclass_of "furry types"

... is perfectly legal ; the instance "Oscar" is now a subclass of "furry types" because Oscar is a particularly furry example of a dog (he really is :). I guess all of this again points to understanding all the RDF stack in necessary to do any serious modeling. I guess I would at this point be keen to model it a bit simpler. In reality, you can model all of these things without the constraints, and still get a decent model for inference models to work with. You can do ;

   book_a

      is_a

         manifestation_x

         translation_y

         idea_z

and let inference engines work out the details. Both will work in the long run, although I understand the want and sometimes need for rigid ontologies"--September 11, 2007 email from Alexander Johannesen.

DECISION: I defined the class Subject as having an Equivalency relationship ("Equivalent to Union of") to Work, Expression, Person, and Corporate body.  Also, all subject properties are defined as having domain of Resource.

 I'm getting confused about when to label examples using class names and when to label them using properties.  I got most of the way through coding the examples using only properties, assuming that that is enough since the properties are defined as belonging to particular classes in the model.  However, when I came to subject headings, I realized that if you use only the properties of subject headings, you cannot specify that a particular subject heading is for a geographic heading, the name of a person, etc.  In those cases, I started using class names in the examples, but I suspect I have run across an underlying problem in the model here.

 

9. There are times when you have an expression of an expression (as when a specific edition is the basis for a new edition, or when there are two Japanese translations, etc.) or a manifestation of a manifestion (as when a reproduction is made of a specific other manifestation). If I put in subexpressions and submanifestations as subclasses does that somehow require every resource to use all of these levels even when they are not necessary? Is there another way to deal with a hierarchy that needs to be expansible and contractible in RDF?

"I would just reuse expression and manifestation classes ; they are "sub" by inheritance. I usually point to the difference between ;

  <topic>

     <topicName> some name </topicName>

  </topic>

and

  <topic>

     <name> some name </name>

  </topic>

Here, classes and instances are described in XML, but the point is that we can reuse <name> at will, as we certainly will define what "name" mean somewhere. In this case, "name" is a sub-element of "topic", so the topic's name. This is straight class-instance theory, and doesn't need a lot of explaining. The former example needs an explanation of what "topicName" mean, and especially if it means anything different from a normal "name".

So, the same with "expression" and subexpression" ; we can already see that the one is a sub of the other, so does "subexpression" express more than just "sub"? In my eyes, the less "things" you need to define and the fewer relationships you need to explain, the easier it will become to express and explain them. Maybe these are too simplistic examples?"--September 11, 2007 email from Alexander Johannesen.

DECISION: Rather than define subexpression and submanifestation class levels, I defined expression-to-expression relationship properties.

10. Is there any way to define types of data in an RDF model? For example, would it be possible to define a particular data element as being transcribed (copied from the resource), as opposed to composed by the cataloger? I have discovered the "built-in datatypes" which seem to provide a way to define certain data as "date" or "integer," for example, but I haven't quite figured out whether these datatypes can be placed right into the model or whether they have to be placed in the resource records created according to the model.

"I don't think there's much correlation between a type and a datatype. :) Datatypes are very closely linked to the exchange format they're expressed in (so you can say that an RDF field's value must be integers, dates, strings of certain kinds, and so forth), so I don't think you need to worry about that. A type is a straight "is_a" relationship in the taxonomical hierarchy, so it's enough to just add ;

   some_book

       has_subject

          fish

          whales

             element_content_source

                transcribed

just like ;

          fish

             element_content_source

                copied_from_resource

is the default in your model (but you wouldn't need to express this)"--September 11, 2007 email from Alexander Johannesen.

Addition to question 10: Could "literal value surrogate" be used to mean a particular element is transcribed, and "non-literal value surrogate" be used to mean that a particular element is selected from a list, including lists such as LCSH (for subject headings) and the LC/NACO authority file (for name and work identifiers)?

 

11. I notice that the Functional Requirements for Authority Data (http://www.ifla.org/VII/d4/FRANAR-ConceptualModel-2ndReview.pdf) (brought to us by the FRBR folks) has defined the following classes and properties (entities and attributes, in their terminology):

 

Class: Person

Properties: dates associated with the person, title of person, other designation, gender, place of birth, place of death, country, place of residence, affiliation, address, language of person, field of activity, profession/occupation, biography/history

 

Class: Name

Properties: type of name (personal, corporate, family, etc.), scope of usage (e.g. fictional genres by that person that use a particular pseudonym), dates of usage, language of name, script of name, transliteration scheme of name

 

Class: Controlled Access Point

Properties: type of controlled access point (personal, corporate, family, etc.), status (level of establishment, e.g., provisional), designated usage (preferred or non-preferred), whether undifferentiated, language of base access point, language of cataloging, script of base access point, script of cataloging, transliteration scheme of base access point, transliteration scheme of cataloging, source of controlled access point (publication cataloged), base access point (e.g. surname and forename), addition (e.g. birth and death date)

 

Is that the most elegant way to model the cataloging practice of choosing a preferred form of name for every person, with cross references from all variant names? 

 

"It depends. It works, of course, but it does indeed sound dodgy to

have a class instance Name in a relationship with a class instance

Person, however that might just be a persona preference rather than a

flaw per se.

 

In Topic Maps a name is always attached to a Topic; there is no

separation of the entities, as this is in many ways closer to human

thinking. There might be a Topic "the name Fred", but again, that is a

Topic in its own right, and not a separate class instance of that

name.

 

As to general modeling I agree that names shouldn't be classes just

for the odd chance the name is complex enough to warrant its own

instance, but looking at the definition it's pretty clear that "name"

isn't a name at all but rather a complex node used for identification

of things. Now that's all fine and well, but doesn't sit too well with

the label "name" nor the concept of the Controlled Access Point which

try to do pretty much the same thing. I'd agree that there's something

ambiguous about this setup. In fact, why not make names part of the

Controlled Access Point? The interesting thing here is indeed the

notion of "access", and one way to access any piece of information is

through its label  name. I'm sure there's something funky that could

be done here"--October 8, 2007 email from Alexander Johannesen

 

Is it possible that FRAD chose this model because of the fact that current Anglo-American cataloging practice considers Mark Twain and Samuel Clemens (for example) to be two different authors, and considers a corporate body that has changed its name to be two different corporate bodies? 

 

"Possibly, but it feels counter to the way we humans look at it. You

could do (in Topic Maps terms) ;

 

Topic 123

   name "Samuel Clemens [Mark Twain]"

   name [pseudonym] "Mark Twain"

   name [original] "Samuel Clemens"

 

This does what they want. Of course, if they *truly* want the modeling

flexibility ;

 

Topic 123

   name "Samuel Clemens"

 

Topic 234

   name "Mark Twain"

 

Association

   type "pseudonym"

   - person 123

   - pseudonym 234

 

The difference between this and the proposed RDF is of course that

"Mark Twain" is a *topic* or *subject* in its own right, and *not*

just a name. Names should never be class instances in themselves, so I

think I agree with you"--October 8, 2007 email from Alexander Johannesen

 

What would be the trade-off in treating name as a property of person, with subproperties of preferred form of name, variant form of name, etc.? 

 

"I think the trade-off is the lack of a node to link information too.

This is probably why they've done it, so that they could treat "Mark

Twain" as a node itself. Of course, the notion of "name" is vague (for

example, his writing style when Mark Twain was very different from the

Samuel Clemens writing and they might as well be treated as two

different people with some pseudonymic relationship between them.

 

I guess we're coming into implementation land now, but there's also

the notion of how software is to use the proposed models. Systems

probably deal easier with properties, and indeed going back to frames

theory, everything is key-value properties in complex formations. The

label property of a topic is just a property with certain semantic

meaning.

 

Another possibility for using Name classes could be that it's easier

to work with OWL in inference engines, although I can't verify this at

this time"--October 8, 2007 email from Alexander Johannesen

 

DECISION: I have created just the single class Person; preferred form of name is treated as a property as is variant form of name.  Bibliographic identities (Mark Twain and Samuel Clemens as two different authors) are not recognized in my rules or my RDF model; for one thing, this will support more standardization internationally and across the Internet.

 

What would be the most elegant way to allow the preferred form of name to vary based on the preferred language, script and transliteration scheme of a particular user?  How would you model this situation (feel free to use topic map form, if you prefer)?

 

"Well, names and variant names

(http://www.isotopicmaps.org/sam/sam-model/#sect-topic-name) are

already part of the Topic Maps standard, so we don't have to create a

model for it (did I mention how well Topic Maps is suited to the

library world? :). Of course, we're here talking about a proper

separation of a person in terms of pseudonyms, so I'll throw a few

false ones into the mix. I've also introduced type here like this ;

[type], where type is an instance of some other Topic (meaning, if you

see [person] there is a Topic somewhere defined with this identifier)

like this ;

 

Topic person

   name "A person"

 

Topic alternate

   name "Alternate spelling of a name"

 

Topic nick

   name "A nick name"

 

Topic 123

   [person]

   name "Samuel Clemens"

   name [alternate] "Sam Clemens"

   name [nick] "Sam Clam"

 

Topic 234

   [person]

   name "Mark Twain"

   name [alternate] "Mr Twain"

 

Association

   [pseudonym]

   - [person] 123

   - [pseudonym] 234

 

In Topic Maps a few types of names are defined, such as sort name,

display name, and we can use scoping too to further model what these

names are (such as language, source, bias, etc.) I'd suggest a quick

glance at http://www.infoloom.com/tmsample/pep4.htm (search for "topic

names")"--October 8, 2007 email from Alexander Johannesen

 

12. A related question. FRAD also treats key identifier (unique machine-readable identifier for an entity) as a class rather than a property.  Why not treat key identifier as a property of person? 

 

"Indeed, why not? I don't know of any standard that has tried to solve

persistent identification as well as Topic Maps. Here's what we do ;

 

Topic 234

   [person]

   [psi] "http://psi.marktwainfoundantion.org/mark_twain"

   name "Mark Twain"

   name [alternate] "Mr Twain"

 

We use Public Subject Indicators [psi] as properties which, really,

are URI's that you have published and are committed to keep alive,

meaningful and persistent. If you are committed to keeping the PSI for

"Martha Yee" alive and well, you publish your PSI and make sure that

doesn't resolve to anything 40x (meaning, not found, unless that's

intentional). These are basically global constants, and they (as the

name suggest) indicate subjects. Institutions and organisations map

what they want, but as long as they're PSI's they are committed to

those URI's, making sure that if *others* out there use them they will

be useful in the future as well.

 

Of course, this doesn't stop you from making up imaginary URI's or

even existent but not really in your domain; it's crazy to think that

all organizations of the world will commit to PSI's, little less know

about Topic Maps or persistent identifiers, so some leeway must be

given. But at least it's a mechanism that works well.

 

There are whole sections of the Topic Maps standard that deal with

PSI's and rules for merging Topic Maps together with them. This way,

as long as we talk about the same PSI's we can merge smaller and

diverse Topic Maps together at any stage later. In fact, this is one

of the absolutely most exciting part of the Topic Maps standard in my

eyes, and sorely overlooked by the world at large. You read more about

"locators" too, at

http://www.isotopicmaps.org/sam/sam-model/#sect-locators

 

In fact, skim the standard itself; it's quite full of good advice and

modeling ideas"--October 8, 2007 email from Alexander Johannesen

 

DECISION: All key identifiers are treated as properties of the entities (classes) they represent.

 

13.  Sometimes a place is a jurisdiction and behaves like a corporate body (e.g. United States as the name of the government of the United States).  Sometimes place is a physical location in which something is located (e.g. a book about the birds of the United States).  In order to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, I have defined two different classes for place, Place as Jurisdictional Corporate Body and Place as Geographic Area.  Will this cause problems in the model?  Will there be times when it prevents us from making elegant generalizations in the model about place per se?  There is a similar problem with events.  Some events are corporate bodies (e.g. conferences that publish papers) and some are a kind of subject (e.g. an earthquake).  I have defined two different classes for event, Conference or Other Event as Corporate Body Creator and Event as Subject.

 

14. The bound with relationship is actually between two items representing two different works, and the issued with relationship is between two manifestations representing two different works.  Is this a work to work relationship?  Will designating it a work to work relationship cause problems for indicating which specific items or manifestation/items of each work are physically located in the same place?  This question may also apply to whole-part relationships when the part is physically contained within the whole and both are located in the same place.  One thing to bear in mind is that the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that represent the particular manifestation or item that is bound with or issued with or part of the whole.

 

DECISION: For now, I have treated these as work-to-work relationships with specified properties; these three types of work-to-work relationships should be understood to apply to only some items or manifestations of the works involved.

 

15.  Are there nesting rules in RDF that I need to be aware of?  For example, to create a related work link, should the related work property be "outside" the work name property, etc.?