Personal knowledge base
Encyclopedia
A personal knowledge base (PKB) is an electronic tool used to express, capture, and later retrieve the personal knowledge
of an individual. It differs from a traditional
database
in that it contains subjective material particular to the owner,
that others may not agree with nor care about. Importantly, a PKB consists
primarily of knowledge, rather than information; in other words, it is not
a collection of documents or other sources an individual has encountered,
but rather an expression of the distilled knowledge the owner has extracted
from those sources.
Davies of the University of Mary Washington
and has a tripartite
definition:
, but is a distinct topic based on the "information" vs. "knowledge" difference. PKB's are about recording and managing the knowledge one derives from documents, whereas PIM is more about managing and retrieving the documents themselves.
written language:
Davinci's notebooks are a famous example. More commonly, card files and personal
annotated libraries have served this function in the pre-electronic age.
Vannevar Bush
's description of the "Memex
" in 1945. Bush surveyed the
post-World-War-II landscape and laid out what he viewed as the most
important forthcoming challenges to humankind in The Atlantic Monthly
. The Memex was a theoretical (never implemented) design for a
system to help tackle the information overload
problem, already
formidable in 1945. In Bush's own words:
Bush envisioned collaborative aspects as well, and even a world-wide system
that scientists could freely consult. But an important emphasis throughout
the article was on expanding our own powers of recollection: "Man needs to
mechanize his record more fully," he says, if he is not to "become bogged
down...by overtaxing his limited memory." With the Memex, the user could
"add marginal notes and comments," and "build a trail of his interest"
through the larger information space. She could share trails with friends,
identify related works, and create personal annotations. Bush's Memex would give each individual the ability to create, categorize, classify, and relate his own set of information
corresponding to his unique personal viewpoint. Much of that information
would in fact consists of bits and pieces from public documents, just
as the majority of the knowledge inside our own heads has been imbibed from
what we read and hear. But the Memex also allowed for the specialized
recording of information that each individual perceived and needed to
retain. The idea of "supplementing our memory" was not a one-size-fits-all
proposition, since no two people have the same interests, opinions, or
memories. Instead, it demanded a subjective expression of knowledge, unique
to each individual.
diagrams to represent abstract
knowledge; the use of spatial layout, color, and images is said to
strengthen understanding and promote creativity. Each of the three primary
schools - mind mapping
, concept mapping
, and cognitive mapping
- prescribes
its own data model and procedures, and each boasts a number of software
applications designed specifically to create compatible diagrams.
was promoted by pop psychologist Tony Buzan
in the 1960's, and
commands the allegiance of an impressive number of adherents worldwide. A
mind map is essentially nothing more than a visual outline, in which a main
idea or topic is written in the center of the diagram, and subtopics
radiate outwards in increasing levels of specificity. The primary value is
in the freeform, spatial layout (rather than a sequential, numbered
outline), the ability for a software application to hide or reveal select
levels of detail, and as mentioned above, graphical adornments. The basic
data model is a tree
, rather than a
graph
, with all edges implicitly
labeled "supertopic/subtopic." Numerous tools are available for
constructing mind maps. (Examples: Freemind
, MindMapper
, MindGenius
, VisiMap, MindManager
, NovaMind
, HeadCASE, ConceptDraw MINDMAP, and Visual Mind
)
were developed by Cornell Professor Joseph
Novak
, and based on David Ausubel
's
assimilation theory of learning. An essential tenet is that
newly encountered knowledge must be related to one’s prior knowledge in
order to be properly understood. Concept maps help depict such connections
graphically. Like mind maps, they feature evocative words or phrases in
boxes connected by lines. There are two principal differences, however:
first, a concept map is properly a graph, not a tree, permitting arbitrary
links between nodes rather than only parent/child relationships; and
second, the links are labeled to identify the nature of the inter-concept
relationship, typically with a verb phrase. In this way, the links on a
diagram can be read as English sentences, with the upstream node as the
subject and the downstream node as the direct object of the sentence.
There are many applications available that could be used for drawing these
diagrams, not all of which directly acknowledge their support for concept
maps in particular.
(Examples:Axon
, SMART Ideas concept-mapping, Mind Pad, and MindFull)
Note that a concept map is virtually identical to the notion of a
"semantic network
," which has served as a cornerstone for
much artificial intelligence work since its inception. Semantic networks,
too, are directed graphs in which the nodes represent concepts and labeled
edges the relationships between them. Much psychology research has
strengthened the idea that the human mind internalizes knowledge in
something very like this sort of framework. This likely explains the ease
with which concept mapping techniques have been adopted by the uninitiated,
since concept maps and semantic networks can be considered equivalent.
University of Strathclyde, uses the same data model as does concept
mapping, but with a new set of techniques. In cognitive maps, element names
have two parts, separated by an ellipsis that is read "as opposed to" in
order to further clarify the semantics of the node. ("Cold...hot" is
different than "cold...freezing," for example.) Links are of three types -
causal, temporal, connotative - the first of which is the most common and
is read as "may lead to." Generally cognitive mapping is best suited to
domains involving arguments and decision making. Cognitive mapping is not
nearly as widespread as the other two paradigms; the premier design
application is Decision Explorer. Together, these and related methods have
brought into the mainstream the idea of breaking down knowledge into its
fundamental elements, and representing them graphically. Students and
workers from widely diverse backgrounds have experienced success in better
articulating and examining their own knowledge, and in discovering how it
relates to what else they know. Although architectural considerations
prevent any of these tools from functioning as bona fide PKBs, the ideas
they have contributed to a front-end interface mechanism cannot be
overestimated.
reference Vannevar Bush's article as
the cornerstone of their heritage. Hence the development of hypertext
techniques, while seldom applied specifically towards PKB solutions, is
important. There have basically been three types of hypertext systems:
those that exploit features of non-linear text to create a dynamic, but
coherent "hyperdocument"; those
that prescribe ways of linking existing documents together for navigation
and expression of affinities; and those that use the hypertext model specifically
to model abstract knowledge. Though the first and especially
the second category have dominated research efforts (and public enthusiasm)
over the past several decades, it is this third class that is closest in
spirit to the original vision of hypertext by its founders.
In a similar vein to Bush
, Doug Engelbart's focus was to
develop computer systems to "help people think better." He sought data
models that more closely paralleled the human thought process, and settled on using hypertext as a way
to represent and store abstract human knowledge. Although his
"Augment
" system underwent many changes, the original
purpose closely aligned with that of PKBs.
More recently, Randall Trigg's TextNet and
NoteCards
systems further explored this idea.
TextNet revolved around "primitive pieces of text connected with typed
links to form a network similar in many ways to a semantic
network." Though text-centric, it was clear that Trigg's goal
was to model the associations between primitive ideas and hence to reflect
the mind's understanding. "By using...structure, meaning can be extracted
from the relationships between chunks (small pieces of text) rather than
from the words making them up." The subsequent
NoteCards
effort was similarly designed to "formulate, structure, compare, and
manage ideas." It was useful for "analyzing information, constructing
models, formulating arguments, designing artifacts, and generally
processing ideas."
Conklin and Begeman's gIBIS system was another early effort into true
knowledge representation, specifically for the field of design
deliberations and arguments. The project lived on
in the later project QuestMap and the more modern
Compendium
, which has been
primarily used for capturing group knowledge expressed in face-to-face
meetings. In all these cases, systems use semantic hypertext in an attempt
to capture shared knowledge in its most basic form. Other examples of
knowledge-based hypertext tools include Mental Link,
Aquanet, and SPRINT, as well
as a few current commercial tools such as PersonalBrain
and Tinderbox
.
allow a user to create
snippets of text and then organize or categorize
them in some way. These tools can be used to form PKBs that are comprised of such text snippets.
Most of these tools are based on a
tree
hierarchy, in which the user can write pages
of notes and then organize them into sections and subsections. (As, for
instance, with HogBay,OneNote,
CircusPonies, and AquaMinds.) The higher
level sections or chapters often receive a colored tab exactly as a
physical three-ring notebook might. Other designers eschew the tree model
for a more flexible category-based approach (e.g., Agenda, Personal Knowbase
, Zoot) (see section data
models). The primary purpose of all these tools is to offer the
benefits of freeform note-taking with none of the deficiencies: users are
free to brainstorm and jot down anything from bullet points to polished
text, while still being able to search, rearrange, and restructure the
entire notebook easily.
An important subcategory of note-taking tools is outliners (e.g., OmniOutliner
), or applications specifically designed to organize ideas in a
hierarchy. These tools typically show a two-pane display with a tree-like
navigation widget in the left-pane and a list of items in the right-pane.
Topics and subtopics can be rearranged, and each outline stored in its own
file. Among the first applications of this kind were TreePad and Dave Winer's ThinkTank. Modern outliners feature the ability to
add graphics and other formatting to an item, and even hyperlinks to
external websites or documents. The once abandoned (but now
resurrected) Ecco system was among the first to allow items
to have typed attributes, displayed in columns. This gives the effect of a
custom spreadsheet per topic, with the topic's items as rows and the
columns as attributes. It allows the user to gracefully introduce structure
to their information as it is identified.
Of particular interest are applications optimized for subsuming portions of
an information space realm into a PKB, where they can be clustered and
arranged according to the user's own perceptions. The Virtual Notebook
System (VNS) was one of the first to emphasize this.
VNS was designed for sharing information among scientists at the Baylor
College of Medicine; a user's "personal notebook” could make references to
specific sections of a "community notebook," and even include arbitrary
segments of other documents through a cut-and-paste mechanism. More
recently, YellowPen, Cartagio, and
Hunter-Gatherer are tools that allow one to easily
grab snippets of Web pages and then organize them subjectively.
organize documents, rather than personal knowledge derived from those documents. Such systems do not encode
subjective knowledge per se, but they do create a personal knowledge base
of sorts by allowing users to organize and cross-reference their
information artifacts.
These efforts provide alternative indexing
mechanisms to the limited "directory path and file name" approach.
Presto replaces the directory hierarchy entirely with
attributes that users assign to files. These key-value pairs represent
user-perceived properties of the documents, and are used as a flexible
means for retrieval and organization. William Jones' Memory Extender was similar in spirit, but it dynamically varied the "weight" of a
file’s keywords according to the user’s context and perceived access
patterns. In Haystack, users - in conjunction with
automated software agents - build a graph-based network of associative
links through which documents can be retrieved.
Metadata and multiple
categorization can also be applied to provide multiple retrieval paths
customized to the way the individual thinks and works with their
information sources. WebTop allowed the user to create
explicit links between documents, but then also merged these user-defined
relationships with other types of associations. These included the
hyperlinks contained in the documents, associations implied by structural
relationships, and content similarities discovered by text analysis. The
idea was that any way in which items can be considered "related" should be
made available to the user for help with retrieval.
A subclass of these systems integrate the user's personal workspace with a
search facility, blurring the distinction between information retrieval and
information organization. SketchTrieve,
DLITE, and Garnet each
materialized elements from the retrieval domain (repositories, queries,
search results) into tangible, manipulatable screen objects. These could be
introduced directly into a spatial layout that also included the
information sources themselves. These systems can be seen as combining a
spatial hypertext interface as in VIKI with
direct access to digital library search facilities. NaviQue was largely in the same vein, though it incorporated a powerful
similarity engine to proactively aid the user in organization.
CYCLADES let users organize Web pages into
folders, and then attempted to infer what each folder "means" to that user,
based on a statistical textual analysis of its contents. This helps users
locate other items similar to what's already in a folder, learn what other
users have found interesting and have grouped together, etc.
All of these document management systems are principally concerned with
organizing objective information sources rather than the expression of
subjective knowledge. Yet their methods are useful to consider with respect
to PKB systems, because such a large part of our knowledge is comprised of
things we remember, assimilate, and repurpose from objective sources.
Search environments like SketchTrieve, as well as snippet gatherers like
YellowPen, address an important need in knowledge management: bridging the
divide between the subjective and objective realms, so that the former can
make reference to and bring structure to the latter.
can be classified as follows:
important of which is the underlying data model they support. This is what
prescribes and constrains the nature of the knowledge they can contain:
what types of knowledge elements are allowed, how they can be structured,
and how the user perceives them and can interact with them.
Three aspects of data models can be identified: the structural framework, which prescribes rules
about how knowledge elements can be structured and interrelated, the knowledge
elements themselves, or basic building blocks of information that a user creates
and works with; , and
schema, which involves the level of formal semantics introduced into
the data model.
prominent PKB systems.
model allow knowledge elements to be organized
into a containment hierarchy, in which each element has one and only one
"parent." This takes advantage of the mind's natural tendency to classify
objects into groups, and to further break up each classification into
subclassifications. It also mimics the way that a document can be broken up
into chapters, sections, and subsections. It tends to be natural for users
to understand.
All of the applications for creating Buzan mind maps
are based on a tree model, because a mind map is a tree.
Each mind map has a "root" element in
the center of the diagram (often called a "main topic") from which all
other elements emanate as descendents. Every knowledge element has one and
only one place in this structure.
Some tools, such as MindManager
, extend this paradigm by introducing "floating topics," which are not
anchored to the hierarchy, and permitting "crosslinks" to arbitrary topics,
similar to those in concept maps.
Other examples of tree-based systems are most personalized search
interfaces, outliners,OmniOutliner
, TreePad and
most of the "notebook-based" note-taking systems (such as AquaMinds or OneNote). By allowing them to partition their notes into sections
and subsections, note-taking tools channel users into a tree hierarchy. In
recognition of this confining limitation, many of these tools also permit a
kind of "crosslink" between items (such as Micro Logic, or MyBase),and/or
employ some form of transclusion (see below) to allow items to co-exist in
several places (such as Zoot or StickyBrain). The dominant paradigm in such
tools, however, remains the simple parent-child hierarchy.
interconnect them in arbitrary ways. The elements of a graph are
traditionally called "vertices," and connected by "arcs," though the
terminology used by graph-based systems varies widely (see Table 1) and the
hypertext community normally uses the terms "nodes" and "links." There are
no restrictions on how many arcs one vertex can have with others, no notion
of a "parent/child" relationship between vertices (unless the user chooses
to label an arc with those semantics), and normally no "root" vertex. In
many systems, arcs can optionally be labeled with a word or phrase
indicating the nature of the relationship, and adorned with arrowheads on
one or both ends to indicate navigability. (Neither of these adornments is
necessary with a tree, since all relationships are implicitly labeled
"parent/child" and are navigable from parent to child.) Note that a graph
is a more general form of a tree, and hence a
strictly more powerful form of expression.
This model is the defining characteristic of hypertext systems including many of those used for document management. It is also the underpinning
of all concept-mapping tools, whether they actually acknowledge the name
"concept maps" or advertise
themselves simply as tools to draw knowledge diagrams (such as Easy-Mapping-Tool or OmniOutliner
). As mentioned previously, graphs draw their power from
the fact that humans are thought to model knowledge as graphs (or
equivalently, semantic networks) internally. In fact, it could be argued
that all human knowledge can be ultimately reduced to a graph of some kind,
which argues strongly for its sufficiency as a structural framework.
An interesting aspect of graph-based systems is whether or not they require
a fully connected
graph. A
fully connected graph is one in which every vertex can be reached from any
other by simply performing enough arc traversals. There are no "islands" of
vertices that are severed from each other. Most graph-based tools allow
non-fully-connected graphs: knowledge elements are added to the system,
and connected arbitrarily to each other, without constraint. But a few
tools, such as PersonalBrain
and Compendium
, actually require a single network of information in which every
knowledge element must be indirectly connected to every other. If one
attempts to remove the last link that connects a body of nodes to the
original root, the severed elements are either “forgotten” or else moved to
a deleted objects heap where they can only be accessed by restoring a
connection to the rest of the graph.
Some hypertext systems add precision to the basic linking mechanism by allowing nodes to
reference not only other nodes, but sections within nodes. This ability is especially useful if the nodes themselves
contain sizeable content, and also for PKB elements making reference to
fragments of objective sources.
advantages in their own right: simplicity, familiarity, ease of navigation,
and the ability to conceal details at any level of abstraction. Indeed, the
problem of “disorientation” in hypertext navigation largely disappears with the tree model; one is never
confused about “where one is” in the larger structure, because traversing
the parent hierarchy gives the context of the larger surroundings. For this
reason, several graph-based systems have incorporated special support for
trees as well, to combine the advantages of both approaches. For instance,
in concept mapping techniques, a generally hierarchical paradigm is
prescribed, after which users are encouraged to identify “crosslinks”
between distant concepts. Similarly, some systems using the mind mapping
paradigm permit arbitrary relationships between nodes.
One of the earliest systems to combine tree and graph primitives was
TEXTNET, which featured two types of nodes: “chunks”
(which contained content to be browsed and organized) and “table of
contents” nodes (or “tocs”.) Any node could freely link to any other,
permitting an unrestricted graph. But a group of tocs could be combined to
form a tree-like hierarchy that bottomed out in various chunk nodes. In
this way, any number of trees could be superimposed upon an arbitrary
graph, allowing it to be viewed and browsed as a tree, with all the
requisite advantages. Strictly speaking, a network of tocs formed a
DAG
rather than a tree. This
means that a “chunk” could be represented in multiple places in the
tree, if two different traversal
paths ended up referring to the same chunk. Note that a
DAG is essentially the result of applying transclusion to the tree model.
This is also true of NoteCards. NoteCards offered a similar
mechanism, using “FileBoxes” as the tree component that was overlaid upon
the semantic network of notecards.
Brown University’s IGD project explored
various ways to combine and display unrestricted graphs with hierarchy, and
used a visual metaphor of spatial containment to convey both graph and tree
structure. Their notion of “link inheritance” simplifies the
way in which complex dual structures are displayed while still faithfully
depicting their overall trends. Commercially, both PersonalBrain
and Multicentrix provide explicit support for parent/child
relationships in addition to arbitrary connections between elements,
allowing tree and graph notions to coexist. Some note-taking tools, while
essentially tree-based, also permit crosslinks between notes (such as Micro Logic and MyBase).
instead spatial positioning as the sole organizational paradigm.
Capitalizing on the human’s tendency to implicitly organize through
clustering, making piles, and spatially arranging, some tools offer a 2D
workspace for placing and grouping items. This provides a less formal (and
perhaps less intimidating) way for a user to gradually introduce structure
into a set of items as it is discovered.
This approach originated from the spatial hypertext community, demonstrated
in various projects, and VIKI/VKB
With these programs, users
place information items on a canvas and can manipulate them to convey
organization imprecisely. Some project could infer the structure from a user’s freeform layout:
a spatial parser examines which items have been clustered together,
colored or otherwise adorned similarly, etc., and makes judgments about how
to turn these observations into machine-processible assertions. While others (Pad ) allowed users to view different objects
in varying levels of detail as they panned around the workspace.
Certain note-taking tools OneNote
combine an overarching tree structure with spatial freedom on each “frame” or “page.”
Users can access a particular page of the notebook with basic search or
tree navigation facilities, and then lay out notes and images on the page
as desired. Many graph-based approaches (such as concept mapping tools) also
allow for arbitrary spatial positioning of elements. This allows both kinds
of relationships to be expressed: explicit links and less formal expression
through creative use of the screen.
terms of their relationships to other elements (as with a tree or graph),
items are simply grouped together in one or more categories, indicating
that they have something in common. This scheme is based on the branch of
pure mathematics called set theory
, in which each of a body
of objects either has, or does not have, membership in each of some number
of sets. There is normally no restriction as to how many different
categories a given item can belong to, as is the case with mathematical
sets.
Users may think of categories as collections, in which the category somehow
encloses or “owns” the items within it. Indeed, some systems depict
categories in this fashion, such as the Vista interface where icons standing for documents are enclosed within ovals that
represent categories. This is merely a convention of display, however, and
it is important to note that fundamentally, categories are the same as
simple keywords.
The most popular application to embrace the category approach was the
original Agenda. All information retrieval in Agenda was performed in terms of
category membership. Users specified queries that were lists of categories
to include (or exclude), and only items that satisfied those criteria were
displayed. Agenda was
particularly sophisticated in that the categories themselves formed a tree
hierarchy, rather than a flat namespace. Assigning an item to a category
also implicitly assigned it to all ancestors in the hierarchy.
Personal Knowbase
is a more modern commercial product based
solely on a keyword (category) paradigm, though it uses a simple flat
keyword structure rather than an inheritance hierarchy like Agenda.
Haystack and Chandler
are other information
management tools which use categorization in important ways. William Jones’
Memory Extender took an artificial intelligence twist on the
whole notion of keywords/categories, by allowing an item’s keywords to be
weighted, and adjusted over time by both the user and the system. This
allowed the strength of category membership to vary dynamically for each of
an item’s assignments, in an attempt to yield more precise retrieval.
as the principal means of organization and retrieval of personal documents.
In Fertig et al.’s own words:
Documents are thus always ordered and accessed chronologically.
Metadata-based queries on the collection produce “substreams,” or
chronologically ordered subsets of the original documents. The rationale
for time-based ordering is that “time is a natural guide to experience; it
is the attribute that comes closest to a universal skeleton-key for stored
experience.” Whether chronology is our
principal or even a common natural coding mechanism psychologically can be
debated. But since any PKB system can easily create such an index,
it seems worthwhile to follow Lifestreams’ lead and allow the user to sort
and retrieve based on time, as many systems have done. If nothing else, it relieves the user from
having to create names for knowledge elements, since the timestamp is
always an implicit identifying mark. PlanPlus, based on
the Franklin-Covey planner system, is also chronologically modeled, and a
number of products based on other data models (e.g. CircusPonies) offer chronological indexing in addition to their core
paradigm.
Aquanet went far beyond the traditional node-link
graph model. Knowledge expressed in Aquanet is centered around
“relations,” or n-ary links between objects in which the semantics of each
participant in the relation is specified by the relation type. Each type of
relation specifies a physical display (i.e., how it will be drawn on the
screen, and the spatial positioning of each of its participants), and a
number of “slots” into which participants can be plugged in. Each
participant in a relation can be either a base object, or another relation.
Users can thus define a schema of relation types, and then build a complex
semantic model out of relations and objects. Since relation types can be
specified to associate any number of nodes (instead of just two, as in the
graph model), this potentially allows more complex relationships to be
expressed.
It should be noted, however, that the same effect can be
achieved in the basic graph model by simply taking the n-ary relations and
“reifying” them (i.e., turning them into nodes in their own right.) For
instance, suppose we define a relation type “assassination,” with slot
types of “assassin,” “victim,” “location,” and “weapon.” We could then
create a relation based on this type where the participants are “John
Wilkes Booth,” “Abraham Lincoln,” “Ford’s Theatre,” and “derringer.” This
allows us to express a complex relationship between multiple objects in
Aquanet. But we can express the same knowledge with the basic graph model
by simply creating a node called “Lincoln’s assassination” and then
creating typed links between that node and the other four labeled
“assassin,” “victim,” etc. Aquanet’s biggest achievement in this area is
the ability to express the schema of relation types, so that the types of
objects an “assassination” relation can connect are consistent and
enforced.
of, and what kind of internal structure, if any, they possess:
types and introduce structure to aspects of the data model. It is a form of
metadata whereby more precise semantics can be applied to various elements
of the system. This facilitates more formal knowledge expression, ensures
consistency across items of the same kind, and can better allows automated
agents to process the information.
Both knowledge elements, and links, can contain various aspects of schema.
In a PKB, a "type system
" allows users to specify that a knowledge element is a member of a specific class or category or items, to provide a built-in method of organization and retrieval. Generally speaking, systems can make knowledge elements untyped, rigidly
typed, or flexibly typed. In addition, they can incorporate some notion of
inheritance among elements and their types. Note the distinction between
types and categories here. A category-based scheme, typically allows any
number of categories/keywords to be assigned to an item. There are two
differences between this and the notion of type. First, items are normally
restricted to being of a single type, and this usually indicates a more
intrinsic, permanent property of an item than simply its presence in a
category collection. (For example, one could imagine an item called “XYZ
Corporation” shifting into and out of categories like “competitors, ”
“overseas distributors,” or “delinquent debtors” over time, but its core
type of “company” would probably be static for all time.) Second, types
often carry structural specifications with them: if an item is of a given
type, this means it will have values for certain attributes appropriate to
that type. Note that some systems that do not allow typing offer the
ability to approximate this function through categories. (e.g.,
OneNote, MindManager
).
Untyped elements are typical among informal knowledge capture tools, since
they are designed to stimulate brainstorming and help users discover their
nascent mental models. These tools normally want to avoid forcing the user
to commit to structure prematurely. Most mind mapping and many concept
mapping tools are in this category: a concept is simply a word or phrase,
with no other semantic information (e.g., Visual Mind
).
Note-taking tools also usually take this approach, with all units of
information being of the same type “note.”
At the other extreme are tools which, like older relational database
technology, require all items to be declared as of a specific type when
they are created. Often this type dictates the internal structure of the
element. These tools are better suited to domains in which the structure
of knowledge to be captured is predictable, well-understood, and known in
advance. For PKB systems, they are probably overly restrictive. KMap and Compendium are examples of tools that allow (and
require) each item to be typed; in their case, the type controls the visual
appearance of the item, rather than any internal structure.
In between these two poles are systems that permit typed and untyped
elements to co-exist (e.g. AquaMinds). NoteTaker is such a
product; it holds simple free-text pages of notes, without any structure,
but also lets the user define “templates” with predefined fields that can
be used to instantiate uniformly structured forms. TreePad has a similar
feature. Some other systems blur the distinction between typed and untyped,
allowing the graceful introduction of structure as it is discovered.
VKB, for example, supports an elegant, flexible
typing scheme, well suited to PKBs. Items in general consist of an
arbitrary number of attribute/value pairs. But when consistent patterns
emerge across a set of objects, the user can create a type for that group,
and with it a list of expected attributes and default values. This
structure can be selectively overridden by individual objects, however,
which means that even objects assigned to a particular type have flexible
customization available to them. Tinderbox offers an alternate way of
achieving this flexibility, as described below.
Finally, the object-oriented notion of type inheritance
is available in a
few solutions. The different card types in NoteCards are arranged into an
inheritance hierarchy, so that new types can be created as extensions of
old. Aquanet extends this to multiple inheritance among types; the “slots”
that an object contains are those of its type, plus those of all
supertypes. SPRINT and Tinderbox also use a frame-based approach, and allow
default values for attributes to be inherited from supertypes. This way, an
item need not define values for all its attributes explicitly: unless
overridden, an item’s slot will have the shared, default value for all
items of that type.
In addition to the structure that is controlled by an item’s
type, other forms of metadata and schema can be applied to knowledge
elements.
allow some form of information to be attached to the links that connect
them.
In most of the early hypertext systems, links were unnamed and untyped,
their function being merely to associate two items in an unspecified
manner. The mind mapping paradigm also does not name links, but for a
different reason: the implicit type of every link is one of
generalization/specialization, associating a topic with a subtopic. Hence
specifying types for the links would be redundant, and labeling them would
clutter the diagram.
Concept mapping prescribes the naming of links, such that the precise
nature of the relationship between two concepts is made clear. As mentioned
above, portions of a concept map are meant to be read as English sentences,
with the name of the link serving as a verb phrase connecting the two
concepts. Numerous systems thus allow a word or phrase to decorate the
links connecting elements, for instance Cmap and
Inspiration.
Named links can be distinguished from typed links, however. If the text
attached to a link is an arbitrary string of characters, unrelated to that
of any other link, it can be considered the link name. Some systems,
however, encourage the re-use of link names that the user has defined
previously. In PersonalBrain
, for instance, before
specifying the nature of a link, the user must create an appropriate “link
type” (associated with a color to be used in presentation) in the
system-wide database, and then assign that type to the link in question.
This promotes consistency among the names chosen for links, so that the
same logical relationship types will hopefully have the same tags
throughout the knowledge base. This feature also facilitates searches based
on link type, among other things. Other systems, especially those suited
for specific domains such as decision modeling (gIBIS and Decision Explorer), predefine a set of link types that
can be assigned (but not altered) by the user.
Some more advanced systems allow links to bear attribute/value pairs
themselves, and even embedded structure, similar to those of the items they
connect. In Haystack this is the case, since links
(“ties”) and nodes (“needles”) are actually defined as subtypes of a common
type (“straw.”)
KMap similarly defines a link as a subclass of node, which
allows links to represent n-ary relationships between nodes, and enables
recursive structure within a link itself. It is unclear how much value this
adds in knowledge modeling, or how often users take advantage of such
a feature. Neptune and Intermedia are two older systems that also support attributes for links,
albeit in a simpler manner.
Another aspect of links that generated much fervor in the early hypertext
systems was that of link precision: rather than merely connecting one
element to another, systems like Intermedia defined anchors within
documents, so that a particular snippet within a larger element could be
linked to another snippet. The Dexter model
covers this issue in detail. For PKB purposes, this seems to be most
relevant as regards links to the objective space, as discussed previously.
If the PKB truly contains knowledge, expressed in appropriately
fine-grained parts, then link precision between elements in the knowledge
base is much less of a consideration.
Note that this discussion on links has only considered connections between
knowledge elements in the system, where the system has total control over
both ends of the connection. As described in the previous section, numerous
systems provide the ability to “link” from a knowledge element inside the
system to some external resource: a file or a URL, say. These external
links typically cannot be enhanced with any additional information, and
serve only as convenient retrieval paths, rather than as aspects of
knowledge representation.
considerations. While not constraining the nature of what knowledge can be
expressed, the architecture nevertheless affects more mundane matters such
as availability and workflow. But even more importantly, the system’s
architecture determines whether it can truly function as a lifelong,
integrated knowledge store – the “base” aspect of the personal knowledge
base defined above.
storage mechanism based on flat files in a filesystem. This is true of
virtually all of the mind mapping tools (MindManager
),
concept mapping tools (Cmap, Axon
,
Inspiration), outliners (TreePad,
OmniOutliner
), and note-taking tools (OneNote
HogBay, Zoot), and even a number of hypertext
tools (NoteCards, Hypercard,
Tinderbox). Typically, the main “unit” of a user’s
knowledge design – whether that be a mind map, a concept map, an outline,
or a “notebook” – is stored in its own file somewhere in the filesystem.
The application can find and load such files via the familiar “File |
Open...” paradigm, at which point it typically maintains the entire
knowledge structure in memory.
The advantage of such a paradigm is familiarity and ease of use; the
disadvantage is a possibly negative influence on knowledge formulation.
Users must choose one of two basic strategies: either store all of their
knowledge in a single file; or else break up their knowledge and store it
across a number of different files, presumably according to subject matter
and/or time period. The first choice can result in scalability problems -
consider how much knowledge a user might collect over a decade, if they
stored things related to their personal life, hobbies, relationships,
reading materials, vacations, academic course notes, multiple work-related
projects, future planning, etc. It seems unrealistic to keep adding this
kind of volume to a single, ever-growing multi-gigabyte file. The other
option, however, is also constraining: each bit of knowledge can be stored
in only one of the files (or else redundantly, which leads to
synchronization problems), and the user is forced to choose this at
knowledge capture time.
reside in a global space, which allows any idea to relate to any other: now
a user can relate a book he read on productivity not only to other books on
productivity, but also to “that hotel in Orlando that our family stayed in
last spring,” because that is where he remembers having read the book.
Though such a relationship may seem “out of bounds” in traditional
knowledge organization, it is exactly the kind of retrieval path that
humans often employ in retrieving memories. The database architecture enables a
PKB to truly form an integrated knowledge base, and contain the full range
of relationships.
Agenda and gIBIS were two
early tools that subsumed a database backend in their architecture. More
recently, the MyLifeBits project uses Microsoft SQL
Server as its storage layer, and Compendium
interfaces
with the open source MySQL database. A few note-taking applications such
as StickyBrain also store information in an integrated
database rather than in user-named files. The only significant drawback to
this architectural choice (other than the modest footprint of the database
management system) is that data is more difficult to copy and share across
systems. This is one true advantage of files: it is a simple matter to
copy them across a network, or include them as an e-mail attachment, where
they can be read by the same application on a different machine. This
problem is solved by some of the following architectural choices.
achieve architectural flexibility. As with all client-server architectures,
the benefits include load distribution, platform interoperability, data
sharing, and ubiquitous availability. Increased complexity and latency are
among the liabilities, which can indeed be considerable factors in PKB
design.
One of the earliest and best examples of a client-server knowledge base was
the Neptune hypertext system. Neptune was
tailored to the task of maintaining shared information within software
engineering teams, rather than to personal knowledge storage, but the
elegant implementation of its “Hypertext Abstract Machine” (HAM) was a
significant and relevant achievement. The HAM was a generic hypertext
storage layer that provided node and link storage and maintained version
history of all changes. Application layers and user interfaces were to be
built on top of the HAM. Architecturally, the HAM provided distributed
network access so that client applications could run from remote locations
and still access the central store. Another, more recent example, is the
Scholarly Ontologies Project
whose ClaiMapper and ClaiMaker components form a similar distributed
solution in order to support collaboration.
These systems implemented a distributed architecture primarily in order to
share data among colleagues. For PKBs, the prime motive is rather user
mobility. This is a key consideration, since if a user is to store all of
their knowledge into a single integrated store, they will certainly need
access to it in a variety of settings. MyBase Networking EditionMyBase
is one example of how this might be achieved. A central server hosts the
user’s data, and allows network access from any client machine. Clients can
view the knowledge base from within the MyBase application, or through a
Web browser (with limited functionality.)
The Haystack project outlines a three-tiered
architecture, which allows the persistent store, the Haystack data model
itself, and the clients that access it to reside on separate machines. The
interface to the middle tier is flexible enough that a number of different
persistent storage models can be used, including relational databases,
semistructured databases, and object-oriented databases. Presto’s
architecture exhibits similar features.
in which the client system consists of nothing but a (possibly enhanced)
browser. This gives the same ubiquitous availability that client-server
approaches do, while minimizing (or eliminating) the setup and installation
required on each client machine.
KMap was one of the first knowledge systems to
integrate with the World Wide Web. It allowed concept maps to be shared,
edited, and remotely stored using the HTTP protocol. Concept maps were
still created using a standalone client application for the Macintosh, but
they could be uploaded to a central server, and then rendered in browsers
as “clickable GIFs.” Clicking on a concept within the map image in the
browser window would have the same navigation effect as clicking on it
locally inside the client application. Hunter-Gatherer, Cartagio, and NoteStar are more recent
browser-based systems that use proxies or browser plugins to achieve a
knowledge building workspace. The user’s knowledge expressions are stored
on a central server in nearly all cases, rather than locally on the
browser’s machine.
all of one’s personal knowledge on a PDA would solve the availability
problem, of course, and even more completely than would a client-server or
web-based architecture. The safety of the information is an issue, since
if the device were to be lost or destroyed, the user could face irrevocable
data loss; this is easily remedied, however, by periodically synchronizing
the device’s contents with a host computer.
Most handheld applications are simple note-taking software, with far fewer
features than their desktop counterparts. BugMe! is an
immensely popular note-taking tool that simply lets users enter text or
scribble onto “notes” (screenfulls of space) and then organize them in
primitive ways. Screen shots can be captured and included as graphics, and
the tool features an array of drawing tools, clip art libraries, etc. The
value add for this and similar tools is purely the size and convenience of
the handheld device, not the ability to manage large amounts of
information.
Perhaps the most effective use of a handheld architecture would be as a
satellite data capture and retrieval utility. A user would normally employ
a fully functional desktop application for personal knowledge management,
but when “on the go,” they could capture knowledge into a compatible
handheld application and upload it to their PKB at a later convenient time.
To enable mobile knowledge retrieval, either select information would need
to be downloaded to the device before the user needed it, or else a
wireless client-server solution could deliver any part of the PKB on
demand. This is essentially the approach taken by software like
KeySuite, which supplements a feature-rich desktop
information management tool (e.g. Microsoft Outlook
) by providing access to that
information on the mobile device. InfoSelect,Micro Logic a tree-
based note-taking application, also offers a mobile product.
of an individual. It differs from a traditional
database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
in that it contains subjective material particular to the owner,
that others may not agree with nor care about. Importantly, a PKB consists
primarily of knowledge, rather than information; in other words, it is not
a collection of documents or other sources an individual has encountered,
but rather an expression of the distilled knowledge the owner has extracted
from those sources.
Definition
The term personal knowledge base itself was coined in 2011 by StephenDavies of the University of Mary Washington
University of Mary Washington
The University of Mary Washington is a public, coeducational liberal arts college located in the city of Fredericksburg, Virginia, USA. Founded in 1908 by the Commonwealth of Virginia as a normal school, during much of the twentieth century it was part of the University of Virginia, until...
and has a tripartite
definition:
- personal: a PKB is intended for private use, and its contents are custom-tailored to the individual. It contains trends, relationships, categories, and personal observations that its owner perceives but which no one else may agree with. It can be shared, just as one can explain one's own opinion to a hearer, but it is not jointly owned by anyone else any more than explaining one's opinion to a friend causes the friend to own one's mind.
- knowledge: a PKB contains knowledge, not merely information. Its purpose is not simply to aggregate all the information sources one has seen, but to preserve the knowledge that one has learned from those sources. When a user returns to a PKB to retrieve knowledge she has stored, she is not merely pointed back to the original documents, where she must relocate, reread, reparse, and relearn the relevant passages. Instead, she is returned to the distilled version of the particular truth she is seeking, so that the mental model she originally had in mind can be easily reformed.
- base: a PKB is a consolidated, integrated knowledge store. It is a reflection of its owner's memory, which, as Bush and many others have observed, can freely associate any two thoughts together, without restriction. Hence a PKB does not attempt to partition a user's field of knowledge into multiple segments that cannot reference one another. Rather, it can connect any two concepts without regard for artificial boundaries, and acts as a single, unified whole.
Contrast with other classes of systems
The following classes of systems cannot be classified as PKBs:- collaborative efforts to build a universal objective space (as opposed to an individual's personal knowledge.) The World Wide Web itself is in this category, as were its predecessors HyperTIES and Xanadu, Web categorization systems like the Open Directory ProjectOpen Directory ProjectThe Open Directory Project , also known as Dmoz , is a multilingual open content directory of World Wide Web links. It is owned by Netscape but it is constructed and maintained by a community of volunteer editors.ODP uses a hierarchical ontology scheme for organizing site listings...
, and collaborative information collections like WikipediaWikipediaWikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
.
- search systems like Enfish and the Stuff I’ve Seen project that index and search one’s information sources on demand, but do not give the user the ability to craft and express personal knowledge.
- tools whose goal is to produce a design artifact rather than to maintain knowledge for its own sake. Systems like ART and Writing Environment use intermediate knowledge representations as a means to an end, abandoning them once a final artifact has been produced, and hence are not suitable as PKBs.
- systems that focus on capturing transient information, rather than archiving knowledge that has long-term value. Examples would be Web logsn and e-diaries. Tools whose information domain is mostly limited to time management tasks (calendars, action items, contacts, etc.) rather than “general knowledge.” Blandford and Green and Palen give excellent surveys; common commercial examples would be Microsoft OutlookMicrosoft OutlookMicrosoft Outlook is a personal information manager from Microsoft, available both as a separate application as well as a part of the Microsoft Office suite...
, Lotus Notes, and Novell Evolution.
- similarly, tools developed for a specific domain, such as bibliographic research rather than for “general knowledge” (examples: ndxCards and Citation).
Personal Information Management
PKM is similar to personal information managementPersonal information management
Personal information management refers to the practice and the study of the activities people perform in order to acquire, organize, maintain, retrieve and use information items such as documents , web pages and email messages for everyday use to complete tasks and fulfill a person’s various...
, but is a distinct topic based on the "information" vs. "knowledge" difference. PKB's are about recording and managing the knowledge one derives from documents, whereas PIM is more about managing and retrieving the documents themselves.
Historical influences
Non-electronic personal knowledge bases have probably existed in some form since the dawn ofwritten language:
Davinci's notebooks are a famous example. More commonly, card files and personal
annotated libraries have served this function in the pre-electronic age.
Bush's Memex
Undoubtedly the most famous early formulation of an electronic PKB wasVannevar Bush
Vannevar Bush
Vannevar Bush was an American engineer and science administrator known for his work on analog computing, his political role in the development of the atomic bomb as a primary organizer of the Manhattan Project, the founding of Raytheon, and the idea of the memex, an adjustable microfilm viewer...
's description of the "Memex
Memex
The memex is the name given by Vannevar Bush to the hypothetical proto-hypertext system he described in his 1945 The Atlantic Monthly article As We May Think...
" in 1945. Bush surveyed the
post-World-War-II landscape and laid out what he viewed as the most
important forthcoming challenges to humankind in The Atlantic Monthly
The Atlantic Monthly
The Atlantic is an American magazine founded in Boston, Massachusetts, in 1857. It was created as a literary and cultural commentary magazine. It quickly achieved a national reputation, which it held for more than a century. It was important for recognizing and publishing new writers and poets,...
. The Memex was a theoretical (never implemented) design for a
system to help tackle the information overload
Information overload
"Information overload" is a term popularized by Alvin Toffler in his bestselling 1970 book Future Shock. It refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information...
problem, already
formidable in 1945. In Bush's own words:
Consider a future device for individual use, which is a sort
of mechanized private file and library. ... [A] device in which an individual stores
all his books, records, and communications, and which is mechanized so that
it may be consulted with exceeding speed and flexibility. It is an enlarged
intimate supplement to his memory.
Bush envisioned collaborative aspects as well, and even a world-wide system
that scientists could freely consult. But an important emphasis throughout
the article was on expanding our own powers of recollection: "Man needs to
mechanize his record more fully," he says, if he is not to "become bogged
down...by overtaxing his limited memory." With the Memex, the user could
"add marginal notes and comments," and "build a trail of his interest"
through the larger information space. She could share trails with friends,
identify related works, and create personal annotations. Bush's Memex would give each individual the ability to create, categorize, classify, and relate his own set of information
corresponding to his unique personal viewpoint. Much of that information
would in fact consists of bits and pieces from public documents, just
as the majority of the knowledge inside our own heads has been imbibed from
what we read and hear. But the Memex also allowed for the specialized
recording of information that each individual perceived and needed to
retain. The idea of "supplementing our memory" was not a one-size-fits-all
proposition, since no two people have the same interests, opinions, or
memories. Instead, it demanded a subjective expression of knowledge, unique
to each individual.
Graphical knowledge capture tools
Great emphasis is placed on the pictorial nature ofdiagrams to represent abstract
knowledge; the use of spatial layout, color, and images is said to
strengthen understanding and promote creativity. Each of the three primary
schools - mind mapping
Mind map
A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Especially in British English, the terms spidergram and spidergraph are more common, but they can cause confusion with the term spider diagram used in mathematics...
, concept mapping
Concept map
For concept maps in generic programming, see Concept .A concept map is a diagram showing the relationships among concepts. It is a graphical tool for organizing and representing knowledge....
, and cognitive mapping
Cognitive map
Cognitive maps are a type of mental processing composed of a series of psychological transformations by which an individual can acquire, code, store, recall, and decode information about the relative locations and attributes of phenomena in their everyday or metaphorical spatial environment.The...
- prescribes
its own data model and procedures, and each boasts a number of software
applications designed specifically to create compatible diagrams.
Mind mapping
Mind mappingMind map
A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Especially in British English, the terms spidergram and spidergraph are more common, but they can cause confusion with the term spider diagram used in mathematics...
was promoted by pop psychologist Tony Buzan
Tony Buzan
Anthony "Tony" Peter Buzan is an author and educational consultant. He is a proponent of the techniques of Mind Mapping and mental literacy. He claims to have worked with "corporate entities and businesses all over the world; academics; Olympic athletes; children of all ages; governments; and...
in the 1960's, and
commands the allegiance of an impressive number of adherents worldwide. A
mind map is essentially nothing more than a visual outline, in which a main
idea or topic is written in the center of the diagram, and subtopics
radiate outwards in increasing levels of specificity. The primary value is
in the freeform, spatial layout (rather than a sequential, numbered
outline), the ability for a software application to hide or reveal select
levels of detail, and as mentioned above, graphical adornments. The basic
data model is a tree
Tree (graph theory)
In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...
, rather than a
graph
Graph (mathematics)
In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...
, with all edges implicitly
labeled "supertopic/subtopic." Numerous tools are available for
constructing mind maps. (Examples: Freemind
FreeMind
FreeMind is a free mind mapping application written in Java. FreeMind is licensed under the GNU General Public License. It provides extensive export capabilities. It runs on Microsoft Windows, Linux and Mac OS X via the Java Runtime Environment....
, MindMapper
MindMapper
MindMapper is a Windows-based visual mapping software developed by SimTech Systems, that allows users to create mind maps, concept maps, flow charts, organizational charts, process maps, Gantt charts and Ishikawa diagrams. MindMapper offers integration with Microsoft Office and Unicode support...
, MindGenius
MindGenius
MindGenius is a mind mapping software application, allowing the user to capture large amounts of disparate information and view the resulting content from different viewpoints. It has been available commercially since 2001...
, VisiMap, MindManager
MindManager
MindManager, called MindMan until version 3.5, is a commercial mind mapping software application developed by Mindjet Corporation. Mind maps created in MindManager are based on the mind mapping method by Tony Buzan. The latest version, MindManager 9, is available for Microsoft Windows...
, NovaMind
NovaMind
NovaMind is a commercial mind mapping application for Mac OS X and Microsoft Windows. It was first released in 2002.It features flexible layout, native interface for both Mac and Windows , and a strong emphasis on the visual aspects of Mind Mapping...
, HeadCASE, ConceptDraw MINDMAP, and Visual Mind
Visual Mind
Visual Mind is mind mapping software that allows users to capture and organize information in a visual manner. The result is electronic "mind maps" that provides both overview and details in the same view. Earlier versions of Visual Mind were primarily targeted to single users...
)
Concept mapping
Concept mapsConcept map
For concept maps in generic programming, see Concept .A concept map is a diagram showing the relationships among concepts. It is a graphical tool for organizing and representing knowledge....
were developed by Cornell Professor Joseph
Novak
Joseph D. Novak
Joseph Donald Novak is an American educator, and Professor Emeritus at the Cornell University, and Senior Research Scientist at IHMC. He is known for his development of concept mapping in the 1970s.- Biography :...
, and based on David Ausubel
David Ausubel
David Paul Ausubel was an American psychologist born in New York. His most significant contribution to the fields of educational psychology, cognitive science, and science education learning, was on the development and research on ....
's
assimilation theory of learning. An essential tenet is that
newly encountered knowledge must be related to one’s prior knowledge in
order to be properly understood. Concept maps help depict such connections
graphically. Like mind maps, they feature evocative words or phrases in
boxes connected by lines. There are two principal differences, however:
first, a concept map is properly a graph, not a tree, permitting arbitrary
links between nodes rather than only parent/child relationships; and
second, the links are labeled to identify the nature of the inter-concept
relationship, typically with a verb phrase. In this way, the links on a
diagram can be read as English sentences, with the upstream node as the
subject and the downstream node as the direct object of the sentence.
There are many applications available that could be used for drawing these
diagrams, not all of which directly acknowledge their support for concept
maps in particular.
(Examples:Axon
Axon Idea Processor
The Axon Idea Processor is a commercial Windows-based program that helps users visualize and process interrelated thoughts and ideas.-Development:...
, SMART Ideas concept-mapping, Mind Pad, and MindFull)
Note that a concept map is virtually identical to the notion of a
"semantic network
Semantic network
A semantic network is a network which represents semantic relations among concepts. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges.- History :...
," which has served as a cornerstone for
much artificial intelligence work since its inception. Semantic networks,
too, are directed graphs in which the nodes represent concepts and labeled
edges the relationships between them. Much psychology research has
strengthened the idea that the human mind internalizes knowledge in
something very like this sort of framework. This likely explains the ease
with which concept mapping techniques have been adopted by the uninitiated,
since concept maps and semantic networks can be considered equivalent.
Cognitive mapping
Cognitive mapping, developed by Fran Ackermann and Colin Eden at theUniversity of Strathclyde, uses the same data model as does concept
mapping, but with a new set of techniques. In cognitive maps, element names
have two parts, separated by an ellipsis that is read "as opposed to" in
order to further clarify the semantics of the node. ("Cold...hot" is
different than "cold...freezing," for example.) Links are of three types -
causal, temporal, connotative - the first of which is the most common and
is read as "may lead to." Generally cognitive mapping is best suited to
domains involving arguments and decision making. Cognitive mapping is not
nearly as widespread as the other two paradigms; the premier design
application is Decision Explorer. Together, these and related methods have
brought into the mainstream the idea of breaking down knowledge into its
fundamental elements, and representing them graphically. Students and
workers from widely diverse backgrounds have experienced success in better
articulating and examining their own knowledge, and in discovering how it
relates to what else they know. Although architectural considerations
prevent any of these tools from functioning as bona fide PKBs, the ideas
they have contributed to a front-end interface mechanism cannot be
overestimated.
Hypertext systems
Many in the hypertext communityreference Vannevar Bush's article as
the cornerstone of their heritage. Hence the development of hypertext
techniques, while seldom applied specifically towards PKB solutions, is
important. There have basically been three types of hypertext systems:
those that exploit features of non-linear text to create a dynamic, but
coherent "hyperdocument"; those
that prescribe ways of linking existing documents together for navigation
and expression of affinities; and those that use the hypertext model specifically
to model abstract knowledge. Though the first and especially
the second category have dominated research efforts (and public enthusiasm)
over the past several decades, it is this third class that is closest in
spirit to the original vision of hypertext by its founders.
In a similar vein to Bush
Vannevar Bush
Vannevar Bush was an American engineer and science administrator known for his work on analog computing, his political role in the development of the atomic bomb as a primary organizer of the Manhattan Project, the founding of Raytheon, and the idea of the memex, an adjustable microfilm viewer...
, Doug Engelbart's focus was to
develop computer systems to "help people think better." He sought data
models that more closely paralleled the human thought process, and settled on using hypertext as a way
to represent and store abstract human knowledge. Although his
"Augment
NLS (computer system)
NLS, or the "oN-Line System", was a revolutionary computer collaboration system designed by Douglas Engelbart and implemented by researchers at the Augmentation Research Center at the Stanford Research Institute during the 1960s...
" system underwent many changes, the original
purpose closely aligned with that of PKBs.
More recently, Randall Trigg's TextNet and
NoteCards
NoteCards
NoteCards was a hypertext personal knowledge basesystem developed at Xerox PARC by Randall Trigg, Frank Halasz and Thomas Moran in 1984. NoteCards developed after Trigg became the first to write a Ph.D. thesis on hypertext while at the University of Maryland College Park in 1983...
systems further explored this idea.
TextNet revolved around "primitive pieces of text connected with typed
links to form a network similar in many ways to a semantic
network." Though text-centric, it was clear that Trigg's goal
was to model the associations between primitive ideas and hence to reflect
the mind's understanding. "By using...structure, meaning can be extracted
from the relationships between chunks (small pieces of text) rather than
from the words making them up." The subsequent
NoteCards
NoteCards
NoteCards was a hypertext personal knowledge basesystem developed at Xerox PARC by Randall Trigg, Frank Halasz and Thomas Moran in 1984. NoteCards developed after Trigg became the first to write a Ph.D. thesis on hypertext while at the University of Maryland College Park in 1983...
effort was similarly designed to "formulate, structure, compare, and
manage ideas." It was useful for "analyzing information, constructing
models, formulating arguments, designing artifacts, and generally
processing ideas."
Conklin and Begeman's gIBIS system was another early effort into true
knowledge representation, specifically for the field of design
deliberations and arguments. The project lived on
in the later project QuestMap and the more modern
Compendium
Compendium (software)
Compendium is a computer program and social science tool that facilitates the mapping and management of ideas and arguments. The software provides a visual environment that allows people to structure and record collaboration as they work through "wicked problems". The software is currently released...
, which has been
primarily used for capturing group knowledge expressed in face-to-face
meetings. In all these cases, systems use semantic hypertext in an attempt
to capture shared knowledge in its most basic form. Other examples of
knowledge-based hypertext tools include Mental Link,
Aquanet, and SPRINT, as well
as a few current commercial tools such as PersonalBrain
PersonalBrain
PersonalBrain is mind mapping and personal knowledge base software from TheBrain Technologies. It uses a dynamic graphical interface that maps hierarchical and network relationships. It includes the ability to add links to Web pages and files as well as notes and events using a built-in calendar...
and Tinderbox
Tinderbox (application software)
Tinderbox is a personal content management system developed for Mac OS and Mac OS X by Eastgate Systems. MacWorld described it as " a remarkable tool for storing, arranging, exploring, and publishing data."...
.
Note-taking applications
Note-taking applicationsElectronic Notetaking
Electronic notetaking , also known as computer-assisted notetaking , is a system that provides virtually simultaneous access to spoken information to people who are deaf and hard of hearing, facilitating equal participation with their |hearing colleagues, coworkers, and classmates...
allow a user to create
snippets of text and then organize or categorize
them in some way. These tools can be used to form PKBs that are comprised of such text snippets.
Most of these tools are based on a
tree
Tree (graph theory)
In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...
hierarchy, in which the user can write pages
of notes and then organize them into sections and subsections. (As, for
instance, with HogBay,OneNote,
CircusPonies, and AquaMinds.) The higher
level sections or chapters often receive a colored tab exactly as a
physical three-ring notebook might. Other designers eschew the tree model
for a more flexible category-based approach (e.g., Agenda, Personal Knowbase
Personal Knowbase
Personal Knowbase is a freeform notes database application for MS Windows. Personal Knowbase was first released in 1998 on the CompuServe Information Service and is an example of a personal knowledge base....
, Zoot) (see section data
models). The primary purpose of all these tools is to offer the
benefits of freeform note-taking with none of the deficiencies: users are
free to brainstorm and jot down anything from bullet points to polished
text, while still being able to search, rearrange, and restructure the
entire notebook easily.
An important subcategory of note-taking tools is outliners (e.g., OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...
), or applications specifically designed to organize ideas in a
hierarchy. These tools typically show a two-pane display with a tree-like
navigation widget in the left-pane and a list of items in the right-pane.
Topics and subtopics can be rearranged, and each outline stored in its own
file. Among the first applications of this kind were TreePad and Dave Winer's ThinkTank. Modern outliners feature the ability to
add graphics and other formatting to an item, and even hyperlinks to
external websites or documents. The once abandoned (but now
resurrected) Ecco system was among the first to allow items
to have typed attributes, displayed in columns. This gives the effect of a
custom spreadsheet per topic, with the topic's items as rows and the
columns as attributes. It allows the user to gracefully introduce structure
to their information as it is identified.
Of particular interest are applications optimized for subsuming portions of
an information space realm into a PKB, where they can be clustered and
arranged according to the user's own perceptions. The Virtual Notebook
System (VNS) was one of the first to emphasize this.
VNS was designed for sharing information among scientists at the Baylor
College of Medicine; a user's "personal notebook” could make references to
specific sections of a "community notebook," and even include arbitrary
segments of other documents through a cut-and-paste mechanism. More
recently, YellowPen, Cartagio, and
Hunter-Gatherer are tools that allow one to easily
grab snippets of Web pages and then organize them subjectively.
Document management systems
Another influence on PKBs are systems whose primary purpose is to help usersorganize documents, rather than personal knowledge derived from those documents. Such systems do not encode
subjective knowledge per se, but they do create a personal knowledge base
of sorts by allowing users to organize and cross-reference their
information artifacts.
These efforts provide alternative indexing
mechanisms to the limited "directory path and file name" approach.
Presto replaces the directory hierarchy entirely with
attributes that users assign to files. These key-value pairs represent
user-perceived properties of the documents, and are used as a flexible
means for retrieval and organization. William Jones' Memory Extender was similar in spirit, but it dynamically varied the "weight" of a
file’s keywords according to the user’s context and perceived access
patterns. In Haystack, users - in conjunction with
automated software agents - build a graph-based network of associative
links through which documents can be retrieved.
Metadata and multiple
categorization can also be applied to provide multiple retrieval paths
customized to the way the individual thinks and works with their
information sources. WebTop allowed the user to create
explicit links between documents, but then also merged these user-defined
relationships with other types of associations. These included the
hyperlinks contained in the documents, associations implied by structural
relationships, and content similarities discovered by text analysis. The
idea was that any way in which items can be considered "related" should be
made available to the user for help with retrieval.
A subclass of these systems integrate the user's personal workspace with a
search facility, blurring the distinction between information retrieval and
information organization. SketchTrieve,
DLITE, and Garnet each
materialized elements from the retrieval domain (repositories, queries,
search results) into tangible, manipulatable screen objects. These could be
introduced directly into a spatial layout that also included the
information sources themselves. These systems can be seen as combining a
spatial hypertext interface as in VIKI with
direct access to digital library search facilities. NaviQue was largely in the same vein, though it incorporated a powerful
similarity engine to proactively aid the user in organization.
CYCLADES let users organize Web pages into
folders, and then attempted to infer what each folder "means" to that user,
based on a statistical textual analysis of its contents. This helps users
locate other items similar to what's already in a folder, learn what other
users have found interesting and have grouped together, etc.
All of these document management systems are principally concerned with
organizing objective information sources rather than the expression of
subjective knowledge. Yet their methods are useful to consider with respect
to PKB systems, because such a large part of our knowledge is comprised of
things we remember, assimilate, and repurpose from objective sources.
Search environments like SketchTrieve, as well as snippet gatherers like
YellowPen, address an important need in knowledge management: bridging the
divide between the subjective and objective realms, so that the former can
make reference to and bring structure to the latter.
Claims and Benefits
PKB systems make various claims about the advantages of using them. Thesecan be classified as follows:
- Knowledge generation and formulation. Here the emphasis is on procedure, not persistence; it is the act of simply using the tool to express one’s knowledge that helps, rather than the ability to retrieve it later.
- Knowledge capture. PKBs do not merely allow one to express knowledge, but also to capture it before it elusively disappears. Often the emphasis is on a streamlined user interface, with few distractions and little encumbrance. The point is to lower the burden of jotting down one's thoughts so that neither task nor thought process is interrupted.
- Knowledge organization. A 2003 study on note-taking habits found that "better organization" was the most commonly desired improvement in people's own information recording practices.
- Knowledge management and retrieval. Perhaps the most critical aspect of a PKB is that the knowledge it stores is permanent and accessible, ready to be retrieved at any later time.
- Integrating heterogeneous sources. Recognizing that the knowledge people form comes from a variety of different places, many PKB systems emphasize that the information from diverse sources and of different types can be integrated into a single database and interface.
Data models
PKB systems can be compared along a number of different axes, the mostimportant of which is the underlying data model they support. This is what
prescribes and constrains the nature of the knowledge they can contain:
what types of knowledge elements are allowed, how they can be structured,
and how the user perceives them and can interact with them.
Three aspects of data models can be identified: the structural framework, which prescribes rules
about how knowledge elements can be structured and interrelated, the knowledge
elements themselves, or basic building blocks of information that a user creates
and works with; , and
schema, which involves the level of formal semantics introduced into
the data model.
Structural frameworks
The following structural frameworks have been featured in one or moreprominent PKB systems.
Tree
Systems that support a treeTree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
model allow knowledge elements to be organized
into a containment hierarchy, in which each element has one and only one
"parent." This takes advantage of the mind's natural tendency to classify
objects into groups, and to further break up each classification into
subclassifications. It also mimics the way that a document can be broken up
into chapters, sections, and subsections. It tends to be natural for users
to understand.
All of the applications for creating Buzan mind maps
Mind map
A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. Especially in British English, the terms spidergram and spidergraph are more common, but they can cause confusion with the term spider diagram used in mathematics...
are based on a tree model, because a mind map is a tree.
Each mind map has a "root" element in
the center of the diagram (often called a "main topic") from which all
other elements emanate as descendents. Every knowledge element has one and
only one place in this structure.
Some tools, such as MindManager
MindManager
MindManager, called MindMan until version 3.5, is a commercial mind mapping software application developed by Mindjet Corporation. Mind maps created in MindManager are based on the mind mapping method by Tony Buzan. The latest version, MindManager 9, is available for Microsoft Windows...
, extend this paradigm by introducing "floating topics," which are not
anchored to the hierarchy, and permitting "crosslinks" to arbitrary topics,
similar to those in concept maps.
Other examples of tree-based systems are most personalized search
interfaces, outliners,OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...
, TreePad and
most of the "notebook-based" note-taking systems (such as AquaMinds or OneNote). By allowing them to partition their notes into sections
and subsections, note-taking tools channel users into a tree hierarchy. In
recognition of this confining limitation, many of these tools also permit a
kind of "crosslink" between items (such as Micro Logic, or MyBase),and/or
employ some form of transclusion (see below) to allow items to co-exist in
several places (such as Zoot or StickyBrain). The dominant paradigm in such
tools, however, remains the simple parent-child hierarchy.
Graph
Graph-based systems allow users to create knowledge elements and then tointerconnect them in arbitrary ways. The elements of a graph are
traditionally called "vertices," and connected by "arcs," though the
terminology used by graph-based systems varies widely (see Table 1) and the
hypertext community normally uses the terms "nodes" and "links." There are
no restrictions on how many arcs one vertex can have with others, no notion
of a "parent/child" relationship between vertices (unless the user chooses
to label an arc with those semantics), and normally no "root" vertex. In
many systems, arcs can optionally be labeled with a word or phrase
indicating the nature of the relationship, and adorned with arrowheads on
one or both ends to indicate navigability. (Neither of these adornments is
necessary with a tree, since all relationships are implicitly labeled
"parent/child" and are navigable from parent to child.) Note that a graph
is a more general form of a tree, and hence a
strictly more powerful form of expression.
System | Vertex | Arc | Graph |
---|---|---|---|
Axon Idea Processor | object | link | diagram |
Banxia Decision Explorer | concept | link | view |
Compendium | node | link | view |
Haystack | needle | tie | bale |
Idea Graph | idea | connection | ideagraph |
Knowledge Manager | concept | relation | map |
MyLifeBits | resource | link/annotation | story |
NoteCards | note card | link | browser |
PersonalBrain | thought | link | brain |
RecallPlus | idea | association | diagram |
SMART Ideas | symbol | connector | level |
This model is the defining characteristic of hypertext systems including many of those used for document management. It is also the underpinning
of all concept-mapping tools, whether they actually acknowledge the name
"concept maps" or advertise
themselves simply as tools to draw knowledge diagrams (such as Easy-Mapping-Tool or OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...
). As mentioned previously, graphs draw their power from
the fact that humans are thought to model knowledge as graphs (or
equivalently, semantic networks) internally. In fact, it could be argued
that all human knowledge can be ultimately reduced to a graph of some kind,
which argues strongly for its sufficiency as a structural framework.
An interesting aspect of graph-based systems is whether or not they require
a fully connected
Connectivity (graph theory)
In mathematics and computer science, connectivity is one of the basic concepts of graph theory: it asks for the minimum number of elements which need to be removed to disconnect the remaining nodes from each other. It is closely related to the theory of network flow problems...
graph. A
fully connected graph is one in which every vertex can be reached from any
other by simply performing enough arc traversals. There are no "islands" of
vertices that are severed from each other. Most graph-based tools allow
non-fully-connected graphs: knowledge elements are added to the system,
and connected arbitrarily to each other, without constraint. But a few
tools, such as PersonalBrain
PersonalBrain
PersonalBrain is mind mapping and personal knowledge base software from TheBrain Technologies. It uses a dynamic graphical interface that maps hierarchical and network relationships. It includes the ability to add links to Web pages and files as well as notes and events using a built-in calendar...
and Compendium
Compendium (software)
Compendium is a computer program and social science tool that facilitates the mapping and management of ideas and arguments. The software provides a visual environment that allows people to structure and record collaboration as they work through "wicked problems". The software is currently released...
, actually require a single network of information in which every
knowledge element must be indirectly connected to every other. If one
attempts to remove the last link that connects a body of nodes to the
original root, the severed elements are either “forgotten” or else moved to
a deleted objects heap where they can only be accessed by restoring a
connection to the rest of the graph.
Some hypertext systems add precision to the basic linking mechanism by allowing nodes to
reference not only other nodes, but sections within nodes. This ability is especially useful if the nodes themselves
contain sizeable content, and also for PKB elements making reference to
fragments of objective sources.
Tree plus graph
Although graphs are a strict superset of trees, trees offer some importantadvantages in their own right: simplicity, familiarity, ease of navigation,
and the ability to conceal details at any level of abstraction. Indeed, the
problem of “disorientation” in hypertext navigation largely disappears with the tree model; one is never
confused about “where one is” in the larger structure, because traversing
the parent hierarchy gives the context of the larger surroundings. For this
reason, several graph-based systems have incorporated special support for
trees as well, to combine the advantages of both approaches. For instance,
in concept mapping techniques, a generally hierarchical paradigm is
prescribed, after which users are encouraged to identify “crosslinks”
between distant concepts. Similarly, some systems using the mind mapping
paradigm permit arbitrary relationships between nodes.
One of the earliest systems to combine tree and graph primitives was
TEXTNET, which featured two types of nodes: “chunks”
(which contained content to be browsed and organized) and “table of
contents” nodes (or “tocs”.) Any node could freely link to any other,
permitting an unrestricted graph. But a group of tocs could be combined to
form a tree-like hierarchy that bottomed out in various chunk nodes. In
this way, any number of trees could be superimposed upon an arbitrary
graph, allowing it to be viewed and browsed as a tree, with all the
requisite advantages. Strictly speaking, a network of tocs formed a
DAG
Directed acyclic graph
In mathematics and computer science, a directed acyclic graph , is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of...
rather than a tree. This
means that a “chunk” could be represented in multiple places in the
tree, if two different traversal
paths ended up referring to the same chunk. Note that a
DAG is essentially the result of applying transclusion to the tree model.
This is also true of NoteCards. NoteCards offered a similar
mechanism, using “FileBoxes” as the tree component that was overlaid upon
the semantic network of notecards.
Brown University’s IGD project explored
various ways to combine and display unrestricted graphs with hierarchy, and
used a visual metaphor of spatial containment to convey both graph and tree
structure. Their notion of “link inheritance” simplifies the
way in which complex dual structures are displayed while still faithfully
depicting their overall trends. Commercially, both PersonalBrain
PersonalBrain
PersonalBrain is mind mapping and personal knowledge base software from TheBrain Technologies. It uses a dynamic graphical interface that maps hierarchical and network relationships. It includes the ability to add links to Web pages and files as well as notes and events using a built-in calendar...
and Multicentrix provide explicit support for parent/child
relationships in addition to arbitrary connections between elements,
allowing tree and graph notions to coexist. Some note-taking tools, while
essentially tree-based, also permit crosslinks between notes (such as Micro Logic and MyBase).
Spatial
Some designers have shunned links between elements altogether, favoringinstead spatial positioning as the sole organizational paradigm.
Capitalizing on the human’s tendency to implicitly organize through
clustering, making piles, and spatially arranging, some tools offer a 2D
workspace for placing and grouping items. This provides a less formal (and
perhaps less intimidating) way for a user to gradually introduce structure
into a set of items as it is discovered.
This approach originated from the spatial hypertext community, demonstrated
in various projects, and VIKI/VKB
With these programs, users
place information items on a canvas and can manipulate them to convey
organization imprecisely. Some project could infer the structure from a user’s freeform layout:
a spatial parser examines which items have been clustered together,
colored or otherwise adorned similarly, etc., and makes judgments about how
to turn these observations into machine-processible assertions. While others (Pad ) allowed users to view different objects
in varying levels of detail as they panned around the workspace.
Certain note-taking tools OneNote
combine an overarching tree structure with spatial freedom on each “frame” or “page.”
Users can access a particular page of the notebook with basic search or
tree navigation facilities, and then lay out notes and images on the page
as desired. Many graph-based approaches (such as concept mapping tools) also
allow for arbitrary spatial positioning of elements. This allows both kinds
of relationships to be expressed: explicit links and less formal expression
through creative use of the screen.
Categories
In category-based structural frameworks, rather than being described interms of their relationships to other elements (as with a tree or graph),
items are simply grouped together in one or more categories, indicating
that they have something in common. This scheme is based on the branch of
pure mathematics called set theory
Set theory
Set theory is the branch of mathematics that studies sets, which are collections of objects. Although any type of object can be collected into a set, set theory is applied most often to objects that are relevant to mathematics...
, in which each of a body
of objects either has, or does not have, membership in each of some number
of sets. There is normally no restriction as to how many different
categories a given item can belong to, as is the case with mathematical
sets.
Users may think of categories as collections, in which the category somehow
encloses or “owns” the items within it. Indeed, some systems depict
categories in this fashion, such as the Vista interface where icons standing for documents are enclosed within ovals that
represent categories. This is merely a convention of display, however, and
it is important to note that fundamentally, categories are the same as
simple keywords.
The most popular application to embrace the category approach was the
original Agenda. All information retrieval in Agenda was performed in terms of
category membership. Users specified queries that were lists of categories
to include (or exclude), and only items that satisfied those criteria were
displayed. Agenda was
particularly sophisticated in that the categories themselves formed a tree
hierarchy, rather than a flat namespace. Assigning an item to a category
also implicitly assigned it to all ancestors in the hierarchy.
Personal Knowbase
Personal Knowbase
Personal Knowbase is a freeform notes database application for MS Windows. Personal Knowbase was first released in 1998 on the CompuServe Information Service and is an example of a personal knowledge base....
is a more modern commercial product based
solely on a keyword (category) paradigm, though it uses a simple flat
keyword structure rather than an inheritance hierarchy like Agenda.
Haystack and Chandler
Open Source Applications Foundation
The Open Source Applications Foundation is a non-profit organization founded in 2002 by Mitch Kapor whose purpose is to effect widespread adoption of free software/open-source software.-OSAF Mission:The mission of the OSAF is stated this way:...
are other information
management tools which use categorization in important ways. William Jones’
Memory Extender took an artificial intelligence twist on the
whole notion of keywords/categories, by allowing an item’s keywords to be
weighted, and adjusted over time by both the user and the system. This
allowed the strength of category membership to vary dynamically for each of
an item’s assignments, in an attempt to yield more precise retrieval.
Chronological
Yale University’s Lifestreams project used timestampsas the principal means of organization and retrieval of personal documents.
In Fertig et al.’s own words:
A lifestream is a time-ordered stream of documents that functions as a
diary of your electronic life; every document you create is stored in your
lifestream, as are the documents other people send you. The tail of your
stream contains documents from the past, perhaps starting with your
electronic birth certificate. Moving away from the tail and toward the
present, your stream contains more recent documents such as papers in
progress or the latest electronic mail you’ve received...
Documents are thus always ordered and accessed chronologically.
Metadata-based queries on the collection produce “substreams,” or
chronologically ordered subsets of the original documents. The rationale
for time-based ordering is that “time is a natural guide to experience; it
is the attribute that comes closest to a universal skeleton-key for stored
experience.” Whether chronology is our
principal or even a common natural coding mechanism psychologically can be
debated. But since any PKB system can easily create such an index,
it seems worthwhile to follow Lifestreams’ lead and allow the user to sort
and retrieve based on time, as many systems have done. If nothing else, it relieves the user from
having to create names for knowledge elements, since the timestamp is
always an implicit identifying mark. PlanPlus, based on
the Franklin-Covey planner system, is also chronologically modeled, and a
number of products based on other data models (e.g. CircusPonies) offer chronological indexing in addition to their core
paradigm.
Aquanet's framework
Though advertised as a hypertext system, Marshall et. al’sAquanet went far beyond the traditional node-link
graph model. Knowledge expressed in Aquanet is centered around
“relations,” or n-ary links between objects in which the semantics of each
participant in the relation is specified by the relation type. Each type of
relation specifies a physical display (i.e., how it will be drawn on the
screen, and the spatial positioning of each of its participants), and a
number of “slots” into which participants can be plugged in. Each
participant in a relation can be either a base object, or another relation.
Users can thus define a schema of relation types, and then build a complex
semantic model out of relations and objects. Since relation types can be
specified to associate any number of nodes (instead of just two, as in the
graph model), this potentially allows more complex relationships to be
expressed.
It should be noted, however, that the same effect can be
achieved in the basic graph model by simply taking the n-ary relations and
“reifying” them (i.e., turning them into nodes in their own right.) For
instance, suppose we define a relation type “assassination,” with slot
types of “assassin,” “victim,” “location,” and “weapon.” We could then
create a relation based on this type where the participants are “John
Wilkes Booth,” “Abraham Lincoln,” “Ford’s Theatre,” and “derringer.” This
allows us to express a complex relationship between multiple objects in
Aquanet. But we can express the same knowledge with the basic graph model
by simply creating a node called “Lincoln’s assassination” and then
creating typed links between that node and the other four labeled
“assassin,” “victim,” etc. Aquanet’s biggest achievement in this area is
the ability to express the schema of relation types, so that the types of
objects an “assassination” relation can connect are consistent and
enforced.
Knowledge elements
There are several options for specifying what knowledge elements consistof, and what kind of internal structure, if any, they possess:
- Word/phrase/concept. Most systems engineered for knowledge representation encourage structures to be composed of very simple elements, usually words or phrases. This is in the spirit of both mind mapping and concept mapping, where users are encouraged to use simple phrases to stand for mental concepts.
- Free text notes. Nearly all systems permit large amounts of free text to exist in the PKB, either as the contents of the elements themselves (NoteCards, Hypercard, TreePad) or attached to elements as separate, supplementary pages (Agenda, Zoot, HogBay).
- Links to an information space. Since a user’s knowledge base is to correspond to her mental perceptions, it seems profitable for the PKB to point to entities in the information space from which she formed those perceptions. Many systems do in fact allow their knowledge elements to point to the original sources in some way. There are three common techniques:
- The knowledge element actually represents an original source. This is the case for document management systems (WebTop, MyLifeBits, Haystack), integrated search facilities (NaviQue, CYCLADES), VIKI/VKB. Tinderbox will also allow one of its notes to be a URL, and the user can control whether its contents should be captured once, or “auto-fetched” as to receive constant web updates. Many systems, in addition to storing a page of free text for each knowledge element, also permit any number of hyperlinks to be attached to a knowledge element (e.g., FreemindFreeMindFreeMind is a free mind mapping application written in Java. FreeMind is licensed under the GNU General Public License. It provides extensive export capabilities. It runs on Microsoft Windows, Linux and Mac OS X via the Java Runtime Environment....
, PersonalBrainPersonalBrainPersonalBrain is mind mapping and personal knowledge base software from TheBrain Technologies. It uses a dynamic graphical interface that maps hierarchical and network relationships. It includes the ability to add links to Web pages and files as well as notes and events using a built-in calendar...
, Inspiration). VNS, which allows users to point to a community notebook page from within their personal notebook, gives similar functionality. - The knowledge element is a repurposed snippet from an original source. This is potentially the most powerful form, but is rare among fully featured PKB systems. Cartagio, Hunter-Gatherer, and YellowPen all allow Web page excerpts to be assimilated and organized, although they primarily only do that, without allowing them to easily be combined with other subjective knowledge. DEVONThinkDEVONthinkDEVONthink is a Mac OS X program for intelligent document management and associative search. It is developed by the company DEVONtechnologies located in Coeur d'Alene, Idaho, USA....
and MyBase’s WebCollect plug-in add similar functionality to their more general-purpose, tree-based information managers. Both of these systems, when a snippet is captured, archive the entire Web page locally so it can be returned to later. The user interfaces of CircusPonies and StickyBrain have been heavily optimized towards grabbing information from other applications and bringing them into the PKB without disturbing the user’s workflow.
- The knowledge element actually represents an original source. This is the case for document management systems (WebTop, MyLifeBits, Haystack), integrated search facilities (NaviQue, CYCLADES), VIKI/VKB. Tinderbox will also allow one of its notes to be a URL, and the user can control whether its contents should be captured once, or “auto-fetched” as to receive constant web updates. Many systems, in addition to storing a page of free text for each knowledge element, also permit any number of hyperlinks to be attached to a knowledge element (e.g., Freemind
- Composites Some programs allow a user to embed knowledge elements (and perhaps other information as well) inside a knowledge element to form an implicit hierarchy. Trees by themselves fall into this category, of course, since each node in the tree can be considered a “composite” of its content and children. But a few graph-based tools offer composite functionality as well. In Aquanet, “relations” form the fundamental means of connection, and the units that are plugged into a relation can be not only objects, but other relations as well. This lends a recursive quality to a user’s modeling. VIKI/VKB’s spatial environment offers “subspaces” which let a user partition their visual workspace into subregions, whose internal contents can be viewed at a glance from the parent. Boxer’s paradigm is similar. Tinderbox is a graph-based tool that supports hierarchical composite structures, and CompendiumCompendium (software)Compendium is a computer program and social science tool that facilitates the mapping and management of ideas and arguments. The software provides a visual environment that allows people to structure and record collaboration as they work through "wicked problems". The software is currently released...
extends this even further by allowing transclusion of “views” as well as of nodes. Unlike the other tools, in Compendium the composite hierarchy does not form a DAGDirected acyclic graphIn mathematics and computer science, a directed acyclic graph , is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of...
, but rather an arbitrary graph: view A can appear on view B, and B can in turn appear on A. The user’s intuitive notion of “inside” must be adapted somewhat in this case.
Schema
In the context of PKBs, "schema" means the ability for a user to specifytypes and introduce structure to aspects of the data model. It is a form of
metadata whereby more precise semantics can be applied to various elements
of the system. This facilitates more formal knowledge expression, ensures
consistency across items of the same kind, and can better allows automated
agents to process the information.
Both knowledge elements, and links, can contain various aspects of schema.
Types, and related schema
In a PKB, a "type system
Type system
A type system associates a type with each computed value. By examining the flow of these values, a type system attempts to ensure or prove that no type errors can occur...
" allows users to specify that a knowledge element is a member of a specific class or category or items, to provide a built-in method of organization and retrieval. Generally speaking, systems can make knowledge elements untyped, rigidly
typed, or flexibly typed. In addition, they can incorporate some notion of
inheritance among elements and their types. Note the distinction between
types and categories here. A category-based scheme, typically allows any
number of categories/keywords to be assigned to an item. There are two
differences between this and the notion of type. First, items are normally
restricted to being of a single type, and this usually indicates a more
intrinsic, permanent property of an item than simply its presence in a
category collection. (For example, one could imagine an item called “XYZ
Corporation” shifting into and out of categories like “competitors, ”
“overseas distributors,” or “delinquent debtors” over time, but its core
type of “company” would probably be static for all time.) Second, types
often carry structural specifications with them: if an item is of a given
type, this means it will have values for certain attributes appropriate to
that type. Note that some systems that do not allow typing offer the
ability to approximate this function through categories. (e.g.,
OneNote, MindManager
MindManager
MindManager, called MindMan until version 3.5, is a commercial mind mapping software application developed by Mindjet Corporation. Mind maps created in MindManager are based on the mind mapping method by Tony Buzan. The latest version, MindManager 9, is available for Microsoft Windows...
).
Untyped elements are typical among informal knowledge capture tools, since
they are designed to stimulate brainstorming and help users discover their
nascent mental models. These tools normally want to avoid forcing the user
to commit to structure prematurely. Most mind mapping and many concept
mapping tools are in this category: a concept is simply a word or phrase,
with no other semantic information (e.g., Visual Mind
Visual Mind
Visual Mind is mind mapping software that allows users to capture and organize information in a visual manner. The result is electronic "mind maps" that provides both overview and details in the same view. Earlier versions of Visual Mind were primarily targeted to single users...
).
Note-taking tools also usually take this approach, with all units of
information being of the same type “note.”
At the other extreme are tools which, like older relational database
technology, require all items to be declared as of a specific type when
they are created. Often this type dictates the internal structure of the
element. These tools are better suited to domains in which the structure
of knowledge to be captured is predictable, well-understood, and known in
advance. For PKB systems, they are probably overly restrictive. KMap and Compendium are examples of tools that allow (and
require) each item to be typed; in their case, the type controls the visual
appearance of the item, rather than any internal structure.
In between these two poles are systems that permit typed and untyped
elements to co-exist (e.g. AquaMinds). NoteTaker is such a
product; it holds simple free-text pages of notes, without any structure,
but also lets the user define “templates” with predefined fields that can
be used to instantiate uniformly structured forms. TreePad has a similar
feature. Some other systems blur the distinction between typed and untyped,
allowing the graceful introduction of structure as it is discovered.
VKB, for example, supports an elegant, flexible
typing scheme, well suited to PKBs. Items in general consist of an
arbitrary number of attribute/value pairs. But when consistent patterns
emerge across a set of objects, the user can create a type for that group,
and with it a list of expected attributes and default values. This
structure can be selectively overridden by individual objects, however,
which means that even objects assigned to a particular type have flexible
customization available to them. Tinderbox offers an alternate way of
achieving this flexibility, as described below.
Finally, the object-oriented notion of type inheritance
Inheritance (computer science)
In object-oriented programming , inheritance is a way to reuse code of existing objects, establish a subtype from an existing object, or both, depending upon programming language support...
is available in a
few solutions. The different card types in NoteCards are arranged into an
inheritance hierarchy, so that new types can be created as extensions of
old. Aquanet extends this to multiple inheritance among types; the “slots”
that an object contains are those of its type, plus those of all
supertypes. SPRINT and Tinderbox also use a frame-based approach, and allow
default values for attributes to be inherited from supertypes. This way, an
item need not define values for all its attributes explicitly: unless
overridden, an item’s slot will have the shared, default value for all
items of that type.
Other forms of schema
In addition to the structure that is controlled by an item’s
type, other forms of metadata and schema can be applied to knowledge
elements.
- Keywords. Many systems let users annotate items with user-defined keywords. Here the distinction between an item’s contents and the overall knowledge structure becomes blurred, since an item keyword could be considered either a property of the item, or an organizational mechanism that groups it into a category with like items. Systems using the category data model (e.g., Agenda) can employ keywords for the latter purpose. Some systems based on other data models also use keywords to achieve category-like functionality.
- Attribute/value pairs. Arbitrary attribute/value pairs can also be attached to elements in many systems, which gives a PKB the ability to define semantic structure that can be queried. Frame-based systems like SPRINT and Aquanet are examples, as well as NoteTaker, VKB, and Tinderbox. MindPad[AKS-Labs 2005] is notable for taking the basic concept mapping paradigm and introducing schema to it via its “model editor.” As mentioned earlier, adding user-defined attribute/value pairs to the items in an outliner yields spreadsheet-like functionality, as in Ecco and OmniOutlinerOmniOutlinerOmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...
.Note that some systems feature attribute/value pairs, but only in the form of system-defined attributes, not user-defined ones. (e.g., Mind Manager, StickyBrain).
- Knowledge element appearance. Some tools modify a knowledge element’s visual appearance on the screen in order to convey meaning to the user. SMART Ideas and Visual MindVisual MindVisual Mind is mind mapping software that allows users to capture and organize information in a visual manner. The result is electronic "mind maps" that provides both overview and details in the same view. Earlier versions of Visual Mind were primarily targeted to single users...
let the user freely choose each element’s icon from a variety of graphics, while KMap ties the icon directly to its underlying type. Other graphical aspects that can be modified include color (VIKI), the set of attributes shown in a particular context (VKB), and the spatial positioning of objects in a relation (Aquanet).
Schema for links
In addition to prescribing schema for knowledge elements, many systemsallow some form of information to be attached to the links that connect
them.
In most of the early hypertext systems, links were unnamed and untyped,
their function being merely to associate two items in an unspecified
manner. The mind mapping paradigm also does not name links, but for a
different reason: the implicit type of every link is one of
generalization/specialization, associating a topic with a subtopic. Hence
specifying types for the links would be redundant, and labeling them would
clutter the diagram.
Concept mapping prescribes the naming of links, such that the precise
nature of the relationship between two concepts is made clear. As mentioned
above, portions of a concept map are meant to be read as English sentences,
with the name of the link serving as a verb phrase connecting the two
concepts. Numerous systems thus allow a word or phrase to decorate the
links connecting elements, for instance Cmap and
Inspiration.
Named links can be distinguished from typed links, however. If the text
attached to a link is an arbitrary string of characters, unrelated to that
of any other link, it can be considered the link name. Some systems,
however, encourage the re-use of link names that the user has defined
previously. In PersonalBrain
PersonalBrain
PersonalBrain is mind mapping and personal knowledge base software from TheBrain Technologies. It uses a dynamic graphical interface that maps hierarchical and network relationships. It includes the ability to add links to Web pages and files as well as notes and events using a built-in calendar...
, for instance, before
specifying the nature of a link, the user must create an appropriate “link
type” (associated with a color to be used in presentation) in the
system-wide database, and then assign that type to the link in question.
This promotes consistency among the names chosen for links, so that the
same logical relationship types will hopefully have the same tags
throughout the knowledge base. This feature also facilitates searches based
on link type, among other things. Other systems, especially those suited
for specific domains such as decision modeling (gIBIS and Decision Explorer), predefine a set of link types that
can be assigned (but not altered) by the user.
Some more advanced systems allow links to bear attribute/value pairs
themselves, and even embedded structure, similar to those of the items they
connect. In Haystack this is the case, since links
(“ties”) and nodes (“needles”) are actually defined as subtypes of a common
type (“straw.”)
KMap similarly defines a link as a subclass of node, which
allows links to represent n-ary relationships between nodes, and enables
recursive structure within a link itself. It is unclear how much value this
adds in knowledge modeling, or how often users take advantage of such
a feature. Neptune and Intermedia are two older systems that also support attributes for links,
albeit in a simpler manner.
Another aspect of links that generated much fervor in the early hypertext
systems was that of link precision: rather than merely connecting one
element to another, systems like Intermedia defined anchors within
documents, so that a particular snippet within a larger element could be
linked to another snippet. The Dexter model
covers this issue in detail. For PKB purposes, this seems to be most
relevant as regards links to the objective space, as discussed previously.
If the PKB truly contains knowledge, expressed in appropriately
fine-grained parts, then link precision between elements in the knowledge
base is much less of a consideration.
Note that this discussion on links has only considered connections between
knowledge elements in the system, where the system has total control over
both ends of the connection. As described in the previous section, numerous
systems provide the ability to “link” from a knowledge element inside the
system to some external resource: a file or a URL, say. These external
links typically cannot be enhanced with any additional information, and
serve only as convenient retrieval paths, rather than as aspects of
knowledge representation.
Architecture
The idea of a PKB gives rise to some important architecturalconsiderations. While not constraining the nature of what knowledge can be
expressed, the architecture nevertheless affects more mundane matters such
as availability and workflow. But even more importantly, the system’s
architecture determines whether it can truly function as a lifelong,
integrated knowledge store – the “base” aspect of the personal knowledge
base defined above.
File-based
Traditionally, most electronic PKB systems have employed a simplestorage mechanism based on flat files in a filesystem. This is true of
virtually all of the mind mapping tools (MindManager
MindManager
MindManager, called MindMan until version 3.5, is a commercial mind mapping software application developed by Mindjet Corporation. Mind maps created in MindManager are based on the mind mapping method by Tony Buzan. The latest version, MindManager 9, is available for Microsoft Windows...
),
concept mapping tools (Cmap, Axon
Axon Idea Processor
The Axon Idea Processor is a commercial Windows-based program that helps users visualize and process interrelated thoughts and ideas.-Development:...
,
Inspiration), outliners (TreePad,
OmniOutliner
OmniOutliner
OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple...
), and note-taking tools (OneNote
HogBay, Zoot), and even a number of hypertext
tools (NoteCards, Hypercard,
Tinderbox). Typically, the main “unit” of a user’s
knowledge design – whether that be a mind map, a concept map, an outline,
or a “notebook” – is stored in its own file somewhere in the filesystem.
The application can find and load such files via the familiar “File |
Open...” paradigm, at which point it typically maintains the entire
knowledge structure in memory.
The advantage of such a paradigm is familiarity and ease of use; the
disadvantage is a possibly negative influence on knowledge formulation.
Users must choose one of two basic strategies: either store all of their
knowledge in a single file; or else break up their knowledge and store it
across a number of different files, presumably according to subject matter
and/or time period. The first choice can result in scalability problems -
consider how much knowledge a user might collect over a decade, if they
stored things related to their personal life, hobbies, relationships,
reading materials, vacations, academic course notes, multiple work-related
projects, future planning, etc. It seems unrealistic to keep adding this
kind of volume to a single, ever-growing multi-gigabyte file. The other
option, however, is also constraining: each bit of knowledge can be stored
in only one of the files (or else redundantly, which leads to
synchronization problems), and the user is forced to choose this at
knowledge capture time.
Database-based
If a PKB's data is stored in a database system, then knowledge elementsreside in a global space, which allows any idea to relate to any other: now
a user can relate a book he read on productivity not only to other books on
productivity, but also to “that hotel in Orlando that our family stayed in
last spring,” because that is where he remembers having read the book.
Though such a relationship may seem “out of bounds” in traditional
knowledge organization, it is exactly the kind of retrieval path that
humans often employ in retrieving memories. The database architecture enables a
PKB to truly form an integrated knowledge base, and contain the full range
of relationships.
Agenda and gIBIS were two
early tools that subsumed a database backend in their architecture. More
recently, the MyLifeBits project uses Microsoft SQL
Server as its storage layer, and Compendium
Compendium (software)
Compendium is a computer program and social science tool that facilitates the mapping and management of ideas and arguments. The software provides a visual environment that allows people to structure and record collaboration as they work through "wicked problems". The software is currently released...
interfaces
with the open source MySQL database. A few note-taking applications such
as StickyBrain also store information in an integrated
database rather than in user-named files. The only significant drawback to
this architectural choice (other than the modest footprint of the database
management system) is that data is more difficult to copy and share across
systems. This is one true advantage of files: it is a simple matter to
copy them across a network, or include them as an e-mail attachment, where
they can be read by the same application on a different machine. This
problem is solved by some of the following architectural choices.
Client-server
Decoupling the actual knowledge store from the PKB user interface canachieve architectural flexibility. As with all client-server architectures,
the benefits include load distribution, platform interoperability, data
sharing, and ubiquitous availability. Increased complexity and latency are
among the liabilities, which can indeed be considerable factors in PKB
design.
One of the earliest and best examples of a client-server knowledge base was
the Neptune hypertext system. Neptune was
tailored to the task of maintaining shared information within software
engineering teams, rather than to personal knowledge storage, but the
elegant implementation of its “Hypertext Abstract Machine” (HAM) was a
significant and relevant achievement. The HAM was a generic hypertext
storage layer that provided node and link storage and maintained version
history of all changes. Application layers and user interfaces were to be
built on top of the HAM. Architecturally, the HAM provided distributed
network access so that client applications could run from remote locations
and still access the central store. Another, more recent example, is the
Scholarly Ontologies Project
whose ClaiMapper and ClaiMaker components form a similar distributed
solution in order to support collaboration.
These systems implemented a distributed architecture primarily in order to
share data among colleagues. For PKBs, the prime motive is rather user
mobility. This is a key consideration, since if a user is to store all of
their knowledge into a single integrated store, they will certainly need
access to it in a variety of settings. MyBase Networking EditionMyBase
is one example of how this might be achieved. A central server hosts the
user’s data, and allows network access from any client machine. Clients can
view the knowledge base from within the MyBase application, or through a
Web browser (with limited functionality.)
The Haystack project outlines a three-tiered
architecture, which allows the persistent store, the Haystack data model
itself, and the clients that access it to reside on separate machines. The
interface to the middle tier is flexible enough that a number of different
persistent storage models can be used, including relational databases,
semistructured databases, and object-oriented databases. Presto’s
architecture exhibits similar features.
Web-based
A variation of the client-server approach is of course Web-based systems,in which the client system consists of nothing but a (possibly enhanced)
browser. This gives the same ubiquitous availability that client-server
approaches do, while minimizing (or eliminating) the setup and installation
required on each client machine.
KMap was one of the first knowledge systems to
integrate with the World Wide Web. It allowed concept maps to be shared,
edited, and remotely stored using the HTTP protocol. Concept maps were
still created using a standalone client application for the Macintosh, but
they could be uploaded to a central server, and then rendered in browsers
as “clickable GIFs.” Clicking on a concept within the map image in the
browser window would have the same navigation effect as clicking on it
locally inside the client application. Hunter-Gatherer, Cartagio, and NoteStar are more recent
browser-based systems that use proxies or browser plugins to achieve a
knowledge building workspace. The user’s knowledge expressions are stored
on a central server in nearly all cases, rather than locally on the
browser’s machine.
Handheld devices
Lastly, mobile devices are a possible PKB architecture. Storingall of one’s personal knowledge on a PDA would solve the availability
problem, of course, and even more completely than would a client-server or
web-based architecture. The safety of the information is an issue, since
if the device were to be lost or destroyed, the user could face irrevocable
data loss; this is easily remedied, however, by periodically synchronizing
the device’s contents with a host computer.
Most handheld applications are simple note-taking software, with far fewer
features than their desktop counterparts. BugMe! is an
immensely popular note-taking tool that simply lets users enter text or
scribble onto “notes” (screenfulls of space) and then organize them in
primitive ways. Screen shots can be captured and included as graphics, and
the tool features an array of drawing tools, clip art libraries, etc. The
value add for this and similar tools is purely the size and convenience of
the handheld device, not the ability to manage large amounts of
information.
Perhaps the most effective use of a handheld architecture would be as a
satellite data capture and retrieval utility. A user would normally employ
a fully functional desktop application for personal knowledge management,
but when “on the go,” they could capture knowledge into a compatible
handheld application and upload it to their PKB at a later convenient time.
To enable mobile knowledge retrieval, either select information would need
to be downloaded to the device before the user needed it, or else a
wireless client-server solution could deliver any part of the PKB on
demand. This is essentially the approach taken by software like
KeySuite, which supplements a feature-rich desktop
information management tool (e.g. Microsoft Outlook
Microsoft Outlook
Microsoft Outlook is a personal information manager from Microsoft, available both as a separate application as well as a part of the Microsoft Office suite...
) by providing access to that
information on the mobile device. InfoSelect,Micro Logic a tree-
based note-taking application, also offers a mobile product.