Data

From SIMILE Widgets
Jump to: navigation, search

Contents

Your Exhibit Data

Publishing an Exhibit calls for an HTML page to display your data plus a data file or location. This section describes how you format and use data with your exhibits. It also describes Exhibit data models and expressions.

What's New in Exhibit 3.0?

There are some notable changes in the way Exhibit handles data in Exhibit 3.0 compared to previous releases.

Stricter JSON Validation

Exhibit 3.0 has stricter standards for JSON data validation. For existing Exhibit 2 users, we suggest you validate your JSON data using JSONLint before publishing exhibits with Exhibit 3.0.

If your data fails to load, you'll need to update it to conform to the JSON specification. You can try to use a one-off extension to upgrade your old JSON for you.

Changes to How Exhibit Works with Babel

The Babel data translation service is no longer integrated in Exhibit. You can still call Babel to translate your data from one format to another, but you need to supply the Babel URL to exhibit.

You'll need to supply a URL to Babel by appending "babel=<url>" to your exhibit-api.js script tag.

For more information on Babel, see http://service.simile-widgets.org/babel/.

Note about Babel: The Babel service is not guaranteed to run as a public service indefinitely. If you rely on Babel translation services (RDF/XML, N3, Excel, an Exhibit page, KML, JPEG, TSV importers), consider running Babel yourself, downloading the transformed data if you don't need to actively transform the original, or maintaining the original data in a format that does not depend on Babel.

rel Usage Note

For Exhibits published using Exhibit 3.0, change <link rel="exhibit/data"/> to <link rel="exhibit-data"/>. The former use is deprecated and will not work at a future date.

Usage with HTML5

The Exhibit attribute-based configuration has changed for HTML5. A compatibility mode remains for Exhibits in XHTML files. HTML5 does not support XML namespaces, providing a new custom attribute mode in its stead.

Moving from Exhibit 2.2.0 in XHTML to Exhibit 3.0 in HTML5 requires changing all attributes prefixed with ex: to be prefixed with data-ex- instead. In addition, all capital letters within the attribute name should be converted to a hyphen and lower case, e.g., ex:itemTypes becomes data-ex-item-types.

The HTML5 data attribute API treats capitalization differently during document processing and when attribute access occurs, necessitating the change to hyphenation.

Data Export for Scripted Mode: Toolbox UI Element'

Exhibit 3.0 Scripted modifies the toolbox UI element behavior. Instead of disappearing and re-appearing based on mouse hovering over a view, the toolbox (visible as the tiny scissors icon) are by default always visible. The former behavior can be reintroduced with a new configuration setting.

HTTP Input and Output (Staged Mode)

The Backstage server for Exhibit 3.0 Staged mode publishes an HTTP+JSON interface to all its data and functionality. See the HTTP Interface documentation on GitHub for more information.

With Staged mode, you can invoke HTTP Get on the data link URL to export the entire dataset in HTML+RDFa format (not the original format), to facilitate search engine indexing.

See the Authoring documentation for Backstage for details on using the data input URL for Staged exhibits, as well as data export features to help make Exhibit data findable by search engines.

Importing Data Into Your Exhibit

Exhibit 3.0 Staged Mode

Adding data to Exhibit Staged is through HTTP. See the developer documentation about the HTTP Interface for details on data upload, creating a database either in-memory or on disk, and more.

Exhibit 3.0 Scripted Mode

There are two basic ways to add data to a Scripted exhibit:

  • Use Exhibit's built-in data importers for data in one of these formats:
    • Exhibit JSON
    • Google spreadsheet
    • Generic JSONP framework
    • Babel-Based importing format

You need to specify which importer to use, and for Babel, supply the URL of a Babel installation (your own or a centrally available Babel service).

Babel-based importers include these input formats:

    • BibTeX
    • Excel spreadsheet
    • Exhibit JSON
    • Exhibit page
    • JPEG
    • N3
    • RDF/XML
    • Tab-separated values
  • Convert your data to JSON manually, with a Babel service or by hand, and then publish the exhibit calling your JSON file.

Exhibit 3.0 relies on the full JSON standards, which were not fully enforced in previous versions of Exhibit. Use a JSON validator such as JSONLint to make sure your JSON is formatted properly.

The following sections offer more details on formatting and importing your data into Exhibit 3.0 Scripted mode.

Creating, Importing, and Managing Data

Exhibit's database natively understands data in its own format (a JSON format), but there are a number of ways to use data in other formats. If you have existing data in another format or if you prefer another format, you can

  1. Use the Babel service to convert your data into Exhibit's JSON format
  2. Use an importer to convert your files into Exhibit's JSON format on-the-fly

Manually Creating and Managing Exhibit Data

To create and manage data in files in Exhibit's JSON format, you just need a decent text editor (see the list of recommended tools).

Start by entering this code into your text editor

{ 
"items":
[ 
        ] 
 } 

Save it in the same directory where your web page (HTML file) is stored. Give it a .js or .json extension (this is optional, done just by convention).

Be careful: note that there are both braces { } and brackets [ ]. Loosely speaking, braces { } are used to wrap many properties of different names, while brackets [ ] are used to wrap several things in a list.

In the code above, your data records, or items in Exhibit's terminology, go in between the brackets. Here is the same code with three items:

{
"items": [
{ "label": "John Doe",
"type": "Person",
"age": "36",
"likes": "Mary Smith",
"favorite-color": ["blue", "yellow"]
},
{ "label": "Mary Smith",
"type": "Person",
"married-to": "Joe Anderson",
"job": "Doctor",
"worksAt": "Boston General Hospital",
"hobby": ["painting", "karate"]
},
{ "label": "Boston General Hospital",
"type": "Place",
"city": "Boston"
}
]
}


Notes:

  • This example shows two types of items: Person and Place. You can use as many as you want, and name them however you want. It's your data--you're the boss. Exhibit doesn't require you use a global schema for your data.
  • Items of the same type, Person in this case, don't have to have the same properties all filled in. So, John has age but Mary doesn't. And Mary has job while John doesn't. etc. Fill in whatever information you have. You'll get some value of out Exhibit even with incomplete, messy data.
  • The code shown here is neatly formatted and aligned, but it doesn't have to be. You can manage your file in whatever way suits you, so you won't make mistakes. Your data is your business.

Data Formatting Notes

Here are some formatting issues to keep in mind when formatting your data:

  • Watch out for { } vs. [ ]. Each item in the code above is wrapped in { } while a list of things like "blue", "yellow" is wrapped in [ ].
  • Watch out for commas. They are used to separate properties within { } and elements of a list within [ ]. Use commas only where needed. Do not put a comma after the last property in a pair of { }. Browsers can get very picky about misplaced commas.
  • Put quotation marks around all property names, e.g., "job", or "co-author".

Your exhibit can include one or more data files. Each data file can contain any number of items (or none at all). It can also contain information about types and properties. You can decide how to split your data among several files.

Converting Data Using Babel

You can use the Babel web service to convert data from various formats into Exhibit's JSON format.

Babel-Based Importers

Babel-based importers for pulling data into Exhibit include BibTeX, Excel spreadsheet, Exhibit JSON, Exhibit page, JPEG, N3, RDF/XML, and tab-separated values, or TSV. While these importers are fully implemented, users must supply a Babel installation URL in their Exhibit code in order to use Babel's import service.

To call Babel you need to supply the Babel service's URL by appending "babel=<url>" to your exhibit-api.js script tag.

For more information on Babel, see http://service.simile-widgets.org/babel/.

Note about using Babel: The Babel service is not guaranteed to run as a public service indefinitely. If you rely on Babel translation services (RDF/XML, N3, Excel, an Exhibit page, KML, JPEG, TSV importers), consider running Babel yourself, downloading the transformed data if you don't need to actively transform the original, or maintaining the original data in format that does not depend on Babel.

Babel gives you the option of entering the URLs to your data files, uploading your data files from your computer, and just simply pasting your data into a text box.

At this time, the two most popular formats we support are BibTeX and Tab-Separated Values (TSV). While BibTeX is a special treat for the academically inclined (more details here), TSV is useful for everyone.

If you have data in tab-separated format, you can use Babel to convert the data to JSON and then load it into Exhibit. Exhibit will not convert your data for you.

Converting Data at Load Time

Exhibit comes with a few importers that can either parse other formats themselves or convert the data through Babel at a Babel service URL you provide.

To import other formats, specify the following importer types in your Exhibit code:

  1. Excel files: use any of the following
    1. application/msexcel
    2. application/x-msexcel
    3. application/vnd.ms-excel
    4. application/x-excel
    5. application/xls
    6. application/x-xls
  2. RDF/XML files: application/rdf+xml
  3. N3 files: application/n3
  4. BibTeX: applicationl/x-bibtex

If you can manually convert your data through Babel, you should be able to import it dynamically into your exhibit using this method.

Note that this method slows down your exhibit because your data needs to travel through Babel first. We recommend that you do this only while developing your exhibit. Once your exhibit is finished, convert your data manually through Babel, save the result, and link your exhibit to the converted data instead.

Google Spreadsheet Importer

Refer to How_to_make_an_exhibit_from_data_fed_directly_from_a_Google_Spreadsheet.

JSON Maker (Excel Spreadsheet)

Jon Bogacki has written a macro-enabled Excel Spreadsheet to convert Excel spreadsheet data into Exhibit JSON format: JSON Maker for SIMILE Widgets. The spreadsheet provides a simple interface to set your data Types and Properties.

Understanding an Exhibit Database

Each exhibit created with Exhibit 3.0 Scripted mode has a database implemented in JavaScript that stores the exhibit's data and lets other parts of the exhibit query the data they need. This database is different from traditional (relational) databases that you might be familiar with, not only because it is implemented in JavaScript but because its data model is different.

1. Data Models

Different data models are different conceptual ways for describing and dealing with data. For example, if you were to write George Washington's biography, here are three different data models you might use:

  1. Write his biography as prose in a book, broken down into chapters but essentially organized in a sequential manner, intended to be read from start to end.
  2. Write his biography in several web pages, with links between them, so the reader can travel instantly from one event in Washington's life to another related event no matter how far apart in time those events occurred.
  3. Write his biography in a table with several columns, including the names of events, the times when they happened, locations where they happened, the people involved or affected, etc., so that the reader can sort, group, and filter the events, or re-visualize them on time lines and maps.

Different data models are suited for different purposes. Prose might be nice to read a child to sleep, or to provide commentary and analysis in addition to face. Tables are great for manipulation of the data, and re-visualization helps to present information in different ways.

You don't need to understand data models too deeply. Just know that different data models exist and are designed for different purposes.

Exhibit has its own data model, which consists of items, types, properties, and property values.

2. Items

Each Exhibit database contains zero or more items. If it helps, you can think of items as records in traditional (relational) databases. An item represents something, anything -- person (Peter Pan), an object (the book called "The DaVinci Code"), a concept (beauty), etc. It's up to you to decide what constitutes items in your own exhibit.

Identifiers

Each item has a unique identifier (or ID for short) that uniquely identifies the item within the exhibit. So two different items in an exhibit should have two different IDs – just as any two different people in the U.S. should have different social security numbers. If you accidentally assign the same identifier to two items, they will be considered the same by Exhibit.

An identifier is just a string -- short piece of text. There is really no restriction on what text can make up an identifier, but we would recommend something meaningful to you: "DaVinci Code", "Peter Pan", and "Beauty" would make good identifiers.

Although items have identifiers, you don't usually deal with identifiers directly. But we want to mention identifiers first just because we need to talk about them in various places later on.

Labels

In addition to an identifier, each item also has a label that is used to textually label the item in many cases when Exhibit needs to show the item in the web page.

Labels don't have to be unique. For example, two items (people) with IDs "John Doe #1" and "John Doe #2" can both have the label "John Doe". In most cases, you can use the same text for both the label and the ID of an item. In fact, Exhibit automatically assigns an item's label as its ID if you don't explicitly provide its ID.

3. Types

Each item also has a type. For example, the type of the item identified as "Peter Pan" would be "Person", the type of "The DaVinci Code" would be "Book", the type of "Beauty" would be "Concept".

Once again, Exhibit doesn't place any restriction on what constitutes types in your exhibit. You make that decision for your own data. Remember our motto for Exhibit: Your data, your business!

If you don't explicitly assign the type to an item, Exhibit sets the item's type to "Item".

Just like items, types also have IDs, which are just strings, e.g., "Book", "Beauty", and "Concept". Types also have labels -- more on labels later on.

4. Properties and Property Values

Now the fun part begins. Each item can have zero or more properties (otherwise known as attributes, fields). For example, the item "Peter Pan" would have

  • A "gender" property
  • A "member-of-gang" property

The item "The DaVinci Code" would have

  • An "author" property
  • A "number-of-copies-sold" property

The property value, or just value for short, of the "gender" property of "Peter Pan" is "male", and the value of the "member-of-gang" property of "Peter Pan" is "The Lost Boys". Similarly, the "author" property value of "The DaVinci Code" is "Dan Brown", and the "number-of-copies-sold" value is 6,347,343 or however many copies it was sold.

Value Types

Note that while the "gender" property value mentioned, "male", is text, the "number-of-copies-sold" property value is a number. So, property values can be of different value types:

text 	"Hello World!" 
number "67.5" 
date 	"2006-12-08", see ISO 8601 format, also see Working with dates 
boolean "true" or "false" 
url 	"http://www.google.com/" 
item 	More about this soon 

All property values of a property (e.g., "number-of-copies-sold") have the same value type ("number"). It is not possible to say that the "number-of-copies-sold" property value of "The DaVinci Code" is 6347343 while the "number-of-copies-sold" property value of "Lord of The Rings" is "so many I can't count" because the first value is a number and the second is text.

Don't forget to use quotes around property values.

Item Value Type

We noted above that the "author" property value of "The DaVinci Code" is "Dan Brown". It's OK to consider that property value to be of value type "text", but since Dan Brown is actually a person, there's more we can do.

We can create another item of type "Person", with ID "Dan Brown", and with label "Dan Brown (writer)". And then, we can declare that "author" property values are of value type "item". When we say the "author" property value of "The DaVinci Code" is "Dan Brown", we actually make a relationship between the item "The DaVinci Code" and the item "Dan Brown". The property value "Dan Brown" is no longer just text, but it identifies another item.

To see this principle demonstrated, examine at the Getting Started with Exhibit example of MIT Nobel Prize Winners, where the property "co-winner" is changed from a value to an item.

5. Graph-Based Data Model

The relationship between "The DaVinci Code" and "Dan Brown" mentioned previously is shown as a red arrow in this graph representation of the data:

Graph-based-model.jpg

Relationships are properties that link items to items. Other properties link from items to text, numbers, dates, booleans, and URLs. So, the value type of a relationship property is "item".

Note that there are two different concepts of types here: types of items (e.g., "Book", "Person") and value types of properties (e.g., "number", "date", "item"). When we say that the value type of the "author" property is "item", we don't say anything about the types of the authors themselves. Books can be written by individual people, small groups, large organizations, or even a faceless, nameless mob.

Although we say that the item "Dan Brown" is an "author" property value of the item "The DaVinci Code", there should be no implication that somehow the item "Dan Brown" is smaller or less important a thing than the item "The DaVinci Code". We could have also structured the properties such that the item "The DaVinci Code" is a "has-written" property value of the item "Dan Brown" and reversed the red arrow. It doesn't matter to Exhibit which direction you pick for a relationship, so just pick the direction most natural to you yourself.

Ready to learn more? Go on to Learn how to use expressions in your Exhibit.

Exhibit Expressions

Data in an Exhibit database can be represented as a graph, as in this example:

Graph-based-model.jpg

Exhibit expressions are used mainly to move along paths through paths and items, such as the path depicted in the graph. That is, given some nodes in the graph (whether circles/items or arrows/properties), evaluating an Exhibit expression retrieves other nodes (items or properties) that are related.

Exhibit moves along such paths by means of expressions. An Exhibit expression consists of a single path. A path consists of a sequence of one or more property IDs, each preceded by a hop operator. The . hop operator traverses along an arrow (forward, or away from an originating circle/item) while the ! hop operator traverses against an arrow (backward, or toward a circle/item).

For example, given the "The DaVinci Code" item node (the blue circle on the left in the graph above), evaluating .author.label returns "Dan Brown (writer)". Given the 6347343 value node, evaluating !number-of-copies-sold.author returns the item node "Dan Brown" (that is, you'll get the whole item/object, not just its name). Evaluating !number-of-copies-sold.author.id returns the value node "Dan Brown" (the id value, not the item itself).

Here are some more examples. You should be able to imagine for yourself, based on the wording of the properties, how the data might appear in a graph like the one above:

  • evaluating .hasAuthor.teachesAt.locatedIn on some papers returns the locations of the schools where the authors of those papers teach.
  • evaluating .spouseOf!parentOf on some people returns their parents-in-law.
  • evaluating !shot!arrested on John F. Kennedy returns the police officers who arrested his assassin.

A path can also start with one of a few predefined variables, currently including

  • value (referring to the current item or value on which the expression is being evaluated) and
  • index (referring to the index of the current item/value in a sequence of items/values)

value is understood if there is no such variable at the beginning of a path. That is, you can also write .spouseOf!parentOf as value.spouseOf!parentOf.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox