Docs

Cogitare is a repository for prompts, writing seeds, plots, twists, and ideas… more generally, it's a repository for curated bits of text accompanied by tags, from various original sources.

Dicere is the set of systems that make it all work. It has three parts: the public API exposes the data to anything that may want to consume it, be it websites, bots, or other systems; the admin interface allows select people to add and manage the data (by invite only, at this point); the search interface, which is exposed in the public API, is powered by Algolia and lets anyone perform powerful searches over the entire collection.

Data

Items contain the actual data, the prompts, seeds, etc. They have zero or more tags, they belong to a dataset, and may have arbitrary metadata attached to them.

Tags are hierarchised: they may have a parent tag that is a “superset” category. Tags are the main classification mechanism. Some tags are assigned special meaning and may affect default behaviour.

Datasets are collections of items that share a common source or author or theme. While datasets can be explored on their own, the main purpose of datasets is for attribution. Additionally, datasets can be used for blocks: all items belonging to the dataset will disappear from the public view if the corresponding dataset is disabled.

All data has soft-deletion enabled: when data is “deleted” it is removed from public view but remains in the store. That enables worry-free and reversible deletion.

In the documentation, some implementation concerns like soft-deletion and additional relationships and data like users or histories are mentioned, but not further described. These are included for context, but are only used for internal book-keeping and administration, and are not available for public access.

Items

At their core, items are a unique blob of content. To that is attached some supporting metadata, both structured and arbitrary. Content at this point is a string of Markdown text. In the future, images may be added as first-class content; for now they can be embedded in the Markdown.

Conceptually, each item should be a short bit of text (that could fit in a tweet or two — no more than fifty words), but that is only a guideline. They should be standalone and provide a seed of inspiration for a writer, poet, artist, or whomever to spur the exercise of their creativity onwards.

Items have the following fields and relations:

Name Type
id integer
text string
created timestamp
updated timestamp
metadata array of key-value pairs
dataset Dataset
tags array of Tags

Tags

A tag is a short, case-insentive string of text that provides either:

Tags can contain any type of characters, but generally are of the lowercase-with-hyphens variety, or name-with:value. The second form is meant to represent related variants, e.g. colour:blue. Because search looks at parts of terms as well as the whole, searching for colour or blue would both match this tag.

Tags also have an optional freeform description that can contain additional details and notes on meaning. Tag descriptions are not considered when indexing items for search.

Tags can have an optional parent tag, which is automatically included (recursively) when indexing items for search, so if jedi has a parent of star-wars, and an item is tagged with jedi (but not star-wars), searching for "star wars" would include the item.

Tags have the following fields and relations:

Name Type
id integer
name string
description string
created timestamp
updated timestamp
parent Tag
children array of Tags
items array of Items

Datasets

When items are added to Cogitare, there's three scenarios:

In the first case, items go in an user's dataset, which is the default dataset for random submissions made by a user. In the second case, a separate dataset is added to over time; it represents the collection and may be contributed to by different users. In the third case, the dataset represents attribution and source for the entire bulk of items.

Datasets are strongly linked to their items: if a dataset is marked as deleted, all items in the dataset are also marked such. If a dataset is restored, all items are also restored. However, items may also be individually deleted without affecting the whole dataset.

Dataset names are included with items for indexing, so they can be used as filters or discovery when searching.

Datasets have the following fields and relations:

Name Type
id integer
name string
description string
created timestamp
updated timestamp
imported timestamp
metadata array of key-value pairs
items array of Items

Metadata

Both Items and Datasets have a metadata field. Metadata is stored as key-value pairs: a string key and a value of any type. To expose this structure in GraphQL in a way that reduces burden on consumers and doesn't lose typing information, a special interface is presented.

Two generic fields provide universal access to unknown-type values:

Typed access is done through type-named fields that all operate on the same principle: if the field is of the type, the value is provided, otherwise null is returned:

For bots

Bots may of course used the full interface and operate on the data as they see fit, but as a general guideline, the recommended access goes like this:

  1. If the user provides no hints or search, go to 4.
  2. Perform a search with a large limit, but retrieving only text.
  3. Pick at random within that.
  4. If nothing is returned, query the random endpoint.

If the bot has a particular context or is for a particular event, you may want to filter the results further. In this case, you can return more fields from search results and perform filtering client-side. For search, you can try to include additional words in the query to filter server-side, too.

Often, it is better to return something even if nothing matches. Consider what set of tags or datasets you should query for a “safe” random pick in your context.

Access

All data presented in the API is public and read-only.

There is no authentication nor limits at the moment. This may change without notice if the service is heavily used and that reduces the quality of the access for everyone.

If authentication is enabled, it would likely be a static token that would need to be provided as an Authorization header, obtainable on request and granted per application. A rate-limited anonymous access would still be provided. This information is provided as a non-binding indication only, so that you can design your solution with this eventuality in mind.

License

The data provided, while free to use, is not in the public domain. Each dataset may provide its own license, which will be provided in its metadata. By default, items are licensed for use under the Creative Commons Attribution 4.0 International but remain the copyright of their authors.

Whenever possible, licenses are expressed in SPDX format. The special string "UNLICENSED" should be taken to mean that the item or dataset is not under any license.

Dicere and Cogitare are themselves open-sourced under the Apache 2.0 license. You can find them both on Github under the Storily organisation.