Songhay Publications and the Concept of the Index (2021)

The intent here is to build upon the 2020 version of this subject and move forward toward new definitions and techniques that will work well in my Studio. That 2020 exploration gives us:

  • the Search Index Entry
  • the Publications Index Entry

We will see from the sections below that the Search Index Entry is a Document with just one extract property tacked on—and the Publications Index Entry is a way to present a tree-like hierarchy of parent and child Segment data with any related child Document data.

Search Index Entry

In Typescript, the SearchIndexEntry is just an extension of Document:

interface SearchIndexEntry extends Partial<Document> {
  string extract;

The following table summarizes its usage in the Studio:

repo language remarks
songhay-core Typescript consumed in Day Path and rasx() context Blogs by lunrjs, client-side search UX
Songhay.Publications C# Songhay.Publications.Activities.GenerateSearchIndexFrom11tyEntries()

Publications Index Entry

In Typescript, the IndexEntry is an extension of Segment:

interface IndexEntry extends Partial<Segment> {
  segments?: IndexEntry[];
  documents?: Partial<Document>[];
repo language remarks
songhay-core Typescript consumed by Publication Index layouts GitHub](]
Songhay.Publications C# ISegmentExtensions.ToPublicationIndexEntryJObject()

The information design goal behind this ‘tree-like hierarchy of parent and child Segment data’ is to represent something like a site map of a website. It would define how to break one large segment (the entire website) down into smaller child segments each with their child documents.

unifying these Index concepts

In Typescript, we can unify these two Index types into a new (proposed) IndexEntry type:

interface IndexEntry extends Partial<Segment> {
  segments?: IndexEntry[];
  documents?: Partial<SearchIndexEntry>[];

For the sake of naming, I would rename SearchIndexEntry to IndexDocument:

interface IndexEntry extends Partial<Segment> {
  segments?: IndexEntry[];
  documents?: Partial<IndexDocument>[];

translating these Typescript definitions into F♯

The contemporary goal of this Studio is to maximize the use of F♯ via Blazor/Bolero which means the role Typescript plays should be significantly reduced.

In the world of F♯, our IndexEntry would look like this:

type IndexEntry() =
    inherit Segment()

    member val segments = Unchecked.defaultof<IndexEntry[] option> with get, set
    member val documents= Unchecked.defaultof<(Document * Extract)[] option> with get, set

…where Extract is type Extract = Extract of string.

We can see that move like this (which includes inheriting from the classic C♯ Segment) makes IndexEntry a bridge 🌉 between the C♯ and F♯ worlds. Such a ‘bridge’ might save me from having to rewrite all the Studio’s C♯ as F♯—which feels like an unwise move in terms of consuming even more of my precious time 👴 at the very least.

obtaining the Document extract

This new Extract type introduces a very important subject matter: the fate of the C♯ Fragment. So far, according to Studio tradition, we have made mention of the classic C♯-based GenericWeb schema of this Studio: the Segment and the Document. But what about the Fragment

At the turn of the century, I developed the Fragment because my Studio only had ‘formal’ recognition of the XHTML document (which is just an aspect of the massive investment in XML, founding this Studio) and the PDF document. What was missing from this Studio at the turn of the century (for well over a decade) were:

  • the Git repository over the file system
  • the cloud 🌩 file system (Azure Storage)
  • cross-platform PowerShell
  • the markdown document
  • the JSON document

“Power User” workflows around Git, Azure Storage, PowerShell, markdown and JSON effectively eliminate the need for the Fragment. For example, yielding an Extract would involve consulting a Document that would point to a markdown document. The markdown document would have front matter in the form of embedded JSON (or even YAML), storing the Extract string.

This new front matter concept in markdown is the leading justification for deprecating Fragment. In the old world 👴 of GenericWeb, the Document represented a statically-generated HTML document to be displayed in the near future. In this new world of Songhay Publications, the Document should be a pointer to a real document (or binary file) existing in the present (and this present file can then be used to statically-generate HTML at future publication).

The original thinking 🧠🐣 behind GenericWeb was to build a poor-man’s version of Microsoft Index Server (released in August 1996) that could also personalize Web analytics reporting, based on breaking down a website into user-defined ‘segments.’ GenericWeb, by the way, was ‘sold’ to UCLA Medical Center Computing Services in the late 1990s when I was consultant-turned-employee.

does your “Studio” not recognize the NoSQL database?

Notice how the section above (“#obtaining the Document extract”) makes no mention of the NoSQL database. No MongoDB? No LiteDB? The explanation for this ignorance can only be more ignorance—and it starts with this 💸 ignorant statement: this Studio does not need NoSQL technologies because there is no need to collect arbitrary shapes of data from “customers” all over the world. This Studio is currently focused on read-only-mostly solutions for publishing which is different from read-write solutions for advertising-based “engagement” 💸

What is ignorant of this Studio—and also embarrassing for this Studio—is the lack of a solid distributed cache support (like Azure Cache for Redis) and the lack of years of experience with CDN products. Working on these world-dominating things have a higher priority than developing a better understanding of NoSQL tech.

spending decades building ‘Power User workflows’ means being doomed to loneliness 😐

Since the turn of the century, there has been a social-media side to being a software developer. And well before GitHub there was the social engineering required for the many MVP programs sponsored by multi-national corporations.

The work being done here is rather sprawling, delayed (by years often) and was never popular in the first place. My use of the world sprawling is meant to take the glamour out of the word “interdisciplinary”: the crossing of C♯ and F♯ with a focus on building something like the Quark Publishing System or maybe something like AppleScript entwined around Adobe FrameMaker sounds archaic, misplaced and bizarre to most socially adept tech folks.

Even the promise of introducing more user-friendly GUI-based tools (before 2030 😆) will only make my work rise to the level of a modicum of fame around command-line wrappers like Handbrake (because that is all I plan to build: command-line wrappers around Songhay Activity assemblies). So, earlier, I got rid of the money 💸and now am ridding myself of fame 👻

I am just a guy that started writing macros in Microsoft Word and somehow ended up here 👴 building digital tools for a small publishing house: this Studio. That sounds okay to me!