Songhay Publications and the Concept of the Index

What you are reading now is a Songhay Publication, this Blog. A Songhay Publication centers around the Document [GitHub]. Several Document instances can be grouped under one or more Segment [GitHub] instances. Further up the hierarchy, a Segment can group other Segment instances. Here is a diagram of the Segment, the Document (and the Fragment) to confuse almost everyone including me:

Songhay Publications Class Diagram

Some of the confusion comes from legacy Entity Framework concerns, specifically the use of ICollection<T> and other association properties. When I simplify my life with small-scale use of static JSON files, we can remove our concern for defining associations. Moreover, my small-scale use of Markdown eliminates the need (almost entirely) for Fragment. After all of this removing and eliminating, at least I can understand why I have the core interfaces, ISegment [GitHub] and IDocument [GitHub], of Songhay Publications:

Songhay Publications Interfaces

the Web Publication Index

Compiling a collection of ISegment groups into a file renders an Publication Index. Meanwhile, in the real world, there is the confusion between a “table of contents” and an Index. As of this writing in the Songhay studio, we avoid this confusion by defining an Index here and looking forward to defining a Keyword Index (which is a real-world index).

All of this fantastic thinking about the Index comes from my work on rendering a Web Publication Index in HTML. I have been working on the visual design of two kinds of Index layouts:

the Index
the Blog Index

Behind the visual design is the information design (too many years of my life was spent here). Both of these visual layouts are driven by a static JSON file this getting closer and closer to shapes of the models in Songhay.Publications. Today, I assume Songhay Index data will eventually look like this:

[
  {
    "segmentName": "Section 1",
    "segments": [
      {
        "segmentName": "Section 1.1",
        "segments": [
          {
            "segmentName": "Section 1.1.1"
          },
          {
            "segmentName": "Section 1.1.2"
          },
          {
            "segmentName": "Section 1.1.3"
          }
        ]
      },
      {
        "segmentName": "Section 1.2",
        "segments": [
          {
            "segmentName": "Section 1.2.1"
          },
          {
            "segmentName": "Section 1.2.2"
          },
          {
            "segmentName": "Section 1.2.3"
          }
        ]
      },
      {
        "segmentName": "Section 1.3",
        "segments": [
          {
            "segmentName": "Section 1.3.1"
          },
          {
            "segmentName": "Section 1.3.2"
          },
          {
            "segmentName": "Section 1.3.3"
          }
        ]
      }
    ]
  },
  {
    "segmentName": "Section 2",
    "segments": [
      {
        "segmentName": "Section 2.1",
        "segments": [
          {
            "segmentName": "Section 2.1.1"
          },
          {
            "segmentName": "Section 2.1.2"
          },
          {
            "segmentName": "Section 2.1.3"
          }
        ]
      },
      {
        "segmentName": "Section 2.2",
        "segments": [
          {
            "segmentName": "Section 2.2.1"
          },
          {
            "segmentName": "Section 2.2.2"
          },
          {
            "segmentName": "Section 2.2.3"
          }
        ]
      }
    ]
  }
]

The JSON above is truncated for prose reading which means that Document data has been left out. For example, Section 2.2 would have documents shaped like this:

{
  "segmentName": "Section 2.2",
  "segments": [
    {
      "segmentName": "Section 2.2.1",
      "documents": [
        {
            "title": "Article 2.2.1.1",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        },
        {
            "title": "Article 2.2.1.2",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        },
        {
            "title": "Article 2.2.1.3",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        }
      ]
    },
    {
      "segmentName": "Section 2.2.2",
      "documents": [
        {
            "title": "Article 2.2.2.1",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        },
        {
            "title": "Article 2.2.2.2",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        }
      ]
    },
    {
      "segmentName": "Section 2.2.3",
      "documents": [
        {
            "title": "Article 2.2.3.1",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        },
        {
            "title": "Article 2.2.3.2",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        },
        {
            "title": "Article 2.2.3.3",
            "fileName ": "..",
            "modificationDate ": "..",
            "path": "..",
            "sortOrdinal": "..",
            "tag": ".."
        }
      ]
    }
  ]
}

the Songhay Publications Index in Typescript

The JSON above can be used as a guide to define a single Index entry in Typescript:

interface IndexEntry extends Partial<Segment> {
  segments?: IndexEntry[];
  documents?: Partial<Document>[];
}

I made a little manual test to verify that this interface is working.

distinguishing the search index from the Publication Index

I have opened an issue in the Publications repo because I introduced the Index concept to Songhay Publications to refer to a search Index. My work on a lunr search index at the beginning of this year was and is very exciting but lost sight of the bigger picture: a search Index must be distinguished from the Publication Index being introduced here.

Why can’t this search index be related to the Publication Index? Let us take a look at the lunr search index in use as of this writing:

[
  {
    "extract": "...",
    "clientId": "..",
    "inceptDate": "...",
    "modificationDate": "...",
    "title": "..."
  },
  {
    "extract": "...",
    "clientId": "..",
    "inceptDate": "...",
    "modificationDate": "...",
    "title": "..."
  },
]

It is clear that what we are looking at is an array of Partial<Document> with this extract property tacked on. (The inceptDate property, by the way, maps to the createDate property on Document; there is an issue out there addressing this weirdness).

Songhay Publications has a method for producing an extract from Document. So it is quite clear that a search index is a very simplified form of our new Publications Index.

Again, in Typescript we can formally and precisely define the SearchIndexEntry:

interface SearchIndexEntry extends Partial<Document> {
  string extract;
}

By writing this, I see now that I can replace the LunrIndexEntry I have defined for this Blog Publication with the more generalized SearchIndexEntry.

https://github.com/BryanWilhite/