Interesting examples

We highlight some interesting examples of clusters of tokens in the UMAPs across both the small language model (SLM) and Pythia family (referred to by their parameter count, e.g. 14M means Pythia-14M). Where appropriate we compare these clusters to SAE features from Bricken et al using the classification introduced there.

Common streaks

There are some ``streaks'' of tokens that are common across many Pythia models. These include:

Sentence starts The, These etc, in the SLM, 14M, 70M, 160M, 1.4B. These form a clear streak in the SLM but are distributed at the dorsal boundary of the UMAP in the larger models. Interesting in the UMAP of Bricken et al there are a few of these sentence starts nearby each other, e.g. feature A/1/3080.
Tab characters \t SLM, 14M, 70M, 160M, in 1.4B the tabs seem more diffuse.
Multiplication * in mathematics SLM, 14M, 70M, 160M, 1.4B but above 70M it is more diffuse. Related to feature A/1/3762 in Bricken et al.
Double spaces between multiple choice questions i.e. (a) answer (b) in 70M vs 160M. It looks like 70M doesn't "know" about this pattern but 160M does.
Variable names in mathematics 70M, 160M, 1.4B. Related to Feature A/1/3526.
2 as an exponent 14M and 70M. Related to feature A/1/2401.

Pythia-70M

as token in "as well as" (link)

Pythia-160M

In 160M we can see that the occurrences of =" following ref-type are clustered and separated from the main class=" cluster, in a way that isn't true in 70M.
In 160M (left, right) two kinds of commas have separated into "lobes". Feature A/1/1081 is commas separating lists of names and nearby are some features that seem related.
Near these "list like commas" you can find \n tokens that are newline separated lists.
Two of the most noticeable patterns in the cc dataset are apostrophe t and s (link). The former is feature A/1/169
@ in email addresses (link). Related to feature A/1/1570.
160M has this structure at the posterior end made up almost entirely of newlines, which is not present in 70M (it has a cluster of newlines further up its body, but without this much structure).
"Surprising newlines" (link). These happen in freelaw quite a lot, because of the manual linebreaks.
Newlines as lists? (link).
"Newlines following squigglies" (link). Newlines following multiple ~ tokens or dashes.
"Newlines after True/False answers" (link) in dm_mathematics but *only* when they are true or false.
"Newlines after general answers" (link) in dm_mathematics
"Double newlines" (link) this is the central supporting axis of the structure.