Interesting examples
We highlight some interesting examples of clusters of tokens in the UMAPs across both the small language model (SLM) and Pythia family (referred to by their parameter count, e.g. 14M means Pythia-14M). Where appropriate we compare these clusters to SAE features from Bricken et al using the classification introduced there.
Common streaks
There are some ``streaks'' of tokens that are common across many Pythia models. These include:
- Sentence starts
The
,These
etc, in the SLM, 14M, 70M, 160M, 1.4B. These form a clear streak in the SLM but are distributed at the dorsal boundary of the UMAP in the larger models. Interesting in the UMAP of Bricken et al there are a few of these sentence starts nearby each other, e.g. featureA/1/3080
. - Tab characters
\t
SLM, 14M, 70M, 160M, in 1.4B the tabs seem more diffuse. - Multiplication
*
in mathematics SLM, 14M, 70M, 160M, 1.4B but above 70M it is more diffuse. Related to featureA/1/3762
in Bricken et al. - Double spaces between multiple choice questions i.e.
(a) answer (b)
in 70M vs 160M. It looks like 70M doesn't "know" about this pattern but 160M does. - Variable names in mathematics 70M, 160M, 1.4B. Related to Feature
A/1/3526
. 2
as an exponent 14M and 70M. Related to featureA/1/2401
.
Pythia-70M
as
token in "as well as" (link)
Pythia-160M
- In 160M we can see that the occurrences of
="
followingref-type
are clustered and separated from the mainclass="
cluster, in a way that isn't true in 70M. - In 160M (left, right) two kinds of commas have separated into "lobes". Feature
A/1/1081
is commas separating lists of names and nearby are some features that seem related. - Near these "list like commas" you can find
\n
tokens that are newline separated lists. - Two of the most noticeable patterns in the
cc
dataset are apostrophet
ands
(link). The former is featureA/1/169
@
in email addresses (link). Related to featureA/1/1570
.- 160M has this structure at the posterior end made up almost entirely of newlines, which is not present in 70M (it has a cluster of newlines further up its body, but without this much structure).
- "Surprising newlines" (link). These happen in freelaw quite a lot, because of the manual linebreaks.
- Newlines as lists? (link).
- "Newlines following squigglies" (link). Newlines following multiple
~
tokens or dashes. - "Newlines after True/False answers" (link) in dm_mathematics but *only* when they are true or false.
- "Newlines after general answers" (link) in dm_mathematics
- "Double newlines" (link) this is the central supporting axis of the structure.