# Converting Context Free Grammar to Chomsky Normal Form

Prerequisite – Simplifying Context Free Grammars

A context free grammar (CGF) is in Chomsky Normal Form (CNF) if all production rules satisfy one of the following conditions:

• A non-terminal generating a terminal (e.g.; X->x)
• A non-terminal generating two non-terminals (e.g.; X->YZ)
• Start symbol generating ε. (e.g.; S-> ε)

Consider the following grammars,

```G1 = {S->a, S->AZ, A->a, Z->z}
G2 = {S->a, S->aZ, Z->a}
```

The grammar G1 is in CNF as production rules satisfy the rules specified for CNF. However, the grammar G2 is not in CNF as the production rule S->aZ contains terminal followed by non-terminal which does not satisfy the rules specified for CNF.

Note –

• For a given grammar, there can be more than one CNF.
• CNF produces the same language as generated by CFG.
• CNF is used as a preprocessing step for many algorithms for CFG like CYK(membership algo), bottom-up parsers etc.
• For generating string w of length ‘n’ requires ‘2n-1’ production or steps in CNF.
• Any Context free Grammar that do not have ε in it’s language has an equivalent CNF.

How to convert CFG to CNF?

Step 1. Eliminate start symbol from RHS.
If start symbol S is at the RHS of any production in the grammar, create a new production as:
S0->S
where S0 is the new start symbol.

Step 2. Eliminate null, unit and useless productions.
If CFG contains null, unit or useless production rules, eliminate them. You can refer the this article to eliminate these types of production rules.

Step 3. Eliminate terminals from RHS if they exist with other terminals or non-terminals. e.g,; production rule X->xY can be decomposed as:
X->ZY
Z->x

Step 4. Eliminate RHS with more than two non-terminals.
e.g,; production rule X->XYZ can be decomposed as:
X->PZ
P->XY

Example – Let us take an example to convert CFG to CNF. Consider the given grammar G1:

```S → ASB
A → aAS|a|ε
B → SbS|A|bb
```

Step 1. As start symbol S appears on the RHS, we will create a new production rule S0->S. Therefore, the grammar will become:

```S0->S
S → ASB
A → aAS|a|ε
B → SbS|A|bb
```

Step 2. As grammar contains null production A-> ε, its removal from the grammar yields:

```S0->S
S → ASB|SB
A → aAS|aS|a
B → SbS| A|ε|bb
```

Now, it creates null production B→ ε, its removal from the grammar yields:

```S0->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS| A|bb
```

Now, it creates unit production B->A, its removal from the grammar yields:

```S0->S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
```

Also, removal of unit production S0->S from grammar yields:

```S0-> AS|ASB| SB| S
S → AS|ASB| SB| S
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
```

Also, removal of unit production S->S and S0->S from grammar yields:

```S0-> AS|ASB| SB
S → AS|ASB| SB
A → aAS|aS|a
B → SbS|bb|aAS|aS|a
```

Step 3. In production rule A->aAS |aS and B-> SbS|aAS|aS, terminals a and b exist on RHS with non-terminates. Removing them from RHS:

```S0-> AS|ASB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|bb|XAS|XS|a
X →a
Y→b
```

Also, B->bb can’t be part of CNF, removing it from grammar yields:

```S0-> AS|ASB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X → a
Y → b
V → b
```

Step 4: In production rule S0->ASB, RHS has more than two symbols, removing it from grammar yields:

```S0-> AS|PB| SB
S → AS|ASB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X → a
Y → b
V → b
P → AS
```

Similarly, S->ASB has more than two symbols, removing it from grammar yields:

```S0-> AS|PB| SB
S → AS|QB| SB
A → XAS|XS|a
B → SYS|VV|XAS|XS|a
X → a
Y → b
V → b
P → AS
Q → AS
```

Similarly, A->XAS has more than two symbols, removing it from grammar yields:

```S0-> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → SYS|VV|XAS|XS|a
X → a
Y → b
V → b
P → AS
Q → AS
R → XA
```

Similarly, B->SYS has more than two symbols, removing it from grammar yields:

```S0 -> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → TS|VV|XAS|XS|a
X → a
Y → b
V → b
P → AS
Q → AS
R → XA
T → SY
```

Similarly, B->XAX has more than two symbols, removing it from grammar yields:

```S0-> AS|PB| SB
S → AS|QB| SB
A → RS|XS|a
B → TS|VV|US|XS|a
X → a
Y → b
V → b
P → AS
Q → AS
R → XA
T → SY
U → XA
```

So this is the required CNF for given grammar.