# Document Analysis using Support Vector Machines

This post details the Vector Space Kernel Model for document analysis outlined in Shawe-Taylor and Cristianini.

## Create Encoded Matrix for each Document

- Select Dictionary of Terms
- Calculate Term Frequency
- Encode using Dictionary of Terms

### Calculate Document-Term Matrix

The matrix representation of the document term frequencies shows the frequency of a term across a collection of documents.

## Kernel Methods for Support Vector Machines

If **x** and **y** are vectors representing a document then a kernel mapping **K** would be defined as:

```
K(x, y) = φ(x) · φ(y) = φ(x · y)
```

where the kernel **K**, the dot product in the new feature space, is defined as a function of the dot product in the original feature space.

## Document Analysis

Given the *document-term matrix* (**D**) and the *term-document matrix* (**D’**), define **K = DD’** the co-occurrence matrix. Then for documents **d_1** and **d_2**, define the *Vector Space*...