Document Analysis using Support Vector Machines
This post details the Vector Space Kernel Model for document analysis outlined in Shawe-Taylor and Cristianini.
Create Encoded Matrix for each Document
- Select Dictionary of Terms
- Calculate Term Frequency
- Encode using Dictionary of Terms
Calculate Document-Term Matrix
The matrix representation of the document term frequencies shows the frequency of a term across a collection of documents.
Kernel Methods for Support Vector Machines
If x and y are vectors representing a document then a kernel mapping K would be defined as:
K(x, y) = φ(x) · φ(y) = φ(x · y)
where the kernel K, the dot product in the new feature space, is defined as a function of the dot product in the original feature space.
Given the document-term matrix (D) and the term-document matrix (D’), define K = DD’ the co-occurrence matrix. Then for documents d_1 and d_2, define the Vector Space...
Continue reading →