# Document Analysis using Support Vector Machines

This post details the Vector Space Kernel Model for document analysis outlined in Shawe-Taylor and Cristianini.

## Create Encoded Matrix for each Document

1. Select Dictionary of Terms
2. Calculate Term Frequency
3. Encode using Dictionary of Terms

### Calculate Document-Term Matrix

The matrix representation of the document term frequencies shows the frequency of a term across a collection of documents.

## Kernel Methods for Support Vector Machines

If x and y are vectors representing a document then a kernel mapping K would be defined as:

``````K(x, y) = φ(x) · φ(y) = φ(x · y)
``````

where the kernel K, the dot product in the new feature space, is defined as a function of the dot product in the original feature space.

## Document Analysis

Given the document-term matrix (D) and the term-document matrix (D’), define K = DD’ the co-occurrence matrix. Then for documents d_1 and d_2, define the Vector Space...

# Notes on Uncertainty

Assertions are made by Pearl ‘88 - Probabilistic Reasoning in Intelligent Systems.

Encoding knowledge into rules requires enumerating examples. Positive examples are difficult to satisfy, and ambiguously defined. As a compromise, exceptions can be summarized. Each proposition can be assigned a measure of uncertainty which is aggregated. This uncertainty value is not a truth value, but closer to a counter-example. There is a restrictive assumption of independence. Three schools appear, non-monotonic logic which is non-numerical, probability calculus that is numerical including Demspter-Schaefer, fuzzy logic, and certainty factors, and probability theory, Bayesian probability.

``````A->C
B->C
(A^B) -> C
What do these propositions say about the interaction of A and B, and what are their exceptions?
``````

Extensional systems use productions, and Intensional systems use declarative knowledge...