Skip to content
This repository was archived by the owner on May 7, 2021. It is now read-only.

Commit 696a7af

Browse files
committed
Updated documentation.
1 parent 920dab0 commit 696a7af

1 file changed

Lines changed: 64 additions & 5 deletions

File tree

README.md

Lines changed: 64 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,37 @@ Java Naive Bayes Classifier
33

44
Nothing special. It works and is well documented, so you should get it running without wasting too much time searching for other alternatives on the net.
55

6+
Overview
7+
------------------
8+
9+
I like talking about *features* and *categories*. Objects have features and may belong to a category. The classifier will try matching objects to their categories by looking at the objects' features. It does so by consulting its memory filled with knowledge gathered from training examples.
10+
11+
Classifying a feature-set results in the highest product of 1) the probability of that category to occur and 2) the product of all the features' probabilities to occure in that category:
12+
13+
```classify(feature1, ..., featureN) = argmax(P(category) * PROD(P(feature|category)))```
14+
15+
This is a so-called maximum a posteriori estimation. Wikipedia actually does a good job explaining it: http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model
16+
17+
Learning from Examples
18+
------------------
19+
20+
Add knowledge by telling the classifier, that these features belong to a specific category:
21+
22+
```java
23+
String[] positiveText = "I love sunny days".split("\\s");
24+
bayes.learn("positive", Arrays.asList(positiveText));
25+
```
26+
27+
Classify unknown objects
28+
------------------
29+
30+
Use the gathered knowledge to classify unknown objects with their features. The classifier will return the category that the object most likely belongs to.
31+
32+
```java
33+
String[] unknownText1 = "today is a sunny day".split("\\s");
34+
bayes.classify(Arrays.asList(unknownText1)).getCategory());
35+
```
36+
637
Example
738
------------------
839

@@ -37,16 +68,44 @@ System.out.println( // will output "negative"
3768
Arrays.asList(unknownText1));
3869

3970
// Change the memory capacity. New learned classifications (using
40-
// learn method are stored in a queue with the size given here and
41-
// used to classify unknown sentences.
71+
// the learn method) are stored in a queue with the size given
72+
// here and used to classify unknown sentences.
4273
bayes.setMemoryCapacity(500);
4374
```
4475

4576
Forgetful learning
4677
------------------
4778

48-
This classifier is forgetful. This means, that the classifier will forget recent classifications it uses for future classifications after - defaulting to 200 - classifications learned.
49-
This will ensure, that the classifier can react to ongoing changes in the user's habbits.
79+
This classifier is forgetful. This means, that the classifier will forget recent classifications it uses for future classifications after - defaulting to 1.000 - classifications learned. This will ensure, that the classifier can react to ongoing changes in the user's habbits.
80+
81+
82+
Interface
83+
------------------
84+
The abstract ```Classifier<T, K>``` serves as a base for the concrete ```BayesClassifier<T, K>```. Here are its methods. Please also refer to the Javadoc.
85+
86+
* ```void reset()``` Resets the learned feature and category counts.
87+
* ```Set<T> getFeatures()``` Returns a ```Set``` of features the classifier knows about.
88+
* ```Set<K> getCategories()``` Returns a ```Set``` of categories the classifier knows about.
89+
* ```int getCategoriesTotal()``` Retrieves the total number of categories the classifier knows about.
90+
* ```int getMemoryCapacity()``` Retrieves the memory's capacity.
91+
* ```void setMemoryCapacity(int memoryCapacity)``` Sets the memory's capacity. If the new value is less than the old value, the memory will be truncated accordingly.
92+
* ```void incrementFeature(T feature, K category)``` Increments the count of a given feature in the given category. This is equal to telling the classifier, that this feature has occurred in this category.
93+
* ```void incrementCategory(K category)``` Increments the count of a given category. This is equal to telling the classifier, that this category has occurred once more.
94+
* ```void decrementFeature(T feature, K category)``` Decrements the count of a given feature in the given category. This is equal to telling the classifier that this feature was classified once in the category.
95+
* ```void decrementCategory(K category)``` Decrements the count of a given category. This is equal to telling the classifier, that this category has occurred once less.
96+
* ```int featureCount(T feature, K category)``` Retrieves the number of occurrences of the given feature in the given category.
97+
* ```int categoryCount(K category)``` Retrieves the number of occurrences of the given category.
98+
* ```float featureProbability(T feature, K category)``` (*implements* ```IFeatureProbability<T, K>.featureProbability```) Returns the probability that the given feature occurs in the given category.
99+
* ```float featureWeighedAverage(T feature, K category)``` Retrieves the weighed average ```P(feature|category)``` with overall weight of ```1.0``` and an assumed probability of ```0.5```. The probability defaults to the overall feature probability.
100+
* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator)``` Retrieves the weighed average ```P(feature|category)``` with overall weight of ```1.0```, an assumed probability of ```0.5``` and the given object to use for probability calculation.
101+
* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight)```Retrieves the weighed average ```P(feature|category)``` with the given weight and an assumed probability of ```0.5``` and the given object to use for probability calculation.
102+
* ```float featureWeighedAverage(T feature, K category, IFeatureProbability<T, K> calculator, float weight, float assumedProbability)``` Retrieves the weighed average ```P(feature|category)``` with the given weight, the given assumed probability and the given object to use for probability calculation.
103+
* ```void learn(K category, Collection<T> features)``` Train the classifier by telling it that the given features resulted in the given category.
104+
* ```void learn(Classification<T, K> classification)``` Train the classifier by telling it that the given features resulted in the given category.
105+
106+
The ```BayesClassifier<T, K>``` class implements the following abstract method:
107+
108+
* ```Classification<T, K> classify(Collection<T> features)``` It will retrieve the most likely category for the features given and depends on the concrete classifier implementation.
50109

51110
Possible Performance issues
52111
------------------
@@ -58,7 +117,7 @@ Performance improvements, I am currently thinking of:
58117
The MIT License (MIT)
59118
------------------
60119

61-
Copyright (c) 2012 Philipp Nolte
120+
Copyright (c) 2012-2014 Philipp Nolte
62121

63122
Permission is hereby granted, free of charge, to any person obtaining a copy
64123
of this software and associated documentation files (the "Software"), to deal

0 commit comments

Comments
 (0)