|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.mllib.feature.IDF
public class IDF
:: Experimental ::
Inverse document frequency (IDF).
The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total
number of documents and d(t) is the number of documents that contain term t.
This implementation supports filtering out terms which do not appear in a minimum number
of documents (controlled by the variable minDocFreq). For terms that are not in
at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.
param: minDocFreq minimum of documents in which a term should appear for filtering
| Nested Class Summary | |
|---|---|
static class |
IDF.DocumentFrequencyAggregator
Document frequency aggregator. |
| Constructor Summary | |
|---|---|
IDF()
|
|
IDF(int minDocFreq)
|
|
| Method Summary | |
|---|---|
IDFModel |
fit(JavaRDD<Vector> dataset)
Computes the inverse document frequency. |
IDFModel |
fit(RDD<Vector> dataset)
Computes the inverse document frequency. |
int |
minDocFreq()
|
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public IDF(int minDocFreq)
public IDF()
| Method Detail |
|---|
public int minDocFreq()
public IDFModel fit(RDD<Vector> dataset)
dataset - an RDD of term frequency vectors
public IDFModel fit(JavaRDD<Vector> dataset)
dataset - a JavaRDD of term frequency vectors
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||