# Simple statistics in NLP

## 1.Frequency Distributions of words.

At most time,we can know topic about a text from the frequent words or infrequent words. So,in this case,we should know the Frequency Distribution of words or collocations.

We can do this with nltk as bellow:

fdist = FreqDist(text)


## 2.Select words by length.

Sometime,the length of words will tell us some information of text,specially with distribution of length of words. We can do this with nltk as bellow:

#select words by length
val = set(text)
uwords = [w for w in val if len(w)>7]

#get distribution of words length
ldist = FreqDist[len(w) for w in text]


## 3.Collocations & Bigrams.

Collocations:a sequence of words that occur together unusually often. Bigrams:the method provided by nltk to get pair of words in a text.

We can do this with nltk as bellow:

#get bigrams of list of words
pairs = bigrams(['word0','word1','word2','word3'])

#get collection of text
col = text.collections()