Data Science: Word Cloud Generation using R

Saturday, 22 October 2016

Word Cloud Generation using R

Go to https://dev.twitter.com/ and log in with your Twitter Account.
Go to https://apps.twitter.com/
Click on create new app.
Use unique name for the application as it might be used by other user.

Go to key and access tokens.
Generate the access token and key.

#Install R studio and R from below link to run the below R script.

#Write the r code to connect the twitter and generate the wordcloud

install.packages("twitteR")
install.packages("RCurl")
install.packages("wordcloud")
install.packages("SnowballC")
install.packages("tm")
install.packages("plyr")

library(twitteR)
library(RCurl)
library(wordcloud)
library(SnowballC)
library(tm)library(plyr)
library(dplyr)

rm(list=ls())

consumer_key <- 'your consumer key'
consumer_secret <- 'your consumer secret key'
access_token <- 'your access token key'
access_secret <- 'your access secret key'
setup_twitter_oauth(consumer_key = consumer_key,consumer_secret = consumer_secret,
access_token = access_token,access_secret = access_secret)

Donald_tweets <- searchTwitter("Donald+trumph",n=2000,lang="en",resultType = "recent")

donald_tweets_text <- sapply(Donald_tweets,function(x) x$getText())

donald_tweets_text_df <- as.data.frame(donald_tweets_text)

donald_corpus <- Corpus(VectorSource(donald_tweets_text))

inspect(donald_corpus[10])

#remove punctuation
donald_clean <- tm_map(donald_corpus, removePunctuation)

inspect(donald_clean[10])

#converting everything to lower cases
donald_clean <- tm_map(donald_clean, content_transformer(tolower))

#stopword are words like of, the, a, as..
donald_clean <- tm_map(donald_clean,removeWords, stopwords("english"))
donald_clean <- tm_map(donald_clean, removeNumbers)
donald_clean <- tm_map(donald_clean, stripWhitespace)

# i am removing #Donald Trumph as it is obviously will be there in
donald_clean <- tm_map(donald_clean, removeWords, "Donald+Trumph")

#all the documents as I have used it in my search#stemDocument("Viewing")
donald_clean <- tm_map(donald_clean, stemDocument)

#wordcloud(sultan_clean) #basic wordcloud
wordcloud(donald_clean, random.order = F, max.words = 200,
scale = c(3.5,1),random.color =T, colors = rainbow(10),min.freq = 3)

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)