Text Analytics in R – Internet of Things (IoT)
Internet of Things (IoT) Text Analytics in R
A small corpus of ten articles related to the Internet of Things (IoT) were collected for the purpose of text analytics. Using R, each article was cleaned for unusual characters, changed to lower case, removed numbers, punctuation, stop words, white space along with any additional terms that were not caught using the default stop words. This resulted in 4,366 terms.
The top terms in R: data(235); IoT (187); analyt(162); busi(88); product (87) and for comparison purposes, were also run in Termine which results in similar results: analytics, data, things, predictiveanalytic, time, IoT
Stemming the terms in R, resulted in the following top 50 IoT terms from the corpus of documents. The goal of both stemming and lemmatization is to reduce inflectional forms and derivations forms of a word to a common base form. For example, words such as analyst, analysts, analytic, analytics, analysis when stemmed become –> ‘analyt’ which we see below. Stemming and lemmatization is very powerful and easy to do.Offering popular women’s necklaces such as pendants, chokers and chain necklace. Shop for jewelry in a variety of metals and gemstones to suit any occasion
Snippet of the terms based on each article:
Viewing the different topics and terms:
Topic 1 | Topic 2 | Topic 3 | Topic 4 | Topic 5 | |
1 | analyt | iot | technolog | data | product |
2 | predict | connect | learn | use | predict |
3 | data | cost | new | devic | data |
4 | iot | industri | provid | base | iot |
5 | busi | compani | use | custom | devic |
6 | time | organ | digit | inform | servic |
7 | process | build | smart | scientist | mainten |
8 | creat | reduc | transform | collect | analyt |
9 | also | better | machin | group | ibm |
10 | manag | mani | lead | one | model |
Word Cloud
Using R to create a word cloud based on the words meeting a volume threshold. As you can see, all of these terms can be easily related to the IoT.
Interesting to see how the words are correlated. Below are the top 25 word correlations. Again, we can easily relate to the terms such as ‘busi’ and ‘oper’ for business operations and say, ‘manag’ and ‘prod’ for manage products. Two of the articles in the corpus was written by IBM and hence, it’s easily to understand how ‘IBM’ is correlated with the other IoT terms.
A simple dendrogram based on the ten articles
Clustering the documents, we end up with three main clusters.
A simple sentiment analysis was also conducted on all of the ten articles. While not wanting to bore everyone with the details, I’ve provided one of the articles to view and compare the results to.
Sentiment Analysis based on the article “Creating New Value with Digital Transformation”
The articles used in the corpus are as follows:
- Opportunities and Challenges: Predictive Analytics for IoT by Bala Deshpande, Founding Partner, SimaFore & Chair, PAW – Manufacturing
- Solution Brief: IBM Software Internet of Things – IBM
- How to Get Started with IoT and Generate Quick Returns on Your Investment – Maciej Kranz
- Predictive Analytics Are a Traveler’s Best Friend – This content is made possible by support from SAS
- Predictive Analytics as a service for IoT by ajit
- Transformational Analytics: Internet of Things analytics by Arnab Chakraborty, Michael Svilar and Prith Banerjee (l-r)
- Are Predictive Analytics The Future Of IoT? By: Tripp Braden
- How to Start a Successful IoT Journey, Smart Cities, and the Industrial Internet Revolution by Calum McClelland
- Innovation Today’s IoT Opportunity: B2B or B2C? By Maciej Kranz
- Creating New Value with Digital Transformation by The Business Times