admin

Posts by admin

Principal Components Analysis - Grouped by Sector

Principal Components Analysis

Principal Components Analysis Utilizing a stock portfolio data set and the Principal Components Analysis as a method in reducing dimension and as a remedial measure for multicollinearity in Ordinary Least Squares regression.  Beginning with the data, we will transform the variables into log values to explain the variation in the log-returns of the stocks and
+ Read More

Automated Variable Selection

Automated Variable Selection The Amex, Iowa housing data set build has been utilized to develop various iterative regression models to determine the mean sales price of a house based on numerous variables. The variables range correlated, continuous variables to categorical variables. In this installment, we continue building the model using raw categories and later, the
+ Read More

Data Variables and Analytical Models

Data Variables and Analytical Models

Data Variables and Analytical Models Before diving in to a statistical analysis of any dataset, spending the requisite time to understand the data, checking the quality and taking a look ‘under the dash’ is essential.  Below, we will examine the data variables and analytical models on a housing prices as a first step in predicting
+ Read More

Regression Models Using Numerous Variables

Assessing Regression Models Using Numerous Variables Regression model on the Amex, Iowa housing data set builds regression models for the house sale price with numerous variables.  Some of which are highly correlated, continuous variables along to the other side of the continuum by evaluating categorical, low correlated variables.  An assessment of each model will be
+ Read More

Python Word Analytics

Text Analytics on News Article

Text Analytics For a little fun, this is a text analytics based on a CBC News article which is available at http://www.cbc.ca/news/technology/trump-climate-change-executive-order-1.4043650[raw]. Below you will see the Python code along with the various word analytics on the text (which was downloaded and put into a text file named “cbcnewstrump.txt”. Enjoy! [/raw]

Variable Transformations

Variable Transformations: Continuous & Categorical

Variable Transformations The Amex, Iowa housing data set build has been utilized to develop various regression models to determine the sales price of a house based on numerous variables. The variables range from highly correlated, continuous variables to categorical variables with low correlations. In this assessment, variable transformations and comparisons of Y versus Log(Y) will
+ Read More

Word Counter in Jupyter Notebooks

Word Counter in Jupyter Notebooks

Simple utilities can make things so much easier at times.  This Jupyter Notebook take a document or in this case, ‘Alice in Wonderland’ by Lewis Carroll and provides the top 10 words.  Naturally, words can be eliminated but for ease of reference, only the word ‘the’ has been removed. For websites, a deeper cleaner is needed
+ Read More

Python Pandas

Power of Python Pandas

Power of Python Pandas The ease of extracting and summarizing large amounts of data using Python Pandas is powerful.  Below is an example of using airline data to find out how many passengers went to an airport, accident rate based on reference codes, deaths and the causes of accidents.  With a few lines of code,
+ Read More

Extract User Reviews using Python Pandas

Extract User Reviews using Python Pandas

Extract User Reviews using Python Pandas TripAdvisor user reviews data about a particular hotel. To help the hotel understand the feedback the reviews provide, and what it might suggest they should focus on to improve customer experience.  In part I, data will be extracted for each reviewer’s ratings of a hotel along with a summary.  The
+ Read More

Python Relational Database

Python Relational Database

Python Relational Database In this example, two data “assets” will be created to be used by a company for a direct marketing campaigns. The raw data will be used to create a python relational database to create a “flat” file with selected customers and variables. Examples of the summary output will be provided along with
+ Read More

1 2 3