Python Relational Database

January 30, 2017 admin

Python Relational Database

In this example, two data “assets” will be created to be used by a company for a direct marketing campaigns. The raw data will be used to create a python relational database to create a “flat” file with selected customers and variables. Examples of the summary output will be provided along with saving the created output for future usage and sharing.

Using Python and the pandas package along with sqlite database.

Data

Three tables in the DB pilot schema that are named item, mail, and customer. Export each table as a csv file with a header record, using a temporary view to do each one.

Use the Data to:

1. Import each of the csv files downloaded from the SSCC into a pandas DataFrame in a Python session. Print out the columns of the mail DataFrame and the first five (5) records in it.

2. Verify that there are no duplicate customer records in the customer data. Provide counts of the number of duplicate records in the customer data. Be sure to include a definition of “duplicate.”

3. Check the item and mail data to determine whether there are any records in them for customers who are not in the customer data. Provide counts of the records in the item and mail data that do not belong to any customer in the customer data.

4. Create a sqlite database, and write the customer, item, and mail data into it as tables. Do not include any item or mail data that don’t match a customer. Save the sqlite database for use in the future. Verify all data has been written to the sqlite DB.

5. Create and export a csv file for XYZ to target in a direct mail marketing campaign. This file should have the following data fields in it, and it should have a header record giving field names. It should only include customers who have been mailed at least seven (7) times in XYZ’s 16 mail campaigns.

Data fields to be included in the csv file:

account number
number of mail campaigns
2009 year-to-date transactions
2009 year-to-date sales
ZHOMEENT
ZMOBAV
ZCREDIT
ZTECH
ZHOMEENT01
ZMOBAV01
ZCREDIT01
ZTECH01

The first four “Z” variables above are from Experian create a version that is numeric and is coded “1” (numeric) if the original is coded “Y,” and “0” (numeric) for any other original code. Include code and results for verifying the new 0/1 variables correctly, taking missing values and nulls into account.

6. Create a cross-tabulation for each of the Z variables and their numeric versions, showing for each that the recoding worked as required while considering any missing values.

7. Pickle or Shelve the pandas DataFrames and save the sqlite DB.

Customers of a hotel speak-winter-2017 CC by SA Lynd Bacon & Associates, Ltd. DBA Lorma Buena Associates is licensed under CC BY 4.0.

[raw]

<br />