Introduction

About Prosper

From the Prosper website. Prosper is America’s first marketplace lending platform, with over $10 billion in funded loans.

Prosper allows people to invest in each other in a way that is financially and socially rewarding. On Prosper, borrowers list loan requests between $2,000 and $35,000 and individual investors invest as little as $25 in each loan listing they select. Prosper handles the servicing of the loan on behalf of the matched borrowers and investors.

Overview

From Udacity:

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.


Setup & General

Select Categories of Interest

The dataframe has been reduced from 81 to 22 categories that may be evaluated as parameters of interest within the scope of the data-set options.

prosper <- prosper[c(3, 5, 6, 9, 13, 15:18, 20, 26, 35:36, 41:42, 46:50, 53:54)]
colnames(prosper)
 [1] "ListingCreationDate"        "Term"                       "LoanStatus"                 "BorrowerRate"              
 [5] "EstimatedReturn"            "ProsperRating..Alpha."      "ProsperScore"               "ListingCategory..numeric." 
 [9] "BorrowerState"              "EmploymentStatus"           "CreditScoreRangeLower"      "TotalInquiries"            
[13] "CurrentDelinquencies"       "RevolvingCreditBalance"     "BankcardUtilization"        "TradesOpenedLast6Months"   
[17] "DebtToIncomeRatio"          "IncomeRange"                "IncomeVerifiable"           "StatedMonthlyIncome"       
[21] "TotalProsperPaymentsBilled" "OnTimeProsperPayments"     

Data Transformations

as.factor

prosper$ProsperScore <- as.factor(prosper$ProsperScore)
prosper$ListingCategory..numeric. <- as.factor(prosper$ListingCategory..numeric.)
prosper$Term <- as.factor(prosper$Term)
prosper$ProsperRating..Alpha. <- factor(prosper$ProsperRating..Alpha.,
                                        levels = c('AA', 'A', 'B', 'C', 'D',
                                                   'E', 'HR'), ordered = TRUE)

Calculated Yearly Income

prosper$CalcYearlyIncome <- prosper$StatedMonthlyIncome * 12

Reformat Income Range

Create a more refined view of income range based upon StatedMonlyIncome * 12

prosper$IncomeRange_old <- prosper$IncomeRange
prosper$IncomeRange <- cut(prosper$CalcYearlyIncome, dig.lab = 10,
                           breaks = c(0, 10000, 20000, 30000, 40000, 50000,
                                      60000, 70000, 80000, 90000, 100000,
                                      200000, 21000036.))

Reformat Date categories as.Date

prosper$ListingCreationDate <- as.Date(prosper$ListingCreationDate,
                                       format = '%Y-%m-%d')
prosper$ListingCreationYear <- year(prosper$ListingCreationDate)

Remove 2005

There are very few loans in 2005

prosper <- subset(prosper, ListingCreationYear != 2005)

Create a new Delinquent category based upon LoanStatus

# Used for filling histograms
prosper$Delinquent <- ifelse(
  prosper$LoanStatus == 'Completed' |
  prosper$LoanStatus == 'Current' |
  prosper$LoanStatus == 'FinalPaymentInProgress',
  'False', 'True')
prosper$Delinquent <- factor(prosper$Delinquent)
# Used in calculating correlations
prosper$DelinquentNum <- as.numeric(prosper$Delinquent,
                                    levels = c('False', 'True'))

Replace DC with MD

DC is not a state

prosper$BorrowerState[prosper$BorrowerState == 'DC'] <- 'MD'

as.numeric

prosper$ProsperScore..numeric. <- as.numeric(prosper$ProsperScore,
                                             levels = c(1, 2, 3, 4, 5, 6, 7, 8,
                                                        9, 10, 11))

Create ListingCategory from ListingCategory..numeric..

categories = c('NA', 'Debt Consol', 'Home Improvement', 'Business', 'Personal',
               'Student', 'Auto', 'Other', 'Baby', 'Boat', 'Cosmetic',
               'Engagement', 'Green', 'Household', 'Large Purchases',
               'Medical/Dental', 'Motorcycle', 'RV', 'Taxes', 'Vacation',
               'Wedding')
prosper$ListingCategory <- prosper$ListingCategory..numeric.
levels(prosper$ListingCategory) <- categories

Create Ratio_PL_Pay_OT - Ratio of Prosper Loan Payments Paid On Time

# Used for linear model
prosper$Ratio_PL_Pay_OT <- prosper$OnTimeProsperPayments/
  prosper$TotalProsperPaymentsBilled
prosper$Ratio_PL_Pay_OT[is.na(prosper$Ratio_PL_Pay_OT)] <- 0

Inquiry

Correlation Matrix

Generated to provide a high level view of possibly interrelated metrics.


Listing Creation Date

ListingCreationYear colored by LoanStatus & Delinquent - Univariate

For the purpose of this analysis, all categories of LoanStatus except Completed, Current & FinalPaymentInProgress are True in the Delinquent category.

Listing Creation Year with Delinquency


 2006  2007  2008  2009  2010  2011  2012  2013  2014 
 6213 11557 11263  2206  5530 11442 19556 35413 10734 

ListingCreationYear colored by Delinquent - Univariate

A view of the absolute number of loans and delinquencies. As will hold true for all further discussions, delinquencies is a reflection of the loan status through 2014. The loan terms are for 1, 3 and 5 years, so the final closed status can only be determined for most loans through 2011 (i.e. most loan terms are 3 years). Note the increasing number of loans following the 2007 downturn, which acccording to Wikipedia, ended in June 2009 (by economic measures). This graph also shows a decrease in the proportion of delinquencies after 2008.


    Pearson's product-moment correlation

data:  ListingCreationYear and DelinquentNum
t = -126.88, df = 113910, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3569634 -0.3467873
sample estimates:
       cor 
-0.3518858 

ListingCreationDate - Univariate

[1] "2006-01-06"
[1] "2014-03-10"
[1] "2008-10-16"
[1] "2009-04-28"
[1] "2009-07-13"

2005 has been excluded from this data-set because of the small quantity of loans; analysis will cover 2006/01/06 through 2014/03/10. Almost no lending occurred from 2008/10/16 through 2009/07/13 because the U.S. Securities and Exchange Commision required peer-to-peer companies to undergo the arduous process of registering their offerings as securities Wikipedia.


Income

IncomeRange (original factors) by EmploymentStatus - Bivariate

CalcYearlyIncome by IncomeRange (original factors) - Bivariate

Income Range (old) by Employment Status shows EmploymentStatus of Not Employed primarily overlaps with IncomeRange_old Not employed. However, CalcYearlyIncome by IncomeRange_old shows IncomeRange_old Not employed and Not displayed are composed of many incomes when cross referenced by CalcYearlyIncome (\(StatedMonthlyIncome * 12\)). Not displayed is only shown for 2006 and 2007. From 2007 - 2009, Not employed doesn’t necessarily mean $0 income. From 2010 - 2013 there is still a range of Not employed, but as the boxplot shows, Not Employed = $0 income. As such, EmployementStatus and IncomeRange_old will not be used for further analysis. IncomeRange has been reformatted to provide better resolution, as shown in the table below.

Income Range colored with Delinquency - Univariate


        (0,10000]     (10000,20000]     (20000,30000]     (30000,40000]     (40000,50000]     (50000,60000]     (60000,70000]     (70000,80000] 
             1448              3377             11189             15242             12576             18533             11052              7875 
    (80000,90000]    (90000,100000]   (100000,200000] (200000,21000036] 
             9070              5661             14282              2215 

Note the significant delinquencies immediately preceeding and during the economic downturn and the reduction in the number of loans for 2009 and 2010.

Income Range colored with Income Verifiable - Univariate

Most incomes after 2007, are verifiable.

Calculated Yearly Income with Delinquency - Univariate

[1] 2006
        cor 
-0.08971717 
[1] 2007
        cor 
-0.03187157 
[1] 2008
        cor 
-0.03178269 
[1] 2009
      cor 
-0.047712 
[1] 2010
        cor 
-0.08174338 
[1] 2011
        cor 
-0.06460819 
[1] 2012
        cor 
-0.03531746 
[1] 2013
        cor 
-0.02890103 
[1] 2014
         cor 
-0.005390592 

Income is not correlated to delinquencies. In general, delinquency count follows income count. The y-axis scale is maintained as static to more easily observe the difference in the number of loans per year.


Proportion Delinquent

Proportion Delinquent by Income Range - Bivariate


    Pearson's product-moment correlation

data:  prop_del and as.numeric(IncomeRange)
t = -9.0583, df = 10, p-value = 3.904e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.9845578 -0.8080484
sample estimates:
       cor 
-0.9441224 

Proportion Delinquent by Income Range - Bivariate, supported by the calculated correlation of -0.944, demonstrates a negative relationship between income range and the proportion of delinquencies. As IncomeRange increases, ProportionDelinquent decreases.

Proportion Delinquent by Year

Proportion Delinquent by Income Range colored by Year - Multivariate

The relationship between Proportion Delinquent and the grouping of the years could be attributable to the loan term length. The majority of loans have a term length of 3 years and some extend to 5 years. Explicit data for the full loan term only exists for loans originating from 2006 until 2009. A mostly complete set of data existis for loans originiating to 2011. A concluding comparison between loans created from 2012 with those created prior to 2012 is not feasable with the available data. Additionally, Prosper introduced new methods for assessing risk (ProsperScore), which subsequently helped to reduce the number of delinquent accounts.

Retrun to Final Plots & Summary

Mean Proportion Delinquent by Income Range - Bivariate

Note the difference between Mean Proportion Delinquent by Income Range - Bivariate and Proportion Delinquent by Income Range - Bivariate. The mean proportion delinquent demonstrates less variance across income ranges. The mean is of the ProportionDelinquent by IncomeRange for each ListingCreationYear.


Debt to Income Ratio

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.140   0.220   0.276   0.320  10.010    8554 

Debt to Income Ratio colored by Delinquent - Univariate

The DebtToIncomeRatio range has been limited to a maximum of 1 in the visualization. The actual range extends to 10, causing positive skewing.


Revolving Credit Balance

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0    3121    8549   17599   19521 1435667    7581 

RevolvingCreditBalance colored by Delinqent - Univariate

A \(Log_{10}\) scaling of the x-axis produces a normal distribution of Revolving Credit Balance, otherwise positively skewed becasue of large valued outliers.

RevolvingCreditBalance by IncomeRange - Bivariate


    Pearson's product-moment correlation

data:  as.numeric(IncomeRange) and RevolvingCreditBalance
t = 103.67, df = 104980, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2992509 0.3102254
sample estimates:
      cor 
0.3047483 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0     493    2416    8864    8105  248115     291 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0   11838   34205   72937   79185 1435667     145 

From the lowest to highest IncomeRange, we can see an increase in median RevolvingCreditBalance from $2,416 to $34,205. The summary shows, in the case of the RevolvingCreditBalance for lowest and highest IncomeRange, a positive skew because the mean is significantly greater than the median.


Loan Term


   12    36    60 
 1614 87755 24545 

Loan Term colored by Delinquency - Univariate

ListingCreationYear colored by Term - Univariate

Most loans are for 36 months, as such, most delinquencies occured on 36 month loans.


ProsperScore (Risk)


    1     2     3     4     5     6     7     8     9    10    11 
  992  5766  7642 12595  9813 12278 10597 12053  6911  4750  1456 

ProsperScore (Risk) with Delinquency - Univariate


    Pearson's product-moment correlation

data:  ProsperScore..numeric. and DelinquentNum
t = -24.981, df = 84851, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.09212228 -0.07876361
sample estimates:
        cor 
-0.08544678 

There’s no noticible correlation between prosper risk score (11 is the best score) and delinquency. The distribution changes from negatively skewed to normalishly skewed after 2012. The largest number of borrowers are assigned a risk score of 4 after 2012 and there is a significant reduction in delinquent accounts.


Delinquencies by State

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.09496 0.12252 0.15242 0.16358 0.17871 0.36634 

Delinqencies by State Heat Map - Bivariate

Proportion Delinquent by State Heat Map - Bivariate

Any interesting observations from the number of delinquencies depicted in the first choropleth map are surpassed by the proportion delinquent choropleth map, which gives a better understanding of the deliquency distribution by state. Delinquencies range between 9.5% and 36.6% with a mean of 16.5%.

Retrun to Final Plots & Summary


Listing Category

              NA      Debt Consol Home Improvement         Business         Personal          Student             Auto            Other 
           16942            58308             7433             7189             2395              756             2572            10494 
            Baby             Boat         Cosmetic       Engagement            Green        Household  Large Purchases   Medical/Dental 
             199               85               91              217               59             1996              876             1522 
      Motorcycle               RV            Taxes         Vacation          Wedding 
             304               52              885              768              771 

ListingCategory colored by Delinquent - Univariate

The quantity of Debt Consolidation loans is so large compared to the other listing categories, that ListingCategory is rendered uninteresting as a metric for analysing the Prosper dataset. NA, Other and Personal are useless for providing any insight other than the borrower didn’t want to divulge the reason for the loan.


Borrower Rate

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1340  0.1840  0.1928  0.2500  0.4975 

Borrower Rate filled by Delinquent Status across Creation Year - Univariate

Borrower Rate count colored by delinquent statues across listing creation year.

Estimated Return by Borrower Rate - Multivariate

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0400  0.1359  0.1875  0.1960  0.2574  0.3600 


    Pearson's product-moment correlation

data:  BorrowerRate and EstimatedReturn
t = 413.73, df = 84851, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.8154276 0.8198876
sample estimates:
      cor 
0.8176699 

As stated in the Prosper Variable Definition spreadsheet, ProsperScore is a custom risk score built using historical Prosper data and is applicable to loans originiating after 2009/07/31. 11 and 1 are the lowest and highest risk respectively. The data show lower risk borrowers have a lower BorrowerRate. Following 2010, there’s a significant reduction in the number of higher risk loans and Estimated Return is greater than 0; note the estimated losses in 2009 and 2010.

Retrun to Final Plots & Summary


Estimated Return by Prosper Score (Risk) - Multivariate

This visualization demonstrates how Prosper has adjusted their process to receive the highest estimated returns from the highest risk borrowers.

[1] 2009
        cor 
-0.01232269 
[1] 2010
       cor 
-0.1603571 
[1] 2011
       cor 
-0.6098998 
[1] 2012
       cor 
-0.5255901 
[1] 2013
      cor 
-0.701346 
[1] 2014
       cor 
-0.7280618 

For the most part, with each year, ProsperScore becomes a better predictor of EstimatedReturn (i.e. the lowest risk borrowers (11) have the lowest EstimatedReturn).

Retrun to Final Plots & Summary


Stated Monthly Income by Estimated Return - Multivariate

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    3200    4667    5607    6817 1750003 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -0.183   0.074   0.092   0.096   0.117   0.284   29061 

A density plot is useful for dealing with overplotting. There is evidence of change in the method of determining EstimatedReturn. Initially there was a negitive estimated return for the highest risk borrowers, compared to later years with the highest risk customers having the highest estimated return rate. The dotted blue lines represent 1st quartile, median and 3rd quartile overall for the respective axis.

Retrun to Final Plots & Summary


Stated Monthly Income by ProsperScore (Risk) - Multivariate


    Pearson's product-moment correlation

data:  ProsperScore..numeric. and StatedMonthlyIncome
t = 24.484, df = 84851, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.07707163 0.09043415
sample estimates:
       cor 
0.08375665 

As demonstrated by the visualization, there is no correlation between income and prosper score.


Density plot of DebtToIncomeRatio by StatedMonthlyIncome colored by Delinquent - Multivariate

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    3200    4667    5607    6817 1750003 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.140   0.220   0.276   0.320  10.010 


    Pearson's product-moment correlation

data:  StatedMonthlyIncome and DebtToIncomeRatio
t = -39.632, df = 77555, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1477838 -0.1339875
sample estimates:
       cor 
-0.1408925 

DebtToIncomeRatio has been limited to a maximum of 1 and StatedMonthlyIncome has been limited to $15000. Except for 2014 the distribution of delinquent and non-delinqent accounts is very similar. The dotted blue lines represent 1st quartile, median and 3rd quartile overall for the respective axis. There is some tendency for lower income borrowers to have a higher debt to income ratio.


DebtToIncomeRatio by ListingCreationYear - Multivariate

Again Debt to Income Ratio is limited to 1. Except for the two lowest income ranges, across all years the median Debt to Income Ratio is below 0.25 and gets lower the greater the income range.


ProsperRating by ProsperScore - Multivariate

[1] 2009
       cor 
-0.8652869 
[1] 2010
       cor 
-0.9212265 
[1] 2011
       cor 
-0.8788855 
[1] 2012
       cor 
-0.8079254 
[1] 2013
       cor 
-0.7896682 
[1] 2014
       cor 
-0.7526703 

The Prosper variable definitions state, ProsperScore is a risk score. ProsperRating is an estimation of the borrower’s estimated loss rate and is determined by (1) credit score and (2) Prosper Score. Prosper Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D, E, and HR (“High Risk”). There is a strong negative correlation for each year. Prosper may use a slightly different method for combining ProsperScore & CreditScroreRangeLower.

Prosper Rating Estimated Average Annual Loss Rate
AA 0.00-1.99%
A 2.00-3.99%
B 4.00-5.99%
C 6.00-8.99%
D 9.00-11.99%
E 12.00-14.99%
HR 15.00%+

Retrun to Final Plots & Summary


Linear Model to Predict Prosper Score

This is more a demonstration of the difficulties of using a linear model to predict an outcome when there is a large number of variables to consider.


Call:
lm(formula = ProsperScore..numeric. ~ TotalInquiries + CurrentDelinquencies + 
    TradesOpenedLast6Months + BankcardUtilization + DebtToIncomeRatio + 
    exp(Ratio_PL_Pay_OT), data = subset(prosper, !is.na(ProsperScore..numeric.) & 
    ListingCreationDate <= "2011-04-30" & ListingCreationDate >= 
    "2008-04-01"))

Residuals:
   Min     1Q Median     3Q    Max 
-6.822 -1.150  0.266  1.258  8.284 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              9.18459    0.04980 184.441  < 2e-16 ***
TotalInquiries          -0.08443    0.00414 -20.391  < 2e-16 ***
CurrentDelinquencies    -0.38781    0.01613 -24.044  < 2e-16 ***
TradesOpenedLast6Months -0.43137    0.01939 -22.251  < 2e-16 ***
BankcardUtilization     -1.41114    0.05384 -26.210  < 2e-16 ***
DebtToIncomeRatio       -0.73725    0.06224 -11.845  < 2e-16 ***
exp(Ratio_PL_Pay_OT)    -0.11566    0.02254  -5.132 2.92e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.711 on 9373 degrees of freedom
  (1135 observations deleted due to missingness)
Multiple R-squared:  0.2248,    Adjusted R-squared:  0.2243 
F-statistic: 453.1 on 6 and 9373 DF,  p-value: < 2.2e-16


Calls:
lmodel: lm(formula = ProsperScore..numeric. ~ TotalInquiries + CurrentDelinquencies + 
    TradesOpenedLast6Months + BankcardUtilization + DebtToIncomeRatio + 
    exp(Ratio_PL_Pay_OT), data = subset(prosper, !is.na(ProsperScore..numeric.) & 
    ListingCreationDate <= "2011-04-30" & ListingCreationDate >= 
    "2008-04-01"))

==========================================
  (Intercept)                   9.185***  
                               (0.050)    
  TotalInquiries               -0.084***  
                               (0.004)    
  CurrentDelinquencies         -0.388***  
                               (0.016)    
  TradesOpenedLast6Months      -0.431***  
                               (0.019)    
  BankcardUtilization          -1.411***  
                               (0.054)    
  DebtToIncomeRatio            -0.737***  
                               (0.062)    
  exp(Ratio_PL_Pay_OT)         -0.116***  
                               (0.023)    
------------------------------------------
  R-squared                     0.225     
  adj. R-squared                0.224     
  sigma                         1.711     
  F                           453.122     
  p                             0.000     
  Log-likelihood           -18341.688     
  Deviance                  27426.518     
  AIC                       36699.376     
  BIC                       36756.547     
  N                          9380         
==========================================

       fit    lwr      upr
1 6.570083 3.2164 9.923766
       fit      lwr      upr
1 7.764011 4.410197 11.11783
       fit      lwr      upr
1 1.237484 -2.14071 4.615678

The expected result(s) are 6, 10, and 3 compared to those received (6.6, 7.76 and 1.24). We can see the predictive power of the model requires refinement. Perhaps it could be improved by adding more variables to the model or variable transformation (i.e. \(Log_{10}(var)\), \(\sqrt[3]{var}\), etc). More likely, this is an instance where success can be gained by applying machine learning methodologies. The variables chosen and the ListingCreationDate used, came from Prosper Score.


Final Plots & Summary

Plot 1

Code Plot 1

The relationship between Proportion Delinquent and the grouping of the years could be attributable to the loan term length. The majority of loans have a term length of 3 years and some extend to 5 years. Explicit data for the full loan term only exists for loans originating from 2006 until 2009. A mostly complete set of data existis for loans originiating to 2011. A concluding comparison between loans created from 2012 with those created prior to 2012 is not feasable with the available data. Additionally, Prosper introduced new methods for assessing risk (ProsperScore), which subsequently helped to reduce the number of delinquent accounts.

Plot 2

Code Plot 2

[1] 2009
      cor 
0.6435063 
[1] 2010
      cor 
0.6483637 
[1] 2011
     cor 
0.860636 
[1] 2012
      cor 
0.8054026 
[1] 2013
      cor 
0.9136454 
[1] 2014
      cor 
0.9727477 

As stated in the Prosper Variable Definition spreadsheet, ProsperScore is a custom risk score built using historical Prosper data and is applicable to loans originiating after 2009/07/31. 11 and 1 are the lowest and highest risk respectively. The data show lower risk borrowers have a lower BorrowerRate. Following 2010, there’s a significant reduction in the number of higher risk loans and Estimated Return is greater than 0; note the estimated losses in 2009 and 2010. Borrower Rate is an increasingly strong predictor of Estimated Return.

Plot 3

Code Plot 3

[1] 2009
       cor 
-0.8652869 
[1] 2010
       cor 
-0.9212265 
[1] 2011
       cor 
-0.8788855 
[1] 2012
       cor 
-0.8079254 
[1] 2013
       cor 
-0.7896682 
[1] 2014
       cor 
-0.7526703 

ProsperScore is a risk score from 1 (highest risk) to 11 (lowest risk). ProsperRating is an estimation of the borrower’s estimated loss rate and is determined by (1) credit score and (2) Prosper Score. Prosper Ratings, from lowest-risk to highest-risk, are labeled AA, A, B, C, D, E, and HR (“High Risk”). There is a strong negative correlation for each year. Prosper may use a slightly different method for combining ProsperScore & CreditScroreRangeLower.

Prosper Rating Estimated Average Annual Loss Rate
AA 0.00-1.99%
A 2.00-3.99%
B 4.00-5.99%
C 6.00-8.99%
D 9.00-11.99%
E 12.00-14.99%
HR 15.00%+

Reflection

The initial direction of the investigation was an attempt to ascertain which metrics were most attributable to delinquency. It quickly became apparent that yearly changes made for more interesting observations. An important observation was that univariate visualizations offered very little categorical insight until facet wrapped by listing creation year. The data set begins with the inception of Prosper and spans several tumultuous years for financial markets in general, and for Prospers’ business model. Except Plot 2, the final visualizations demonstrate significant year-over-year change. The most significant change, seems to be the method by which Prosper evaluates risk, by adding Prosper Score in 2009. Additionally, lack of business, likely attributable to the 2007 recession, is evident, as is the 2008-2009 cessation of lending cause by regulatory changes.

The attempt to model the Prosper Score with this Linear Model was unsuccessful. My approach was to introduce transformations to the independent variables in hopes of creating a stronger delineation between each factor of Prosper Score. Specifically, \(f(x) = e^x\) was used to mitigate the introduction of a 0 factor into the model. Perhaps it will be an instructive endevour to apply machine learning to this data-set following that class.

As stated previously, I struggled and was ultimatly unsuccessful in accuratly modeling the Prosper Score.

I enjoyed a modicum of success with the strong correclation shown between Prosper Rating and the combined metric of Prosper Score * Credit Score.


