Over the last few years, the idea of applying computer assisted pattern recognition, or more commonly known as machine learning, to social lending has sort of stuck with me. Somewhere in 2015, myself and a colleague, first looked at this problem space. I may write about the machine learning aspect in a future blog post, but that is not the focus in this piece. It was not until recently that I began to think of lending in the context of geography. Could visual patterns be teased out from the available data? There is an existing article on the topic, but the granularity of the analysis is at the state level. Similar to that geographical analysis of Prosper, there’s also a look at Lending Club at the ZIP3 level. I wanted to get to a smaller unit of political geography. Before I get into this, let’s give some context to what exactly Social Lending is all about.
The basic idea with social lending is a person who wants or needs a bit of money. The person makes a listing for a loan use an online platform like Lending Club or Prosper, instead of going directly to a bank. Social lending, more commonly known as peer to peer lending, sells the idea that it offers opportunities for both borrowers and lenders to reach their own objectives outside of direct interaction with banks. Lenders, big and small, have a potential opportunity to put their money to work, while borrowers are able to access money through an alternative to traditional bank loans and credit cards. As with many transactional things in the era of the Internet, both lenders and borrowers fill out forms via web pages on the respective platforms. To give background to the size of the peer to peer lending industry, by early 2015, the largest peer to peer lending platform, Lending Club, had facilitated over six billion dollars worth of loans1. With an active listing on one of the platforms, potential investors in the loan that may result for the listing, review the listing’s information and decide whether to commit some amount of money to the final loan.
A bit of the appeal of peer-to-peer lending, along with being an alternative source of money for borrowers who might have difficulties accessing credit through other channels, is how these loans are securitized into notes and presented to investors. Let’s take a quick detour into securitization.
The basic idea of securitization is to take many financial obligations, e.g. loans or debts, pool them together into an even larger thing, and then chop that larger thing into small pieces. The small pieces are then sold to investors, who expect an eventual returning of their initial investment with interest payments along the way. Securitization has been around for a long time. In the 1850s, there were offerings of farm mortgage bonds by the Racine & Mississippi railroad. These farm mortgage bonds had three components 1) the note, which stated the financial obligation of the farmer to repay the stated mortgage amount; 2) the mortgage, which offered the farm as collateral; and 3) the bond of the railroad, which offered its reputation for repayment in addition to other other assets. 2 In the 1970s, the Department of Housing and Urban Development created the first modern residential mortgage-backed security when the Government National Mortgage Association (Ginnie Mae or GNMA) sold securities backed by a bundled mortgage loans3. There also a fascinating looking-back-in-time at a moment in securitization history in a Federal Reserve Bank of San Francisco Weekly Note from July 4, 1986.
The peer-to-peer lending industry, with a focus on everyday people who want to invest in these loans (as opposed to large banks, and private equity investors) is slightly different in how loans are securitized. Instead of bundling many, multi-thousand dollar loans into a pool, and then dividing the pool into notes, a single loan, for example, in the amount of $10,000, is divided into notes in denominations ranging from $25 up to thousands of dollars. An investor could buy a single $25 note, or she could buy a larger percentage of a given loan. As an aside, a widely held objective in investing is to maximize return on investment and reduce risk. A diversification of the risk is supposed to be achieved by buying a slice of many different loans4.
Let’s get back to the topic at hand: the geography of social lending. First, the data. I will be using data from Prosper. There’s a tremendous amount of work behind getting these data into a shape and structure that lends itself to both looking at things geographically, as well as simply getting historical data that matches the data for the listing that borrower made with the data for the loan that was made following the listing. This process involved first having an investment account with Prosper, and then applying for an additional level of access for finer grained data. Without the finer grained level of access, the problem becomes an issue of record linkage; tying listing data to loan data based on the interest rate of the loan, the date of the loan’s origination, the amount of money the loan was for, the state of the borrower, and a couple other characteristics. It is fairly accurate, but if one is able to get true listing to loan matches, just use that.
Location. Location. Location.
Contrary to what was said in the Orchard Platform’s article on geography and Prosper, locations at a finer resolution than state are available. There are, however, a couple caveats. The first being, it is the text in this field (borrower_city) is freeform and entered by the borrower. There is no standardization. You might get a chcgo, a chicgo, or the actual proper noun spelling, Chicago, for the city’s name. It also appears that entering a city name might be optional, as there are some listings with an empty city. The other caveat for borrower_city, is that it is available only in the historic data downloads, and not available via Prosper’s API. Why is a finer grained location interesting? Because, if you were an investor, you might want to include a prospective borrower’s city in your judgement on whether or not to invest in a loan. I won’t trust those Minneapolis borrowers. In my mind, this actually the reason this information is suppressed at the time of an active listing. There are laws and regulations in the US that state lenders are not allowed to discriminate based on age, sex, and race. Fair lending laws have been on the books since the 1960s and 1970s5, and so lenders have been keen to avoid perceptions of discrimination based on these characteristics. Even so, both Prosper and Lending Club, in their early days, had pieces of information shared by the prospective borrowers. Things like a photo of the borrower along with a message from the borrower were posted in the listing. Photos could leave an impression of age and race, while the notes often included references to the person’s spouse with associated pronouns6. Both Prosper and Lending Club have the exact addresses for successful borrowers, there are know your customer rules and regulations, after all. By not exposing this sliver of information at the time of an active listing, the lending platforms are potentially covering themselves from both actual discriminatory liability, as well as perceived public relations issues (that doesn’t mean that one of these platforms does not periodically have both — likely a paywall on that link, by the way).
At the start of the last paragraph, I mentioned the messiness of these free form city names. How does one cleanup these data into a normalized, relatively accurate location? Google. Google, through its cloud services business, offers relatively good name standardization, and geocoding services. So, putting chcgo, or chicgo into their system, results in Chicago, IL with a bunch of other information, like the county it is located in, as well as latitude and longitude information for both a bounding box around the entity as well as a centroid.
The Google geocoding service, I should add, after a point, it is not free. Up to 2,500 uses, there is no charge, for each additional 1,000, it is $0.50. With a total of 477,546 loans with associated listing data, this seemed potentially expensive. Instead, I collapsed down the borrower’s city and state into a unique value, and fed that into the geocoding service. Getting a unique set of city and state combinations significantly reduced the number of things that I would need to geocode; from nearly 478,000 individual loans, down to about 22,000 combinations. These standardized city/state/coordinates are then reattached to the original data. Not every user entered city was able to be identified. Entries like chstnt+hl+cv, md and fpo were not identified. FPO and APO (also found in these data) are military installations, Fleet Post Office and Army Post Office, respectively. The loan/listing entries with locations that could not be identified via Google’s Geocoding Service were removed from these data, resulting in less than 10,000 listings, or 1.9% of the total, dropping off.
I should also give some temporal context to these data; the data range in dates from November 15, 2005 to January 31, 2018.
With a collection of finer grained locations (of unknown quality, I should add), what questions can be visualized with these data?
Orchard Platform’s article on geography of peer to peer lending, as you recall, looks at state level aggregations of data. The piece looks at choropleth maps of loan originations by volume, loan originations per capita, loans with 30 days or more past due, and finally a map of normalized unemployment rates.
The two maps, above, are originations by place at a city level. It is effectively showing nothing more than where people live. It’s a population map. It is what someone should expect. You will see more loans originating from the Los Angeles or New York City area than the Fargo, ND/Moorhead, MN area. There’s just more people (much higher population densities) in the first two metropolitan areas than in the latter; each of those two higher population metropolitan areas are also spatially larger. The New York metropolitan area, for example, is 13,318 square miles, while the Fargo/Moorhead area is only 2,821 square miles.
Even looking at just failed loans, which one of the above maps does, is still only identifying where populations live.
What if you wanted to look at loan originations and whether there appears to be a concentration within counties in the US that a significant proportion of a county’s population is African American?
First you would need data on race, at the county level in the United States. The US Census Bureau’s American Community Survey is a great source for this type of information. In addition to data on race, you need this information tied back to a counties or census tracts or states. There’s a product made by the Institute for Social Research and Data Innovation, called the National Historic Geographic Information System, just NHGIS7. Along with the census and survey based data, NHGIS has
ESRI shapefiles available that tie the data to place spatially. These are the two things needed.
The above map, with its blue Prosper loan locations, and the red colored choropleth, representing the percent of a county’s population that is African American, on the surface is interesting looking, but it is really only showing where a segment of the greater population live.
I posed question of race and lending to a colleague of mine, and he thought on it for a short time, and then suggested looking at a choropleth of the number of loans in a county divided by the percent of minorities in a county.
First, define what is meant my minority. In the case of the following, I simply defined this as not white. The 2010 US Census found that White – Alone made up 72.4% of the US population8. Whether or not combining all non-white populations into a single number is the correct thing to do is another story.
In the map to the right, the scatter plot of locations of borrowers is gone, and instead, what is the loan count divided by the ratio of non-whites in a given county. It is another way to slice the data. However, it also seems to just be identifying more diverse populations. Los Angeles, Seattle, Chicago, Boston, Las Vegas, and Albuquerque, for example.
Another way to spin the question is to assume, for a moment, that the loans are evenly distributed throughout a county’s population. If a county was 80% white, 15% African American, and 5% Native American or Alaskan Native, we could assume that 80% of the loans were taken out by white individuals, 15% were taken out by African Americans, and 5% were taken out by Native American or Alaskan Natives. I highly doubt this is the case. It would be possible to get a closer idea by looking at county subdivisions and where the geocoded cities are located within those.
So, taking the idea that things are evenly distributed, you allocate a portion of the loans to non-whites, or one could even look at the individual race groups in the American Community Survey. This proportioned loan count is then divided by total number of non-whites in the county. This should have the effect of dampening high loan counts but low non-white populations.
In the map to the left, there are still some larger, more diverse population centers picked up. Los Angeles, San Fransisco and the Bay Area counties, Las Vegas, Atlanta, Chicago, and Houston.
In addition to this larger population areas, places like Arapahoe County, Colorado, which is directly east of Denver, shows up. Mahnomen County in Minnesota’s northwest area also shows up. There’s also the curious ring around the Washington D.C. area, too.
One final map. Let’s take the same map as the previous, but let’s narrow the focus to loans that ultimately were not repaid; that is the number of loans, weighted by the ratio of non-whites in a given county, divided by the total county population.
I could keep slicing and dicing things and coming with more choropleths, but I won’t. For a broader look at race and money, Propublica has a fascinating look at bankruptcy and race — Data Analysis: Bankruptcy and Race in America. This report states that Memphis, Tennessee, and Shelby County, where Memphis is located in, have had the highest bankruptcy rate per capita in the nation. It is curious to see that Shelby County, Tennessee, Desoto and Tunica counties in Mississippi, as well as Crittenden and Saint Francis counties in Arkansas all show up in the above map. These are all counties that are part of the greater Memphis area.
That’s it for now.
Other ideas I have had with regard to Prosper data includes looking at whether given a borrower’s credit profile and state, can the county they reside be sussed out via pattern recognition (e.g. machine learning). I will write, at some point, about a simpler application of machine learning: attempting to predict loan failure or success.
Twelve or thirteen years ago, I had the thought, I need a desk. Most rational, and retail-centric individuals would have traveled to a furniture store, engaged in conversation with a salesperson, possibly been convinced of the merits of a particular desk, and subsequently completed the sale with the exchange of money for the promise of a desk being delivered at some later date by two, slightly hungover individuals in a large box truck.
I picked up a wood working magazine, instead. It was around this time, with the use of a friend’s wood shop and a couple hours of his time each Tuesday, that I had finished up a queen-sized, Mission-style oak bed frame. I was hankering for another project. A desk seemed reasonable.
I did not follow through the reasonable idea of taking ready made plans from a woodworking magazine. Instead, I used them as a guide for things like height and depth.
You might be wondering, why am I bringing up a project that is over a decade past its completion? There are a couple reasons. The first being that I recently disassembled the desk to move it to another room in the house, and the second, and coincidentally, I came across an archive that contained the bulk of my notes, all of the AutoCAD drawings, and a software script (crude, albeit effective) for figuring out some golden ratios with regard to board widths that would constitute the desk’s main surface top.
The disassembly, and reassembly of the desk was interesting to me because it allowed me to better inspect the joints and such, as well as replace the drawer slides on the center drawer. When we moved to a different house in 2012, and the desk was disassembled, the original drawer slides on the center drawer broke, and the replacement just never really quite worked well, and it did not extend far enough to make the drawer fully useful.
The design and construction of the desk was a bit of rolling effect. I would design and draft up plans for a side panel or a drawer front, and my friend and I would spend a Tuesday evening jointing, planing and sawing the pieces of wood that would be necessary for that piece.
I spent a lot of time tinkering with AutoCAD. It was really quite enjoyable, and it allowed me to use some of the drafting skills I had learned while in high school. During high school, the thinking was that future career plans would be some sort of mechanical or civil engineering, and drafting might be useful. Education and career track ultimately did not follow the physical engineerings, but wandered down the path of computer science and the engineering of software, but I still feel that all the drafting and CAD I took in high school was well worth the time and effort.
In addition to picking up a legitimate copy of AutoCAD (I was a student at the time, so, I took advantage of AutoDesk’s educational discount program), I picked up a wide body inkjet printer. This made working with the plans in the shop more readable.
The desk was designed to unassembled from time to time. The center drawer, with the correct slides, is removable; the desk top can be removed after removing bolts that hold it to angle iron (see photos below) on the inside edge of the top of the drawer assemblies; the front (opposite where you sit) is removable by unscrewing four brass wood screws. All of the drawers in each side can be removed to lighten the weight; if you are curious, I used Blum full extension slides. A little bit more about the materials and supplies, I used: the finish is 4 coats of shellac with several coats of marine grade varnish over the shellac. Twelves out from the finishing coats, and there are no signs of sun damage to the finish. The wood, cherry and walnut, were from a friend and his family. He has appeared in many blog posts of the years, from showing up in photos of gardening, snowshoeing into a Minnesota state park, to he and I traveling to arctic Canada, to me chronicling a cross-country road trip to his wedding. Alas, the supply of cherry, walnut, oak, and others dried up when his parents left Minnesota. Much of the other wood, like luan plywood and such, that was used in the desk came from local big-box lumber yards. All of the drawers are also lined with physical stock certificates. There are certificates for Marquette Cement, Massey – Ferguson Limited, Chemsol Incorporated, as well as dozens of others. All of these certificates were purchased off of eBay.
Even though the finish on the desk is holding up quite well, the top has had a small bit of damage. As the wood has continued to dry out, a lengthy crack has appeared in the top. It is, however, in a location that does not impact functionality. Aside from the crack, there was some shrinkage that was causing several of the drawers to no longer be aligned quite right. In order for the drawers be to fully closed, the drawer had be lifted up slightly. All of these drawer issues were resolved as I reassembled the desk in its new location.
Finally, if you are curious about the plans and possibly making your own fancy, overly complicated desk, the plans (most in PDF, but others in AutoCAD’s DWG format) are linked below. The plans are released under a BSD-3 Clause like license.
The little bit of clunky software is also linked before; instructions on how to run the perl script are at the very bottom.
File: desk-plans.zip (5MB)
File: table_layout.pl_.zip (4KB)
About: table_layout.pl is a simple script that can calculate various options for construction of a table-top. It assumes that you want a wider center board with narrow, even-counted boards on each side of the center board. Usage: ./table_layout.pl --width=FLOAT [--widecnt=INT] [--optimal] Example: ./table_layout.pl --width=30.75 --widecnt=5 For a table with width 30.3/4" with 5 of the wider center/edge pieces. The third option, 'opt', will cause table_layout to try to order the solutions in what it thinks is optimal - this feature is as of yet unimplemented.
It has been a while since I last posted anything. July 2017, actually, seems to be the last time I wrote anything here. I could say that I was busy, which would be true. But, I certainly had the bits of time – here and there – that would have allowed me to post something had I wanted to do that. But, thoughts that I wanted share just were not there.
The back half of last year has two large events that standout in my mind. Both personal, but only one that I feel like sharing.
I graduated. It is not like I am at the top of my field or anything like that. I simply happy that I stuck through my graduate program and now I can say, I have a masters degree.
With this task now behind, I have been mulling over many-things-computer lately. Marveling at how, in the early 1990s when I was I becoming interested in computers, I never really thought much about what exactly I wanted to do or be when I grew up. I simply liked to tinker. It was not until I was nearly finished with high school that I thought I would likely pursue computer science. I really did not know what that was — likely computer programming, I had hoped. I liked computer programming. I had first been introduce to computer programming on Apple IIe computers when I was in elementary school, in an after school program for kids who liked math. I was one of those kids who liked math. I give much of credit to June Hendrickson for introducing myself and a small cohort of kids in the early 1990s in Hibbing. I do not know how or whether that after school program influenced the others, but it certainly left an impression on me.
Miss Hendrickson introduced me to polynomial algebra and what was basically a gateway drug for me: Apple BASIC. It was simple and seemed sort of elegant. Start your program with 10 HOME and just work your way down the file. Need to jump back to the beginning? GOTO 10. Programming clicked, and it would be the thing I did when I got home school, and it was often the thing I was doing before going to bed.
As it turned out, there was something more visual than just Apple BASIC or QBASIC. Microsoft, in 1993, released Visual Basic 3.0. An older friend, who was off at college, picked me up a copy from the campus bookstore. A graphic user interface with BASIC? It was great. I honestly didn’t realize that it was not great, and there were more powerful programming pieces of software available. Nonetheless, I spent countless hours making small utilities and other bits of software. The most memorable thing that I made was a piece of software that effectively calculated Riemann sums for assisting calculations related to curvature of a ballistic trajectory. Like every 12 year old, I had a fascination with the mathematics of projectiles. Unbeknownst to myself, I had stumbled upon some foundational concepts of calculus: calculating the area under a curve on a graph.
After Visual Basic, there was Borland C and Visual Studio. I eventually obtained a shell account with the local internet service provider. This was my first introduction to Unix and specifically SunOS. Linux was also, at this point, a few years old. Slackware was the thing to get. The same friend who had purchased Visual Basic 3.0 for me, also introduced me to Linux.
There has been more since. A lot more, and yet, I still have those old habits. Things computers and things software are often the first thing I think about in the morning, and often the last thing I think about before falling asleep. Even though my occupation is that of Software Engineer. I still recreationally program, too. I still tinker. That screen shot of Visual Basic 3.0 (above) is from my modern day, current MacBook Pro laptop. It’s Windows for Workgroups 3.11 running in a virtual machine.
Around nine years ago, I got an inkling to grow grapes. I do not know exactly what piqued my interested in the subject of growing grapes. I was not, and still am not much a oenophile. I will have a glass here and there, and will always be keen for a Malbec or Foch if offered.
We were living in the Duluth, MN, area at the time, and we had limited planting space in our modest quarter-acre yard. A space maybe four feet wide by thirty feet long; it certainly would not be able to house enough vines to produce enough grapes to make a carboy-worth of wine, it would at least be an experiment.
In late winter, I placed an orderfor a half-dozen Frontenac vines from a place in Iowa. Frontenac seemed like an interesting varietal. It had its origins in Minnesota and seemed to have hardiness
that might work in the Duluth area (USDA Zone 3b). And that was about all I put into which varietal to get. Nothing
really about the potential type of wine or even consumable juice would be produced. I really did not care whether it
was white or red, foxy tasty, or any of a host of other characteristics one might want.
The vines arrived, and with the help of a friend, we got them planted. And we waited. Over the time we remained in the Duluth area (until May 2012, when we moved), the vines ebbed and flowed with the seasons, dying back to the ground after a particularly harsh winter. The vines did produce a few clusters of grapes, but nothing more.
…and we moved.
Here in St. Paul, MN, up until last year, we had only planted a few juice/jelly grapes – mostly the varietal Beta. With none of these vines producing grapes, yet. Last year, however, the inkling to plant a small vineyard came back.
In late spring of last year, we cleared a stand of buckthorn and mulberry trees on the north side of our property – behind our house – it’s roughly an area of 1,200 square feet. The area has a decent amount of sun exposure in the summer from mid-morning to mid-afternoon. We tilled the soil – mostly to remove the grass, and loosen things up. Tilling was also useful to loosen the many buckthorn stumps.
In late winter of the previous year, prior to clearing the stand of buckthorns and mulberries, we decided we would grow the varietal Maréchal Foch. I had recently tried a bottle from a local vineyard, and, as with Ogdan Nash’s poem, Termite, and how the termite tasted it, and found it good [in relation to wood], I tasted the Foch from St. Croix Vineyards, and found it good.
In January of last year, with a bit searching, I located a nursery and vineyard that sells Foch – Aberfoyle Vineyard & Nursery, which happens to be in Minnesota. I placed an order for 25 potted Foch vines. More waiting – mostly because it was winter.
With the trees removed, the ground turned over, and most of the stumps gone, we put a relatively simple fence around the space; we have hounds, and hounds are curious critters, and I could see them wandering off with neat bunches of grapes in your mouth.
The potted vines arrived in April of last year, and with the help of my wife and her sister, we got all 25 vines planted, and we waited.
Winter passed, and all the vines survived. Once the vines had a bit of growth, we put wood stakes in place and loosely tied each vine up. As an aside, if we were doing this on a larger scale, I would certainly be using grow tubes to train the growth. On the topic of larger grow operations, the spacing we chose for the grapes would be generally be considered too close. It would be too narrow for any machinery to be driven down, but with a row length of around 35 feet, I do not intend to drive anything down the rows. We will have to keep an eye out for issues with air circulation, too, with the rows being somewhat narrow.
With the vines staked, and with a considered amount of growth having occurred since we staked the grapes, it was time to get the posts and wire trellis installed. Using a two-stroke, single person post hole auger, I set to work on getting postholes made.
We used 12 gauge, stainless steel orchard wire, as well as using one-way wire anchor vises – which, I have to say, are probably the damn coolest device I’ve come across in a while. We decided upon using a top cordon trellis system, so, there is just one wire, 66″ above the ground. Generally, two vines from each trunk is brought up (trained) to the top wire, and then it simply grows down the wire.
Time will tell if we are able to produce enough grapes to make a bit of wine.