### 200+ Lyft rides: usage patterns and pricing

With pink mustaches everywhere, it’s hard to imagine that Lyft was once small, but I still remember when the company had just come out of beta. I remember standing on street corners waiting for the car icons to appear on my screen — drivers were scarce then. Still it was a dead simple UI: all you had to do was press a big pink button. Within minutes, someone would pick you up, give you a fist bump, and take you to wherever you needed to go. The app stored your credit card and took care of the payments.

I don’t actually remember my first ride, but my email receipt tells me a “Roger W” took me from Civic Center to the sublet in Haight-Ashbury where I had been living at the time. I must have gone out that night, because it happened around 2:30am on a Saturday. I paid 13 dollars for the trip. It must have been damn good, because over the next 15 months, I spent almost $3,000 on more than 200 rides.

**How I Use Lyft**

Fist bumps and pink mustaches aside, I’ve always wondered about the company’s marketplace dynamics. What is the pricing structure that powers the Lyft economy? What are the key demand and supply ratios? What are the inputs to the model, and how does it take into account different cities, weekends, competition from taxis, and other factors?

It’s usually difficult to get this data as an outsider, but since Lyft sends out email receipts and I’ve taken so many rides, I figured I had enough data points to learn something meaningful about the pricing. At the very least, it’d be interesting just to understand my own usage patterns. So I hacked together a script to parse and analyze my receipts (available on my Github). Here’s what I found:

I’ve spent a total of $2,954 on 213 rides at the time of this writing. That averages out to $13.87 per ride and just over 3 rides per week, if I start counting from my first ride. In those 213 trips, I’ve logged a total of 641 miles, which is roughly the distance between San Francisco and Portland [1].

There’s a decent amount of variance when I break the data down by month. If I don’t include the incomplete data of August 2012 and December 2013, I averaged 13.6 rides a month with a standard deviation of 6 rides. But most of this variation comes from one-off events. For example, July was my heaviest month with 26 rides, but a big chunk of that happened during 4th of July weekend when I took 10 rides over a span of 4 days.

Still, heavy-usage days are pretty rare. Of days when I use Lyft, almost 80% consisted of just one ride. There have been only 6 occasions where I’ve taken three or more rides on a single day, and I’ve never taken more than four.

This breakdown by day of week surprised me a bit. I can probably explain the Saturday number from meeting up with people and going out, but what’s really interesting is that usage increases monotonically Monday through Friday. Some of the Friday usage is probably going out as well, but my schedule from Monday through Thursday looks pretty similar. Also I don’t go out *that* much. Do I actually just get lazier with public transportation as the week goes on?

Despite the fact that a disproportionate amount of rides occur on weekends, the most common hours of usage are still those where I’m commuting to and from the office. I’ve taken 44 rides between 10 and 11am alone! Anecdotally, I know that the majority of these rides are me rushing to get to work on time. Clearly the big takeaway here is that I should just wake up 10 minutes earlier to catch BART.

Looking at the pricing data, most of my rides fall into the $10-15 bucket, which includes all my rides to and from work. The $6 rides are pretty silly as well since the distances are so short and probably even walkable. The $30+ rides consist entirely of trips to SFO. I decided early on that I loved the experience of Lyfting to the airport too much to cut back here. Really, it’s just so fast and painless compared to taking BART that it’s well worth the $22 difference in cost [2].

**Pricing Model**

To me, figuring out pricing is like finding the holy grail, and this was a basic exploration into how the cost of a Lyft ride is determined. I will caveat the following and admit there’s very little that we can determine conclusively with noisy and limited data, but I think the findings are interesting nonetheless.

First, I pulled the ride distances after geocoding the start and end addresses with the Google Maps API [3]. Then I plotted the data. It’s hard to say whether the relationship above is nonlinear or whether I’m just missing the speed component that inevitably varies with distance. I also don’t have much data for rides between 5 and 10 miles. For rides shorter than 5 miles, the relationship seems very linear, but again, this may just be due to less variation in the other variables (i.e. speed) within city limits.

To estimate “dollars per mile”, I considered only rides with distances between 1 and 5 miles. By choosing a subset where speed is less variant, I could more reasonably assume it to be endogenous in my model. I also fixed the y-intercept at 6, since this seems to be the minimum price Lyft will recommend that you pay for a ride [4]. Running least squares linear regression on the data with these parameters returns a slope of 2.73.

There are many possible models here. For example, UberX has a base fare of $3.50 + $2.75 per mile when above 11 mph and $0.55 per minute when below 11 mph, with a minimum total of $8. It could also be something much simpler. One simple interpretation of the above result is that the cost of a ride should be $6 for distances up to a mile and then $2.73 per mile thereafter, for rides up to 5 miles. Try using this formula to predict your recommended donation next time!

It’s clear that attempting to reverse engineer a pricing model with only one variable and limited observations is hard. However, I think a linear model like this one can be a pretty good approximation for the vast majority of trips that are under 5 miles.

**Maps and stuff**

For fun, I plotted the straight-line origin-destination paths of my rides on a map of San Francisco. The last map is Seattle and shows the few rides I took over 4th of July weekend. Unsurprisingly, there are a bunch of points clustered around a few locations:

- A: my current house
- B: my office
- C: my old office
- D: my old house
- E: SFO

**Final Thoughts**

I had a lot of fun tinkering with this, although I have to say it was a pretty expensive dataset! If you’re interested in running the same analysis on your own data, please feel free to fork my code and share your results — I’d love to understand fellow Lyfter behavior.

It sounds like there are big changes coming for Lyft, with new plans to introduce “prime-time tips” and ditch the donation system in favor of required fares. I’m curious to see how people will respond and how the story plays out!

And this was my best Lyft experience — my driver was probably the only reason I didn’t miss my flight!

Thank you to my @lyft driver for picking me up from work, dropping me home, waiting for me to pack, and getting me to the airport on time!

— George Xing (@g_xing) July 4, 2013

**Footnotes**

[1] Lyft credits paid for $65 of the total amount. There were 6 receipts that were missing either the pickup address, dropoff address, or both. I assigned these rides distances of 0. There were 4 receipts from December 2012 that did not contain the time. I manually hardcoded these from memory. Finally, there were at least 2 instances where I gave slightly more or less than the recommended amount.

[2] Taking the Bart to SFO from my house is about $8. As a side note, I recommend that everyone have certain things in life to care about and spend money on — this is one of mine.

[3] The distances are based on aggregated turn-by-turn driving directions from Google. Documentation of the API wrapper I used is here.

[4] I’ve had 5 rides with distances ranging from 0.62 to 0.97 miles, all of which cost $6. One of my friends just forwarded me a couple receipts from Seattle where she paid $5 for distances of 0.55 and 1.24 miles. This suggests that Lyft has different pricing models depending on the geography. Note that I have not excluded Seattle data in the regression, which may skew the results.