NYC Taxi Trips Analysis

The Dataset

  • This dataset contains 6 tables in csv format, along with a geospatial map in TopoJSON and Shapefile formats
  • The 4 Taxi Trips tables contain a total of 28 million Green Taxi trips in New York City from 2017 to 2020. Each record represents one trip, with fields containing details about the pick-up/drop-off times and locations, distances, fares, passengers, and more
  • The 454 Calendar table contains a fiscal calendar (2017-2018) used by the Taxi & Limousine Commission, with fields containing the date and fiscal year, quarter, month, and week
  • The Taxi Zones table contains information about 265 zone locations in New York City, including the location id, borough, and service zone
  • The Taxi Zones Map files contain a map of New York City with divisions for the 265 locations that can be used to create custom map visuals in Power BI (TopoJSON) or Tableau (Shapefile)

The Objective

To use the Historical Data to answer the following questions:

  1. What's the average number of trips we can expect this week?
  2. What's the average fare per trip we expect to collect?
  3. What's the average distance traveled per trip?
  4. How do we expect trip volume to change, relative to last week?
  5. Which days of the week and times of the day will be busiest?
  6. What will likely be the most popular pick-up and drop-off locations?

The Data Cleaning and Transformation

In Order to successfully answer the above questions, the following steps were taken to clean and transform the raw data:

  1. I focussed on trips that were NOT sent via “store and forward”
  2. Attention was also given to street-hailed trips paid by card or cash, with a standard rate
  3. I removed any trips with dates before 2017 or after 2018, along with any trips with pickups or drop-offs into unknown zones
  4. It was assumed that any trips with no recorded passengers had 1 passenger
  5. If a pickup date/time is AFTER the drop-off date/time, the dates were swapped
  6. I removed trips lasting longer than a day, and any trips which showed both a distance and fare amount of 0
  7. Records where the fare, taxes, and surcharges are ALL negative, I made them positive
  8. For trips that had a fare amount but have a trip distance of 0, the distance was calculated with this equation : (Fare amount - 2.5) / 2.5
  9. For trips that have a trip distance but have a fare amount of 0, the fare amount was calculated with this equation: 2.5 + (trip distance x 2.5)

Phone

(+234) 803-243 6015

Address

Benin City, Edo State
Nigeria