Cabify and why you should not send sensitive data over public APIs

Sensitive data leakage is a serious issue. Even though some people don't really care much, in the wrong hands such data could be used for pretty much anything, including identity theft. Let's take a look at this issue today with one of my favorite transportation apps: Cabify.

Cabify and why you should not send sensitive data over public APIs

Sensitive data leakage is a serious issue. Even though some people don't really care much, in the wrong hands such data could be used for pretty much anything, including identity theft. Let's take a look at this issue today with one of my favorite transportation apps: Cabify.

Before we start, I want to make something clear here: what I mean by public API is an API that is out in the wild, even if it requires proper authentication to be used. It doesn't mean that the API is open for random use by anyone. Private API, in this case, would be an internal API used only within a specific context. And, yes, I know public could mean the API can be used by anyone else - but I honestly doubt this API isn't already, as it's dead simple to use.

It's a common practice nowadays to build the frontend and backend separately. The frontend should take care of UI/UX and has the goal of guiding the user through whatever is the application does. The backend will do the hard work of processing all those complex algorithms and return whatever data is necessary based on the inputs. Most web projects take this in consideration, and that's perfect (IMHO).

However, without proper communication, the backend doesn't really know how what kind of data the frontend needs. Does it need the information XYZ, even though it's a sensitive one? Well, for completeness, let's just add it, we can always remove it later, right?

Wrong.

Well, yes, sure, you can remove it, but adding it in the first place was wrong. Exposing sensitive user data through an API without the actual need of it is a very dangerous approach and could lead to very serious issues. Let's say you leak other user's ID number, date of birth, email, and other similar data: we could be talking identity theft here, people! And don't even get me started on proper secure means of transporting any data you need: some people still don't care about HTTPS (and all the good things that comes with it).

You might be asking yourself already: what does all of this has to do with Cabify?

Well, you see, Cabify[1] is a transportation app just like Uber. I've been using it in my hometown for months now and it has been, for the most part of it, exactly what I expected from it. But what most people don't really know is that they have a website, on which you can order a car if you need to. You can also contact support, check your previous journeys and some other stuff on it.

A while ago I decided to take a look at how they send this information to their website, and what kind of data I can retrieve from it. I'm a coder, so I like to take a deeper look at this stuff for some weird reason. By opening Cabify's website, logging in and clicking on Journeys, I was taken to a list of previous trips I've done. For that, they did an API request, properly authenticated:

GET https://cabify.com/profile/journeys?start_at=&rider_id=&show_in_profile=true&per_page=15&page=1&order=asc

The result is presented on a screen just like this:

Journeys view

The API result, however, shocked me a bit, as it seemed to include some data that isn't really shown in their website:

[{
    "type": "Journey",
    "rider": {
        "score": 4.0,
        "mobile": "+55[REDACTED]",
        "full_name": "Ricardo Gomes da Silva",
        "country": "BR"
    },
    "driver": {
        "type": "User",
        "avatar_url": "https://cabify.s3.amazonaws.com/production/avatars/[REDACTED]",
        "name": "[REDACTED]",
        "surname": "[REDACTED]",
        "birthday": "[REDACTED]",
        "email": "[REDACTED]@outlook.com",
        "score": 4.9,
        "national_id_number": "[REDACTED]",
        "emergency_contact": null,
        "authorization_service_date": "[REDACTED]",
        "active_region_id": "br_porto_alegre",
        "gender": "male",
        "general_register_number": "[REDACTED]",
        "driver_license": "[REDACTED]",
        "mobile": "+55[REDACTED]",
        "full_name": "[REDACTED]",
        "country": "BR",
    },
    "taxi": {
        "type": "Taxi",
        "region_id": "br_porto_alegre",
        "name": "Chevrolet [REDACTED]",
        "reg_plate": "[REDACTED]",
        "manufacture_year": 2014,
        "vehicle_registration_date": "[REDACTED]",
        "local_certification_date": "[REDACTED]"
    },
}, (...)]

Oh! :o

I've obviously redacted some data for privacy reasons (mine and the driver's). I've also removed pointless data (well, pointless for this experiment) - the original block of data was over 300 lines long per ride :)

The main block contains a bunch of IDs, geolocation data, points for tracing the trip, pricing and so on. It also includes loyalty program information, for some reason. This information is not relevant at this point. The rider is ok I guess, as it contains my own personal data: score (only 4 out of 5?! weird), mobile phone number, full name, country and some other general info. It shouldn't be sent to the frontend however, as it isn't being used for anything useful at all.

The driver block is what scares me the most. Let's take a proper look at it:

"driver": {
    "type": "User",
    "avatar_url": "https://cabify.s3.amazonaws.com/production/avatars/[REDACTED]",
    "name": "[REDACTED]",
    "surname": "[REDACTED]",
    "birthday": "[REDACTED]",
    "email": "[REDACTED]@outlook.com",
    "score": 4.9,
    "national_id_number": "[REDACTED]",
    "emergency_contact": null,
    "authorization_service_date": "[REDACTED]",
    "active_region_id": "br_porto_alegre",
    "gender": "male",
    "general_register_number": "[REDACTED]",
    "driver_license": "[REDACTED]",
    "mobile": "+55[REDACTED]",
    "full_name": "[REDACTED]",
    "country": "BR",
},

As you can see, the driver blocks includes the user's avatar (usually a photo), full name, birthday (including year of birth), email address, national ID number, emergency contact and driver's license. Now imagine what could be done with this information, huh?

You could, for example, call the emergency contact and do some social engineering with this information. Hell, you could even call the driver!

Fun fact, my drivers don't seem to have an emergency contact. Or maybe the API isn't sending this data? Or maybe Cabify doesn't even have such information?

Gotta love some lambda!

Anyway, there's also some data available about the car that was used for the ride:

"taxi": {
    "type": "Taxi",
    "region_id": "br_porto_alegre",
    "name": "Chevrolet [REDACTED]",
    "reg_plate": "[REDACTED]",
    "manufacture_year": 2014,
    "vehicle_registration_date": "[REDACTED]",
    "local_certification_date": "[REDACTED]"
},

Honestly, not sure how much you could do with this data, but for sure it isn't need for their website. It could be used for social engineering though.

As you already noticed, this is sensitive data. Information like this shouldn't be returned by an API, even more considering it's not the rider's (me) information, but the driver's. In the wrong hands, this data could be used for many things, which could eventually turn into identity theft - a very serious and complicated issue to solve.

¯\_(ツ)_/¯

Personally, I believe Cabify should reconsider their API approach and return only the necessary data. It would save them bandwidth and trouble in the long run.

Unnecessary, sensitive data is always what it is: unnecessary and sensitive. Don't send it.

That would be all folks! :)


  1. I'm not affiliated with Cabify or am getting anything good from this. Just be clear. ↩︎

Mastodon