Categories
Development

How to create complicated mongodb queries?

I am a novice when it comes to mongo as I have traditionally only worked with Oracle database.
I have a mongo database that’s storing bitbucket data in columns like so:

_id | _class | collectorItemId| firstEverCommit | scmUrl | scmBranch | scmAuthor | scmCommitTimestamp

There are a few more columns in there that I’ve omitted for the sake of time. For the scmBranch column, the column is populated with one of two strings: “master” or “develop”.
Here is a sample of what the data looks like:
enter image description here

Here is the document view of one of the rows:

{
"_id" : ObjectId("5e39d6a0330c130006a042c6"),
"collectorItemId" : ObjectId("5e33a6b9887ef5000620a0c0"),
"firstEverCommit" : false,
"scmUrl" : "sampleRepo1",
"scmBranch" : "master",
"scmRevisionNumber" : "a2ad6842468eb55bffcbe7d700b6addd3eb11629",
"scmAuthor" : "son123",
"scmCommitTimestamp" : NumberLong(1580841662000)
}

I am now trying to formulate mongo queries that will get me the following data:

 1. For each scmUrl, If max(scmCommitTimestamp) where scmBranch =
    "develop" > max(scmCommitTimestamp) where scmBranch = "master" THEN
    count the number of rows (i.e commits) where scmBranch = "develop"
    AND scmCommitTimestamp > max(scmCommitTimestamp) where scmBranch =
    "master"

 2. For the results found in #1, find the oldest commit and newest
    commit

So far, the best mongo query I’ve been able to come up with is the following:

db.bitbucket.aggregate([{
    "$group": {
        "_id": {
            "scmUrl": "$scmUrl",
            "scmBranch": "$scmBranch"
        },
        "MostRecentCommit": {
            "$max": {"$toDate":"$scmCommitTimestamp"}
        }
    }
},{
    "$project": {
        "RepoName": {"$substr": ["$_id.scmUrl",39,-1]},
        "Branch": "$_id.scmBranch",
        "MostRecentCommit": "$MostRecentCommit"
    }
},{
   "$sort":{
       "RepoName":1,
       "Branch":1
       }

}
])

But this only gets me back the most recent commit for the develop branch and the master branch of each scmUrl (i.e repo), like so:
enter image description here

Ideally, I’d like to get back a table of results with the following columns:

scmUrl/RepoName | Number of commits on develop branch that are not on master branch| oldest commit in develop branch that's not in master branch | newest commit in develop branch that's not in master branch

How can I modify my mongo query to extract the data that I want?

Leave a Reply

Your email address will not be published. Required fields are marked *