Mongo Metrics: Calculating the Mean

Mongo Metrics is a new series in collaboration with Compose's Resident Data ScientistLisa Smith that shows you how to extract insights and toy with data stored in Compose MongoDB. This series is a MongoDB flavor of our popular Metrics Maven series.

While most of us don't immediately think "Data Science" when we think of MongoDB, it turns out that the MongoDB, through its aggregations pipeline, is a fantastic data store for assembling and analyzing complex data. In this new series, we'll build on our previous Metrics Maven articles onmean,median andmode, as well as our previous article on the MongoDB aggregations pipeline , to achieve Mean, Median, and Mode calculations using MongoDB. In this first article in the series, we'll cover the mean .

Getting to the Mean

We'll borrow the product catalog from a fictional pet supply company used ina previous article and put it in a collection called products .

We'll also create a transaction collection which contains the following data:

order_id | date | item_count | order_value ---------------------------------------------------------------- 50000 | ISODATE("2016-09-02") | 3 | 35.97 50001 | ISODATE("2016-09-02") | 2 | 7.98 50002 | ISODATE("2016-09-02") | 1 | 5.99 50003 | ISODATE("2016-09-02") | 1 | 4.99 50004 | ISODATE("2016-09-02") | 7 | 78.93 50005 | ISODATE("2016-09-02") | 0 | null 50006 | ISODATE("2016-09-02") | 1 | 5.99 50007 | ISODATE("2016-09-02") | 2 | 19.98 50008 | ISODATE("2016-09-02") | 1 | 5.99 50009 | ISODATE("2016-09-02") | 2 | 12.98 50010 | ISODATE("2016-09-02") | 1 | 20.99

We'll have to modify the data format slightly for MongoDB - one document in the product collection looks like the following:

{ _id: ObjectID("589cd56b6ca2fef0f7737fbc"), product: "leash", category: "dog wear", productLine: "Bowser", price: 15.99, numberInStock: 48 }

and one document in the transactions collection looks like the following:

{ _id: ObjectID("589cd56b6ca2eef0f7737b0a"), orderId: 50001, date: ISODATE("2016-09-02"), itemCount: 3, orderValue: 35.97 } Lean Mean

Calculating the Mean, also known outside the mathematical world as the "average", is our first stop on the Mongo Metrics tour. The mean is computed by summing a field in each document returned from a query, and dividing that by the number of documents returned. For example, to calculate the mean of the prices across all of the products in our database, we would add the prices together and divide by the number of documents.

To better understand this section, you'll want to have a foundational understanding of the $match and $group operators in the MongoDB aggregation pipeline. If you need some background, you can check out our previous article on MongoDB aggregations by example .

MongoDB provides an $avg operator that delivers exactly what we want. The first thing we need to do is make sure that we're only running our average calculation on transactions that have a non-null value in the orderValue field:

{ $match: { orderValue: { $exists: true } } }

Next, we'll use the $group operation to generate the sum across the orderValue fields.

{ $group: { _id: null, averageTransactionAmount: { $avg: "$orderValue" } } }

Combining the two gives us the full aggregation query:

db.transactions.aggregate([ { $match: { orderValue: { $exists: true } } }, { $group: { _id: null, averageTransactionAmount: { $avg: "$orderValue" } } } ]);

Which will give us the following output:

{ averageTransactionAmount: 18.16272727272 }

Since this is a dollar amount, it would be ideal to have our results rounded to the hundreds place.

Rounding in MongoDB

MongoDB doesn't have an operation for rounding so we'll need to do a little bit of fancy footwork with our averageTransactionAmount . The steps we'll take to get down to the hundredths place is as follows:

Multiply the amount by 100 (1816.2727272) Truncate the amount to it's integer value (1816) Divide the amount by 100 (18.16)

We aren't doing true rounding here, but that's ok for our purposes.

Since our math operations accept any expression that evaluates to a number, we can add our calculation directly to the group stage of our pipeline:

{ $group: { _id: null, averageTransactionAmount: { $divide: [ { $trunc: { $mult: [ {$avg: "$orderValue"}, 100 ] } }, 100 ] } } }

Which produces the following results:

{ averageTransactionAmount: 18.16 }

While this may seem like an involved process, it's pretty straight forward if you follow the steps listed above. Our final aggregation query including rounding looks like this:

db.transactions.aggregate([ { $match: { orderValue: { $exists: true } } }, { $group: { _id: null, averageTransactionAmount: { $divide: [ { $trunc: { $mult: [ {$avg: "$orderValue"}, 100 ] } }, 100 ] } } } ]); Next Steps

Finding the mean is a great first starting point for metrics, but it's not perfect. One of the major flaws with mean is that adding one value which lies far outside the majority of the data can heavily skew the results.

Let's take a look at an extreme example:

100, 105, 110, 112, 120, 500

Looking at the data above, it's obvious that the biggest value (500) lies far outside the rest of the values. This value, called an outlier , affects our mean value since mean takes into account every single data point. With our outlier, the mean value is the following:

(100 + 105 + 110 + 112 + 120 + 500) / 6 = 174.5

Without the outlier, the mean more closely represents what we're looking for with the average - an idea of where the majority of the values lie:

(100 + 105 + 110 + 112 + 120) / 5 = 109.4

The idea that a single value that lies substantially above or below the majority of values can skew the results makes mean unreliable as a singular metric. To get a better picture of the data, we need another answer.

In our next article in this series, we'll explore using the median to reduce the impact of our outliers.

If you have any feedback about this or any other Compose article, drop the Compose Articles team a line atarticles@compose.com. We're happy to hear from you.

attribution

Mongo Metrics: Calculating the Mean

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本