AWS best practices mention the metering aspect in their saas boost user guide https://github.com/awslabs/aws-saas-boost/blob/main/docs/user-guide.md#billing-and-metering
In 2019 at an AWS reinvent session I noticed a very interesting diagram:
Many great companies rise to the occasion and tackle the cloud-native deployment automation, monitoring, identity, and metrics but somehow the metering and billing aspect was neglected. The system of record built on top of a metering service is fundamentally different than one built on monitoring data.
https://www.amberflo.io/blog/why-we-started-amberflo
https://www.amberflo.io/blog/behind-the-scenes-of-amberflos-usage-based-metering-and-billing-system
In a microservices architecture, you can find a metering service in many companies that is responsible for this functionality. From our experience, it’s not easy to build such a service. Let’s go over a few of the challenges of building a scalable and reliable usage metering service:
1. Count once and only once: Make sure your customers are not overcharged (or undercharged)
- In a distributed environment, preventing duplicate records is not an easy task. You would like the usage metering service to ignore duplicate records, even if the client retries the same record. This makes the usage metering service idempotent
- Scaling to various workloads is tricky. Usage patterns can fluctuate and records must not be dropped
2. Performance with large data scan ranges
- Showing a user their usage over a longer period of time (for example 6 months+), in a UI that loads quickly (< 1–2 seconds) requires a backend architecture and infrastructure for scale, throughput, and low latency .
If you try to sum the total number of events, across a large timeframe, it can quickly reach into tens of millions of events for a single query. If you try to do this on a time-series database, your costs are going to be prohibitively high, and in standard data warehouses, the latency is going to be very high. For example, looking at aggregated data over the last 12 months.
Using elasticsearch (or any other logs-based system) for example cannot run aggregation on the last 6 months per customer in a performant latency.
- You will have to run an aggregation on a cadence (usually daily) that sums up the events. This data pipeline is error-prone and could be tricky. The price of an error directly impacts the customer and the revenue.
- Presenting the usage in near real-time, with auditable data (pointing to the exact transaction) which can be aggregated to multiple periods.
- Check in real-time for quotas. For example, an account should not run more than 1000 jobs in 1 day.
- Scaling could drive the cost very high when using naive solutions
3. Alerts and notifications
- Automating moving between pricing tiers
- Notify when a large customer is using the product less than the week before
- Send a Slack message to the account manager when a quota is exceeded
- Letting your customers define alerts based on their cost and usage
4. Out of the box dashboards
- The usage data is critical to run a cloud business. There are many personas in the organization that would like a view to the usage data:
- Track major customers
- Analyze feature usage over time
- Analyze churn
- Predict pricing plan impact
5. Developer artifacts
- In a microservice environment, there are multiple programming languages/ deployment models etc. Integration with a usage service will need multiple SDK’s with constant maintenance
- The usage service will be used from multiple services and by different teams. A well documented and robust API will simplify the team’s onboarding
6. Pricing plans models- Decouple engineering from product pricing decision
- As the engineering teams instrument and build this metering pipeline, the product team needs to innovate on pricing plans and makes data-based decisions.
- Democratization of the usage data enables product teams to choose the right pricing dimensions and test new pricing models based on real data
Using Amberflo.io Metering-as-a-Service you will get a robust, reliable, cost-effective fully managed metering service. https://www.amberflo.io/
https://amberflo.readme.io/docs/how-amberflo-works
When products adopt consumption-based pricing models, they encounter what may seem simple task of metering and tracking usage across their customers. It turns out, it is a heavy lift.