Anemometer is a tool for running SQL queries and pushing results as metrics and optional events to Datadog
An anemometer is a device used for measuring wind speed and direction.
This project was originally created to help us monitor some tables in Airflow, but was later updated so it could work generically with any database.
We currently support the following databases:
- Postgres
- Vertica
- BigQuery
- SQLite (mostly for local development and testing)
Support for any of the databases listed here can be added fairly easily!
- Read How to contribute
- Add the driver to
go.mod, and ensurego.sumgets updated - Add import for the new driver to
monitor.go - Update this README to add the new database
- Submit a Pull Request
The latest version of Anemometer can be found on the Releases tab.
statsd:
address: 127.0.0.1:8125
tags:
- environment:production
monitors:
# Gauge metric (default) - values that can go up or down
- name: airflow-dag-disabled
database:
type: postgres
uri: postgresql://username:password@localhost:5432/database?sslmode=disable
sleep_duration: 300
metric: airflow.dag.disabled
metric_type: gauge # optional - this is the default
sql: >
SELECT dag_id AS dag_id,
CASE WHEN is_paused AND NOT is_subdag THEN 1 ELSE 0 END AS metric
FROM dag
# Histogram metric - statistical distribution on each host
- name: airflow-task-queued-seconds
database:
type: postgres
uri: postgresql://username:password@localhost:5432/database?sslmode=disable
sleep_duration: 300
metric: airflow.task.queued_seconds
metric_type: histogram
sql: >
SELECT dag_id AS dag_id,
task_id AS task_id,
EXTRACT(EPOCH FROM (current_timestamp - queued_dttm)) AS metric
FROM task_instance WHERE state = 'queued'
# Count metric - track number of events
- name: failed-tasks-count
database:
type: postgres
uri: postgresql://username:password@localhost:5432/database?sslmode=disable
sleep_duration: 60
metric: airflow.task.failed
metric_type: count
sql: >
SELECT dag_id AS dag_id,
COUNT(*) AS metric
FROM task_instance WHERE state = 'failed'
AND end_date > NOW() - INTERVAL '1 hour'
GROUP BY dag_id
# Event monitor - emit the configured metric plus one event for each returned row
- name: postgres-long-running-queries
database:
type: postgres
uri: postgresql://username:password@localhost:5432/database?sslmode=disable
sleep_duration: 1800
metric: postgres.long_running_query
metric_type: gauge
event:
enabled: true
title_column: event_title
text_column: event_text
alert_type: warning
priority: normal
source_type_name: anemometer
aggregation_key_column: event_aggregation_key
tags:
- alert_type:long_running_query
tag_columns:
- database_name
- duration_bucket
sql: >
SELECT 1 AS metric,
datname AS database_name,
CASE
WHEN now() - query_start > interval '6 hours' THEN '6h_plus'
WHEN now() - query_start > interval '4 hours' THEN '4h_plus'
ELSE '2h_plus'
END AS duration_bucket,
'Long running Postgres query' AS event_title,
'Database: ' || datname || E'\n' ||
'User: ' || usename || E'\n' ||
'PID: ' || pid || E'\n' ||
'Runtime: ' || (now() - query_start)::text || E'\n\n' ||
query AS event_text,
'postgres-long-running-query:' || datname || ':' || pid AS event_aggregation_key
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '2 hours'This is where you tell Anemometer where to send StatsD metrics
address- The address:port on which StatsD is listening (usually127.0.0.1:8125)tags- Default tags to send with every metric and event, optional
This is where you tell Anemometer about the monitor(s) configuration
name- The name of this monitor, mainly used in loggingdatabase.type- The type of database connection to be used (postgres,vertica,bigquery, andsqlite3are currently supported)database.uri- The URI connection string used to connect to the database (usually followsprotocol://username:password@hostname:port/database)sleep_duration- How long to wait between pushes to StatsD (in seconds)metric- The name of the metric to be sent to StatsDmetric_type- The type of metric to send to Datadog (optional, defaults togauge)event- Optional Datadog event configuration. When enabled, one event is sent for each row returned by the SQL query.sql- The SQL query to execute when populating the metric's values/tags (see SQL Query Structure)
Anemometer makes the following assumptions about the results of your query:
- Exactly one column will be named
metric, and the value is convertable tofloat64(no strings) - An optional column named
timestampcan be included to explicitly provide a timestamp for the metrics (only supported forgaugeandcounttypes) - All other columns will be aggregated into tags and sent to StatsD
- The tags will take the form of
column_name:value - Event payload columns such as
event_title,event_text,event_aggregation_key, and configured title/text/aggregation/hostname columns are not used as metric tags by default.
Anemometer supports all four Datadog metric types. You can specify the metric
type using the metric_type configuration option:
- Use case: Values that can go up or down (e.g., CPU usage, memory usage, queue depth)
- Configuration:
metric_type: gauge(or omit for default) - StatsD format:
metric_name:value|g|#tags - Timestamp support: ✅ Supports optional
timestampcolumn
- Use case: Track how many times something happened (e.g., requests, errors, events)
- Configuration:
metric_type: count - StatsD format:
metric_name:value|c|#tags - Note: Values are converted to integers
- Timestamp support: ✅ Supports optional
timestampcolumn
- Use case: Track statistical distribution of values on each host (e.g., request latency, file sizes)
- Configuration:
metric_type: histogram - StatsD format:
metric_name:value|h|#tags - Timestamp support: ❌ Uses current time only
- Use case: Track statistical distribution of values across your infrastructure (e.g., request latency across all hosts)
- Configuration:
metric_type: distribution - StatsD format:
metric_name:value|d|#tags - Timestamp support: ❌ Uses current time only
Note: The metric_type field is optional and defaults to gauge for
backwards compatibility. Existing configurations will continue to work without
any changes.
Anemometer can also send Datadog events through DogStatsD. This is useful for alert conditions where the SQL query itself controls whether anything should be emitted. For example, a long-running-query monitor can return only sessions that have been active for more than two hours. If the query returns no rows, no events are sent. If the query returns three rows, three events are sent.
Events are sent in addition to the configured metric. Event-enabled monitors
still need a metric setting and a SQL metric column.
The monitor's sleep_duration controls how often Anemometer re-checks the query
and therefore how often a still-true condition can re-notify through a Datadog
event monitor.
enabled- Set totrueto send one event for each returned SQL rowtitle- Static event title. Used whentitle_columnis not configuredtitle_column- SQL result column containing the event titletext- Static event body. Used whentext_columnis not configuredtext_column- SQL result column containing the event bodyalert_type- Event type:info,warning,error, orsuccess(defaults toinfo)priority- Event priority:normalorlow(defaults tonormal)source_type_name- Event source type (defaults toanemometer)aggregation_key- Static key used by Datadog to group related eventsaggregation_key_column- SQL result column containing the aggregation keyhostname- Static hostname for the eventhostname_column- SQL result column containing the hostnametags- Static event-only tagstag_columns- SQL result columns to use as event-only tags
Static event.tags are sent only with events. Use event.tag_columns for
low-cardinality SQL result fields that should be available to Datadog event
monitor queries and notification templates. Put high-cardinality details such as
PID, exact runtime, client address, and query text in event_text instead of
tags.
Anemometer supports custom timestamps for gauge and count metrics by including an optional timestamp column in your SQL query results. This allows you to send metrics with specific timestamps rather than using the current time.
- RFC3339 strings:
"2023-12-25T10:30:00Z" - Unix timestamps:
1703505000(asint64,int32,int, orfloat64) - Database time objects: Direct
time.Timeorsql.NullTimevalues
SELECT 'production' AS environment,
COUNT(*) AS metric,
'2023-12-25T10:30:00Z' AS timestamp
FROM users
WHERE created_at BETWEEN '2023-12-25 00:00:00' AND '2023-12-25 23:59:59'📖 For more information, see Datadog's documentation on historical metric in gestion.
To monitor the number of records in your user's table you might do something like this:
SELECT 'production' AS environment,
'users' AS table_name,
COUNT(0) AS metric
FROM usersResulting in the following:
environment | table_name | metric
-------------+------------+--------
production | users | 99
Assuming we named our metric table.records with metric_type: gauge, this
would result in the following data being sent to StatsD:
table.records:99|g|#environment:production,table_name:users
To monitor the number of queries each user is running in your database you might do something like this:
SELECT 'production' AS environment,
usename AS user_name,
COUNT(0) AS metric
FROM pg_stat_activity
WHERE query != '<IDLE>'
GROUP BY usenameResulting in the following:
environment | user_name | metric
-------------+-----------+--------
production | cjonesy | 160
production | postgres | 6
Assuming we named our metric database.queries with metric_type: count, this
would result in the following data being sent to StatsD:
database.queries:160|c|#environment:production,user_name:cjonesy
database.queries:6|c|#environment:production,user_name:postgres
Notice that one metric is sent for each row in the query.
Anemometer (A SQL -> StatsD metrics generator)
Usage:
anemometer [command]
Available Commands:
help Help about any command
start Start the Anemometer agent
version Print the version number
Flags:
-h, --help help for anemometer
Use "anemometer [command] --help" for more information about a command.
anemometer start -c /path/to/your/config.ymlYou can run Anemometer using Docker with the image from GitHub Container Registry:
docker run -v /path/to/your/config.yml:/config.yml ghcr.io/simplifi/anemometer start -c /config.ymlThis mounts your local configuration file into the container and runs the start command.
If you want to test this out locally you can run the following to start Anemometer:
anemometer start -c /path/to/config.ymlYou can see the metrics that would be sent by watching the statsd port on localhost:
nc -u -l 8125make buildTo run all the standard tests:
make testThis project is using goreleaser. GitHub release creation is automated using Travis CI. New releases are automatically created when new tags are pushed to the repo.
$ TAG=0.1.0 make tagThis project has some clear Contribution Guidelines and expectations that you can read here (CONTRIBUTING).
The contribution guidelines outline the process that you'll need to follow to get a patch merged.
And you don't just have to write code. You can help out by writing documentation, tests, or even by giving feedback about this work.
Thank you for contributing!
