Data Exploration is one of Castrel's core features, helping teams quickly understand data structures, discover queryable IT resources and monitoring entities, and build reusable query knowledge after connecting new data sources. Whether you're integrating a new logging system or need to map out the structure of existing metrics data, Castrel helps you efficiently complete data governance tasks.
Data Exploration is an AI-driven data discovery and governance system that automatically identifies key entities in your observability data, establishes relationships between entities, and generates reusable query templates. When you connect a new data source (such as Elasticsearch, Prometheus, Loki, etc.), Castrel automatically scans the data structure, identifies services, instances, infrastructure, and other entities, and persists the findings as knowledge.
Unlike traditional manual data mapping approaches, Data Exploration can:
1. Start Exploration
You can start Data Exploration in the following ways:
2. View Exploration Report
After exploration completes, Castrel generates a detailed exploration report containing:
| Content | Description |
|---|---|
| Exploration Overview | Data source info, data type, confirmed data collections and timestamp fields |
| Entity Discovery | Identified services, service_entities, infra_entities and their relationships |
| Reusable Query Templates | 3-5 query templates for the data source, with usage and parameter descriptions |
| Field Dictionary | Key field paths, types, meanings, and common value examples |
3. Persist as Knowledge
Exploration results are automatically persisted as knowledge for use in subsequent incident investigation, alert triage, and other scenarios. You can also view and edit this knowledge in the Knowledge Base.
Data Exploration identifies three types of entities:
| Entity Type | Description | Common Field Examples |
|---|---|---|
| service | Stable identifier for a logical service or application, moderate cardinality, interpretable | service.name, service, app |
| service_entity | Service instance, higher cardinality, can be mapped back to service | k8s.pod.name, container.id, instance, process.pid |
| infra_entity | Infrastructure resources hosting services | host.name, node.name, ip, k8s.cluster.name |
Castrel establishes relationships between entities through the following methods:
Castrel's Data Exploration follows these principles to ensure accuracy and verifiability of discoveries:
| Principle | Description |
|---|---|
| Global to local | First explore the overall structure of data collections, then dive into field details and entity identification |
| Validate before expanding | First verify with small time windows and return limits, then expand scope after confirmation |
| Prefer aggregation | Use aggregation statistics to discover entity distributions, avoiding full data pulls that cause performance issues |
| Evidence-based | All conclusions must be supported by query results, no guessing allowed |
Data Collection Discovery → Field Structure Analysis → Sample Retrieval → Aggregation Statistics → Relationship Verification → Template Persistence
Here's a typical data exploration report structure:
logs-*@timestamp (type: date, format: ISO8601)Service
service.nameorder-service, payment-service, inventory-service, and 12 other servicesService Entity
kubernetes.pod.nameservice.name fieldservice.name and kubernetes.pod.name co-occur in the same log entryInfra Entity
kubernetes.node.namekubernetes.pod.name1. Query Error Logs by Service
{
"query": {
"bool": {
"must": [
{ "term": { "service.name": "${service_name}" } },
{ "term": { "log.level": "error" } },
{ "range": { "@timestamp": { "gte": "${start_time}", "lte": "${end_time}" } } }
]
}
},
"size": 100
}
service_name (service name), start_time/end_time (time range)| Tip | Description |
|---|---|
| Ensure data source connection is healthy | Check data source connection status before exploration, ensure sufficient permissions to read data structure and samples |
| Choose appropriate time range | Default of last three days is usually sufficient; if data volume is small, consider expanding the time range |
| Specify target application | If the data source contains data from multiple applications, specifying the target application improves exploration efficiency and accuracy |
| Review and supplement knowledge | Exploration results are persisted as knowledge; we recommend reviewing and adding business context information |