job posting

Software Engineer – Backend (Python)

Remote

Immediate Joiner

(Log Intelligence & Observability)5+years of exp

We are building a sophisticated service designed to transform raw log data into actionable answers.
The system takes user input, fetches relevant logs from sources like Splunk, and processes them for delivery.
As part of this flow, you will help architect intermediate caching layers and data processing pipelines to ensure fast, reliable access to distributed log data.
While Splunk is our primary focus today, our vision is a log-agnostic platform capable of extracting and aggregating data from a variety of sources to provide a unified troubleshooting experience.

You will be responsible for the end-to-end flow of log retrieval and processing, ensuring that our users get the data they need with minimal latency.

Pipeline Development: Build and optimize the “Input-to-Answer ” workflow: taking user input, executing Splunk queries, caching outputs, and delivering final results.
API & Integration: Develop RESTful Flask APIs and integrate with the Splunk API across five distributed computing log layers.
Cross-Platform Expansion: Architect the system to eventually support log extraction from other sources, including ELK (Elasticsearch/Logstash/Kibana), Graylog, and more.
Caching & Optimization: Implement intermediate caching layers, potentially utilizing alternative log processing solutions to speed up data delivery.
Security & Validation: Manage service-to-service authentication (JWT/Shared Secrets) and implement query validation and workload analysis.
Python & Flask: Proficiency in Python 3.9+ and the Flask framework for microservices.
Advanced Splunk Querying: Strong experience with Splunk Query Language (SPL) and the Splunk SDK for Python is essential. You should be comfortable writing complex searches to extract specific insights from high-volume data.
Kubernetes Knowledge: Deep understanding of Kubernetes logs is required to effectively validate and interpret the content being analyzed.
Troubleshooting: Proven ability to troubleshoot distributed systems using logs. You should understand how to trace an issue across multiple services and nodes.
Testing: Strong commitment to quality, specifically requiring experience writing Integration Tests and unit tests using Pytest.
Containerization: Experience with Docker and Kubernetes/Helm for deployment.

Preferred Qualifications (Nice-to-Have)

Distributed Systems: Hands-on experience with Spark or Flink. Since our log-fetching mechanisms are part of a distributed architecture, this experience is highly valuable for understanding the data lifecycle.
Multi-Stack Experience: Familiarity with ELK, Graylog, ClickHouse, or OpenSearch, as we plan to integrate these into our extraction engine.
Advanced Observability: Experience with platform engineering or advanced log management (index management, retention policies).
Specialized Tooling: Familiarity with internal tools such as Whisper, Mosaic, or Rio.

Tech Stack: Python, Splunk, Kubernetes, Flask API, Integration Testing

Apply Now