Software Engineer – Backend (Python)

Apply Now

Remote

Immediate Joiner

(Log Intelligence & Observability)5+years of exp

  • We are building a sophisticated service designed to transform raw log data into actionable answers.
  • The system takes user input, fetches relevant logs from sources like Splunk, and processes them for delivery.
  • As part of this flow, you will help architect intermediate caching layers and data processing pipelines to ensure fast, reliable access to distributed log data.
  • While Splunk is our primary focus today, our vision is a log-agnostic platform capable of extracting and aggregating data from a variety of sources to provide a unified troubleshooting experience.

You will be responsible for the end-to-end flow of log retrieval and processing, ensuring that our users get the data they need with minimal latency.

  • Pipeline Development: Build and optimize the “Input-to-Answer ” workflow: taking user input, executing Splunk queries, caching outputs, and delivering final results.
  • API & Integration: Develop RESTful Flask APIs and integrate with the Splunk API across five distributed computing log layers.
  • Cross-Platform Expansion: Architect the system to eventually support log extraction from other sources, including ELK (Elasticsearch/Logstash/Kibana), Graylog, and more.
  • Caching & Optimization: Implement intermediate caching layers, potentially utilizing alternative log processing solutions to speed up data delivery.
  • Security & Validation: Manage service-to-service authentication (JWT/Shared Secrets) and implement query validation and workload analysis.
  • Python & Flask: Proficiency in Python 3.9+ and the Flask framework for microservices.
  • Advanced Splunk Querying: Strong experience with Splunk Query Language (SPL) and the Splunk SDK for Python is essential. You should be comfortable writing complex searches to extract specific insights from high-volume data.
  • Kubernetes Knowledge: Deep understanding of Kubernetes logs is required to effectively validate and interpret the content being analyzed.
  • Troubleshooting: Proven ability to troubleshoot distributed systems using logs. You should understand how to trace an issue across multiple services and nodes.
  • Testing: Strong commitment to quality, specifically requiring experience writing Integration Tests and unit tests using Pytest.
  • Containerization: Experience with Docker and Kubernetes/Helm for deployment.

Preferred Qualifications (Nice-to-Have)

  • Distributed Systems: Hands-on experience with Spark or Flink. Since our log-fetching mechanisms are part of a distributed architecture, this experience is highly valuable for understanding the data lifecycle.
  • Multi-Stack Experience: Familiarity with ELK, Graylog, ClickHouse, or OpenSearch, as we plan to integrate these into our extraction engine.
  • Advanced Observability: Experience with platform engineering or advanced log management (index management, retention policies).
  • Specialized Tooling: Familiarity with internal tools such as Whisper, Mosaic, or Rio.

Tech Stack: Python, Splunk, Kubernetes, Flask API, Integration Testing