algorithms and data structures

An Introduction to Algorithms and Data Structures

An algorithm is a series of instructions in a particular order for performing a specific task.

amazon emr

Overview of Amazon EMR

Amazon EMR is a managed cluster platform that makes it easier to run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze huge amounts of data.

apache pinot

Apache Pinot joins hands with Kafka and Presto to provide low-latency, high-throughput user-facing analytics

Apache Pinot is a real-time, distributed OLAP datastore that was built for low-latency, high-throughput analytics, making it perfect for user-facing analytical workloads. Pinot joins hands with Kafka and Presto to provide user-facing analytics.

apache spark

apache yarn


AWS Command Line Interface (AWS CLI)

AWS CLI is an open-source tool that allows us to interact with AWS services using command-line shell commands.

aws glue

coding problem solving

data engineering

data files and formats

data lake

data lake and lakehouse

data management

Data Product vs. Data as a Product

A data product is not the same as data as a product. A data product aids the accomplishment of the product's goal by using the data, whereas in data as a product, the data itself is seen as the actual product.

Data Catalog

A data catalog is an ordered inventory of an organization's data assets that makes it easy to find the most relevant data quickly.

data quality

data security and compliance

Data Governance

Data governance is the process of defining security guidelines and policies and making sure they are followed by having authority and control over how data assets are managed.

data streaming


design patterns and coding principles

Singleton Pattern

The singleton pattern ensures controlled access to a single instance of a class. While it offers significant benefits in terms of resource management and access control, developers must be mindful of its downsides, such as potential scalability issues and the introduction of global states. When used carefully, it can be...


Anti-patterns at first seem to be quick and reasonable, they typically have adverse effects in the future. They are design and code smells. It affects our software badly and adds technical debt. We should avoid them at all costs.



Navigating the NoSQL Landscape: MongoDB vs. Cassandra for nested or complex JSON data handling

The choice between MongoDB and Cassandra becomes crucial when dealing with nested or complex JSON objects. MongoDB and Cassandra offer different approaches due to their underlying data models and architectures.

kafka streams

Windowing in Kafka Streams

Windowing refers to the process of dividing a continuous stream of data into discrete segments, or windows, based on time. These windows then serve as the basis for applying computational operations, such as aggregations or transformations, to the data contained within them.

knowledge graph



How to set SLA in Apache Airflow

Apache Airflow enables us to schedule tasks as code. In Airflow, a SLA determines the maximum completion time for a task or DAG. Note that SLAs are established based on the DAG execution date, not the task start time.



Introduction to gRPC

gRPC is an open-source, high-performance RPC framework that can run in any environment. gRPC builds on HTTP/2 protocol and the protobuf message-encoding protocol to provide high performance, low-bandwidth communication between applications and services.


Rust’s Ownership and Borrowing Enforce Memory Safety

Rust's ownership and borrowing features prevent us from experiencing memory-related problems. Rust is a great choice when performance matters and it solves pain points that bother many other languages.



What does 'yanked' release mean?

'Released' and 'yanked' are terms used in software development to indicate the state of a software package or library. These terms specify whether a given package version is suitable for usage or need to be avoided.


Terraform Basics

Terraform is an open source infrastructure-as-code tool that allows us to programmatically provision the physical resources required for an application to run.

web server