List: De2 | Curated by Musili Adebayo

Aug 19, 2024
24 stories
De2
Arpita Mishra
Comprehensive Guide on Pandas for Data Engineering1. Introduction to Pandas
Aug 11, 2024
Aug 11, 2024
In
Python in Plain English
by
Vishal Barvaliya
Different Types of Testing in Python: A Simple GuideAccess this blog for free…
Jul 27, 2024
Jul 27, 2024
In
Top Python Libraries
by
Meng Li
6 Essential Python Libraries to Supercharge Your Data ProcessingEnhance Your Python Projects with These Powerful Libraries for Efficient Data Handling and Analysis
Jul 2, 2024
1
Jul 2, 2024
1
Netflix Technology Blog
ETL development life-cycle with Dataflowby Rishika Idnani and Olek Gorajek
Aug 2, 2024
6
Aug 2, 2024
6
In
Level Up Coding
by
Liu Zuo Lin
8 Python Dictionary Things I Regret Not Knowing EarlierThese tips have made dealing with dictionaries in Python a lot more enjoyable and elegant, and I kinda wish I learnt them a little less…
Jul 5, 2024
12
Jul 5, 2024
12
James JIANG
Mastering Spark on K8s 🔥 and Why I Dumped 💔 Kubeflow Spark Operator (Formerly Google’s Spark…🌟 FREE full access on: LovinData — Simplified Full Stack Data Engineering
Jun 22, 2024
1
Jun 22, 2024
1
Vishal Barvaliya
Essential Git Commands for Data EngineersAccess this blog for free…
Jun 3, 2024
Jun 3, 2024
Kevin Wong
My Personal Spark Optimization NoteThe Fast Track to Mastering Spark: Prepare A Comprehensive Guide Outline to Optimizing Spark Performance
Apr 16, 2024
Apr 16, 2024
In
Python in Plain English
by
Gabriel Ejiro
Creating an ETL Data Pipeline Using Bash with Apache AirflowAn ETL (Extract, Transform, Load) data pipeline is a process used in data integration and warehousing for extracting data from multiple…
May 13, 2024
1
May 13, 2024
1
Adegbite Ayoade
Transforming MarketEase Data with dbt: A Comprehensive GuideIn today’s data-driven world, businesses need efficient and scalable solutions to transform raw data into actionable insights.
Jun 23, 2024
Jun 23, 2024
Swathi Thokala
YouTube Trend Analysis Pipeline: ETL with Airflow, Spark, S3 and DockerIn this article, we will walk through creating an automated ETL (Extract, Transform, Load) pipeline using Apache Airflow and PySpark. This…
Jun 18, 2024
2
Jun 18, 2024
2
Ashwin
Spark Partitioning Partition UnderstandingDo you find yourself struggling with managing large datasets in your Spark projects? Are you looking to optimize your data processing…
Jan 8, 2024
Jan 8, 2024
Vishal Barvaliya
Data Warehousing interview questions for Data EngineersAccess this blog for free…
Jun 7, 2024
Jun 7, 2024
Hugo Lu
You’ve got Databricks Snowflake war all wrong; Tabular Acquired for $1bnDatabricks’ acquisition of tabular show the goal is far greater
Jun 7, 2024
8
Jun 7, 2024
8
In
Awesome Azure
by
Ashish Patel
Azure — Difference between Azure Blob Storage and Azure Data Lake Storage (ADLS)Comparison: Azure Blob Storage vs Azure Data Lake Storage (ADLS) Gen2.
Mar 1, 2022
1
Mar 1, 2022
1
John Tringham
Explained: Slowly Changing Dimensions (SCD)The what, why, and more
Jan 6, 2024
Jan 6, 2024
Ajayi Ayodeji
An Introspection into the ‘dimension’ in Dimensional Modelling.Dimensional modeling is a technique used in data warehousing to organize and structure data for easy retrieval and analysis. It starts by…
May 25, 2024
May 25, 2024
Jitapichab
Apache Nifi: Integrate Kafka to consume and produce.Apache nifi, Kafka , Kafka consumer and Kafka oducer with python.
Jun 27, 2020
1
Jun 27, 2020
1
In
Towards Data Engineering
by
Yusuf Ganiyu
Robust Data Pipelines with Databricks, Spark, DBT, and Azure | Data Engineering ProjectIn the ever-evolving world of data engineering, the buzz around the Data Build Tool (DBT) is becoming impossible to ignore. This tool is…
Dec 18, 2023
2
Dec 18, 2023
2
In
Towards AI
by
Hamza Gharbi
End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and DockerBuilding a Practical Data Pipeline with Kafka, Spark, Airflow, Postgres, and Docker
Jan 19, 2024
16
Jan 19, 2024
16