Arpita MishraComprehensive Guide on Pandas for Data Engineering1. Introduction to PandasAug 11, 2024Aug 11, 2024
InPython in Plain EnglishbyVishal BarvaliyaDifferent Types of Testing in Python: A Simple GuideAccess this blog for free…Jul 27, 2024Jul 27, 2024
InTop Python LibrariesbyMeng Li6 Essential Python Libraries to Supercharge Your Data ProcessingEnhance Your Python Projects with These Powerful Libraries for Efficient Data Handling and AnalysisJul 2, 20241Jul 2, 20241
Netflix Technology BlogETL development life-cycle with Dataflowby Rishika Idnani and Olek GorajekAug 2, 20246Aug 2, 20246
InLevel Up CodingbyLiu Zuo Lin8 Python Dictionary Things I Regret Not Knowing EarlierThese tips have made dealing with dictionaries in Python a lot more enjoyable and elegant, and I kinda wish I learnt them a little less…Jul 5, 202412Jul 5, 202412
James JIANGMastering Spark on K8s 🔥 and Why I Dumped 💔 Kubeflow Spark Operator (Formerly Google’s Spark…🌟 FREE full access on: LovinData — Simplified Full Stack Data EngineeringJun 22, 20241Jun 22, 20241
Vishal BarvaliyaEssential Git Commands for Data EngineersAccess this blog for free…Jun 3, 2024Jun 3, 2024
Kevin WongMy Personal Spark Optimization NoteThe Fast Track to Mastering Spark: Prepare A Comprehensive Guide Outline to Optimizing Spark PerformanceApr 16, 2024Apr 16, 2024
InPython in Plain EnglishbyGabriel EjiroCreating an ETL Data Pipeline Using Bash with Apache AirflowAn ETL (Extract, Transform, Load) data pipeline is a process used in data integration and warehousing for extracting data from multiple…May 13, 20241May 13, 20241
Adegbite AyoadeTransforming MarketEase Data with dbt: A Comprehensive GuideIn today’s data-driven world, businesses need efficient and scalable solutions to transform raw data into actionable insights.Jun 23, 2024Jun 23, 2024
Swathi ThokalaYouTube Trend Analysis Pipeline: ETL with Airflow, Spark, S3 and DockerIn this article, we will walk through creating an automated ETL (Extract, Transform, Load) pipeline using Apache Airflow and PySpark. This…Jun 18, 20242Jun 18, 20242
AshwinSpark Partitioning Partition UnderstandingDo you find yourself struggling with managing large datasets in your Spark projects? Are you looking to optimize your data processing…Jan 8, 2024Jan 8, 2024
Vishal BarvaliyaData Warehousing interview questions for Data EngineersAccess this blog for free…Jun 7, 2024Jun 7, 2024
Hugo LuYou’ve got Databricks Snowflake war all wrong; Tabular Acquired for $1bnDatabricks’ acquisition of tabular show the goal is far greaterJun 7, 20248Jun 7, 20248
InAwesome AzurebyAshish PatelAzure — Difference between Azure Blob Storage and Azure Data Lake Storage (ADLS)Comparison: Azure Blob Storage vs Azure Data Lake Storage (ADLS) Gen2.Mar 1, 20221Mar 1, 20221
John TringhamExplained: Slowly Changing Dimensions (SCD)The what, why, and moreJan 6, 2024Jan 6, 2024
Ajayi AyodejiAn Introspection into the ‘dimension’ in Dimensional Modelling.Dimensional modeling is a technique used in data warehousing to organize and structure data for easy retrieval and analysis. It starts by…May 25, 2024May 25, 2024
JitapichabApache Nifi: Integrate Kafka to consume and produce.Apache nifi, Kafka , Kafka consumer and Kafka oducer with python.Jun 27, 20201Jun 27, 20201
InTowards Data EngineeringbyYusuf GaniyuRobust Data Pipelines with Databricks, Spark, DBT, and Azure | Data Engineering ProjectIn the ever-evolving world of data engineering, the buzz around the Data Build Tool (DBT) is becoming impossible to ignore. This tool is…Dec 18, 20232Dec 18, 20232
InTowards AIbyHamza GharbiEnd-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and DockerBuilding a Practical Data Pipeline with Kafka, Spark, Airflow, Postgres, and DockerJan 19, 202416Jan 19, 202416