Course curriculum

  • 1
  • 2

    Module 1: Course Overview

    • Segment - 01 - Course Structure and Approach
    • Segment - 02 - Pre-requisites
    • Segment - 03 - Course Audience
    • Segment - 04 - About Instructor
  • 3

    Module 2: Environment Setup

    • Segment - 05 - Google Cloud Account Setup
    • Segment - 06 - Creating a Dataproc Cluster
    • Segment - 07 - GCP Account Best Practices
    • Installation DataProc cluster
  • 4

    Module 3: Holistic View, Architectures and Pipelines

    • Segment - 08 - Big Data Logical Architecture
    • Segment - 09 - Evolution of Big Data Technologies
    • Segment - 10 - Key Big Data Architectures
    • Segment - 11 - Typical Big Data Batch Pipeline
    • Segment - 12 - Typical Big Data Streaming Pipeline
    • Segment - 13 - Bonus 1 - Another Example of Big Data Streaming Pipeline
    • Segment - 14 - Bonus 2 - Another Example of Big Data Streaming Pipeline
  • 5

    Module 4: Key Ingestion-Data Flow Frameworks

    • Segment - 15 - Factors to consider while comparing Ingestion frameworks
    • Segment - 16 - Kafka vs Flume
    • Segment - 17 - NiFi vs Kafka
    • Segment - 18 - Sqoop vs Flume
    • Segment - 19 - Sqoop vs Kafka Connect
    • Segment - 20 - Hands-on NiFi Installation
    • Segment - 21 - Hands-on Kafka Installation
    • Segment - 22 - Hands-on Kafka and NiFi Integration Background
    • Segment - 23 - Hands-on Kafka and NiFi Integration
  • 6

    Module 5: Key Storage Frameworks

    • Segment - 24 - Factors to consider while comparing Storage frameworks
    • Segment - 25 - HDFS vs HBase
    • Segment - 26 - HBase vs Kudu
    • Segment - 27 - HDFS vs Kudu
    • Segment - 28 - HBase vs Cassandra
  • 7

    Module 6: Data formats

    • Segment - 29 - Text vs Binary
    • Segment - 30 - Interoperability
    • Segment - 31 - Row Oriented vs Column Oriented
    • Segment - 32 - Splittable Formats
    • Segment - 33 - Schema Evolution
    • Segment - 34 - Comparing Data Formats
    • Segment - 35 - Hands-on Sqoop Installation on Dataproc Cluster
    • Segment - 36 - Hands-on Big Data Batch Pipeline Use Avro Format
  • 8

    Module 7: Key Data Processing Frameworks

    • Segment - 37 - Factors to consider while comparing Processing frameworks
    • Segment - 38 - MR vs Spark Logical Architecture Perspective
    • Segment - 39 - MR vs Spark Performance Perspective
    • Segment - 40 - Spark vs Tez
    • Segment - 41 - Spark vs Flink
    • Segment - 42 - Kafka Streams vs Spark Streaming
    • Segment - 43 - Spark 2.x Streaming vs Spark 1.x Streaming
    • Segment - 44 - Spark Core vs Spark SQL
    • Segment - 45 - Hands-on Kafka & Spark Streaming Integration
  • 9

    Module 8: Key Data Analysis Frameworks

    • Segment - 46 - Factors to consider while comparing Analysis frameworks
    • Segment - 47 - Hive vs Impala
    • Segment - 48 - Hive vs Pig
    • Segment - 49 - Hive vs Spark SQL
    • Segment - 50 - Hive vs Hive LLAP vs Impala
    • Segment - 51 - Hive vs KSQL
    • Segment - 52 - 7. KSQL vs KSQLDB
    • Segment - 53 - Hands-on KSQL
    • Segment - 54 - Hands-on Write to a Stream and Table using KSQL
    • Segment - 55 - Hands-on Streaming ETL Pipeline Background
    • Segment - 56 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 1
    • Segment - 57 - Hands-on Build a Scalable ETL Pipeline with Kafka Connect - part 2
  • 10

    Module 9: Delta Lake

    • Segment - 58 - Delta Architecture
    • Segment - 59 - Why Delta Lake?
    • Segment - 60 - Challenges with Data Lake
    • Segment - 61 - Delta Lake Demo
  • 11

    Module 10: Bonus

    • Segment - 62 - Solr vs ElasticSearch
    • Segment - 63 - Cloudera Search vs Solr
    • Segment - 64 - Oozie vs Airflow
    • Segment - 65 - KSQL vs KStreams
  • 12

    Module 11: Epilogue

    • Segment - 66 - Conclusion