Data Engineering Essentials Hands-on - SQL, Python and Spark - Druckversion +- Forum Rockoldies (https://rockoldies.net/forum) +-- Forum: Fotobearbeitung - Photoshop (https://rockoldies.net/forum/forumdisplay.php?fid=16) +--- Forum: E-Learning, Tutorials (https://rockoldies.net/forum/forumdisplay.php?fid=18) +--- Thema: Data Engineering Essentials Hands-on - SQL, Python and Spark (/showthread.php?tid=36358) |
Data Engineering Essentials Hands-on - SQL, Python and Spark - Panter - 26.09.2021 Data Engineering Essentials Hands-on - SQL, Python and Spark MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch Genre: eLearning | Language: English + srt | Duration: 483 lectures (46m 11s) | Size: 17.1 GB Learn key Data Engineering Skills such as SQL, Python and Spark with tons of Hands-on tasks and exercises What you'll learn: Setup Development Environment to learn building Data Engineering Applications on GCP Database Essentials for Data Engineering using Postgres Data Engineering Programming Essentials using Python Data Engineering using Spark Dataframe APIs (PySpark) Data Engineering using Spark SQL (PySpark and Spark SQL) Relevance of Spark Metastore and integration of Dataframes and Spark SQL Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines Setup self support single node Hadoop and Spark Cluster to get enough practice on HDFS and YARN Requirements Laptop with decent configuration (Minimum 4 GB RAM and Dual Core) Sign up for GCP with the available credit or AWS Access Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit) CS or IT degree or prior IT experience is highly desired Description As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as well as Spark. About Data Engineering Data Engineering is nothing but processing the data depending upon our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Course Details As part of this course, you will be learning Data Engineering Essentials such as SQL, Programming using Python and Spark. Here is the detailed agenda for the course. Data Engineering Labs - Python and SQL You will start with setting up self-support Data Engineering Labs either on GCP or Cloud9 so that you can learn the key skills related to Data Engineering with a lot of practice leveraging tasks and exercises provided by us. As you pass the sections related to SQL and Python, you will also be guided to set up Hadoop and Spark Lab. Provision GCP Server or AWS Cloud9 Instance Setup Docker to host Postgres Database Setup Postgres Database to practice SQL Setup Jupyter Lab Once Jupyter Lab is setup, you can upload the Jupyter Notebooks and start practicing all the key skills related to Data Engineering. Database Essentials - SQL using Postgres It is important for one to be proficient with SQL to take care of building data engineering pipelines. SQL is used for understanding the data, perform ad-hoc analysis, and also in building data engineering pipelines. Getting Started with Postgres Basic Database Operations (CRUD or Insert, Update, Delete) Writing Basic SQL Queries (Filtering, Joins, and Aggregations) Creating Tables and Indexes Partitioning Tables and Indexes Predefined Functions (String Manipulation, Date Manipulation, and other functions) Writing Advanced SQL Queries Programming Essentials using Python Python is the most preferred programming language to develop data engineering applications. As part of several sections related to Python, you will be learning most of the important aspects of Python to build data engineering applications effectively. Perform Database Operations Getting Started with Python Basic Programming Constructs Predefined Functions Overview of Collections - list and set Overview of Collections - dict and tuple Manipulating Collections using loops Understanding Map Reduce Libraries Overview of Pandas Libraries Database Programming - CRUD Operations Database Programming - Batch Operations Setting up Single Node Data Engineering Cluster for Practice The most common approach to build data engineering applications at scale is by using Spark integrated with HDFS and YARN. Before getting into data engineering using Spark and Hadoop, we need to set up an environment to practice data engineering using Spark. As part of this section, we will primarily focus on setting up a single node cluster to learn key skills related to data engineering using distributed frameworks such as Spark and Hadoop. Setup Single Node Hadoop Cluster Setup Hive and Spark on Single Node Cluster Master required Hadoop Skills to build Data Engineering Applications As part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as Programming Language. Overview of HDFS Commands Data Engineering using Spark SQL Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax. Getting Started with Spark SQL Basic Transformations Managing Tables - Basic DDL and DML Managing Tables - DML and Partitioning Overview of Spark SQL Functions Windowing Functions Data Engineering using Spark Data Frame APIs Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications. Data Processing Overview Processing Column Data Basic Transformations - Filtering, Aggregations, and Sorting Joining Data Sets Windowing Functions - Aggregations, Ranking, and Analytic Functions Spark Metastore Databases and Tables Desired Audience for this Data Engineering Essentials course People from different backgrounds can aim to become Data Engineers. We cover most of the Data Engineering essentials for the aspirants who want to get into the IT field as Data Engineers as well as professionals who want to propel their career towards Data Engineering from legacy technologies. College students and entry-level professionals to get hands-on expertise with respect to Data Engineering. This course will provide enough skills to face interviews for entry-level data engineers. Experienced application developers to gain expertise related to Data Engineering. Conventional Data Warehouse Developers, ETL Developers, Database Developers, PL/SQL Developers to gain enough skills to transition to be successful Data Engineers. Testers to improve their testing capabilities related to Data Engineering applications. Any other hands-on IT Professional who wants to get knowledge about Data Engineering with Hands-On Practice. Prerequisites to practice Data Engineering Skills Here are the prerequisites for someone who wants to be a Data Engineer. Logistics Computer with decent configuration (At least 4 GB RAM, however 8 GB is highly desired) Dual Core is required and Quad-Core is highly desired Chrome Browser High-Speed Internet Desired Background Engineering or Science Degree Ability to use computer Knowledge or working experience with databases and any programming language is highly desired Training Approach for learning required Data Engineering Skills Here are the details related to the training approach for you to master all the key Data Engineering Skills to propel your career towards Data Engineering. It is self-paced with reference material, code snippets, and videos provided as part of Udemy. One can either use the environment provided by us or set up their own environment using Docker on AWS or GCP or the platform of their choice. We would recommend completing 2 modules every week by spending 4 to 5 hours per week. It is highly recommended to take care of the exercises at the end to ensure that you are able to meet all the key objectives for each module. Support will be provided through Udemy Q&A. The course is designed in such a way that one can self-evaluate through the course and confirm whether the skills are acquired. Here is the approach we recommend you to take this course. The course is hands-on with thousands of tasks, you should practice as you go through the course. You should also spend time understanding the concepts. If you do not understand the concept, I would recommend moving on and come back later to the topic. Go through the consolidated exercises and see if you are able to solve the problems or not. Make sure to follow the order we have defined as part of the course. After each and every section or module, make sure to solve the exercises. We have provided enough information to validate the output. By the end of the course, then you can come to the conclusion that you are able to master essential skills related to SQL, Python, and Spark. Who this course is for Computer Science or IT Students or other graduates with passion to get into IT Data Warehouse Developers who want to transition to Data Engineering roles ETL Developers who want to transition to Data Engineering roles Database or PL/SQL Developers who want to transition to Data Engineering roles BI Developers who want to transition to Data Engineering roles QA Engineers to learn about Data Engineering Application Developers to gain Data Engineering Skills Homepage Download from Rapidgator: |