Data Lake Training using Python

Code: P1874

3 days

List Tuition : $1,795.00 USD

Course Overview

    Download PDF 

Data lakes are emerging as an increasingly viable solution for extracting value from big data at the enterprise level, and represent the logical next step for early adopters and newcomers alike. The flexibility, agility, and security of having structured, unstructured, and historical data readily available in segregated logical zones brings a bevy of transformational capabilities to businesses.

What many potential users fail to understand, however, is what defines a usable data lake. Often, those new to big data, and even well-versed Hadoop veterans, will attempt to stand up a few clusters and piece them together with different scripts, tools, and third-party vendors. This method is neither cost-effective nor sustainable.

1.Introduction to Data Lakes

2.Introduction to Python and other languages

3.Lake Basics

4.Extract, Transform, and Load vs Extract, Load, and Transform

5.Transformers and Provisioners

6.Working with Basic Zones

a.Transient Zone

b.Raw Zone

c.Trusted Zone

d.Refined Zone

7.Source Connection manager

a.Source Type, Credentials, Owner

8.Data Feed Configuration

a.Feed Name, Type (RDBMS/File/Streaming)

b.Mode - Incremental/Full/CDC

c.Expected Latency

d.Structure information, PII

9.Workflows of core components

a.Hadoop API’s for file

b.Sqoop for RDBMS

c.Kafka, Flume for streaming

10.Operational Stats

a.What, Who, When, Why

b.Failures and Notifications

c.SLA monitoring

11.Application Development Platform

a.Hadoop components Spark, MapReduce, Pig, Hive

b.Abstract and build reusable workflows for common problems

12.Business Rules Integration

a.Rules provided by business

13.Workflow Scheduling / Management

a.Scheduling, dependency

b.Logging

14.Destination Connections

a.Destination Type, Credentials, Owner

b.Provisioning Metadata

c.Type (RDBMS/File/Streaming)

d.Filters if applicable

e.Mode Full / Incremental

f.Frequency: daily / hourly / message

15.Scripts

16.Scaling Python

17.Debugging and Unit Testing Python


Labs

Install Python and writing basic scripts

Language features needed in all applications

Basic features of Python

Core of Python

Matrices

Basketball Dataset

Homework challenge

Demographic data

Movie ratings

1.Introduction to Data Lakes

2.Introduction to Python and other languages

3.Lake Basics

4.Extract, Transform, and Load vs Extract, Load, and Transform

5.Transformers and Provisioners

6.Working with Basic Zones

a.Transient Zone

b.Raw Zone

c.Trusted Zone

d.Refined Zone

7.Source Connection manager

a.Source Type, Credentials, Owner

8.Data Feed Configuration

a.Feed Name, Type (RDBMS/File/Streaming)

b.Mode - Incremental/Full/CDC

c.Expected Latency

d.Structure information, PII

9.Workflows of core components

a.Hadoop API’s for file

b.Sqoop for RDBMS

c.Kafka, Flume for streaming

10.Operational Stats

a.What, Who, When, Why

b.Failures and Notifications

c.SLA monitoring

11.Application Development Platform

a.Hadoop components Spark, MapReduce, Pig, Hive

b.Abstract and build reusable workflows for common problems

12.Business Rules Integration

a.Rules provided by business

13.Workflow Scheduling / Management

a.Scheduling, dependency

b.Logging

14.Destination Connections

a.Destination Type, Credentials, Owner

b.Provisioning Metadata

c.Type (RDBMS/File/Streaming)

d.Filters if applicable

e.Mode Full / Incremental

f.Frequency: daily / hourly / message

15.Scripts

16.Scaling Python

17.Debugging and Unit Testing Python


Labs

Install Python and writing basic scripts

Language features needed in all applications

Basic features of Python

Core of Python

Matrices

Basketball Dataset

Homework challenge

Demographic data

Movie ratings

Request a Discounted Quote




Bring Training to You

Request schedule for this course

Request a Quote for this Class

We provide government and government contractor discounts, please request a quote

Schedule



total option: 0

Hotel and Travel can be included on your quote.
For immediate response, you can call 1-855-515-2170 or we will provide a quote within 4 business hours. Travel must be booked 14 days before training for rate to apply.

Learn How to Become a Managed Learning Member

Request a Quote

Thank you for requesting a quote, we will be in touch shortly with a quote. If you need immediate assistance, please call 855-515-2170.

Request Other Date

Request date or location you need

Don’t see the date or location you need? Contact us and let us know, we are adding dates and locations daily.