Intermediate Python and Spark/Scala
Azure/AWS (S3, Redshift, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Data warehouse)
Hadoop/Hive
This class is for you if:
Programmers, Developers, Technical Leads, Architects
Developers/Business Analysts aspiring to be a ‘Machine Learning Engineer'
Data Scientists/Data Analysts who want to gain expertise in Predictive Analytics
'Python' professionals who want to design automatic predictive models
Upon completion of this course, you will be able to:
Learn how to use Databricks for Python/Spark/R/Spark-SQL development
Setup job for Notebook, Setup Spark cluster
Setup BI Tool with Databricks
Intergrate CMD CLI with Databricks
Use Databrick Rest API
Use Databrick for Data Visulazation
Learn how to use the Databricks for ML/GraphX/Predictive models.
Basic Spark Components
Spark Architecture
Low Level API – RDD & RDD Operation
(Trasformation and Actions)
Discributed Variable – Broadcast Variable & Accumulator
RDD – Partitions and Shuffling
Reading from CSV, JSON, Parquet Files, JDBC
Writing Data in CSV, JSON, Parquet Files, JDBC
Use of DataFrames
Use of DataSets
Spark SQL
SQL Joins with DataFrames
Broadcast Join
Aggregations
UDF
Catalyst Query Optimization(Theory )
Jobs, Stages and Tasks (Theory )
Partitions and Shuffling
Streaming Sources and Sinks
Structured Streaming APIs
Windowing and Aggregation
Checkpointing
Watermarking
Reliability and Fault Tolerance (Theory)
Basic of Spark ML
Liner and Logistic Regression ML Algo
Basic GraphFrames API
The next Data Science Foundations Online Course will run from 2019-03-05 to 2019-04-25. Classes are generally held on Tuesdays and Thursdays from 6:30-9:30 PM ET / 3:30-6:30 PM PT, with some exceptions for holidays. The deadline for registration is 2019-02-22. The course tuition is $3495 with early-bird discounts available.
The exact dates for the next session will be: 3/5, 3/7, 3/12, 3/14, 3/19, 3/21, 3/25, 3/28, 4/2, 4/4, 4/9, 4/11, 4/16, 4/18, 4/23, 4/25