Exeliq provides Apache Spark implementation services. We highly recommend Spark to enterprises worldwide. The framework offers great performance benefits and versatility. Enterprises are faced with a high volume and velocity of data coming from web and mobile apps. To stay ahead of the curve, it is critical that the speed of data processing and analysis should support the Big Data apps. Spark gives your business that advantage. It also offers multiple analytics options such as machine learning, streaming analytics and graph analytics.
Apache Spark - Exeliq
Why Exeliq Kubernetes
Experienced Teachers
Learn from The Exeliq’s experienced data science instructors who are dedicated to teaching data analytics.
Jumpstart your Career
Learn the most sought-after tools and techniques in the industry to help jumpstart your data analyst career.
Working as Professionals
This course was designed for the busy lives of working professionals with a part-time schedule and recorded lectures.
Build a Series of Miniprojects
Gain hands-on experience applying the tools employers value to real-world data sets. All powered by a 100-node cluster.
Live Lectures
Live lectures allow you to ask your instructor questions and interact with your classmates. Registered students can find their demo live lectures.
Small Class Sizes
Small class sizes ensure you have plenty of access to your instructor and can receive personalized feedback on your progress.
Prerequisites
- Intermediate Python and Spark/Scala
- Azure/AWS (S3, Redshift, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Data warehouse)
- Hadoop/Hive
Target Audience
This class is for you if:
- Programmers, Developers, Technical Leads, Architects
- Developers/Business Analysts aspiring to be a ‘Machine Learning Engineer’
- Data Scientists/Data Analysts who want to gain expertise in Predictive Analytics
- ‘Python’ professionals who want to design automatic predictive models
Syllabus
Spark Overview
1. Basic Spark Components
2. Spark Architecture
3. Low Level API – RDD & RDD Operation (Trasformation and Actions)
4. Discributed Variable – Broadcast Variable & Accumulator
5. RDD – Partitions and Shuffling
Spark SQL and DataFrames
1. Reading from CSV, JSON, Parquet Files, JDBC
2. Writing Data in CSV, JSON, Parquet Files, JDBC
3. Use of DataFrames
4. Use of DataSets
5. Spark SQL
6. SQL Joins with DataFrames
7. Broadcast Join
8. Aggregations
9. UDF
10. Catalyst Query Optimization (Theory )
Spark Internals
1. Jobs, Stages and Tasks (Theory)
2. Partitions and Shuffling
Structured Streaming
1. Streaming Sources and Sinks
2. Structured Streaming APIs
3. Windowing and Aggregation
4. Checkpointing
5. Watermarking
6. Reliability and Fault Tolerance (Theory)
Machine Learning
1. Basic of Spark ML
2. Liner and Logistic Regression ML Algo
Graph Processing with GraphFrames
1. Basic GraphFrames API
Learning Outcomes
Upon completion of this course, you will be able to:
- Learn how to use Databricks for Python/Spark/R/Spark-SQL development
- Setup job for Notebook, Setup Spark cluster
- Setup BI Tool with Databricks
- Intergrate CMD CLI with Databricks
- Use Databrick Rest API
- Use Databrick for Data Visulazation
- Learn how to use the Databricks for ML/GraphX/Predictive models
Testimonials
The data science skills I sharpened at The Data Incubator helped me analyze diversity in STEM education, model SaaS stock prices, and compare industry growth rates.
Christian Templeton, Google
From day one I was getting my hands dirty working with data using industry-relevant tools. Having completed the program I’m now better equipped to manage engineering and product teams.
Michelle Miller, Hack Reactor
Their heavy focus on applied learning meant that I was working on real data and solving real problems right from the start.
David Kwon, Yelp
A few of our 250+ Hiring
and Training Partners
Frequently Asked Questions
Who is this course designed for?
This course is designed for anyone who would like to learn the essentials of data science. Students will gain hands-on experience working with some of the most in-demand big data technologies and leave ready to kickstart their data science careers.
When is the next session? How can I join?
You can find the dates of upcoming sessions and register via The Exeliq’s Eventbrite page.
What is the approximate time commitment for this course?
Our Foundations course includes four hours of lectures each week. You’ll also want to allot extra time for reviewing lectures and working on the associated projects; we generally recommend setting aside at least four additional hours per week for this, but it’s up to you how much time you’d like to put in.
Does this class offer job search support?
As Foundations is intended to add a set of skills for students who are currently employed full time, we do not offer placement or job search support for this course.