Myles Baker is a Solutions Architect who helps large enterprises develop Apache Spark applications using Databricks. His work on image processing software at NASA introduced him to distributed computing, and since then he has helped clients build data science models and applications at-scale spanning multiple industries — including Comcast! He received a B.S. in Applied Mathematics from Baylor University and an M.S. in Computer Science from the College of William and Mary.
Building, Scaling, and Deploying Deep Learning Pipelines with Apache Spark
Deep Learning has shown tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner.
This talk introduces the Deep Learning Pipelines, a new open-source package for Apache Spark.
This package simplifies Deep Learning in three major ways:
1. It has a simple API that integrates well with enterprise Machine Learning pipelines.
2. It automatically scales out common Deep Learning patterns, thanks to Spark.
3. It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL.
In this talk, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show:
- how to build deep learning models in a few lines of code;
- how to scale common tasks like transfer learning and prediction; and
- how to publish models in Spark SQL.