Sridhar Alla works as the director of big data solutions and architecture at Comcast, where he has delivered several key solutions, such as the Xfinity personalization platform, clickthru analytics, and the correlation platform, marketing and analytics marts at PB scale. Sridhar started his career in network appliances on NAS storage and caching technologies. Previously, he served as the CTO of security company eIQNetworks, where he merged the concepts of big data and cyber security & compliance , encryption etc. He holds patents on the topics of very large-scale processing algorithms and caching.
Data Science As A Service
Almost all organizations now have a need for datascience and as such the main challenge after determining the algorithm is to scale it up and make it operational. We at comcast use several tools and technologies such as Python, R, SaS, H2O and so on. In this talk we will show how many common use cases use the common algorithms like Logistic Regression, Random Forest, Decision Trees , Clustering, NLP etc.
Spark has several Machine Learning algorithms built in and has excellent scalability. Hence we at comcast built a platform to provide DSaaS on top of Spark with REST API as a means of controlling and submitting jobs so as to abstract most users from the rigor of writing(repeating ) code instead focusing on the actual requirements.
We will show how we solved some of the problems of establishing feature vectors, choosing algorithms and then deploying models into production. We will discuss what feature engineering is all about , various techniques to use and how to scale to 20000 column datasets using random forest, svd, pca. Also demonstrated is how we can build a service around these to save time and effort when building 100s of models.