Using Cloudera Enterprise, it is possible to build and operate an enterprise-grade Hadoop/Spark platform. To make use of big data, what kind of platform is needed, and how do you get the most out of it? From the perspective of data engineering and data science, I will introduce machine learning that uses SQL-on-Hadoop, Spark, and Python.
An introduction of using arbitrary Python packages on PySpark with Cloudera Data Science Workbench
In this session, we will introduce the benefits of the integrated data analysis platform, which is important for using data in the enterprise, and how Cloudera will prevent the analysis environment from becoming silos.
Developing and testing workflows productively is hard. In this session, I talk about how to develop heavy, data-dependent workflows with Digdag.
When executing machine learning pipelines for trainings and inferences, the systems and machine learning infrastructures vary depending on required characteristics and requirements such as the purpose of the application, data volume, and latency. On the other hand, many companies in industry have built machine learning infrastructures with each companies knowledges. The knoledges are not organized yet since they are engineering efforts and engineers less motivated to publish them so that there are few papers for the system design and problems characteristic in machine learning infrastructure and systems. In this presentation, we will introduce challenges for machine learning systems, especially for continuous prediction in the production environment and approaches to them.
Adopting a machine learning system is an essential step for enterprise companies to progress to the next stage of their business. However, machine learning systems tend to be complex, because they depend on different languages, libraries, or frameworks, such as scikit-learn, TensorFlow, and XGBoost. As a result, there are many challenges for building machine learning system in production, including determining which architecture is best for which use case, how to deploy your predictive models, and how to move from development and to a production environment. I explain how to put your machine learning model into production, discusses common issues and obstacles you may encounter, and shares best practices and typical architecture patterns of deployment ML models with example designs from the Hadoop and Spark ecosystem using Cloudera Data Science Workbench.