UM  > Personal research not belonging to the institution
Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation
Guoyao Xu1; Cheng-Zhong Xu1,2
Conference NameSymposium on Cloud Computing (SoCC)
Conference DateSEP 24-27, 2017
Conference PlaceSanta Clara, CA

Modern in-memory distributed computation frameworks like Spark adequately leverage memory resources to cache intermediate data across multi-stage tasks in pre-allocated worker processes, so as to speedup executions. They rely on a cluster resource manager like Yarn or Mesos to pre-reserve specific amount of CPU and memory for workers ahead of task scheduling. Since a worker is executed for an entire application and runs multiple batches of DAG tasks from multi-stages, its memory demands change over time [3]. Resource managers like Yarn solve the non-trivial allocation problem of determining right amounts of memory provision for workers by requiring users to make explicit reservations before execution. Since the underlying execution frameworks, workload and complex codebases are invisible, users tend to over-estimate or under-estimate workers’ demands, leading to over-provisioning or under-provisioning of memory resources. We observed there exists a performance inflection point with respect to memory reservation per stage of applications. After that, performance fluctuates little even under overprovisioned memory [1]. It is the minimum required memory to achieve expected nearly optimal performance. We call these capacities as optimal demands. They are capacity cut lines to divide over-provisioning and under-provisioning. To relieve the burden of users, and provide guarantees over both maximum cluster memory utilization and optimal application performance, we present a system namely Prometheus for online estimation of optimal memory demand for workers per future stage, without involving users’ efforts. The procedure to explore optimal demands is essentially a search problem correlated memory reservation and performance. Most existing searching methods [2] need multiple profiling runs or prior historical execution statistics, which are not applicable to online configuration of newly submitted or non-recurring jobs. The recurring applications’ optimal demands also change over time under variations of input datasets, algorithmic parameters or source code. It becomes too expensive and infeasible to rebuild new search model for every setting.Prometheus adopts a two-step approach to tackle the problem: 1) For newly submitted or non-recurring jobs, we do profiling and histogram frequency analysis of job’s runtime memory footprints from only one pilot run under over-provisioned memory. It achieves a highly accurate (over 80% accuracy) initial estimation of optimal demands per stage for each worker. By analyzing frequency of past memory usages per sampling time, we efficiently estimate probability of base demands and distinguish them from unnecessarily excessive usages. Allocation of base demands tends to achieve near-optimal performance, so as to approach optimal demands. 2) Histogram frequency analysis algorithm has an intrinsic property of self-decay. For subsequent recurring submissions, Prometheus exploits this property to efficiently perform a recursive search. It obtains stepwise refinement and rapidly reaches optimal demands through few recurring executions. We demonstrate this recursive search reaches up to 3-4 times lower searching overheads and 2-4 times more accuracy compared with alternative solutions like random search. We validate the design by implementing Prometheus atop of Spark and Yarn. The experimental results show that it achieves an ultimate accuracy of more than 92%. By deploying Prometheus and reserving memory according to the optimal demands, one could improve cluster memory utilization by about 40%. It simultaneously reduces individual application execution time by over 35% comparing to the state-of-the-art approaches. Overall, the optimal memory demands knowledge provided by Prometheus enables cluster managers to effectively avoid over-provisioning or under-provisioning of memory resources, and achieve optimal application performance and maximum resource efficiency. Acknowledgment .

KeywordMemory Demands Estimation Resource Efficiency Performance Spark Yarn
Indexed BySCIE
WOS Research AreaComputer Science
WOS SubjectComputer Science, Information Systems ; Computer Science, Theory & Methods
WOS IDWOS:000414279000079
Fulltext Access
Citation statistics
Cited Times [WOS]:2   [WOS Record]     [Related Records in WOS]
Document TypeConference paper
CollectionPersonal research not belonging to the institution
Affiliation1.Wayne State University Shenzhen Institutes of Advanced Technology
2.Shenzhen Institutes of Advanced Technology Wayne State University
Recommended Citation
GB/T 7714
Guoyao Xu,Cheng-Zhong Xu. Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation[C],2017.
APA Guoyao Xu,&Cheng-Zhong Xu.(2017).Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation.PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17).
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Guoyao Xu]'s Articles
[Cheng-Zhong Xu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Guoyao Xu]'s Articles
[Cheng-Zhong Xu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Guoyao Xu]'s Articles
[Cheng-Zhong Xu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.