UM  > 科技學院
Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation
Guoyao Xu1; Cheng-Zhong Xu1,2
2017
Conference NameSymposium on Cloud Computing (SoCC)
Source PublicationPROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17)
Conference DateSEP 24-27, 2017
Conference PlaceSanta Clara, CA
Abstract

Modern in-memory distributed computation frameworks like Spark adequately leverage memory resources to cache intermediate data across multi-stage tasks in pre-allocated worker processes, so as to speedup executions. They rely on a cluster resource manager like Yarn or Mesos to pre-reserve specific amount of CPU and memory for workers ahead of task scheduling. Since a worker is executed for an entire application and runs multiple batches of DAG tasks from multi-stages, its memory demands change over time [3]. Resource managers like Yarn solve the non-trivial allocation problem of determining right amounts of memory provision for workers by requiring users to make explicit reservations before execution. Since the underlying execution frameworks, workload and complex codebases are invisible, users tend to over-estimate or under-estimate workers’ demands, leading to over-provisioning or under-provisioning of memory resources. We observed there exists a performance inflection point with respect to memory reservation per stage of applications. After that, performance fluctuates little even under overprovisioned memory [1]. It is the minimum required memory to achieve expected nearly optimal performance. We call these capacities as optimal demands. They are capacity cut lines to divide over-provisioning and under-provisioning. To relieve the burden of users, and provide guarantees over both maximum cluster memory utilization and optimal application performance, we present a system namely Prometheus for online estimation of optimal memory demand for workers per future stage, without involving users’ efforts. The procedure to explore optimal demands is essentially a search problem correlated memory reservation and performance. Most existing searching methods [2] need multiple profiling runs or prior historical execution statistics, which are not applicable to online configuration of newly submitted or non-recurring jobs. The recurring applications’ optimal demands also change over time under variations of input datasets, algorithmic parameters or source code. It becomes too expensive and infeasible to rebuild new search model for every setting.Prometheus adopts a two-step approach to tackle the problem: 1) For newly submitted or non-recurring jobs, we do profiling and histogram frequency analysis of job’s runtime memory footprints from only one pilot run under over-provisioned memory. It achieves a highly accurate (over 80% accuracy) initial estimation of optimal demands per stage for each worker. By analyzing frequency of past memory usages per sampling time, we efficiently estimate probability of base demands and distinguish them from unnecessarily excessive usages. Allocation of base demands tends to achieve near-optimal performance, so as to approach optimal demands. 2) Histogram frequency analysis algorithm has an intrinsic property of self-decay. For subsequent recurring submissions, Prometheus exploits this property to efficiently perform a recursive search. It obtains stepwise refinement and rapidly reaches optimal demands through few recurring executions. We demonstrate this recursive search reaches up to 3-4 times lower searching overheads and 2-4 times more accuracy compared with alternative solutions like random search. We validate the design by implementing Prometheus atop of Spark and Yarn. The experimental results show that it achieves an ultimate accuracy of more than 92%. By deploying Prometheus and reserving memory according to the optimal demands, one could improve cluster memory utilization by about 40%. It simultaneously reduces individual application execution time by over 35% comparing to the state-of-the-art approaches. Overall, the optimal memory demands knowledge provided by Prometheus enables cluster managers to effectively avoid over-provisioning or under-provisioning of memory resources, and achieve optimal application performance and maximum resource efficiency. Acknowledgment .

KeywordMemory Demands Estimation Resource Efficiency Performance Spark Yarn
DOI10.1145/3127479.3132689
Indexed BySCI
Language英语
WOS Research AreaComputer Science
WOS SubjectComputer Science, Information Systems ; Computer Science, Theory & Methods
WOS IDWOS:000414279000079
全文获取链接
引用统计
被引频次[WOS]:1   [WOS记录]     [WOS相关记录]
Document TypeConference paper
专题Faculty of Science and Technology
Personal research not belonging to the institution
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Wayne State University Shenzhen Institutes of Advanced Technology
2.Shenzhen Institutes of Advanced Technology Wayne State University
推荐引用方式
GB/T 7714
Guoyao Xu,Cheng-Zhong Xu. Prometheus: online estimation of optimal memory demands for workers in in-memory distributed computation[C],2017.
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
Google Scholar
中相似的文章 Google Scholar
[Guoyao Xu]的文章
[Cheng-Zhong Xu]的文章
Baidu academic
中相似的文章 Baidu academic
[Guoyao Xu]的文章
[Cheng-Zhong Xu]的文章
Bing Scholar
中相似的文章 Bing Scholar
[Guoyao Xu]的文章
[Cheng-Zhong Xu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。