# Learning Data Placement with RLDP in Modern Distributed Storage Systems

RLRP: High-Efficient Data Placement with Reinforcement Learning for Modern Distributed Storage Systems

Modern distributed storage systems with massive data and storage nodes pose higher requirements to data placement strategies. And with emerged new storage devices, heterogeneous storage architecture has become increasingly common and popular. However, traditional strategies expose great limitations in the face of these requirements, especially do not well consider distinct characteristics of heterogeneous storage nodes yet, which will lead to suboptimal performance. In this paper, we present and evaluate the RLRP, which is a data placement strategy based on deep reinforcement learning (RL). RLRP constructs placement and migration agents through the Deep-Q-Network (DQN) model to achieve fair distribution and adaptive data migration. Besides, RLRP provides optimal performance for heterogeneous environment by an attentional Long Short-term Memory (LSTM) model. Finally, RLRP adopts Stagewise Training and Model fine-tuning to accelerate the training of RL models with large-scale state and action space. RLRP is implemented on Park and the evaluation results indicate RLRP is a high-efficient data placement strategy for modern distributed storage systems and. RLRP can reduce read latency by 10%~50% in heterogeneous environment compared with existing strategies. In addition, RLRP is used in the real-world system Ceph, which improves the read performance of Ceph by 30%~40%.

## Abstract

Modern distributed storage systems with massive data and storage nodes pose higher requirements to data placement strategies. And with emerged new storage devices, heterogeneous storage nodes are very common and popular. However, traditional strategies expose great limitations in the face of these requirements, especially do not well consider distinct characteristics of heterogeneous storage nodes yet, which will lead to suboptimal performance. In this paper, we present and evaluate the RLRP, which is a data placement strategy based on deep reinforcement learning (RL). RLRP constructs placement and migration agents through the Deep-Q-Network (DQN) model to achieve fair distribution and adaptive data migration. Besides, RLRP provides optimal performance for heterogeneous environment by an attentional Long Short-term Memory (LSTM) model. Finally, RLRP adopts Stagewise Training and Model fine-tuning to accelerate the training of RL models with large-scale state and action space. RLRP is implemented on Park and the evaluation results indicate RLRP is a highly efficient data placement strategy for modern distributed storage systems. RLRP can reduce read latency by 10%~50% in heterogeneous environment compared with existing strategies. In addition, RLRP is used in the real-world system Ceph, which improves the read performance of Ceph by 30%∼40%.

addresses the challenges of training RL models.

## Introduce

• 数据放置对分布式系统可靠性非常重要）

massive amounts of data

With the explosive growth of data, emerging distributed storage systems with massive amounts of data and larger clusters are facing great challenges in performance, scalability and reliability. The primary task is to distribute petabytes of data among tens or hundreds of thousands of storage devices. It’s necessary to design an efficient, fair and adaptive data placement algorithm. Poor data placement algorithms often lead to imbalance in resource usage, which make part of the system becoming an unnecessary bottleneck, leading to longer delays, wasted storage space, and diminished reliability.

Modern distributed storage systems have the following characteristics, which bring higher requirements and challenges to data placement algorithm. 1) Larger data volume and larger number of nodes. It is more difficult to distribute storage resources fairly, while ensuring time and space efficiency. 2) System scalability. Expansion and data migration are very common in modern distributed storage systems. A scalable algorithm is necessary, which can adjust the placement of data in response to changes in storage scale, and re-distribute data fairly with a small number of data migrations. 3) Higher reliability requirements. In the petabytes-level storage system, devices failures occur almost every day, and frequent failures cause serious data loss[??]. Therefore, the algorithm must provide disaster recovery capability, such as data replication and erasure codes, etc. 4) Heterogeneous storage nodes. With the development of storage devices, more high-performance and low-latency storage devices, such as non-volatile memory (NVM), NVMe and Optane SSD, are used in different nodes. The storage capacity of each node is different, which includes capacity, network, IO bandwidth, and so on. The algorithm should comprehensively consider the heterogeneity of nodes and reasonably distribute the nodes to achieve the best performance of the system. These requirements pose great challenges to the data placement algorithm.

3）综合两则

Traditional algorithms make trade-offs on these requirements and show great limitations in modern distributed storage systems. The most representative data distribution mapping algorithms include: 1) Global mapping strategies provide a global mapping （a table or directory）between data blocks and storage systems for locating data items and replicas, which are adopted by many distributed file systems, such as GFS[?], HDFS[?] and so on. However, as the amount of data increases, tables or directories grow linearly in the number of data blocks, resulting in higher storage space and data search costs. 2) Decentralized mapping represented by the Hashing. Decentralized mapping schemes for data placement eliminate the cost of maintaining global mappings by providing a compact function (hashing, pseudo-hashing, etc.) between data and servers. Hash-based schemes including Linear hash[?] , Extensible hash[?] , rush, consistency hash[?] , crush[?] and random slicing, etc. are widely used in distributed systems. For example, Amazon’s Dynamo system optimizes the consistent by hash virtual nodes to achieve high scalability and Ceph uses the pseudo-random CRUSH algorithm to provide high flexibility and reliability. But they also have big flaws in the modern storage environment. Take CRUSH as an example, Its replica selection strategy often results in unbalanced data placement and uncontrolled data migration. In addition, these algorithms do not consider the heterogeneity of devices, or only consider some differences, such as the capacity of nodes, which makes it difficult to achieve the best performance of heterogeneous nodes in modern storage systems.

the storage space and the cost of finding the data will become more and more expensive it is difficult to meet these requirements because they tend to catch one and lose another. A scalable data placement method is necessary, which can adjust the layout of data according to the changes in storage scale, so that data can be stored in new devices. Re-distribute fairly on the set and control the smaller amount of migrationMany academic studies and industrial projects have tried to propose lots of excellent data placement algorithms.

1）被广泛使用的是（called Global mapping）2）以hash函数为代表的去中心化的映射方式，他们不维护3）是综合两种方式，但是这些研究在面对负载多变的现代分布式存储中存在各种问题，往往在 trade-off。另外由于要支持副本以及存储节点的异构性，很多算法处理不了，造成系统性能瓶颈。或者为了达到最高性能，资源往往集中于高性能的节点上，造成性能热点问题，导致。大规模的设备

the replica placement problem is a combinatorial optimization problem

Many studies have shown that the data distribution problem with replicas in modern storage systems can be regarded as a combinatorial optimization problem. In recent years, Reinforcement Learning (RL) is widely used to solve various system problems and provide a good way to solve combinatorial optimization problems. Therefore, we propose RLRP (RL-based Replica Placement), a new data and replica placement scheme with RL models in modern distributed storage systems. This strategy overcomes the limitations of current data distribution strategies by incorporating lessons learned from table-based and pseudo-randomized hashing strategies. In addition, RLRP uses RL models to overcome the limitations of current data distribution strategies and addresses the challenges of training models, and provides optimal performance for heterogeneous systems. The main contributions of our paper are summarized as follows:

• 我们提出了基于强化学习的数据分布和数据迁移算法，提高了存储系统的
• 我们提出基于强化学习的
• 我们为异构环境设计了
• 我们在真实系统中使用我们的框架

The rest of this paper is organized as follows. Section 2 describes the background and motivation behind our work. We introduce the design and implementation details of RangeKV in Sections 3 and 4. Section 5 presents our comprehensive experiments and evaluation result analysis. Related work is discussed in Section 6. Finally, Section 7 concludes the paper.

## Background and Motivation

• Data Placement Algorithm

1）公平性:如果一个方案允许存储在数据节点上的对象的比例等于(或至少接近)它在系统总容量中所占的份额，则该方案称为公平的。公平性可以用数据节点的相对权重(对象数量除以权重)的标准差来衡量。

2）自适应性：如果一个方案在系统规模变化时，可以以最小迁移数据量达到系统重新均匀分配，则该方案可以称为自适应的。自适应性可以由系统规模变化时算法迁移的数据量与最优迁移的数据量之比值来衡量。

3) 冗余:一个方案在存储数据时采用了多个副本或擦除码，称为冗余。

4) 一个方案应该能够考虑数据节点的异构性，考虑节点负载性能等多种因素，以实现系统最佳性能。

5）时间和空间效率。 如果一种方案能以较低的时间和空间复杂度计算目标的位置，我们称其为时间和空间高效的方案。 ）

In distributed storage systems, the task of the data placement algorithm is to distribute data to various devices, which can be regarded as a typical case of the “balls into bins” model. In the following we use a data object (or just object) to refer to each logical data, which is the “ball” in the “balls into bins” model. the storage system, it is a unit of data allocation and access. It can refer to a traditional file fragment, a data block group, or an object in an object-based storage. The storage systems consisting of a large number of back-end storage nodes (called Data Nodes, DNs), each with one or more CPUs, local memory, and locally attached devices for storing data. And DN is the “bin’’ in the balls into bins” model. Therefore, the task of the data placement strategy is to take the instructions from front-end clients and distribute objects to Data Nodes. Clients of the system can read, create, and update objects. Based on the characteristics of modern distributed storage, the data placement scheme can be compared based on the following criteria:

1) Fairness: A scheme is called fair if it allows the fraction of objects stored at a data node is equal (or at least close) to its share of the total capacity of the system. Fairness can be measured by the standard deviation of the relative weight (the number of objects divided by weight) of the data nodes.

2） Adaptivity: A scheme is called adaptive if it can achieve fair redistribution with the near-minimum amount of migrated data in the case that there is any change in the number of objects, data nodes, or the capacities of the system. The adaptivity can be measured by the ratio of the amount of data migrated by the scheme to the amount of data optimally migrated In theory when the system scale changes.

3) Redundancy: A scheme can be called redundant if it adopts multiple replicas or erasure codes when storing data.

4) High Performance means that a scheme can consider data node heterogeneity including the node load, network bandwidth, etc. to achieve the best performance of systems.

5) Time and Space Efficiency. We call a scheme time and space efficient if it can compute the position of an object with a low time and space complexity.

It should be possible to schedule based on device availability adjust the location of data to maximize the performance of the system.The layout algorithm makes the percentage of data on each device equal to its relative weight, so the layout algorithm is fair

1）均匀性，2）传统算法的局限性。在我们的测试中，我们对传统算法的典型代表（。。。）进行了测试，如表？？。

Reinforcement Learning[?] is an area of machine learning concerned with how artificial agents ought to take actions in a certain environment with with goal of maximizing some notion of cumulative reward. RL problems can be formulated as a Markov decision process (MDP), which is used to describe the process of interaction between agent and environment, and the State (S), Action (A), Reward (R) constitute its basic elements. Specifically, In RL, the environment is the world that the agent lives in and interacts with. At every step of interaction, the agent sees a (possibly partial) observation of the state of the world, and then decides on an action to take. The environment changes when the agent acts on it, but may also change on its own. The agent also perceives a reward signal from the environment, a number that tells it how good or bad the current world state is. The goal of the agent is to maximize its cumulative reward. The RL method are a way that the agent can learn behaviors to achieve its goal.

In general, RL can be divided into two categories: value-based and policy-based method. The value-based method outputs the value or benefit (generally referred to as Q-value) of all actions and chooses the action with the highest Q-value, obviously limited to discrete action spaces. The policy-based method is applicable to the continuous action space, and its output is a policy rather than a value, and the action can be immediately outputted according to the policy. The policy-based method is sufficient because of the discrete action space in RLDP.

Q- learning和Deep Q Network (DQN)是最经典的基于值的RL方法。Q-learning使用Q-table记录行为值 (Q value) , 每种在一定状态的行为都会有一个值 Q(s, a)，通常是状态和动作的二维表。策略就是每次根据状态选择Q表中最大Q-value的动作。

Q value的更新是根据贝尔曼方程：

Q-Learning and Deep Q Network (DQN) are the most classic value-based RL methods. Each action in a certain state will have a value (Q-value), which are defined as Q(s, a). Q-learning records Q-value by a Q-table, which is usually a two-dimensional table of states and actions. The policy is to choose the action, which has the maximum Q-value in the Q-table, according to the state. Q-value will be updated at each step according to the Behrman equation:
$Q(s_t, a_t) \gets Q(s_t, a_t) + \alpha[r + \gamma max_{a_{t+1}} Q(s_{t+1}, a_{t+1})-Q(s_t, a_t)]$ where $\alpha$ is the learning rate, 0 $\textless$ $\alpha \leq$ 1; r is the reward received when moving from the state $s_t$ to the state $s_{t+1}$; $\gamma$ is a discount factor, which favors short-term reward if close to zero and strives for a long-term high reward when approaching one.

In RLDP，采用DQN代替Q-learning，是因为Q-learning难以解决大规模分布式系统中数据放置问题中存在的状态空间大的问题。 DQN在q学习中使用神经网络而不是q表来评估q值，即：

Deep Q Network (DQN) is adopted instead of Q-learning because Q-learning is hard to solve the problem of a large state space, which occurs in data placement problems in large-scale distributed systems. DQN uses neural networks rather than Q-tables in Q-Learning to evaluate Q-value,which can be described as follows:

[Q(s, a, w)\approx Q(s, a)]

where $w$ of $Q(s,a,w)$ represents the weights of neural network in DQN. This is a Function Approximation technology\cite{dqn}. However, when a nonlinear function approximator such as a neural network is used to represent Q-value, reinforcement learning is unstable or divergent, which comes from the correlations present in the sequence of observations, the fact that small updates to Q-value may significantly change the policy of the agent and the data distribution, and the correlations between Q-value and the target values. Experience replay uses a random sample of prior actions instead of the most recent action to proceed and is a common optimization technique for DQN. Experience replay removes correlations in the observation sequence and smooths changes in the data distribution. Iterative updates adjust Q towards target values that are only periodically updated, further reducing correlations with the target.

The technique used experience replay, a biologically inspired mechanism that uses a random sample of prior actions instead of the most recent action to proceed.[2] This removes correlations in the observation sequence and smooths changes in the data distribution. Iterative updates adjust Q towards target values that are only periodically updated, further reducing correlations with the target.

• Challenges

this is the first work using reinforcement learning to optimize the data distribution of storage systems, which will face daunting challenges : 1) How to model the problem and how to define the elements of reinforcement learning? 2) How to deal with migration issues? 3) How to speed up training when the number of nodes and the amount of data are relatively large? 4) How to balance performance and fairness in a heterogeneous environment? etc. RLDP solves these problems well and provides a good case for reinforcement learning to solve system problems.

The problem, due to its generality, is studied in many other disciplines(e.g. game theory, control theory, etc). In the case of machine learning, the environment is typically formulated as a Markov Decision Process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between reinforcement learning and the classical dynamic programming methods is that in RL do not assume knowledge of an exact mathematical model of the MDP. An advantage of RL algorithms is that they can target large MDPs where exact methods become impracticable.

## Design

In this section, a new RL-based data placement scheme for distributed storage systems will be described. The whole RLRP framework and RL models is introduced first. Then, the model training, heterogeneous environment processing and implementation are introduced in detail.

• RLRP System

The overall RLRP architecture is shown in Fig. \ref{RLRP}. The RLRP framework defines the mapping of objects to DNs (see Section ?) and provides interfaces for upper-layer operations on objects, including object creation, deletion, and migration. Based on the RL models, the basic elements of Environment and Agent are defined as follows.

Environment：如图？？，我们不直接分布objects到DNs，而是先映射对象到Virtual Nodes，这与Dynamo中的虚拟节点，Ceph中的归置组以及OpenStack-Swift中的分区具有相同的概念。系统上的每个数据对象通过一致性哈希函数映射映射到一个虚拟节点。散列函数应用数据对象的标识来计算使用虚拟节点总数的模运算，然后定义对象属于哪个虚拟节点。由于散列映射层的散列函数输出均匀分布的散列值，因此每个虚拟节点中的对象数量是平衡的。虚拟节点是管理对象、数据迁移和故障管理的基本单位，它设定可以很好解决对象数量大导致难以管理的问题，从而降低数据分布的难度。虚拟节点的个数通常在系统启动前设定，且不会轻易变化，否则会产生大量数据移动的副作用。默认情况下，假设数据节点数为N_DN，副本数是r，则初始化虚拟节点数计算如下：在计算出V后，虚拟节点数量等于V舍入到最接近2的N次幂的值。比如，副本数是3，当数据节点数分别为100，200，300时，V等于，3333.333333，6666.666667，10000，则虚拟节点数等于4096，8192，8192。

Instead of directly distributing objects to DNs, we first map objects to Virtual Nodes, which has the same concept of virtual nodes in Dynamo, placement group (PG) in Ceph, and partitions in OpenStack-Swift. Each data object is mapped to a virtual node through a hash function, which applies the identification (e.g. object ID or name) of a data object to calculate the modulo operation using the total number of virtual nodes, defining then which virtual node the object belongs to. The number of objects in every virtual node is balanced due to the hash function of the hash mapping layer that outputs hashed values uniformly distributed. The virtual node is the basic unit of object management, data migration, and fault recovery. It can ease the difficulty of managing the large number of objects, so as to reduce the burden of data distribution. The number of virtual nodes is set before the system starts and does not change frequently, otherwise it would create side-effect of huge data movements. By default, assuming that the number of data nodes is N_DN and the number of replicas is R, the recommended number of virtual nodes is calculated as follows: $V = 100 * N_d / R$, and $N_v$ is equal to $V$ rounded to the nearest power of 2. For example, if the number of replicas is 3, when the number of data nodes is 100,200,300, the V equals 3333.333333, 6666.666667, 10000, and the number of virtual nodes equals 4096,8192,8192.

VNs映射到DNs通过RLRP来决策，一个虚拟节点会在不同数据节点上复制多份，副本数由上层应用决定。并且由标志来分辨主副本所在，主副本是首先写入的节点，在复制到别的副本，也是读操作访问的节点。如图中例子，副本数是2，VN1选择副本存放的数据节点是DN1和DN2。当每个virtual node被创建后，RLRP算法会考虑DNs节点的状态，通过Agent算法选出副本，并将映射关系加到RPM中。每个对象的读操作首先是通过hash函数找到虚拟节点，在通过查找RPM查找VN所在的数据节点。由于经过hash分组，映射表不会像集中式那样随着objects增长而线性增长。由一个映射表存储。每次读将通过查表找VN所在位置。由于经过hash分组，映射表不会像集中式那样随着objects增长而线性增长。

A Replica Placement Mapping Table (RPMT) generated by RLRP is responsible for defining the mapping of virtual node replicas to data nodes. A virtual node is replicated multiple times on different data nodes and the number of replicas (the replication factor) is determined by the upper-layer application. In the example shown in Fig. \ref{RLRP}, the replication factor is 2, and VN1 selects DN1 and DN2 as the data nodes where the replicas are stored. RPMT can determine the location of the master replica, which is the node that is accessed by read operations and is first written and then copied to other replicas during the write process. When each virtual node is created, the RLRP will consider the states of the data nodes, select the data nodes by RL Agent, and add the mapping to the RPMT. The read operation for each object first finds the virtual node through the hash function, and then finds the data node by looking up the RPMT. Due to the grouping by virtual nodes, the table (RPMT) will not grow linearly in the number of objects like global mapping strategies.

Agent is the main body of RL model, divided into Placement Agent and Migration Agent, which make decisions for distribution and Migration respectively. The basic framework of RL Agent is shown in the figure?? For distribution and migration, we use two kinds of agents to deal with the problem of distribution and migration. The two agents use the same model framework, but the defined elements are different. The framework of each Agent is a standard DQN model, which uses MLP as its neural network model by default and RNN+Attention model in heterogeneous environment. Let’s first introduce the basic Placement Agent, which only considers DN capacity differences. Next, the basic migration Agent is introduced. Then it introduces how to accelerate the training and optimize the model. Finally, the construction of Placement model and migration model under Heterogeneous environment is introduced, which will consider various factors that affect performance. It should be noted that the framework of Agent in heterogeneous environment is the same, but the neural network model is different.

Common interface主要是RLRP和Environment交互的接口，包括Metrics Collector和Action Controller两部分。Metrics Collector负责从环境中获取系统性能和延迟，以及DNs节点的相关属性。DNs的相关属性将作为强化学习的样本特征，在不同环境可以有不同的选择，尤其是异构条件下，关键属性的选择是非常重要的。它们将作为RL Agent的状态和奖励信息送给Agent。

Action Controller负责实施Agent决定的action，它会通过更新映射表来决定选择的副本的数据节点或者迁移后的数据节点。

Common Interface is the interface between RLRP and Environment, including Metrics Collector and Action Controller. Metrics Collector is responsible for obtaining states and attributes of DNs, and system performance or latency from the Environment. The attributes of DNs will be used as the features of the training sample in RL, which can be selected differently in different environments. Especially under heterogeneous conditions, the key feature selection is very important. These metrics are sent to the RL Agent as the state information (st) and reward information (rt) respectively. Action Controller applies the actions of replica placement or data migration suggested by the RL Agent through updating the Replica Placement Mapping Table.

Memory Pool 存储训练相关权重参数，包括DQN的经验回放，模型参数等等

Memory Pool stores training-related weight parameters, including DQN experience playback, model parameters, etc.

RL Agent是RLRP系统的核心，可以看作是RLRP系统的大脑，它从环境中接收奖励和状态，并更新策略来指导如何分配副本以获得更高的奖励。RL Agent基本框架如图？？,Agent分为Placement Agent和Migration Agent，分别为分布和迁移做出决策。数据迁移是指当存储规模发生变化时（数据节点增加或者减少），数据或副本调整以达到数据重新平衡。两者Agent使用一样的模型框架，一个标准的DQN模型，只是定义的元素不同。另外，Agents在异构和非异构的环境下DQN中神经网络模型有所区别。它默认使用的MLP作为其神经网络模型，在异构环境中使用RNN+Attention模型。在非异构环境中，基本的Placement Agent和迁移Agent只考虑DN的容量区别，而Heterogeneous环境下它会考虑各种影响性能的因素。为了加快RL训练，Agent可以并行生成经验(在内存池中存储经验)，并在经验缓冲区达到批大小后进行经验重放，以进行训练和参数更新。

RL Agent can be regarded as the brain or core of the RLRP system, which receives reward and state from Environment and updates the policy to guide how to distribute replicas for getting higher reward. AL shown in the figure?? , agents consist of Placement Agent and Migration Agent, which make decisions for distribution and migration respectively. Data migration refers that when the storage scale changes (data nodes increase or decrease), data or replicas are adjusted to achieve fair redistribution. The two agents use the same model framework, which is a a standard DQN model, but the elements defined are different. To to speed up RL training, Agent can generate the experience in parallel (experience storage in Memory Pool) and perform experience replay when the experience buffer reaches the batch size for training and parameter updating. In addition, the neural network models of DQN are different in heterogeneous and non-heterogeneous environments. MLP is used as the neural network model by default, while LSTM (an RNN model) with Attention mechanism is used in heterogeneous environment in the DQN model of agents. In the non-heterogeneous environment, the basic Placement Agent and Migration Agent only consider the capacity difference of DNs, while various factors that affect performance will be considered in the Heterogeneous environment.

Next, we first introduce basic Placement Agent, Migration Agent and training optimization of the model in non-heterogeneous environments. The model in heterogeneous environment and the implementation will follow.

In this simulated environment, an RL agent balances jobs over multiple heterogeneous servers to minimize the average job completion time. Jobs have a varying size that we pick from a Pareto distribution (Grandl et al., 2016) with shape 1.5 and scale 100. The job arrival process is Poisson with an inter-arrival rate of 55. The number of servers and their service rates are configurable, resulting in different amounts of system load. For example, the default setting has 10 servers with processing rates ranging linearly from 0.15 to 1.05. In this setting, the load is 90%. The problem of minimizing average job completion time on servers with heterogeneous processing rates does not have a closed-form solution (Harchol-Balter & Vesilo, 2010); a widely-used heuristic is to join the shortest queue (Daley, 1987). However, understanding the workload pattern can give a better policy; for example, one strategy is to dedicate some servers for small jobs to allow them finish quickly even if many large jobs arrive (Feng et al., 2005). In this environment, upon each job arrival, the observed state is a vector (j;s1;s2;:::;sk), where j is the incoming job size and sk is the size of queue k. The action a 2 f1;2;:::;kg schedules the incoming job to a specific queue. The reward ri = Pn [min(ti;cn)-ti-1], where ti is the time at step i and cn is the completion time of active job n.

• Placement Agent

At each step of interacting with the environment, according to the current state of DNs, Placement Agent determines the action of placing replicas for each VN, and then the reward is calculated and fed back to the agent. Let n be the number of data nodes in the system. When calculating the replicas location of the ith VN (VNi, ), the State-action-reward configurations of our problem can be described as follows.

We describe state at time step t as st, which means the load state of data nodes when considering VNt. St can be defined by the

the list of relative weights of all data nodes:

St = {w0, w1… Wn}

where wk is equal to the total number of VNs in DNk time step t divided by the capacity of DNk, indicating the relative load state.

action：动作空间是{DN0，DN1，DN2，，，DNn}，表示副本可选择的数据节点编号。At for VNt 等于k元组（DN，DN，，，DN），表示VNt的k个副本所在节点。其中第一个为主副本，具体副本选择算法后面会介绍。默认情况下，选出来的数据节点互不相等以提高可靠性，但是如果n小于K，则会有重复副本在同一节点的情况。

The set of actions is represented by A = {DN0，DN1，DN2，，，DNn}，indicating the numbers of all data nodes. We define action at time step t by the following k tuple At = (DN,DN,..,DN ), which represents the data nodes where k replicas of VNt are stored. The first one is the master replica, and the specific replica selection algorithm will be described later. By default, the selected data nodes are not equal to each other to improve reliability, but if n is less than K, there will be duplicate replicas on the same node.

reward：我们用STD表示St的标准差，直接反映了此刻节点的均衡性。Rt等于St的标准差的负数，即Rt = - STD。St的标准差越大，表示节点越不均衡，Rt就越小，反之Rt越大，越能激励agent往标准差小的方向分布虚拟节点。

We use STD to represent the standard deviation of St, which directly reflects the balance of nodes at the moment. Rt is equal to the negative of the standard deviation of St, which is Rt = -std. The larger the standard deviation of St is, the more imbalanced the nodes are, and the smaller Rt is. Conversely, the larger Rt is, the more agent can be motivated to distribute virtual nodes in the direction of small standard deviation.

 As shown in Fig. \ref{RL_NN}, the input of the Placement Agent is the load state of each data node, and the output is $k$ data nodes storing replicas, which will be added to the mapping table. The Replica Placement Mapping is a binary matrix table of $RPM_{dv}$ cells in which a $RPM_{dv}$ cell $\in {0,1,2}$. A RPM matrix has size $D V$ in which a row represents a data node $d \in D$ and a column represents a virtual node $v \in V$. A virtual node may be replicated $k$ times and each replica of a virtual node is replicated into a data node if the $RPM_{dv}$ cell value is 1 or 2, otherwise is 0. And 1 means the primary replica of the virtual node and 2 means other replica. Placement Agent uses a 2x128-node MLP by default, that is, two hidden layers with 128 nodes each. The specific model configuration can be set according to the actual environment.

Replica Placement Algorithm specifically describes the Placement Agent process. A list a_list is used to store all actions (numbers of data nodes for replica placement), and the algorithm stops when the length of a_list reaches the replication factor. By default, the number of data nodes is greater than the number of replicas, so the same elements cannot exist in a_list. Therefore, the algorithm will preferentially select the action with the largest value in Q_value. If the action is the same as that of the previous one, the action with the second largest value in Q_value will be selected as a substitute, and so on. Specifically, in line xx, action selection is achieved by calling the E function , which applies e-greedy method, that is, the action is randomly selected with probability e, otherwise the action is selected with the value from the largest to the smallest in Q_value. If the number of data nodes is smaller than the number of replicas, each action selection will not consider whether there is overlap, which generally does not appear in reality and is not shown in the algorithm. After each action is selected, the u function is called to perform the replica placement with a_list. The state of data nodes is update and returned as a new state, and mapping table is updated. The standard deviation calculated based on the new state is returned as a reward. A tuple “” will be stored in the Replay Memory D, which can be used for training agent and updating neural network parameters in the training process.

Node changes can be divided into two situations: adding and removing nodes. Removing nodes still uses Placement Agent model. Specifically, assume that the node DNm is removed and the VN replicas in the DNm is remapped to other nodes via Placement Agent with two limitations. Limitation 1: DNm cannot be selected as action; Limitation 2: When n>k, replica conflict is not allowed. That is, the data node that has a replica of the VN cannot be selected. The reduction of nodes requires retraining of Placement Agent for subsequent node distribution.

• Migration Agent

Migration Agent is used for scenarios of adding nodes or manually triggering data migration, to optimize the adaptive of data placement. When adding a data node, it is need to consider migrating replicas to the new node for each virtual node. The State-action-reward configurations of Migration Agent is the same as that of Placement Agent, except for the Action, which is described as follows:

action: 动作集是0~k的常数集{0, 1, …, k}，action a的不同取值表示对每个VN的不同副本的迁移命令。如果a等于0，则VN不会迁移任何副本到新节点上。如果a等于1到k任意一个值i，则表示迁移VN的第i个副本到新节点中，需要从RPM找到。以三副本系统为例，动作空间A={0，1，2，3}.VNi的副本在(DNk, DNj, DNl)中，当action等于0时，VNi不进行任何副本迁移。副本DNk会被迁移当action取值1时，另外action为2，3时，副本 DNj, DNl将会被迁移到新节点中。

The action set is a constant set from 0 to k (). Different values of action a indicate migration commands for different replicas of each VN. If a is equal to 0, VN will not migrate any replicas to the new node. If a is equal to any value i from 1 to k, it means that the i-th replica of the VN is migrated to the new node, which can be found from the RPM. Take the system of three replicas as an example, the action space A={0,1,2,3}. The replicas of VNi is in (DNk, DNj, DNl). When action is equal to 0, VNi does not perform any replica migration. The replica DNk will be migrated when the action value is 1. And the action is 2 or 3, the replica DNj or DNl will be migrated to the new node.

Through the above action definition, it can be ensured that some replicas of virtual nodes are migrated to the new node after the new node joins in the system, and the standard deviation is still used as a reward to ensure the balance of data node data after the migration, which meets the requirements of adaptability. Migration Agent training is similar to Placement Agent. In addition, after adding nodes, Placement Agent also needs to be retrained.

• 训练加速

As shown in Fig. 1 (a), a finite state machine (FSM) is constructed to control the training process, which is consists of model training and model testing. The training FSM can be divided into six states. Initialization state is used to initialize all training parameters and the parameters of the agent model. The training state is after initialization. Unlike the traditional fixed number of training epochs, the training FSM sets the lower limit Emin and the upper limit Emax of the number of epochs. When the number of training epochs epoch reaches Emin, it enters the Check state, which is used to check the quality of training. The standard deviation R of the data node status after training is calculated in Check state. A result can be called qualified only if R is less than or equal to 1. If the training is good, it’s turn to the testing state (), otherwise it’s back to training state. The testing state uses stop as a stop counter, and N is a constant. After each test epoch, stop is incremented if the test result is good, otherwise Check_state is returned. The training process can be completed only when the results is good for N consecutive times or more in testing state. In order to prevent the training from failing to converge, if the training epochs exceed Emax, it will enter Time_out state, at which time an error will be reported during training. The parameter Re specified by users indicates whether to restart the training (return to the initialization state to initialize the new parameters), or end directly. The following describes the training process of each epoch in detail.

Algorithm \ref{alg:Training} lists the pseudo code of the training algorithm, which is the classic DQN training algorithm \cite{dqn}. Firstly the tuples in the relay memory $D$ are replayed periodically to generate mini-batches B. For each mini-batch transition $(s, a, r, s’)$ ($s’$ is the next state of $s$), the target value is calculated as follows: [y = r + \gamma max_{a’} Q(s’, a’; \theta)] where $\gamma$ a discount factor and $\theta$ is the weight vectors of $Q$ network. Different from the target value of the traditional DQN, it lacks the situation in the terminal state, because in our environment, there is no target state. Next, parameters ($\theta$) of $Q$ network are upaded with the mini-batch stochastic gradient descent (SGD) method \cite{playing} and minimize the training objective:

[ min L(\theta) = \mathbb{E}[y - Q(s,a; \theta))^2]] which is to minimize the expectation of the difference between the target value and the output of $Q$ network.

However, in the actual training process, as the number of objects, virtual nodes and data nodes increases. In order to meet the high requirements mentioned above (R<=1), the training time also increases a lot. In order to speed up training, two optimizations are adopted. Stagewise Training is used to optimize training when virtual nodes increase, while Transfer Learning is used when data nodes increase. In addition, the state space will be very large during the training process. In our environment, standard deviation is used to measure the quality of the state and many states are actually the same from this perspective. For example, consider the following two states: (100, 200, 300) and (0, 100, 200), their standard deviations are both 81.6. The two states can be considered to be consistent, which means that the action selected in the two states should be the same. So the relative state set, which is obtained by subtracting the smallest element from all the elements in the original state set, is used in in the training process and can reduce a lot of redundant states. Of course, a real load state must be maintained in the system.

Stagewise Training. In modern storage systems, the number of objects is very large, and the number of the virtual nodes is also large although grouped by hash. And the number of steps in an epoch training is equal to the number of virtual nodes. In practical training, if a small sample is selected, the training result is not ideal; if a large sample is selected, the training time is very long. To this end, as shown in Fig. 1, a segmented training method is adopted. Firstly, we use a larger number of samples (n in the figure) as the training data sets, and randomly divide into many small samples (m). Assuming n=k*m+b, the large samples can be divided into k+1 small samples in total. By default, k is set to 10, m and b can be calculated by the number of large samples n. Each sample maintains a TFSM. As shown in Fig. 3, a small sample with m size is trained from the initialization state of TFSM, and the trained model became the base model. The state changes from S0 to S1, and then based on state S1, base model will directly be test in the test state of TFSM in the second small sample. If the test result is qualified, it’s turn to the next sample stage, otherwise base model will be retrained in the training state of TFSM with the the second small sample. And so on, until the final sample test is completed.

W1，Wn，Bn的前n维度也用旧模型的参数初始化，W1新增维度置零，Wn，Bn新增维度的初始化随机。通过这种方式，新模型的训练速度大大增加，甚至提升数百上千倍。

Suppose the number of data nodes increases to n ‘. This causes the shapes of three parameter arrays to change: W1 (from [dx, dy] to [dx, ˆdy]), B1 (from [dy] to [ ˆdy]), and W2 (from [dy, dz] to [ ˆdy, dz]). In this case we initialize the new variables in the first layer as:

Model fine-tuning. The increase of data nodes makes training more difficult in two aspects: 1) More data nodes leads to longer training time; 2) As data nodes increase, the dimensions of state, action space and neural network increase, making the original model unavailable. As mentioned earlier, each change in the number of nodes causes retraining of Placement Agents, affecting system recovery and performance. Model fine-tuning is adopted, which can update the neural network structure based on the old model to reduce the training time when the number of nodes increases. Specifically, when the dimension of the neural network increases, the parameters of the old model will be used as the initial parameters of the lower dimension of the new model, and the new dimension will be zeroed or randomly initialized. Take the neural network ($K$ hidden layers) in Fig. \ref{RL_NN} as an example, only the dimensions of $W_1$, $W_n$ and $B_n$ are related to the number of data nodes $n$, so after nodes are added, all parameters except them are initialized with the parameters of the old model. Suppose the number of data nodes increases to $n’$. This causes the shapes of three parameter arrays to change: $W_1$ (from [$n,h1$] to [$n’,h1$]), $W_n$ (from [$h_k, n$] to [$h_k, n’$]) and $B_n$ (from [$n$] to [$n’$]). In this case, the new variables are initialized as:

Where R() indicates a random initialization. The first n dimensions of W1, Wn and Bn are initialized with the parameters of the old model. The new dimensions initialization of W_n’ and B_n’ will be randomized, which ensures that symmetry is broken among the new dimensions. The new dimensions of W_1’ are initialized to zero, which ensure that the newly added variables do not affect the output of the full connection layer, and the subsequent training will not be affected if the weight of only part is reset to zero. In this way, the training speed of the new model will be greatly increased, or even increased by hundreds or thousands of times.

A simple example here would be adding more units to an internal fully-connected layer of the model.As shown in the following example, suppose that before the change, some part of the interior of the model contained an input vector x (dimension dx), which is transformed to an activation vector y = W1x+B1 (dimension dy), which is then consumed by another fully-connected layer z = W2y + B2 (dimension dz).If we need to increase the dimension of of y from dy to ˆdy, this causes the shapes of three parameter arrays to change: W1 (from [dx, dy] to [dx, ˆdy]), B1 (from [dy] to [ ˆdy]), and W2 (from [dy, dz] to [ ˆdy, dz]). In this case we initialize the new variables in the first layer as: this will cause the dimensions of W1, B1, and W2 to change: W1=[W1 R()], B1= [B1 R()] W2= [W2, 0] where R() refers to random initialization. After this operation, it can be found that the first dy weights of W1’ B1’ are exactly the same as W1 and W2. Random initialization ensures that sufficient gradient information can be obtained when the newly added weight parameters are optimized. Meanwhile, the newly added weight parameters of W2’ are zeroed to ensure that the newly added variables of the previously activated parts do not affect the output of the full connection layer, and the subsequent training will not be affected if only part of the weight is reset to zero.

• （异构场景）

In a heterogeneous storage environment, storage nodes are different in terms of capacity, performance, network, etc. Therefore, a distribution algorithm needs to take these factors into account to ensure fairness of distribution and high performance of system. Four metrics of nodes are used to measure the node states, namely the capacity , the CPU utilization, disk I/O access rate and network resources. The capacity is still expressed by relative weight (Weight for short), which reflects the load space status of nodes. The CPU utilization (CPU) represents the ratio of the time that the CPU executes the process to the total running time of the CPU. The lower the CPU usage is, the smaller the CPU load is and the stronger the CPU load capacity is. Disk I/O access rate (IO) which reflects the capability of disk read/write is an important indicator for the replica placement. The network utilization (Net) is the percentage of the bandwidth used by the data nodes for running the system to the total bandwidth. All Metrics are taken from the storage system by the Metrics Collector. For example, to measure the network utilization, it collects rxkb/s and txkb/s, which is the amount of network data (kilobytes) received and transmitted from a storage node, and divides these values by the overall network bandwidth to get the network utilization. We use the four tuples Ti = (Net, IO, CPU, Weight) I to represent the state of the data node DNi. Therefore, in a heterogeneous environment, the state space can be defined as:

\textbf{State:} At time step $t$, $S_t$ can be defined by the the list of the four tuples of all data nodes: $S_t={\tau_0^t, \tau_1^t, …, \tau_n^t}$, where $\tau_i^t = (Net, IO, CPU, Weight)_i^t$.

\textbf{Placement Model.} We use a sequence-to-sequence model with LSTM\cite{lstm} and a content-based Attention mechanism\cite{attention} instead of MLP in Placement Agent to predict the replicas placements. As show in Fig. \ref{attention}, the overall model architecture is formed by an encoder-decoder design based on stacked LSTM cells.

The inputs to this model are the previously defined state set, which are stored as tunable embedding vectors. The sequence model LSTM can handle a variety of data nodes. And LSTM can obtain the hidden layer state of each data node in the process of calculation. The decoder is an attentional LSTM model which has the same number of decoding steps as the input sequence. At each step, the decoder outputs the Q-value with the action (DN), which is the same as the MLP in Placement Agent. Attention mechanism calculates alignment scores between the previous decoder hidden state and each of the encoder’s hidden states and store scores as the alignment vector. And the encoder hidden states and their respective alignment scores are multiplied to form the context vector.

Of course, in actual applications, there may be more factors affecting system performance, and the object access patterns are also different. Our heterogeneous environment is relatively simple, and more detailed and complex cases will be considered in future work.

## Implementation

RLRP是在Park \cite{Park}平台上实现的，这是一个面向学习增强的计算机系统的开放平台。 RLRP的所有代码都是用c++和python编写的，并基于张量流实现了强化学习模型 。此外，RLRP被成功打包到实际的分布式存储系统Ceph (v12.2.13) \cite{Ceph}中。 如图. \ref{Ceph}所示，Mertics收集器和动作控制器通过Ceph Monitor与Ceph交互。 Mertics Collector通过使用Linux SAR (system Activity Report) \cite{SAR}实用程序，每隔30秒从Ceph osd获取系统Mertics。 Action Controller调用Ceph监视器来实现RL Agent所做的放置/迁移操作，并更新Ceph集群的OSDmap。 RLRP以插件的形式实现，并保留了Ceph原有的体系结构和其他流程。

RLRP is implemented on Park \cite{park} platform, which is an open platform for learning-augmented computer systems. All the codes in RLRP are written in C++ and python, and the Reinforcement Learning model is implemented based on the Tensorflow In addition, RLRP is successfully packaged into the actual distributed storage systems, Ceph (v12.2.13) \cite{ceph}. As shown in Fig. \ref{Ceph}, the Mertics Collector and Action Controller interact with Ceph through the Ceph Monitor. Mertics Collector fetches the system mertics from the Ceph OSDs at 30 second intervals by using the Linux SAR (System Activity Report) \cite{sar} utility. The Action Controller invokes the Ceph monitor to implement the placement/migration actions made by the RL Agent and update the OSDmap of Ceph cluster. RLRP is implemented as plug-ins, and retains the original architecture and other processes of Ceph.

## Evaluation

1）基于现代分布式存储系统分布算法的标准，RLRP表现如何，和其他算法相比优势是什么？

2）RLRP训练速度如何，训练效果如何？

3）在异构的存储系统中，RLRP优势是什么？

4）在真实的系统和真实负载下，RLRP表现如何？

1) How does RLRP perform based on evaluation criteria of data placement schemes in modern distributed storage systems, and what are its advantages compared with other schemes ?

2) What are the effects of the optimization measures of RLRP on the training with a large number of objects and nodes?

3) What are the advantages of RLRP in heterogeneous storage systems?

4) How does RLRP perform on real-world systems with real workloads?

The experiments are carried out in the simulated system environment and the real system environment respectively. The simulation system environment is built on \textbf{DaDiSi} \cite{random,dadisi}, an API for creating and testing data distribution policies in a (simulated) storage environment. DaDiSi is a client-server architecture, and the clinet distributes real-word workload data to each server (Data Node).The experiments in the real system environment is the performance evaluation of RLRP with the rados_bench \cite{rabench} workload in Ceph. In addition, all the experiments are conducted on a local 9-node cluster, one of which is configured as the client node in simulated system environment, and Ceph Admin and Ceph monitor in real system environment. The remaining eight nodes are configured as server nodes (or OSDs in Ceph). Three of them are equipped with an Intel Skylake Xeon Processor (2.40 GHz), 54 GB of memory, and an Intel DC NVMe SSD (P4510, 2 TB); others have Intel Xeon Processor E5-2690 (2.60 GHz) 8-core processors, 16 GB memory and Samsung SATA SSDs (PM883, 3.84 TB) . The operating system running on all nodes is Centos 7 with the kernel of 64-bit Linux 3.10.0.

Based on DaDiSi, we constructe simulation homogeneous and heterogeneous environments. In the simulation homogeneous environment, all data nodes are created on one server, and the data nodes have the same configuration except for the capacity. In the simulation heterogeneous environment, each server is configured as one data node.

In the simulation homogeneous environment, all data nodes are created on a server, and the configuration of data nodes is the same except for the capacity.

We compare RLRP with other five state-of-the-art schemes. They are 1) Consistent hash.

2) Crush. 3) Random slicing. 4) Kinesis. 5) m

Consistent hash, Crush and Random slicing的实现基于DaDiSI提供的API，Kinesis和m实现基于论文的源码。

In addition, PLRP-pa indicates that RLRP adopts the Placement Agent, PLRP-ma indicates adopting Migration Agent. RLRP-epa and RLRP-ema indicate respectively adopting Placement Agent and Migration Agent , which use the placement model for heterogeneous environment.

In the heterogeneous setting, DaDiSi puts a different number of 1T hard disks to each data node to simulate the capacity. The experiments begin with 100 same data nodes (10 disks per node, 10TB). 100 new data nodes are added in second group, each with capacities varing from 10TB to 15TB. And 100 nodes with 10-20TB capacities are added in the third group, and so on. There are five group experiments, ranging from 100 nodes to 500 nodes. The object size is 1MB, and 10000~10^8 objects are allocated for data nodes in each group. In addition, the default number of replicas is 3 and can be set by different values. We use a tuple (Ndata, Nobj, Nk) to represent the number of data nodes, objects, and copies in a set of experiments, and x to represent variables

The experimental results are as follows:

• 公平性

The distributional fairness is measure with the standard deviation and overprovisioning percentage. For instance, an oversubscription of 10% means that the maximum number of objects is 10% higher than the average.

Figure?? And figure?? show the standard deviation and P under (x, 10^6, 3). It can be seen that RLRP-PA has the most fair data distribution. The standard deviation is more than 50% lower than other schemes and does not increase with the increase of data nodes. The P increase slightly, but it is also below 3%. The pseudo-hash based schemes, Crush, Random Slicing, and Kinesis also perform well and P is 1%~4%. The hash functions of different segments are quite different, which causes the p of Kinesis to fluctuate greatly. Consistent Hashing is mediocre, while DMORP is the worst, as shown on the right, far worse than other schemes . The figure shows the changes of P values under different number of objects and different number of copies. It can be seen that RLRP-PA is very stable with p around 2%, and performs best in all cases with small fluctuation. Crush, random slicing, and kinesis have general performance under small data samples, with a P value of 25% to 30%. As the amount of data or the number of copies increases, the P value decreases to close to RLRP-pa. While Consistent Hashingp value is between 5% and 20%, which performs good under small data volume. DMORP is still the worst, with p-values higher than 50% in any case.

• 时间空间有效性

Time-Space Efficiency} is measure with the memory used in each configuration as well as the performance of each request. Fig. \ref{mtest} shows the average allocated memory over the different tests.

RLRP的内存消耗由agent模型（所有参数）和映射表组成，模型随着节点数（状态空间）增加而增加。但是模型并不是很大，比如100个节点约是2.4M，500节点约是12M。另外由于虚拟节点的设定，映射表比较小，10^6个对象下约为539K。总的来说，和Crush, Random Slicing, and Kinesis 这种去中心化的相比，内存用量会更多，但总体还是很少。 Crush and Kinesis内存消耗非常少，且不受节点数影响，大致为4M左右。Random Slicing当节点增加时需要维护一个表。一致性哈希的内存消耗取决于数据节点的数量，约为40~250M。 DMORP 需要维护额外的信息用于genetic algorithm，内存占用远远大于其他算法且随着数据节点增加而增加，约为1~10G。

The memory consumption of RLRP consists of the agent model (all parameters) and the mapping table. The model increases as the number of nodes (state space) increases. But the model is not very large, for example, the size in 100 nodes is about 2.4m, and about 12M for 500 nodes. In addition, due to the setting of virtual nodes, the mapping table is relatively small, about 539K for 10^6 objects. Decentralized Crush and Kinesis consumes very little memory and is not affected by the number of nodes, which is about 4M. Random Slicing needs keep a small table with information about previous storage system insert and remove operations when the nodes increase, and occupies 4$\thicksim$70M memory. The memory consumption of Consistent Hashing depends on the number of data nodes and is about 40~250M. DMORP needs to maintain additional information for genetic algorithm, which occupies much more memory than other algorithms and increases with the increase of data nodes, about 1-10G.

Consistent Hashing 和 Random Slicing表现最好，只需要5us的时间。RLRP需要大约10us的时间，通过查表。DMORP 和CRUSH都是通过计算的方式，需要20~25us。而Kinesis 需要分段且分段数量随着节点数增加而增加的，每个段的计算方式也不一样，50~160us

• 自适应性

In DaDiSi, a hard disk capacity of 1 TByte and putting 16 hard disks in each shelf means that each data item has a size of 2 MByte. T

In the heterogeneous setting we begin with 100 data nodes, which

• Criteria Evaluation

Redundancy: 判断各种算法

High Performance：

1）同构环境中，公平性、适应性、冗余性、内存使用率、查找速率这五个指标来衡量

pool+表

%统计了Stagewise Training和使用大小样本的训练时所用的训练时间和训练效果（R，模型误差），

It can be seen that training with small samples has high error, while training with large samples takes longer time. However, the segmented training with large sample has less error and the training time is almost the same as that with small sample.

The figure shows the training time under the Model fine-tuning mode and the normal training mode under 10~200 data nodes. You can clearly see the advantages of the model. For example, under 20 data nodes, in order to achieve a good training effect , The unoptimized training time is 12247s, while the model only needs 200s, the speed is increased by 98%, and the improvement is more obvious with the increase of data nodes.

2）异构环境中，会更关注性能

https://github.com/intel-cloud/cosbench

• Ceph

• 真实数据

RL学习在存储系统中的应用

## 讨论

• 2021.7.6
• 通用的数据分布的问题基本没什么研究的，问题不够新颖，应该聚焦于具体的未提出未发觉的问题上
• 投稿Transactions on Storage，A类期刊，七月底完成一版
1. 问题不明确：背景中需要增加其他分布方案的问题？

• 类似于 Random Slicing: Efficient and Scalable Data Placement for Large-scale Storage Systems

2. 使用RL的动机不纯
• 将分布问题建模，动态分布问题可以看作组合优化问题，RL在解决组合优化问题中发挥出色
3. 迁移方案太过简单，很多因素需要考虑？
4. 方案中增加公式补充？
5. 优化表述不清楚，要结合图详细解释？
6. 测试规模小，osd目前测试到80？至少要扩到100，1000：训练极其慢？
7. 异构考虑较为简单，但是很多动态的不用考虑，应该考虑异构中静态元素。动态元素应当负载均衡考虑？
8. 异构测试效果如何对比说明，需要增加别的方案的测试结果？
9. 测试太少，增加对比测试目标：一致性hash，table-based，crush等等
10. 增加真实系统，真实数据的对比测试
11. 测试补充
• 增加其他分布算法的测试
• consistent hash：crush：random slicing：table-based：
• 1000 osd的模型训练
• 放低要求，加速训练
• 封装模型到ceph，真实数据对比测试

Tags:

## Storage meets ai

less than 1 minute read

Published:

Storage Technologies Meets Artificial Intelligence: A Survey

## 自适应索引

less than 1 minute read

Published: