方圆并济:基于spark on angel 的高性能分布式机器学习€¦ · [spark on angel] lr...

Post on 28-Jul-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

方圆并济:基于 Spark on Angel 的高性能分布式机器学习

源起

腾讯的产品需求

SmallModel

d

Big Datan

d

d<<n

SparseBig Data

d

Big Model

d

d ≈ n

寻找满足十亿级维度的工业级的分布式机器学习平台

Executor

Driver

ModelExecutor

Executor

Executor

Executor

Executor

Driver

Model

Executor

Executor

Executor

Spark机器学习的瓶颈

One Issue

https://issues.apache.org/jira/browse/SPARK-6932

A Prototype of Parameter Server

2015

Glint & Yahoo

2016

理念

Worker

PS PS PS

Spark Worker Worker Worker Worker

Angel mutable

immutable

—— 方圆并济

Spark on Angel

核心抽象

MapperReducer RDD PSModel

RDD vs PSModel

RDD-1 RDD-2 RDD-3 RDD-4 RDD-5

PSModel

epoch-1 epoch-2 epoch-3 epoch-4 epoch-5

epoch……………………

RDD的核心抽象RDD

Partition-1

Partition-2

Partition-3

Partition-4

Partition-n

Compute Func

…………………

Dependencies

NodeMemory Node Disk

MemoryBlock -n

DiskBlock -n

Preferred locationsPartitioners

RDD

RDD

…………………

(Transformation or Action)

PSModel的核心抽象

PSModelM

pull

ΔM

push

Shard

PSServer

MatrixContext

Sync

PSPartitioner

Partition1

Partition2

Partition-……

Partition3

PSClient

Clock

Spark on Angel的架构

PSAgent PSAgent

SPARKRDD ……………………

Parameter Server Shard

PSServer

Shard

PSServer

PSAgent

Shard

PSServer

PSModel

Executor

TASK

TASK

TASK

PSModel

Executor

TASK

TASK

TASK

AngelContext

SparkDriver

……………………

PSAgentPSAgentPSAgent

Parameter Server

Model M pull ΔMpush

Shard

PSServer

Shard

PSServer

Shard

PSServer

Worker

psFuncModel PartitionersyncProtocol

PsClient

DataBlock

Task

PsClient

DataBlock

Task

•••

丰富的机器学习及数学计算库

•••

友好的用户编程接口

•••

工业级别可用的参数服务器

Angel和Glint的比较

PSPartitioner

Partition1

Partition2

Partition-……

Partition3

更丰富的模型切分 更灵活的异步模式 更强大的psFunc

Angel的定位

https://github.com/tencent/angel

Spark on Angel的开发

Angel的API设计

TrainTask

1. Start PS

2. Load Model

3.runTask

4.parse & preProcess

5.train

6.learn

HDFS

8.Save ModelHDFS

AngelClient

MLLearner

DataBlockLabledData

LabledData

LabledData

MLModel

7.push & pullPSModel

PSModel

PSModel

Model

PSServer

MLRunner

MLModelRDD

Spark on Angel的API设计

RDD2

RDD3

……

RDD1

Shard

PSServer

AngelClient

PSClient

AngelSpark on AngelSpark

SparkPSContext

PSModel

{ RDD_PS_Functions }

PSVector PSMartrix

BreezePSVector CachedPSVector

Spark on Angel的基础写法

••••

<<class>>BreezeVector

def round(t: T):Tdef dot(t: T):Tdef max(t: T):T

<<trait>>NumericOps[T]

def round(t: T):Tdef dot(t: T):Tdef max(t: T):T

<<class>>BreezePSVector

def round(t: T):Tdef dot(t: T):Tdef max(t: T):T

混入相同特征

PSAgent

进行透明替换

Angle PS

•••

Vector的透明替换

Executor

Task

BreezePSVector

BreezePSVector

BreezePSVector

PSClient

Angel的算法

Spark on AngelAvailable

LR on Angel

Pull parameters from PS

Push update value to PS

2.

PS PS PS PS

Worker Worker Worker

HDFS HDFS HDFS

0.

1.

[Spark on Angel] LR

[spark_on_angel_quick_start.md]

{BreezeOps}

wPS gradientPS

Angel

Spark sampleRDDmapPartitions

DenseVectorArray

优化方法

[Spark on Angel] LR with Optimizer

wPS statePS Angel

DenseVector

SparksampleRDD

mapPartitions

SGD OWLQN LBFGS

Breeze.optimizer

DiffFunction(BreezePSVector) : (Double, BreezePSVector)

[spark_on_angel_optimizer.md]

GBDT:树模型+Boosting

Age<30

Wage<10K

IsMale?Y

Y

YN

N

N

tree 1 tree 2

predict( ) 5+0.5=5.5

predict( ) 10+1.5=11.5

predict( ) 1+1.5=2.5

predict( ) 1+0.5=1.5

predict( ) 1+1.5=2.5

A

B

C

D

E

GBDT on Angel: 模型存储

feature value

feature ID

leaf prediction

PS1

feature value

feature ID

leaf prediction

PS2

feature value

feature ID

leaf prediction

PS3

grad histogram

hess histogram

GBDT on Angel(1):构建森林

PS1 PS2 PS3

Worker1 Worker2 Worker3

GBDT on Angel(2): 分裂树节点

find split feature & value

[gbdt_on_angel.md]

Angel

Spark

[Spark on Angel] GBDT

Instance RDD Gradient RDD Prediction RDDzip zip

InstanceLayout

PS

map

Grad Histogram

PS

SplitFeature

PS

SplitValue

PS

LeafWeight

PS

[spark_on_angel_gbdt.md]

(Spark on Angel)vs Spark —— LR

Angel vs XGBoost —— GBDT

Angel vs Spark —— LDA

Angel vs Spark —— GD-LR

Angel vs Spark —— ADMM-LR

Spark on Angel的特点

OpenSource & Perspective

Angel开源

• [GBDT] The purposes of using parameter server in GBDT #7

(PR 60)

学术创新

• 国际顶级会议Paper(CCF A类)

版本展望(What is Next)

V1.3 V1.5 V2.0

Q & A微博:@明风

喜欢记得给个Star噢 andymhuang@tenent.com

机器学习系统 & 算法工程师

We are Hiring

top related