SlideShare a Scribd company logo
1 of 32
Download to read offline
CAIN 2023
Prevalence of Code Smells in
Reinforcement Learning Projects
NicolĂĄs Cardozo, Ivana Dusparic, Christian Cabrera
Systems an Computing Engineering - Universidad de los Andes, BogotĂĄ - Colombia
Trinity College Dublin - Ireland
University of Cambridge - UK
n.cardozo@uniandes.edu.co, ivana.dusparic@tcd.ie, chc79@cam.ac.uk
@ncardoz
Reinforcement learning programming
2
RL libraries
Reinforcement learning programming
2
RL libraries
How about the quality?
3
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
How about the quality?
3
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Automatically
test
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Automatically
test
Ship/Deploy/Maintain
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Automatically
test
Ship/Deploy/Maintain
Design
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Automatically
test
Ship/Deploy/Maintain
Design
Code
Design
How about the quality?
3
Code
+ ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments)
+ markAssignments(String: assignment, marks: float[]) : void
- checkAssignment(String: assignment) : boolean
- section : int
- lecturer : String
- lectures : Date[]
- students : Student[]
- assignments : String[]
ISIS1226
- id: int
- mark: double
Assignment
- name: String
- marks: double[]
Student
- name: String
Lecturer
Automatically
test
Ship/Deploy/Maintain
Design
Code Test?
Ship/Deploy/Maintain
Design
Process
4
MongoDB
Node.js
Backend
GitHub
API
Backend
Process
4
Filter
MongoDB
Node.js
Backend
PY
PY
PY
PY
GitHub
API
Q-learning
Python projects
Backend
Process
4
Filter
MongoDB
Node.js
Backend
PY
PY
PY
PY
GitHub
API
Q-learning
Python projects
Backend
20 most popular repositories
4 reference project implementations
Process
4
Filter
MongoDB
Node.js
Backend
PY
PY
PY
PY
Sta$c analysis
GitHub
API
Q-learning
Python projects
Static analysis
Backend
20 most popular repositories
4 reference project implementations
Process
4
Filter
MongoDB
Node.js
Backend
PY
PY
PY
PY
Sta$c analysis
GitHub
API
Q-learning
Python projects
Static analysis
LM LC LPL LMC
LSC LTCE MNC LLF
Metrics
Backend
20 most popular repositories
4 reference project implementations
Metrics
5
Code smell MĂ©tric Threshold
Long Method
(LM)
Function LOC 38
Long Class
(LC)
Class LOC 29
Long Parameter List
(LPL)
Number of parameters 5
Long Method Chain
(LMC)
Length of message chain 5
Long Scope Chaining
(LSC)
Depth of closure 3
Long Ternary Conditional Expression
(LTCE)
Number of characters 54
Multiply-Nested Container
(MNC)
Depth of nested container 3
Long Lambda Function
(LLF)
Number of characters 48
Metrics
6
def update(self, msg_env):
'''
Update the state of the agent
:param msg_env: dict. A message generated by the order matching
'''
# check if should update, if it is not a trade
# if not isinstance(msg_env, type(None)):
if not msg_env:
if not self.should_update():
return None
# recover basic infos
inputs = self.env.sense(self)
state = self.env.agent_states[self]
s_cmm = self.env.s_main_intrument
# Update state (position ,volume and if has an order in bid or ask)
self.state = self.get_intern_state(inputs, state)
# Select action according to the agent's policy
s_action = None
s_action, l_msg = self.take_action(self.state, msg_env)
s_action2 = s_action
# Execute action and get reward
reward = 0.
self.env.update_order_book(l_msg)
l_prices_to_print = []
if len(l_msg) == 0:
reward += self.env.act(self, None)
self.b_new_reward = False
for msg in l_msg:
if msg['agent_id'] == self.i_id:
# check if should hedge the position
self.should_change_stoptime(msg)
# form log message
s_action = msg['action']
s_action2 = s_action
s_side_msg = msg['order_side'].split()[0]
s_indic = msg['agressor_indicator']
s_cmm = msg['instrumento_symbol']
d_aux = {'A': msg['order_status'],
# log just the last 4 digits of the order
'I': msg['order_id'] % 10**4,
'Q': msg['order_qty'],
'C': msg['instrumento_symbol'],
'S': s_side_msg,
'P': '{:0.2f}'.format(msg['order_price'])}
l_prices_to_print.append(d_aux)
#
l_prices_to_print.append('{:0.2f}'.format(msg['order_price']))
if s_indic == 'Agressor' and s_action == 'SELL':
s_action2 = 'HIT' # hit the bid
elif s_indic == 'Agressor' and s_action == 'BUY':
s_action2 = 'TAKE' # take the offer
try:
# the agent's positions and orders list are update here
# TODO: The reward really should be collect at this
point?
reward += self.env.act(self, msg)
self.b_new_reward = False
except:
print 'BasicAgent.update(): Message with error at
reward:'
pprint.pprint(msg)
raise
# check if should cancel any order due to excess
l_msg1 = self.could_include_new(s_action)
self.env.update_order_book(l_msg1)
for msg in l_msg1:
if msg['agent_id'] == self.i_id:
s_indic = msg['agressor_indicator']
d_aux = {'A': msg['order_status'],
'I': msg['order_id'],
'C': msg['instrumento_symbol'],
'S': msg['order_side'].split()[0],
'P': '{:0.2f}'.format(msg['order_price'])}
l_prices_to_print.append(d_aux)
try:
# the agent's positions and orders list are update here
# there is no meaning in colecting reward here
self.env.act(self, msg)
except:
print 'BasicAgent.update(): Message with error at
reward:'
pprint.pprint(msg)
raise
# === DEBUG ====
# if len(l_msg1) > 0:
# print 'n====CANCEL ORDER DUE TO EXCESS======n'
# pprint.pprint(l_msg1)
# ==============
# NOTE: I am not sure about that, but at least makes sense... I guess
# I should have to apply the reward to the action that has generated
# the trade (when my order was hit, I was in the book before)
if s_action2 == s_action:
if s_action == 'BUY':
s_action = 'BEST_BID'
elif s_action == 'SELL':
s_action = 'BEST_OFFER'
if s_action in ['correction_by_trade', 'crossed_prices']:
if s_side_msg == 'Buy':
s_action = 'BEST_BID'
elif s_side_msg == 'Sell':
s_action = 'BEST_OFFER'
# Learn policy based on state, action, reward
if s_cmm == self.env.s_main_intrument:
if self.policy_update(self.state, s_action, reward):
self.k_steps += 1
self.n_steps += 1
# print 'new step: {}n'.format(self.n_steps)
# calculate the next time that the agent will react
if not isinstance(msg_env, type(dict)):
self.next_time = self.env.order_matching.last_date
f_delta_time = self.f_min_time
# add additional miliseconds to the next_time to act
if self.f_min_time > 0.004:
if np.random.rand() > 0.4:
i_mult = 1
if np.random.rand() < 0.5:
i_mult = -1
f_add = min(1., self.f_min_time*100)
f_add *= np.random.rand()
f_delta_time += (int(np.ceil(f_add))*i_mult)/1000.
self.next_time += f_delta_time
self.last_delta_time = int(f_delta_time * 1000)
# print agent inputs
self._pnl_information_update()
self.log_step(state, inputs, s_action2, l_prices_to_print, reward)
Long method
Metrics
6
def update(self, msg_env):
'''
Update the state of the agent
:param msg_env: dict. A message generated by the order matching
'''
# check if should update, if it is not a trade
# if not isinstance(msg_env, type(None)):
if not msg_env:
if not self.should_update():
return None
# recover basic infos
inputs = self.env.sense(self)
state = self.env.agent_states[self]
s_cmm = self.env.s_main_intrument
# Update state (position ,volume and if has an order in bid or ask)
self.state = self.get_intern_state(inputs, state)
# Select action according to the agent's policy
s_action = None
s_action, l_msg = self.take_action(self.state, msg_env)
s_action2 = s_action
# Execute action and get reward
reward = 0.
self.env.update_order_book(l_msg)
l_prices_to_print = []
if len(l_msg) == 0:
reward += self.env.act(self, None)
self.b_new_reward = False
for msg in l_msg:
if msg['agent_id'] == self.i_id:
# check if should hedge the position
self.should_change_stoptime(msg)
# form log message
s_action = msg['action']
s_action2 = s_action
s_side_msg = msg['order_side'].split()[0]
s_indic = msg['agressor_indicator']
s_cmm = msg['instrumento_symbol']
d_aux = {'A': msg['order_status'],
# log just the last 4 digits of the order
'I': msg['order_id'] % 10**4,
'Q': msg['order_qty'],
'C': msg['instrumento_symbol'],
'S': s_side_msg,
'P': '{:0.2f}'.format(msg['order_price'])}
l_prices_to_print.append(d_aux)
#
l_prices_to_print.append('{:0.2f}'.format(msg['order_price']))
if s_indic == 'Agressor' and s_action == 'SELL':
s_action2 = 'HIT' # hit the bid
elif s_indic == 'Agressor' and s_action == 'BUY':
s_action2 = 'TAKE' # take the offer
try:
# the agent's positions and orders list are update here
# TODO: The reward really should be collect at this
point?
reward += self.env.act(self, msg)
self.b_new_reward = False
except:
print 'BasicAgent.update(): Message with error at
reward:'
pprint.pprint(msg)
raise
# check if should cancel any order due to excess
l_msg1 = self.could_include_new(s_action)
self.env.update_order_book(l_msg1)
for msg in l_msg1:
if msg['agent_id'] == self.i_id:
s_indic = msg['agressor_indicator']
d_aux = {'A': msg['order_status'],
'I': msg['order_id'],
'C': msg['instrumento_symbol'],
'S': msg['order_side'].split()[0],
'P': '{:0.2f}'.format(msg['order_price'])}
l_prices_to_print.append(d_aux)
try:
# the agent's positions and orders list are update here
# there is no meaning in colecting reward here
self.env.act(self, msg)
except:
print 'BasicAgent.update(): Message with error at
reward:'
pprint.pprint(msg)
raise
# === DEBUG ====
# if len(l_msg1) > 0:
# print 'n====CANCEL ORDER DUE TO EXCESS======n'
# pprint.pprint(l_msg1)
# ==============
# NOTE: I am not sure about that, but at least makes sense... I guess
# I should have to apply the reward to the action that has generated
# the trade (when my order was hit, I was in the book before)
if s_action2 == s_action:
if s_action == 'BUY':
s_action = 'BEST_BID'
elif s_action == 'SELL':
s_action = 'BEST_OFFER'
if s_action in ['correction_by_trade', 'crossed_prices']:
if s_side_msg == 'Buy':
s_action = 'BEST_BID'
elif s_side_msg == 'Sell':
s_action = 'BEST_OFFER'
# Learn policy based on state, action, reward
if s_cmm == self.env.s_main_intrument:
if self.policy_update(self.state, s_action, reward):
self.k_steps += 1
self.n_steps += 1
# print 'new step: {}n'.format(self.n_steps)
# calculate the next time that the agent will react
if not isinstance(msg_env, type(dict)):
self.next_time = self.env.order_matching.last_date
f_delta_time = self.f_min_time
# add additional miliseconds to the next_time to act
if self.f_min_time > 0.004:
if np.random.rand() > 0.4:
i_mult = 1
if np.random.rand() < 0.5:
i_mult = -1
f_add = min(1., self.f_min_time*100)
f_add *= np.random.rand()
f_delta_time += (int(np.ceil(f_add))*i_mult)/1000.
self.next_time += f_delta_time
self.last_delta_time = int(f_delta_time * 1000)
# print agent inputs
self._pnl_information_update()
self.log_step(state, inputs, s_action2, l_prices_to_print, reward)
Long method
Long class
class
QLearningAgent(BasicAgent):
'''
A representation of an agent
that learns using Q-learning
with linear
parametrization and e-greedy
exploration described at p.60 ~
p.61 form
Busoniu at al., 2010. The
approximator used is the
implementation of tile
coding, described at Sutton
and Barto, 2016 (draft).
'''
actions_to_open = [None,
'BEST_BID', 'BEST_OFFER',
'BEST_BOTH']
actions_to_close_when_short
= [None, 'BEST_BID']
actions_to_close_when_long
= [None, 'BEST_OFFER']
actions_to_stop_when_short
= [None, 'BEST_BID', 'BUY']
actions_to_stop_when_long
= [None, 'BEST_OFFER',
'SELL']
FROZEN_POLICY = False
def __init__(self, env, i_id,
d_normalizers, d_o
fi
_scale,
f_min_time=3600.,
f_gamma=0.5,
f_alpha=0.5, i_numOfTilings=16,
s_decay_fun=None,
f_ttoupdate=5.,
d_initial_pos={},
s_hedging_on='DI1F19',
b_hedging=True,
b_keep_pos=True):
'''
Initialize a
QLearningAgent. Save all
parameters as attributes
:param env: Environment
Object. The Environment where
the agent acts
:param i_id: integer. Agent
id
:param d_normalizers:
dictionary. The maximum range
of each feature
:param f_min_time*:
fl
oat.
Minimum time in seconds to the
agent react
:param f_gamma*:
fl
oat.
weight of delayed versus
immediate rewards
:param f_alpha*: the initial
learning rate used
:param i_numOfTilings*:
unmber of tiling desired
:param s_decay_fun*:
string. The exploration factor
decay function
:param f_ttoupdate*.
fl
oat.
time in seconds to choose a
diferent action
'''
f_aux = f_ttoupdate
super(QLearningAgent,
self).__init__(env, i_id,
f_min_time, f_aux,
d_initial_pos=d_initial_pos)
self.learning = True # this
agent is expected to learn
self.decayfun =
s_decay_fun
# Initialize any additional
variables here
self.max_pos = 100.
self.max_disclosed_pos =
10.
self.orders_lim = 4
self.order_size = 5
self.s_agent_name =
'QLearningAgent'
# control hedging
obj_aux =
risk_model.GreedyHedgeModel
self.s_hedging_on =
s_hedging_on
self.risk_model =
obj_aux(env,
s_instrument=s_hedging_on,
s_fairness='closeout')
self.last_spread = [0.0, 0.0]
self.f_spread = [0.0, 0.0]
self.f_gamma = f_gamma
self.f_alpha = f_alpha
self.f_epsilon = 1.0
self.b_hedging =
b_hedging
self.current_open_price =
None
self.current_max_price =
-9999.
self.current_min_price =
9999.
self.b_keep_pos =
b_keep_pos
# Initialize any additional
variables here
self.f_time_to_buy = 0.
self.f_time_to_sell = 0.
self.b_print_always = False
self.d_normalizers =
d_normalizers
self.d_o
fi
_scale =
d_o
fi
_scale
self.numOfTilings =
i_numOfTilings
self.alpha = f_alpha
i_nTiling = i_numOfTilings
value_fun =
ValueFunction(f_alpha,
d_normalizers, i_nTiling)
self.value_function =
value_fun
self.old_state = None
self.last_action = None
self.last_reward = None
self.disclosed_position = {}
self.f_stop_time =
STOP_MKT_TIME - 1 + 1
# self.features_names =
['position', 'o
fi
_new',
'spread_longo',
#
'ratio_longo', 'ratio_curto',
#
'size_bid_longo',
'size_bid_curto',
#
'spread_curto', 'high_low',
'rel_price']
self.features_names =
['position', 'o
fi
_new',
'ratio_longo',
'ratio_curto',
'spread_longo',
'rel_price']
def
reset_additional_variables(self,
testing):
'''
Reset the state and the
agent's memory about its
positions
:param testing: boolean. If
should freeze policy
'''
self.risk_model.reset()
self.f_time_to_buy = 0.
self.f_time_to_sell = 0.
self.last_reward = None
self.current_open_price =
None
self.current_max_price =
-9999.
self.current_min_price =
9999.
self.spread_position = {}
self.disclosed_position = {}
self.env.reward_fun.reset()
for s_instr in
self.env.l_instrument:
self.disclosed_position[s_instr]
= {'qAsk': 0.,
'Ask': 0.,
'qBid': 0.,
'Bid': 0.}
if testing:
self.freeze_policy()
def
additional_actions_when_exec(
self, s_instr, s_side, msg):
'''
Execute additional action
when execute a trade
:param s_instr: string.
:param s_side: string.
:param msg: dictionary.
Last trade message
'''
# check if the main
intrument was traded
s_main =
self.env.s_main_intrument
if
msg['instrumento_symbol'] ==
s_main:
self.b_has_traded = True
# check if it open or close
a pos
f_pos =
self.position[s_instr]['qBid']
f_pos -=
self.position[s_instr]['qAsk']
b_zeroout_buy = f_pos ==
0 and s_side == 'ASK'
b_zeroout_sell = f_pos ==
0 and s_side == 'BID'
b_new_buy = f_pos > 0
and s_side == 'BID'
b_new_sell = f_pos < 0
and s_side == 'ASK'
b_close_buy = f_pos > 0
and s_side == 'ASK'
b_close_sell = f_pos < 0
and s_side == 'BID'
s_other_side = 'BID'
# set the time to open
position if it just close it
if b_close_buy or
b_zeroout_buy:
self.f_time_to_buy =
self.env.order_matching.f_time
+ 60.
elif b_close_sell or
b_zeroout_sell:
self.f_time_to_sell =
self.env.order_matching.f_time
+ 60.
# print when executed
f_pnl = self.log_info['pnl']
s_time =
self.env.order_matching.s_time
s_err = '{}: {} - current
position {:0.2f}, PnL: {:0.2f}n'
print s_err.format(s_time,
s_instr, f_pos, f_pnl)
# keep a list of the opened
positions
if s_side == 'BID':
s_other_side = 'ASK'
if b_zeroout_buy or
b_zeroout_sell:
self.current_open_price
= None # update by risk model
self.current_max_price =
-9999.
self.current_min_price =
9999.
self.d_trades[s_instr]
[s_side] = []
self.d_trades[s_instr]
[s_other_side] = []
elif b_new_buy or
b_new_sell:
if b_new_buy:
self.risk_model.price_stop_sell
= None
if b_new_sell:
self.risk_model.price_stop_buy
= None
# log more information
l_info_to_hold =
[msg['order_price'],
msg['order_qty'], None]
if 'last_inputs' in
self.log_info:
l_info_to_hold[2] =
self.log_info['last_inputs']['TOB']
self.d_trades[s_instr]
[s_side].append(l_info_to_hold)
self.d_trades[s_instr]
[s_other_side] = []
elif b_close_buy or
b_close_sell:
f_qty_to_match =
msg['order_qty']
l_aux = []
for f_price, f_qty, d_tob
in self.d_trades[s_instr]
[s_other_side]:
if f_qty_to_match ==
0:
l_aux.append([f_price, f_qty,
d_tob])
elif f_qty <=
f_qty_to_match:
f_qty_to_match -=
f_qty
elif f_qty >
f_qty_to_match:
f_qty -=
f_qty_to_match
f_qty_to_match = 0
l_aux.append([f_price, f_qty,
d_tob])
self.d_trades[s_instr]
[s_other_side] = l_aux
if abs(f_qty_to_match) >
0:
l_info_to_hold =
[msg['order_price'],
f_qty_to_match, None]
if 'last_inputs' in
self.log_info:
l_info_to_hold[2] =
self.log_info['last_inputs']['TOB']
self.d_trades[s_instr]
[s_side].append(l_info_to_hold)
def need_to_hedge(self):
'''
Return if the agent need to
hedge position
'''
# ask risk model if should
hedge
if not self.b_hedging:
return False
if not self.b_keep_pos:
if
self.env.order_matching.last_da
te > self.f_stop_time:
if
abs(self.log_info['duration']) >
0.01:
self.b_need_hedge
= True
# print
'need_to_hedge(): HERE'
return
self.b_need_hedge
if
self.risk_model.should_stop_dis
closed(self):
return True
if
self.risk_model.should_hedge_o
pen_position(self):
# check if should hedge
position
if
abs(self.log_info['duration']) >
1.:
self.b_need_hedge =
True
return
self.b_need_hedge
return False
def
get_valid_actions_old(self):
'''
Return a list of valid
actions based on the current
position
'''
# b_stop = False
valid_actions =
list(self.actions_to_open)
if not
self.risk_model.can_open_positi
on('ASK', self):
valid_actions =
list(self.actions_to_close_when_
short) # copy
if
self.risk_model.should_stop_dis
closed(self):
# b_stop = True
valid_actions =
list(self.actions_to_stop_when_s
hort)
elif not
self.risk_model.can_open_positi
on('BID', self):
valid_actions =
list(self.actions_to_close_when_
long)
if
self.risk_model.should_stop_dis
closed(self):
# b_stop = True
valid_actions =
list(self.actions_to_stop_when_l
ong)
return valid_actions
def get_valid_actions(self):
'''
Return a list of valid
actions based on the current
position
'''
# b_stop = False
valid_actions =
list(self.actions_to_open)
return valid_actions
def get_intern_state(self,
inputs, state):
'''
Return a dcitionary
representing the intern state of
the agent
:param inputs: dictionary.
what the agent can sense from
env
:param state: dictionary.
the current state of the agent
'''
d_data = {}
s_main =
self.env.s_main_intrument
d_data['OFI'] =
inputs['qO
fi
']
d_data['qBID'] =
inputs['qBid']
d_data['BOOK_RATIO'] =
0.
d_data['LOG_RET'] =
inputs['logret']
d_rtn = {}
d_rtn['cluster'] = 0
d_rtn['Position'] =
fl
oat(state[s_main]['Position'])
d_rtn['best_bid'] =
state['best_bid']
d_rtn['best_o
ff
er'] =
state['best_o
ff
er']
# calculate the current
position in the main instrument
f_pos =
self.position[s_main]['qBid']
f_pos -=
self.position[s_main]['qAsk']
f_pos +=
self.disclosed_position[s_main]
['qBid']
f_pos -=
self.disclosed_position[s_main]
['qAsk']
# calculate the duration
exposure
# f_duration =
self.risk_model.portfolio_duratio
n(self.position)
# measure the OFI index
f_last_o
fi
= 0.
# if self.logged_action:
# compare with the last
data
# if 'to_delta' in
self.log_info:
# # measure the change
in OFI from he last sction taken
# for s_key in [s_main]:
# i_o
fi
_now =
inputs['OFI'][s_key]
# i_o
fi
_old =
self.log_info['to_delta']['OFI']
[s_key]
# f_aux = i_o
fi
_now -
i_o
fi
_old
# f_last_o
fi
+= f_aux
f_last_o
fi
= inputs['dOFI']
[s_main]
# for the list to be used as
features
fun = self.bound_values
s_lng =
self.env.s_main_intrument
s_crt = self.s_hedging_on
l_values = [fun(f_pos * 1.,
'position'),
fun(f_last_o
fi
,
'o
fi
_new', s_main),
fun(inputs['ratio']
[s_lng]['BID'], 'ratio_longo'),
fun(inputs['ratio']
[s_crt]['BID'], 'ratio_curto'),
fun(inputs['spread']
[s_lng], 'spread_longo'),
#
fun(inputs['spread'][s_crt],
'spread_curto'),
# fun(inputs['size']
[s_lng]['BID'], 'size_bid_longo'),
# fun(inputs['size']
[s_crt]['BID'], 'size_bid_curto'),
#
fun(inputs['HighLow'][s_lng],
'high_low'),
fun(inputs['reallAll']
[s_lng], 'rel_price')]
d_rtn['features'] =
dict(zip(self.features_names,
l_values))
return d_rtn
def bound_values(self,
f_value, s_feature_name,
s_cmm=None):
'''
Return the value bounded
by the maximum and minimum
values predicted.
Also apply nomalizations
functions if it is de
fi
ned and
d_normalizers,
in the FUN key.
:param f_value:
fl
oat. value
to be bounded
:param s_feature_name:
string. the name of the feature
in d_normalizers
:param s_cmm*: string.
Name of the instrument
'''
f_max =
self.d_normalizers[s_feature_na
me]['MAX']
f_min =
self.d_normalizers[s_feature_na
me]['MIN']
f_value2 = max(f_min,
f_value)
f_value2 = min(f_max,
f_value)
if 'FUN' in
self.d_normalizers[s_feature_na
me]:
if s_feature_name ==
'o
fi
_new':
f =
self.d_normalizers[s_feature_na
me]['FUN'](f_value, s_cmm)
f = max(f_min, f)
f = min(f_max, f)
f_value2 = f
else:
f_value2 =
self.d_normalizers[s_feature_na
me]['FUN'](f_value2)
return f_value2
def get_epsilon_k(self):
'''
Get $epsilon_k$
according to the exploration
schedule
'''
trial = self.env.count_trials
- 2 # ?
if self.decayfun ==
'tpower':
# e = a^t, where 0 < z <
1
# self.f_epsilon =
math.pow(0.9675, trial) # for
100 trials
self.f_epsilon =
math.pow(0.9333, trial) # for 50
trials
elif self.decayfun == 'trig':
# e = cos(at), where 0 <
z < 1
# self.f_epsilon =
math.cos(0.0168 * trial) # for
100 trials
self.f_epsilon =
math.cos(0.03457 * trial) # for
50 trials
else:
# self.f_epsilon =
max(0., 1. - (1./45. * trial)) # for
50 trials
self.f_epsilon = max(0.,
1. - (1./95. * trial)) # for 100
trials
return self.f_epsilon
def choose_an_action(self,
d_state, valid_actions):
'''
Return an action from a list
of allowed actions according to
the
agent's policy based on
epsilon greedy policy and
valueFunction
:param d_state: dictionary.
The inputs to be considered by
the agent
:param valid_actions: list.
List of the allowed actions
'''
# return a uniform random
action with prob $epsilon_k$
(exploration)
state_ = d_state['features']
best_Action =
random.choice(valid_actions)
if not
self.FROZEN_POLICY:
if np.random.binomial(1,
self.get_epsilon_k()) == 1:
return best_Action
# apply: arg max_{u'}
( phi^T (x_k, u') theta_k)
values = []
for action in valid_actions:
values.append(self.value_functi
on.value(state_, action, self))
# return
self.d_value_to_action[argmax(v
alues)]
return
valid_actions[argmax(values)]
def apply_policy(self, state,
action, reward):
'''
Learn policy based on
state, action, reward. The algo
part of "apply
action u_k" is in the update
method from agent frmk as the
update just
occur after one trial, state
and reward are at the next step.
Return
True if the policy was
updated
:param state: dictionary.
The current state of the agent
:param action: string. the
action selected at this time
:param reward: integer. the
rewards received due to the
action
'''
# check if there is some
state in cache
state_ = state['features']
valid_actions =
self.get_valid_actions()
if self.old_state and not
self.FROZEN_POLICY:
# TD Update
q_values_next = []
for act in valid_actions:
# for the vector: $
(phi^t (x_{k+1}, u') *
theta_k)_{u'}$
# state here plays the
role of next state x_{k+1}. act
are u's
f_value =
self.value_function.value(state_,
act, self)
q_values_next.append(f_value)
# Q-Value TD Target
# apply: Qhat <- r_{k+1}
+ y max_u' (phi^T(x_{k+1}, u')
theta_k)
# note that u' is the
result of apply u in x. u' is the
action that
# would maximize the
estimated Q-value for the state
x'
td_target =
self.last_reward + self.f_gamma
* np.max(q_values_next)
# Update the state value
function using our target
# apply: $theta_{k+1}
<- alpha_k (Q_ - Qhat)
theta(x_k, u_k)$
# the remain part of the
update is inside the method
learn
# use last_action here
because it generated the
curremt reward
self.value_function.learn(self.old
_state, self.last_action,
td_target, self)
# save current state, action
and reward to use in the next
run
self.old_state = state_ # in
the next run it is x_k <- x_{k+1}
self.last_action = action #
in the next run it is u_k
self.last_reward = reward
# in the next run it is r_{k+1}
if action in ['SELL', 'BUY']:
print '=',
return True
def set_qtable(self, s_fname,
b_freezy_policy=True):
'''
Set up the q-table to be
used in testing simulation and
freeze policy
:param s_fname: string.
Path to the qtable to be used
'''
# freeze policy if it is for
test and not for traning
if b_freezy_policy:
self.freeze_policy()
# load qtable and
transform in a dictionary
value_fun =
pickle.load(open(s_fname, 'r'))
self.value_function =
value_fun
# log
fi
le used
s_print = '{}.set_qtable():
Setting up the agent to use'
s_print =
s_print.format(self.s_agent_nam
e)
s_print += ' the Value
Function at {}'.format(s_fname)
# DEBUG
logging.info(s_print)
def stop_on_main(self, l_msg,
l_spread):
'''
Stop on the main
instrument
:param l_msg: list.
:param l_spread: list.
'''
s_main_action = ''
if
self.risk_model.should_stop_dis
closed(self):
if self.log_info['duration']
< 0.:
print '=',
s_main_action =
'SELL'
if self.log_info['duration']
> 0.:
print '=',
s_main_action =
'BUY'
if
self.env.order_matching.last_da
te > self.f_stop_time:
if not self.b_keep_pos:
if
self.log_info['duration'] < 0.:
print '>',
s_main_action =
'SELL'
if
self.log_info['duration'] > 0.:
print '>',
s_main_action =
'BUY'
# place orders in the best
price will be handle by the
spread
# in the next time the
agent updates its orders
# l_spread_main =
self._select_spread(self.state,
s_code)
if s_main_action in ['BUY',
'SELL']:
self.b_need_hedge =
False
l_msg +=
self.cancel_all_hedging_orders()
l_msg +=
self.translate_action(self.state,
s_main_action,
l_spread=l_spread)
return l_msg
return []
def msgs_due_hedge(self):
'''
Return messages given
that the agent needs to hedge
its positions
'''
# check if there are
reasons to hedge
l_aux =
self.risk_model.get_instruments
_to_hedge(self)
l_msg = []
if l_aux:
# print 'nHedging {} ...
n'.format(self.position['DI1F21']
)
s_, l_spread =
self._select_spread(self.state,
None)
s_action, s_instr, i_qty =
random.choice(l_aux)
# generate the
messages to the environment
my_book =
self.env.get_order_book(s_instr,
False)
row = {}
row['order_side'] = ''
row['order_price'] = 0.0
row['total_qty_order'] =
abs(i_qty)
row['instrumento_symbol'] =
s_instr
row['agent_id'] =
self.i_id
# check if should send
mkt orders in the main
instrument
l_rtn =
self.stop_on_main(l_msg,
l_spread)
if len(l_rtn) > 0:
# print 'stop on main'
s_time =
self.env.order_matching.s_time
print '{}: Stop loss.
{}'.format(s_time, l_aux)
return l_rtn
# generate trade and the
hedge instruments
s_time =
self.env.order_matching.s_time
print '{}: Stop gain.
{}'.format(s_time, l_aux)
if s_action == 'BUY':
self.b_need_hedge =
False
row['order_side'] =
'Buy Order'
row['order_price'] =
my_book.best_ask[0]
l_msg +=
self.cancel_all_hedging_orders()
l_msg +=
translator.translate_trades_to_a
gent(row, my_book)
return l_msg
elif s_action == 'SELL':
self.b_need_hedge =
False
row['order_side'] =
'Sell Order'
row['order_price'] =
my_book.best_bid[0]
l_msg +=
self.cancel_all_hedging_orders()
l_msg +=
translator.translate_trades_to_a
gent(row, my_book)
return l_msg
# generate limit order or
cancel everything
elif s_action ==
'BEST_BID':
f_curr_price,
i_qty_book = my_book.best_bid
l_spread = [0.,
self.f_spread_to_cancel]
elif s_action ==
'BEST_OFFER':
f_curr_price,
i_qty_book =
my_book.best_ask
l_spread =
[self.f_spread_to_cancel, 0.]
if s_action in
['BEST_BID', 'BEST_OFFER']:
i_order_size =
row['total_qty_order']
l_msg +=
translator.translate_to_agent(sel
f,
s_action,
my_book,
# worst t/TOB
l_spread,
i_qty=i_order_size)
return l_msg
else:
# if there is not need to
send any order, so there is no
# reason to hedge
self.b_need_hedge =
False
l_msg +=
self.cancel_all_hedging_orders()
return l_msg
self.b_need_hedge = False
return l_msg
def
cancel_all_hedging_orders(self):
'''
Cancel all hedging orders
that might be in the books
'''
l_aux = []
for s_instr in
self.risk_model.l_hedging_instr:
my_book =
self.env.get_order_book(s_instr,
False)
f_aux =
self.f_spread_to_cancel
l_aux +=
translator.translate_to_agent(sel
f,
None,
my_book,
# worst t/TOB
[f_aux, f_aux])
return l_aux
def _select_spread(self,
t_state, s_code=None):
'''
Select the spread to use in
a new order. Return the
criterium and
a list od spread
:param t_state: tuple. The
inputs to be considered by the
agent
'''
l_spread = [0.0, 0.0]
s_main =
self.env.s_main_intrument
my_book =
self.env.get_order_book(s_main,
False)
# check if it is a valid book
if abs(my_book.best_ask[0]
- my_book.best_bid[0]) <= 1e-6:
return s_code, [0.02,
0.02]
elif my_book.best_ask[0] -
my_book.best_bid[0] <= -0.01:
return s_code, [0.15,
0.15]
# check if should stop to
trade
if
self.risk_model.b_stop_trading:
return s_code, [0.04,
0.04]
# check if it is time to get
agressive due to closing market
if
self.env.order_matching.last_da
te > STOP_MKT_TIME:
if self.log_info['pos']
[s_main] < -0.01:
return s_code, [0.0,
0.04]
elif self.log_info['pos']
[s_main] > 0.01:
return s_code, [0.04,
0.0]
else:
return s_code, [0.04,
0.04]
# change spread
if not
self.risk_model.should_open_at
_current_price('ASK', self):
l_spread[1] = 0.01
elif not
self.risk_model.should_open_at
_current_price('BID', self):
l_spread[0] = 0.01
# if it just have close a
position at the speci
fi
c side
if
self.env.order_matching.f_time
< self.f_time_to_buy:
l_spread[0] = 0.01
if
self.env.order_matching.f_time
< self.f_time_to_sell:
l_spread[1] = 0.01
# check if can not open
positions due to limits
if not
self.risk_model.can_open_positi
on('ASK', self):
l_spread[1] = 0.02
elif not
self.risk_model.can_open_positi
on('BID', self):
l_spread[0] = 0.02
return s_code, l_spread
def should_print_logs(self,
s_question):
'''
Return if should print the
log based on s_question:
:param s_question: string.
All or 5MIN
'''
if self.b_print_always:
return True
if s_question == 'ALL':
return PRINT_ALL
elif s_question == '5MIN':
return PRINT_5MIN
return False
def set_to_print_always(self):
'''
'''
self.b_print_always = True
Metrics
7
state = [
(player.x_change == 20 and player.y_change == 0 and ((list(map(add, player.position[-1], [20, 0])) in player.position) or
player.position[-1][0] + 20 >= (game.game_width - 20))) or (player.x_change == -20 and player.y_change == 0 and ((list(map(add, player.position[-1], [-20, 0])) in player.position) or
player.position[-1][0] - 20 < 20)) or (player.x_change == 0 and player.y_change == -20 and ((list(map(add, player.position[-1], [0, -20])) in player.position) or
player.position[-1][-1] - 20 < 20)) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add, player.position[-1], [0, 20])) in player.position) or
player.position[-1][-1] + 20 >= (game.game_height-20))), # danger straight
(player.x_change == 0 and player.y_change == -20 and ((list(map(add,player.position[-1],[20, 0])) in player.position) or
player.position[ -1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],
[-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == -20 and player.y_change == 0 and ((list(map(
add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and (
(list(map(add,player.position[-1],[0,20])) in player.position) or player.position[-1][
-1] + 20 >= (game.game_height-20))), # danger right
(player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],[20,0])) in player.position) or
player.position[-1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == -20 and ((list(map(
add, player.position[-1],[-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and (
(list(map(add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (
player.x_change == -20 and player.y_change == 0 and ((list(map(add,player.position[-1],[0,20])) in player.position) or
player.position[-1][-1] + 20 >= (game.game_height-20))), #danger left
player.x_change == -20, # move left
player.x_change == 20, # move right
player.y_change == -20, # move up
player.y_change == 20, # move down
food.x_food < player.x, # food left
food.x_food > player.x, # food right
food.y_food < player.y, # food up
food.y_food > player.y # food down
]
Multiply-nested container
State space: 11
depth: 5
Results
8
Results
8
Results
8
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
From top 4 most common code smells 3 are shared across the 2 data sets
(Multiply-Nested Container, Long Method, Long Parameter List)
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
From top 4 most common code smells 3 are shared across the 2 data sets
(Multiply-Nested Container, Long Method, Long Parameter List)
State representations
are inherently complex
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
From top 4 most common code smells 3 are shared across the 2 data sets
(Multiply-Nested Container, Long Method, Long Parameter List)
State representations
are inherently complex
Functionality is presented
as a code block
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
From top 4 most common code smells 3 are shared across the 2 data sets
(Multiply-Nested Container, Long Method, Long Parameter List)
State representations
are inherently complex
Functionality is presented
as a code block
RL algorithms are riddle
with learning parameters
Conclusion
9
RL projects contain many code smells - 3.15 average per file
- can be up to 8 per file (or 1 code smell every 27 lines)
From top 4 most common code smells 3 are shared across the 2 data sets
(Multiply-Nested Container, Long Method, Long Parameter List)
State representations
are inherently complex
Functionality is presented
as a code block
RL algorithms are riddle
with learning parameters
Code smells point to a violation of the design principles
(coupling, cohesion, single responsibility)
Future perspectives
10
specific metrics and
code smells for RL
Future perspectives
10
specific metrics and
code smells for RL
we need specific metrics, thresholds, and tools
to capture the complexity of RL algorithms
Future perspectives
10
specific metrics and
code smells for RL
we need specific metrics, thresholds, and tools
to capture the complexity of RL algorithms
The complexity of RL can be managed with the
creation of dedicated data structures or express
relations between entities more ergonomically

More Related Content

Similar to [CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects

E_Commerce
E_CommerceE_Commerce
E_CommerceSilpiNandi1
 
E_Commerce Data model
E_Commerce Data modelE_Commerce Data model
E_Commerce Data modelSilpiNandi1
 
Naive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentNaive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentAndriy Khavryuchenko
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Abhishek Thakur
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docxGEETHAR59
 
Test and refactoring
Test and refactoringTest and refactoring
Test and refactoringKenneth Ceyer
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMontreal Python
 
Predicting Future Sale
Predicting Future SalePredicting Future Sale
Predicting Future SaleDebmalya Pramanik
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnArnaud Joly
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdf
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdfHello- I hope you are doing well- I am doing my project- which is Rans (1).pdf
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdfIan0J2Bondo
 
SOLID Ruby SOLID Rails
SOLID Ruby SOLID RailsSOLID Ruby SOLID Rails
SOLID Ruby SOLID RailsMichael Mahlberg
 
科ç‰čæž—Î»ć­ž
科ç‰čæž—Î»ć­žç§‘ç‰čæž—Î»ć­ž
科ç‰čæž—Î»ć­žćœ„ćœŹ æŽȘ
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to peopleFabian Abel
 
Authorship attribution pydata london
Authorship attribution   pydata londonAuthorship attribution   pydata london
Authorship attribution pydata londonkperi
 
vertopal.com_DataEncodingForDataClustering-5 (1).pdf
vertopal.com_DataEncodingForDataClustering-5 (1).pdfvertopal.com_DataEncodingForDataClustering-5 (1).pdf
vertopal.com_DataEncodingForDataClustering-5 (1).pdfzraibianour
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn AnalysisVasudev pendyala
 

Similar to [CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects (20)

E_Commerce
E_CommerceE_Commerce
E_Commerce
 
E_Commerce Data model
E_Commerce Data modelE_Commerce Data model
E_Commerce Data model
 
Naive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentNaive application of Machine Learning to Software Development
Naive application of Machine Learning to Software Development
 
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
Approaching (almost) Any Machine Learning Problem (kaggledays dubai)
 
Xgboost
XgboostXgboost
Xgboost
 
NPTEL QUIZ.docx
NPTEL QUIZ.docxNPTEL QUIZ.docx
NPTEL QUIZ.docx
 
Test and refactoring
Test and refactoringTest and refactoring
Test and refactoring
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
 
Predicting Future Sale
Predicting Future SalePredicting Future Sale
Predicting Future Sale
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdf
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdfHello- I hope you are doing well- I am doing my project- which is Rans (1).pdf
Hello- I hope you are doing well- I am doing my project- which is Rans (1).pdf
 
SOLID Ruby SOLID Rails
SOLID Ruby SOLID RailsSOLID Ruby SOLID Rails
SOLID Ruby SOLID Rails
 
科ç‰čæž—Î»ć­ž
科ç‰čæž—Î»ć­žç§‘ç‰čæž—Î»ć­ž
科ç‰čæž—Î»ć­ž
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
 
CPP Homework Help
CPP Homework HelpCPP Homework Help
CPP Homework Help
 
Authorship attribution pydata london
Authorship attribution   pydata londonAuthorship attribution   pydata london
Authorship attribution pydata london
 
vertopal.com_DataEncodingForDataClustering-5 (1).pdf
vertopal.com_DataEncodingForDataClustering-5 (1).pdfvertopal.com_DataEncodingForDataClustering-5 (1).pdf
vertopal.com_DataEncodingForDataClustering-5 (1).pdf
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 

More from Universidad de los Andes

An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...Universidad de los Andes
 
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript ProgramsUniversidad de los Andes
 
[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...Universidad de los Andes
 
[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile AppsUniversidad de los Andes
 
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...Universidad de los Andes
 
[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing Algorithms[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing AlgorithmsUniversidad de los Andes
 
Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...Universidad de los Andes
 
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...Universidad de los Andes
 
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary studyDoes Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary studyUniversidad de los Andes
 
Learning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptationsLearning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptationsUniversidad de los Andes
 
CQL: declarative language for context activation
CQL: declarative language for context activationCQL: declarative language for context activation
CQL: declarative language for context activationUniversidad de los Andes
 
Generating software adaptations using machine learning
Generating software adaptations using machine learningGenerating software adaptations using machine learning
Generating software adaptations using machine learningUniversidad de los Andes
 
[Bachelor_project] AsignaciĂłn de exĂĄmenes finales
[Bachelor_project] AsignaciĂłn de exĂĄmenes finales[Bachelor_project] AsignaciĂłn de exĂĄmenes finales
[Bachelor_project] AsignaciĂłn de exĂĄmenes finalesUniversidad de los Andes
 
Programming language techniques for adaptive software
Programming language techniques for adaptive softwareProgramming language techniques for adaptive software
Programming language techniques for adaptive softwareUniversidad de los Andes
 
Peace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contextsPeace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contextsUniversidad de los Andes
 

More from Universidad de los Andes (18)

An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...
 
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
 
[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...
 
[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps
 
Keeping Up! with LaTeX
Keeping Up! with LaTeXKeeping Up! with LaTeX
Keeping Up! with LaTeX
 
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
 
[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing Algorithms[CCC'21] Evaluation of Work Stealing Algorithms
[CCC'21] Evaluation of Work Stealing Algorithms
 
Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...
 
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
 
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary studyDoes Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
 
Learning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptationsLearning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptations
 
Distributed context Petri nets
Distributed context Petri netsDistributed context Petri nets
Distributed context Petri nets
 
CQL: declarative language for context activation
CQL: declarative language for context activationCQL: declarative language for context activation
CQL: declarative language for context activation
 
Generating software adaptations using machine learning
Generating software adaptations using machine learningGenerating software adaptations using machine learning
Generating software adaptations using machine learning
 
[Bachelor_project] AsignaciĂłn de exĂĄmenes finales
[Bachelor_project] AsignaciĂłn de exĂĄmenes finales[Bachelor_project] AsignaciĂłn de exĂĄmenes finales
[Bachelor_project] AsignaciĂłn de exĂĄmenes finales
 
Programming language techniques for adaptive software
Programming language techniques for adaptive softwareProgramming language techniques for adaptive software
Programming language techniques for adaptive software
 
Peace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contextsPeace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contexts
 
Emergent Software Services
Emergent Software ServicesEmergent Software Services
Emergent Software Services
 

Recently uploaded

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects

  • 1. CAIN 2023 Prevalence of Code Smells in Reinforcement Learning Projects NicolĂĄs Cardozo, Ivana Dusparic, Christian Cabrera Systems an Computing Engineering - Universidad de los Andes, BogotĂĄ - Colombia Trinity College Dublin - Ireland University of Cambridge - UK n.cardozo@uniandes.edu.co, ivana.dusparic@tcd.ie, chc79@cam.ac.uk @ncardoz
  • 4. How about the quality? 3 + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer
  • 5. How about the quality? 3 + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Design
  • 6. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Design
  • 7. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Automatically test Design
  • 8. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Automatically test Ship/Deploy/Maintain Design
  • 9. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Automatically test Ship/Deploy/Maintain Design Design
  • 10. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Automatically test Ship/Deploy/Maintain Design Code Design
  • 11. How about the quality? 3 Code + ISIS1226(String: lecturer, int: section, Data: lectures, int: size, int: numAssignments) + markAssignments(String: assignment, marks: float[]) : void - checkAssignment(String: assignment) : boolean - section : int - lecturer : String - lectures : Date[] - students : Student[] - assignments : String[] ISIS1226 - id: int - mark: double Assignment - name: String - marks: double[] Student - name: String Lecturer Automatically test Ship/Deploy/Maintain Design Code Test? Ship/Deploy/Maintain Design
  • 15. Process 4 Filter MongoDB Node.js Backend PY PY PY PY Sta$c analysis GitHub API Q-learning Python projects Static analysis Backend 20 most popular repositories 4 reference project implementations
  • 16. Process 4 Filter MongoDB Node.js Backend PY PY PY PY Sta$c analysis GitHub API Q-learning Python projects Static analysis LM LC LPL LMC LSC LTCE MNC LLF Metrics Backend 20 most popular repositories 4 reference project implementations
  • 17. Metrics 5 Code smell MĂ©tric Threshold Long Method (LM) Function LOC 38 Long Class (LC) Class LOC 29 Long Parameter List (LPL) Number of parameters 5 Long Method Chain (LMC) Length of message chain 5 Long Scope Chaining (LSC) Depth of closure 3 Long Ternary Conditional Expression (LTCE) Number of characters 54 Multiply-Nested Container (MNC) Depth of nested container 3 Long Lambda Function (LLF) Number of characters 48
  • 18. Metrics 6 def update(self, msg_env): ''' Update the state of the agent :param msg_env: dict. A message generated by the order matching ''' # check if should update, if it is not a trade # if not isinstance(msg_env, type(None)): if not msg_env: if not self.should_update(): return None # recover basic infos inputs = self.env.sense(self) state = self.env.agent_states[self] s_cmm = self.env.s_main_intrument # Update state (position ,volume and if has an order in bid or ask) self.state = self.get_intern_state(inputs, state) # Select action according to the agent's policy s_action = None s_action, l_msg = self.take_action(self.state, msg_env) s_action2 = s_action # Execute action and get reward reward = 0. self.env.update_order_book(l_msg) l_prices_to_print = [] if len(l_msg) == 0: reward += self.env.act(self, None) self.b_new_reward = False for msg in l_msg: if msg['agent_id'] == self.i_id: # check if should hedge the position self.should_change_stoptime(msg) # form log message s_action = msg['action'] s_action2 = s_action s_side_msg = msg['order_side'].split()[0] s_indic = msg['agressor_indicator'] s_cmm = msg['instrumento_symbol'] d_aux = {'A': msg['order_status'], # log just the last 4 digits of the order 'I': msg['order_id'] % 10**4, 'Q': msg['order_qty'], 'C': msg['instrumento_symbol'], 'S': s_side_msg, 'P': '{:0.2f}'.format(msg['order_price'])} l_prices_to_print.append(d_aux) # l_prices_to_print.append('{:0.2f}'.format(msg['order_price'])) if s_indic == 'Agressor' and s_action == 'SELL': s_action2 = 'HIT' # hit the bid elif s_indic == 'Agressor' and s_action == 'BUY': s_action2 = 'TAKE' # take the offer try: # the agent's positions and orders list are update here # TODO: The reward really should be collect at this point? reward += self.env.act(self, msg) self.b_new_reward = False except: print 'BasicAgent.update(): Message with error at reward:' pprint.pprint(msg) raise # check if should cancel any order due to excess l_msg1 = self.could_include_new(s_action) self.env.update_order_book(l_msg1) for msg in l_msg1: if msg['agent_id'] == self.i_id: s_indic = msg['agressor_indicator'] d_aux = {'A': msg['order_status'], 'I': msg['order_id'], 'C': msg['instrumento_symbol'], 'S': msg['order_side'].split()[0], 'P': '{:0.2f}'.format(msg['order_price'])} l_prices_to_print.append(d_aux) try: # the agent's positions and orders list are update here # there is no meaning in colecting reward here self.env.act(self, msg) except: print 'BasicAgent.update(): Message with error at reward:' pprint.pprint(msg) raise # === DEBUG ==== # if len(l_msg1) > 0: # print 'n====CANCEL ORDER DUE TO EXCESS======n' # pprint.pprint(l_msg1) # ============== # NOTE: I am not sure about that, but at least makes sense... I guess # I should have to apply the reward to the action that has generated # the trade (when my order was hit, I was in the book before) if s_action2 == s_action: if s_action == 'BUY': s_action = 'BEST_BID' elif s_action == 'SELL': s_action = 'BEST_OFFER' if s_action in ['correction_by_trade', 'crossed_prices']: if s_side_msg == 'Buy': s_action = 'BEST_BID' elif s_side_msg == 'Sell': s_action = 'BEST_OFFER' # Learn policy based on state, action, reward if s_cmm == self.env.s_main_intrument: if self.policy_update(self.state, s_action, reward): self.k_steps += 1 self.n_steps += 1 # print 'new step: {}n'.format(self.n_steps) # calculate the next time that the agent will react if not isinstance(msg_env, type(dict)): self.next_time = self.env.order_matching.last_date f_delta_time = self.f_min_time # add additional miliseconds to the next_time to act if self.f_min_time > 0.004: if np.random.rand() > 0.4: i_mult = 1 if np.random.rand() < 0.5: i_mult = -1 f_add = min(1., self.f_min_time*100) f_add *= np.random.rand() f_delta_time += (int(np.ceil(f_add))*i_mult)/1000. self.next_time += f_delta_time self.last_delta_time = int(f_delta_time * 1000) # print agent inputs self._pnl_information_update() self.log_step(state, inputs, s_action2, l_prices_to_print, reward) Long method
  • 19. Metrics 6 def update(self, msg_env): ''' Update the state of the agent :param msg_env: dict. A message generated by the order matching ''' # check if should update, if it is not a trade # if not isinstance(msg_env, type(None)): if not msg_env: if not self.should_update(): return None # recover basic infos inputs = self.env.sense(self) state = self.env.agent_states[self] s_cmm = self.env.s_main_intrument # Update state (position ,volume and if has an order in bid or ask) self.state = self.get_intern_state(inputs, state) # Select action according to the agent's policy s_action = None s_action, l_msg = self.take_action(self.state, msg_env) s_action2 = s_action # Execute action and get reward reward = 0. self.env.update_order_book(l_msg) l_prices_to_print = [] if len(l_msg) == 0: reward += self.env.act(self, None) self.b_new_reward = False for msg in l_msg: if msg['agent_id'] == self.i_id: # check if should hedge the position self.should_change_stoptime(msg) # form log message s_action = msg['action'] s_action2 = s_action s_side_msg = msg['order_side'].split()[0] s_indic = msg['agressor_indicator'] s_cmm = msg['instrumento_symbol'] d_aux = {'A': msg['order_status'], # log just the last 4 digits of the order 'I': msg['order_id'] % 10**4, 'Q': msg['order_qty'], 'C': msg['instrumento_symbol'], 'S': s_side_msg, 'P': '{:0.2f}'.format(msg['order_price'])} l_prices_to_print.append(d_aux) # l_prices_to_print.append('{:0.2f}'.format(msg['order_price'])) if s_indic == 'Agressor' and s_action == 'SELL': s_action2 = 'HIT' # hit the bid elif s_indic == 'Agressor' and s_action == 'BUY': s_action2 = 'TAKE' # take the offer try: # the agent's positions and orders list are update here # TODO: The reward really should be collect at this point? reward += self.env.act(self, msg) self.b_new_reward = False except: print 'BasicAgent.update(): Message with error at reward:' pprint.pprint(msg) raise # check if should cancel any order due to excess l_msg1 = self.could_include_new(s_action) self.env.update_order_book(l_msg1) for msg in l_msg1: if msg['agent_id'] == self.i_id: s_indic = msg['agressor_indicator'] d_aux = {'A': msg['order_status'], 'I': msg['order_id'], 'C': msg['instrumento_symbol'], 'S': msg['order_side'].split()[0], 'P': '{:0.2f}'.format(msg['order_price'])} l_prices_to_print.append(d_aux) try: # the agent's positions and orders list are update here # there is no meaning in colecting reward here self.env.act(self, msg) except: print 'BasicAgent.update(): Message with error at reward:' pprint.pprint(msg) raise # === DEBUG ==== # if len(l_msg1) > 0: # print 'n====CANCEL ORDER DUE TO EXCESS======n' # pprint.pprint(l_msg1) # ============== # NOTE: I am not sure about that, but at least makes sense... I guess # I should have to apply the reward to the action that has generated # the trade (when my order was hit, I was in the book before) if s_action2 == s_action: if s_action == 'BUY': s_action = 'BEST_BID' elif s_action == 'SELL': s_action = 'BEST_OFFER' if s_action in ['correction_by_trade', 'crossed_prices']: if s_side_msg == 'Buy': s_action = 'BEST_BID' elif s_side_msg == 'Sell': s_action = 'BEST_OFFER' # Learn policy based on state, action, reward if s_cmm == self.env.s_main_intrument: if self.policy_update(self.state, s_action, reward): self.k_steps += 1 self.n_steps += 1 # print 'new step: {}n'.format(self.n_steps) # calculate the next time that the agent will react if not isinstance(msg_env, type(dict)): self.next_time = self.env.order_matching.last_date f_delta_time = self.f_min_time # add additional miliseconds to the next_time to act if self.f_min_time > 0.004: if np.random.rand() > 0.4: i_mult = 1 if np.random.rand() < 0.5: i_mult = -1 f_add = min(1., self.f_min_time*100) f_add *= np.random.rand() f_delta_time += (int(np.ceil(f_add))*i_mult)/1000. self.next_time += f_delta_time self.last_delta_time = int(f_delta_time * 1000) # print agent inputs self._pnl_information_update() self.log_step(state, inputs, s_action2, l_prices_to_print, reward) Long method Long class class QLearningAgent(BasicAgent): ''' A representation of an agent that learns using Q-learning with linear parametrization and e-greedy exploration described at p.60 ~ p.61 form Busoniu at al., 2010. The approximator used is the implementation of tile coding, described at Sutton and Barto, 2016 (draft). ''' actions_to_open = [None, 'BEST_BID', 'BEST_OFFER', 'BEST_BOTH'] actions_to_close_when_short = [None, 'BEST_BID'] actions_to_close_when_long = [None, 'BEST_OFFER'] actions_to_stop_when_short = [None, 'BEST_BID', 'BUY'] actions_to_stop_when_long = [None, 'BEST_OFFER', 'SELL'] FROZEN_POLICY = False def __init__(self, env, i_id, d_normalizers, d_o fi _scale, f_min_time=3600., f_gamma=0.5, f_alpha=0.5, i_numOfTilings=16, s_decay_fun=None, f_ttoupdate=5., d_initial_pos={}, s_hedging_on='DI1F19', b_hedging=True, b_keep_pos=True): ''' Initialize a QLearningAgent. Save all parameters as attributes :param env: Environment Object. The Environment where the agent acts :param i_id: integer. Agent id :param d_normalizers: dictionary. The maximum range of each feature :param f_min_time*: fl oat. Minimum time in seconds to the agent react :param f_gamma*: fl oat. weight of delayed versus immediate rewards :param f_alpha*: the initial learning rate used :param i_numOfTilings*: unmber of tiling desired :param s_decay_fun*: string. The exploration factor decay function :param f_ttoupdate*. fl oat. time in seconds to choose a diferent action ''' f_aux = f_ttoupdate super(QLearningAgent, self).__init__(env, i_id, f_min_time, f_aux, d_initial_pos=d_initial_pos) self.learning = True # this agent is expected to learn self.decayfun = s_decay_fun # Initialize any additional variables here self.max_pos = 100. self.max_disclosed_pos = 10. self.orders_lim = 4 self.order_size = 5 self.s_agent_name = 'QLearningAgent' # control hedging obj_aux = risk_model.GreedyHedgeModel self.s_hedging_on = s_hedging_on self.risk_model = obj_aux(env, s_instrument=s_hedging_on, s_fairness='closeout') self.last_spread = [0.0, 0.0] self.f_spread = [0.0, 0.0] self.f_gamma = f_gamma self.f_alpha = f_alpha self.f_epsilon = 1.0 self.b_hedging = b_hedging self.current_open_price = None self.current_max_price = -9999. self.current_min_price = 9999. self.b_keep_pos = b_keep_pos # Initialize any additional variables here self.f_time_to_buy = 0. self.f_time_to_sell = 0. self.b_print_always = False self.d_normalizers = d_normalizers self.d_o fi _scale = d_o fi _scale self.numOfTilings = i_numOfTilings self.alpha = f_alpha i_nTiling = i_numOfTilings value_fun = ValueFunction(f_alpha, d_normalizers, i_nTiling) self.value_function = value_fun self.old_state = None self.last_action = None self.last_reward = None self.disclosed_position = {} self.f_stop_time = STOP_MKT_TIME - 1 + 1 # self.features_names = ['position', 'o fi _new', 'spread_longo', # 'ratio_longo', 'ratio_curto', # 'size_bid_longo', 'size_bid_curto', # 'spread_curto', 'high_low', 'rel_price'] self.features_names = ['position', 'o fi _new', 'ratio_longo', 'ratio_curto', 'spread_longo', 'rel_price'] def reset_additional_variables(self, testing): ''' Reset the state and the agent's memory about its positions :param testing: boolean. If should freeze policy ''' self.risk_model.reset() self.f_time_to_buy = 0. self.f_time_to_sell = 0. self.last_reward = None self.current_open_price = None self.current_max_price = -9999. self.current_min_price = 9999. self.spread_position = {} self.disclosed_position = {} self.env.reward_fun.reset() for s_instr in self.env.l_instrument: self.disclosed_position[s_instr] = {'qAsk': 0., 'Ask': 0., 'qBid': 0., 'Bid': 0.} if testing: self.freeze_policy() def additional_actions_when_exec( self, s_instr, s_side, msg): ''' Execute additional action when execute a trade :param s_instr: string. :param s_side: string. :param msg: dictionary. Last trade message ''' # check if the main intrument was traded s_main = self.env.s_main_intrument if msg['instrumento_symbol'] == s_main: self.b_has_traded = True # check if it open or close a pos f_pos = self.position[s_instr]['qBid'] f_pos -= self.position[s_instr]['qAsk'] b_zeroout_buy = f_pos == 0 and s_side == 'ASK' b_zeroout_sell = f_pos == 0 and s_side == 'BID' b_new_buy = f_pos > 0 and s_side == 'BID' b_new_sell = f_pos < 0 and s_side == 'ASK' b_close_buy = f_pos > 0 and s_side == 'ASK' b_close_sell = f_pos < 0 and s_side == 'BID' s_other_side = 'BID' # set the time to open position if it just close it if b_close_buy or b_zeroout_buy: self.f_time_to_buy = self.env.order_matching.f_time + 60. elif b_close_sell or b_zeroout_sell: self.f_time_to_sell = self.env.order_matching.f_time + 60. # print when executed f_pnl = self.log_info['pnl'] s_time = self.env.order_matching.s_time s_err = '{}: {} - current position {:0.2f}, PnL: {:0.2f}n' print s_err.format(s_time, s_instr, f_pos, f_pnl) # keep a list of the opened positions if s_side == 'BID': s_other_side = 'ASK' if b_zeroout_buy or b_zeroout_sell: self.current_open_price = None # update by risk model self.current_max_price = -9999. self.current_min_price = 9999. self.d_trades[s_instr] [s_side] = [] self.d_trades[s_instr] [s_other_side] = [] elif b_new_buy or b_new_sell: if b_new_buy: self.risk_model.price_stop_sell = None if b_new_sell: self.risk_model.price_stop_buy = None # log more information l_info_to_hold = [msg['order_price'], msg['order_qty'], None] if 'last_inputs' in self.log_info: l_info_to_hold[2] = self.log_info['last_inputs']['TOB'] self.d_trades[s_instr] [s_side].append(l_info_to_hold) self.d_trades[s_instr] [s_other_side] = [] elif b_close_buy or b_close_sell: f_qty_to_match = msg['order_qty'] l_aux = [] for f_price, f_qty, d_tob in self.d_trades[s_instr] [s_other_side]: if f_qty_to_match == 0: l_aux.append([f_price, f_qty, d_tob]) elif f_qty <= f_qty_to_match: f_qty_to_match -= f_qty elif f_qty > f_qty_to_match: f_qty -= f_qty_to_match f_qty_to_match = 0 l_aux.append([f_price, f_qty, d_tob]) self.d_trades[s_instr] [s_other_side] = l_aux if abs(f_qty_to_match) > 0: l_info_to_hold = [msg['order_price'], f_qty_to_match, None] if 'last_inputs' in self.log_info: l_info_to_hold[2] = self.log_info['last_inputs']['TOB'] self.d_trades[s_instr] [s_side].append(l_info_to_hold) def need_to_hedge(self): ''' Return if the agent need to hedge position ''' # ask risk model if should hedge if not self.b_hedging: return False if not self.b_keep_pos: if self.env.order_matching.last_da te > self.f_stop_time: if abs(self.log_info['duration']) > 0.01: self.b_need_hedge = True # print 'need_to_hedge(): HERE' return self.b_need_hedge if self.risk_model.should_stop_dis closed(self): return True if self.risk_model.should_hedge_o pen_position(self): # check if should hedge position if abs(self.log_info['duration']) > 1.: self.b_need_hedge = True return self.b_need_hedge return False def get_valid_actions_old(self): ''' Return a list of valid actions based on the current position ''' # b_stop = False valid_actions = list(self.actions_to_open) if not self.risk_model.can_open_positi on('ASK', self): valid_actions = list(self.actions_to_close_when_ short) # copy if self.risk_model.should_stop_dis closed(self): # b_stop = True valid_actions = list(self.actions_to_stop_when_s hort) elif not self.risk_model.can_open_positi on('BID', self): valid_actions = list(self.actions_to_close_when_ long) if self.risk_model.should_stop_dis closed(self): # b_stop = True valid_actions = list(self.actions_to_stop_when_l ong) return valid_actions def get_valid_actions(self): ''' Return a list of valid actions based on the current position ''' # b_stop = False valid_actions = list(self.actions_to_open) return valid_actions def get_intern_state(self, inputs, state): ''' Return a dcitionary representing the intern state of the agent :param inputs: dictionary. what the agent can sense from env :param state: dictionary. the current state of the agent ''' d_data = {} s_main = self.env.s_main_intrument d_data['OFI'] = inputs['qO fi '] d_data['qBID'] = inputs['qBid'] d_data['BOOK_RATIO'] = 0. d_data['LOG_RET'] = inputs['logret'] d_rtn = {} d_rtn['cluster'] = 0 d_rtn['Position'] = fl oat(state[s_main]['Position']) d_rtn['best_bid'] = state['best_bid'] d_rtn['best_o ff er'] = state['best_o ff er'] # calculate the current position in the main instrument f_pos = self.position[s_main]['qBid'] f_pos -= self.position[s_main]['qAsk'] f_pos += self.disclosed_position[s_main] ['qBid'] f_pos -= self.disclosed_position[s_main] ['qAsk'] # calculate the duration exposure # f_duration = self.risk_model.portfolio_duratio n(self.position) # measure the OFI index f_last_o fi = 0. # if self.logged_action: # compare with the last data # if 'to_delta' in self.log_info: # # measure the change in OFI from he last sction taken # for s_key in [s_main]: # i_o fi _now = inputs['OFI'][s_key] # i_o fi _old = self.log_info['to_delta']['OFI'] [s_key] # f_aux = i_o fi _now - i_o fi _old # f_last_o fi += f_aux f_last_o fi = inputs['dOFI'] [s_main] # for the list to be used as features fun = self.bound_values s_lng = self.env.s_main_intrument s_crt = self.s_hedging_on l_values = [fun(f_pos * 1., 'position'), fun(f_last_o fi , 'o fi _new', s_main), fun(inputs['ratio'] [s_lng]['BID'], 'ratio_longo'), fun(inputs['ratio'] [s_crt]['BID'], 'ratio_curto'), fun(inputs['spread'] [s_lng], 'spread_longo'), # fun(inputs['spread'][s_crt], 'spread_curto'), # fun(inputs['size'] [s_lng]['BID'], 'size_bid_longo'), # fun(inputs['size'] [s_crt]['BID'], 'size_bid_curto'), # fun(inputs['HighLow'][s_lng], 'high_low'), fun(inputs['reallAll'] [s_lng], 'rel_price')] d_rtn['features'] = dict(zip(self.features_names, l_values)) return d_rtn def bound_values(self, f_value, s_feature_name, s_cmm=None): ''' Return the value bounded by the maximum and minimum values predicted. Also apply nomalizations functions if it is de fi ned and d_normalizers, in the FUN key. :param f_value: fl oat. value to be bounded :param s_feature_name: string. the name of the feature in d_normalizers :param s_cmm*: string. Name of the instrument ''' f_max = self.d_normalizers[s_feature_na me]['MAX'] f_min = self.d_normalizers[s_feature_na me]['MIN'] f_value2 = max(f_min, f_value) f_value2 = min(f_max, f_value) if 'FUN' in self.d_normalizers[s_feature_na me]: if s_feature_name == 'o fi _new': f = self.d_normalizers[s_feature_na me]['FUN'](f_value, s_cmm) f = max(f_min, f) f = min(f_max, f) f_value2 = f else: f_value2 = self.d_normalizers[s_feature_na me]['FUN'](f_value2) return f_value2 def get_epsilon_k(self): ''' Get $epsilon_k$ according to the exploration schedule ''' trial = self.env.count_trials - 2 # ? if self.decayfun == 'tpower': # e = a^t, where 0 < z < 1 # self.f_epsilon = math.pow(0.9675, trial) # for 100 trials self.f_epsilon = math.pow(0.9333, trial) # for 50 trials elif self.decayfun == 'trig': # e = cos(at), where 0 < z < 1 # self.f_epsilon = math.cos(0.0168 * trial) # for 100 trials self.f_epsilon = math.cos(0.03457 * trial) # for 50 trials else: # self.f_epsilon = max(0., 1. - (1./45. * trial)) # for 50 trials self.f_epsilon = max(0., 1. - (1./95. * trial)) # for 100 trials return self.f_epsilon def choose_an_action(self, d_state, valid_actions): ''' Return an action from a list of allowed actions according to the agent's policy based on epsilon greedy policy and valueFunction :param d_state: dictionary. The inputs to be considered by the agent :param valid_actions: list. List of the allowed actions ''' # return a uniform random action with prob $epsilon_k$ (exploration) state_ = d_state['features'] best_Action = random.choice(valid_actions) if not self.FROZEN_POLICY: if np.random.binomial(1, self.get_epsilon_k()) == 1: return best_Action # apply: arg max_{u'} ( phi^T (x_k, u') theta_k) values = [] for action in valid_actions: values.append(self.value_functi on.value(state_, action, self)) # return self.d_value_to_action[argmax(v alues)] return valid_actions[argmax(values)] def apply_policy(self, state, action, reward): ''' Learn policy based on state, action, reward. The algo part of "apply action u_k" is in the update method from agent frmk as the update just occur after one trial, state and reward are at the next step. Return True if the policy was updated :param state: dictionary. The current state of the agent :param action: string. the action selected at this time :param reward: integer. the rewards received due to the action ''' # check if there is some state in cache state_ = state['features'] valid_actions = self.get_valid_actions() if self.old_state and not self.FROZEN_POLICY: # TD Update q_values_next = [] for act in valid_actions: # for the vector: $ (phi^t (x_{k+1}, u') * theta_k)_{u'}$ # state here plays the role of next state x_{k+1}. act are u's f_value = self.value_function.value(state_, act, self) q_values_next.append(f_value) # Q-Value TD Target # apply: Qhat <- r_{k+1} + y max_u' (phi^T(x_{k+1}, u') theta_k) # note that u' is the result of apply u in x. u' is the action that # would maximize the estimated Q-value for the state x' td_target = self.last_reward + self.f_gamma * np.max(q_values_next) # Update the state value function using our target # apply: $theta_{k+1} <- alpha_k (Q_ - Qhat) theta(x_k, u_k)$ # the remain part of the update is inside the method learn # use last_action here because it generated the curremt reward self.value_function.learn(self.old _state, self.last_action, td_target, self) # save current state, action and reward to use in the next run self.old_state = state_ # in the next run it is x_k <- x_{k+1} self.last_action = action # in the next run it is u_k self.last_reward = reward # in the next run it is r_{k+1} if action in ['SELL', 'BUY']: print '=', return True def set_qtable(self, s_fname, b_freezy_policy=True): ''' Set up the q-table to be used in testing simulation and freeze policy :param s_fname: string. Path to the qtable to be used ''' # freeze policy if it is for test and not for traning if b_freezy_policy: self.freeze_policy() # load qtable and transform in a dictionary value_fun = pickle.load(open(s_fname, 'r')) self.value_function = value_fun # log fi le used s_print = '{}.set_qtable(): Setting up the agent to use' s_print = s_print.format(self.s_agent_nam e) s_print += ' the Value Function at {}'.format(s_fname) # DEBUG logging.info(s_print) def stop_on_main(self, l_msg, l_spread): ''' Stop on the main instrument :param l_msg: list. :param l_spread: list. ''' s_main_action = '' if self.risk_model.should_stop_dis closed(self): if self.log_info['duration'] < 0.: print '=', s_main_action = 'SELL' if self.log_info['duration'] > 0.: print '=', s_main_action = 'BUY' if self.env.order_matching.last_da te > self.f_stop_time: if not self.b_keep_pos: if self.log_info['duration'] < 0.: print '>', s_main_action = 'SELL' if self.log_info['duration'] > 0.: print '>', s_main_action = 'BUY' # place orders in the best price will be handle by the spread # in the next time the agent updates its orders # l_spread_main = self._select_spread(self.state, s_code) if s_main_action in ['BUY', 'SELL']: self.b_need_hedge = False l_msg += self.cancel_all_hedging_orders() l_msg += self.translate_action(self.state, s_main_action, l_spread=l_spread) return l_msg return [] def msgs_due_hedge(self): ''' Return messages given that the agent needs to hedge its positions ''' # check if there are reasons to hedge l_aux = self.risk_model.get_instruments _to_hedge(self) l_msg = [] if l_aux: # print 'nHedging {} ... n'.format(self.position['DI1F21'] ) s_, l_spread = self._select_spread(self.state, None) s_action, s_instr, i_qty = random.choice(l_aux) # generate the messages to the environment my_book = self.env.get_order_book(s_instr, False) row = {} row['order_side'] = '' row['order_price'] = 0.0 row['total_qty_order'] = abs(i_qty) row['instrumento_symbol'] = s_instr row['agent_id'] = self.i_id # check if should send mkt orders in the main instrument l_rtn = self.stop_on_main(l_msg, l_spread) if len(l_rtn) > 0: # print 'stop on main' s_time = self.env.order_matching.s_time print '{}: Stop loss. {}'.format(s_time, l_aux) return l_rtn # generate trade and the hedge instruments s_time = self.env.order_matching.s_time print '{}: Stop gain. {}'.format(s_time, l_aux) if s_action == 'BUY': self.b_need_hedge = False row['order_side'] = 'Buy Order' row['order_price'] = my_book.best_ask[0] l_msg += self.cancel_all_hedging_orders() l_msg += translator.translate_trades_to_a gent(row, my_book) return l_msg elif s_action == 'SELL': self.b_need_hedge = False row['order_side'] = 'Sell Order' row['order_price'] = my_book.best_bid[0] l_msg += self.cancel_all_hedging_orders() l_msg += translator.translate_trades_to_a gent(row, my_book) return l_msg # generate limit order or cancel everything elif s_action == 'BEST_BID': f_curr_price, i_qty_book = my_book.best_bid l_spread = [0., self.f_spread_to_cancel] elif s_action == 'BEST_OFFER': f_curr_price, i_qty_book = my_book.best_ask l_spread = [self.f_spread_to_cancel, 0.] if s_action in ['BEST_BID', 'BEST_OFFER']: i_order_size = row['total_qty_order'] l_msg += translator.translate_to_agent(sel f, s_action, my_book, # worst t/TOB l_spread, i_qty=i_order_size) return l_msg else: # if there is not need to send any order, so there is no # reason to hedge self.b_need_hedge = False l_msg += self.cancel_all_hedging_orders() return l_msg self.b_need_hedge = False return l_msg def cancel_all_hedging_orders(self): ''' Cancel all hedging orders that might be in the books ''' l_aux = [] for s_instr in self.risk_model.l_hedging_instr: my_book = self.env.get_order_book(s_instr, False) f_aux = self.f_spread_to_cancel l_aux += translator.translate_to_agent(sel f, None, my_book, # worst t/TOB [f_aux, f_aux]) return l_aux def _select_spread(self, t_state, s_code=None): ''' Select the spread to use in a new order. Return the criterium and a list od spread :param t_state: tuple. The inputs to be considered by the agent ''' l_spread = [0.0, 0.0] s_main = self.env.s_main_intrument my_book = self.env.get_order_book(s_main, False) # check if it is a valid book if abs(my_book.best_ask[0] - my_book.best_bid[0]) <= 1e-6: return s_code, [0.02, 0.02] elif my_book.best_ask[0] - my_book.best_bid[0] <= -0.01: return s_code, [0.15, 0.15] # check if should stop to trade if self.risk_model.b_stop_trading: return s_code, [0.04, 0.04] # check if it is time to get agressive due to closing market if self.env.order_matching.last_da te > STOP_MKT_TIME: if self.log_info['pos'] [s_main] < -0.01: return s_code, [0.0, 0.04] elif self.log_info['pos'] [s_main] > 0.01: return s_code, [0.04, 0.0] else: return s_code, [0.04, 0.04] # change spread if not self.risk_model.should_open_at _current_price('ASK', self): l_spread[1] = 0.01 elif not self.risk_model.should_open_at _current_price('BID', self): l_spread[0] = 0.01 # if it just have close a position at the speci fi c side if self.env.order_matching.f_time < self.f_time_to_buy: l_spread[0] = 0.01 if self.env.order_matching.f_time < self.f_time_to_sell: l_spread[1] = 0.01 # check if can not open positions due to limits if not self.risk_model.can_open_positi on('ASK', self): l_spread[1] = 0.02 elif not self.risk_model.can_open_positi on('BID', self): l_spread[0] = 0.02 return s_code, l_spread def should_print_logs(self, s_question): ''' Return if should print the log based on s_question: :param s_question: string. All or 5MIN ''' if self.b_print_always: return True if s_question == 'ALL': return PRINT_ALL elif s_question == '5MIN': return PRINT_5MIN return False def set_to_print_always(self): ''' ''' self.b_print_always = True
  • 20. Metrics 7 state = [ (player.x_change == 20 and player.y_change == 0 and ((list(map(add, player.position[-1], [20, 0])) in player.position) or player.position[-1][0] + 20 >= (game.game_width - 20))) or (player.x_change == -20 and player.y_change == 0 and ((list(map(add, player.position[-1], [-20, 0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == 0 and player.y_change == -20 and ((list(map(add, player.position[-1], [0, -20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add, player.position[-1], [0, 20])) in player.position) or player.position[-1][-1] + 20 >= (game.game_height-20))), # danger straight (player.x_change == 0 and player.y_change == -20 and ((list(map(add,player.position[-1],[20, 0])) in player.position) or player.position[ -1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1], [-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == -20 and player.y_change == 0 and ((list(map( add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and ( (list(map(add,player.position[-1],[0,20])) in player.position) or player.position[-1][ -1] + 20 >= (game.game_height-20))), # danger right (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],[20,0])) in player.position) or player.position[-1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == -20 and ((list(map( add, player.position[-1],[-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and ( (list(map(add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or ( player.x_change == -20 and player.y_change == 0 and ((list(map(add,player.position[-1],[0,20])) in player.position) or player.position[-1][-1] + 20 >= (game.game_height-20))), #danger left player.x_change == -20, # move left player.x_change == 20, # move right player.y_change == -20, # move up player.y_change == 20, # move down food.x_food < player.x, # food left food.x_food > player.x, # food right food.y_food < player.y, # food up food.y_food > player.y # food down ] Multiply-nested container State space: 11 depth: 5
  • 24. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines)
  • 25. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines) From top 4 most common code smells 3 are shared across the 2 data sets (Multiply-Nested Container, Long Method, Long Parameter List)
  • 26. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines) From top 4 most common code smells 3 are shared across the 2 data sets (Multiply-Nested Container, Long Method, Long Parameter List) State representations are inherently complex
  • 27. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines) From top 4 most common code smells 3 are shared across the 2 data sets (Multiply-Nested Container, Long Method, Long Parameter List) State representations are inherently complex Functionality is presented as a code block
  • 28. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines) From top 4 most common code smells 3 are shared across the 2 data sets (Multiply-Nested Container, Long Method, Long Parameter List) State representations are inherently complex Functionality is presented as a code block RL algorithms are riddle with learning parameters
  • 29. Conclusion 9 RL projects contain many code smells - 3.15 average per file - can be up to 8 per file (or 1 code smell every 27 lines) From top 4 most common code smells 3 are shared across the 2 data sets (Multiply-Nested Container, Long Method, Long Parameter List) State representations are inherently complex Functionality is presented as a code block RL algorithms are riddle with learning parameters Code smells point to a violation of the design principles (coupling, cohesion, single responsibility)
  • 30. Future perspectives 10 specific metrics and code smells for RL
  • 31. Future perspectives 10 specific metrics and code smells for RL we need specific metrics, thresholds, and tools to capture the complexity of RL algorithms
  • 32. Future perspectives 10 specific metrics and code smells for RL we need specific metrics, thresholds, and tools to capture the complexity of RL algorithms The complexity of RL can be managed with the creation of dedicated data structures or express relations between entities more ergonomically