Skip to content

Trace doesn't work with python 3.13 #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ryan-Rong-24 opened this issue Feb 25, 2025 · 3 comments
Open

Trace doesn't work with python 3.13 #43

Ryan-Rong-24 opened this issue Feb 25, 2025 · 3 comments

Comments

@Ryan-Rong-24
Copy link

Trace seems to have some error with the bundle decorator when running on python 3.13. Downgrading to 3.12 does not have this error.

Traceback (most recent call last):
File "/Users/ryanrong/Documents/GitHub/cs224n_llm_agent/pong_LLM_agent.py", line 269, in
rewards = optimize_policy(
env_name="ALE/Pong-v5",
...<4 lines>...
model="gpt-4o-mini"
)
File "/Users/ryanrong/Documents/GitHub/cs224n_llm_agent/pong_LLM_agent.py", line 169, in optimize_policy
@trace.bundle(trainable=True)
~~~~^^^^^^^^^^^^^^^^
File "/Users/ryanrong/.pyenv/versions/3.13.1/lib/python3.13/site-packages/opto/trace/bundle.py", line 57, in decorator
fun_module = FunModule(
fun=fun,
...<7 lines>...
_ldict=prev_f_locals, # Get the locals of the calling function
)
File "/Users/ryanrong/.pyenv/versions/3.13.1/lib/python3.13/site-packages/opto/trace/bundle.py", line 114, in init
assert _ldict is None or isinstance(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ldict, dict
^^^^^^^^^^^^
), "_ldict must be a dictionary. or None"
^
AssertionError: _ldict must be a dictionary. or None

@chinganc
Copy link
Collaborator

chinganc commented Feb 26, 2025

@Ryan-Rong-24 can you paste an example code snippet, so I can reproduce the error? Thanks.

Can you also let me know what you get as _ldict when that error happened?

@chinganc
Copy link
Collaborator

chinganc commented Feb 26, 2025

It seems the main cause is that starting from Python 3.13 (see here) f_locals is no longer a dict.

The f_locals attribute on a frame object is an instance of a “frame-locals proxy”. The proxy object exposes a write-through view of the underlying locals dictionary for the frame. This ensures that the variables exposed by f_locals are always up to date with the live local variables in the frame itself.

Our previous assumes it to be a dict. (f_locals is used to resolve the nonlocal namespace for bundle).

Currently the quickest solution is to use Python <3.13. We will look into that more for a patch.

@Ryan-Rong-24
Copy link
Author

This is our code for using Trace to play Pong. Also I'm not sure if I can show what _ldict is without diving into the code for Trace... but hopefully this helps! Meanwhile I'll just develop on python 3.12

def optimize_policy(
    env_name="ALE/Pong-v5",
    horizon=2000,
    memory_size=5,
    n_optimization_steps=10,
    verbose=False,
    model="gpt-4o-mini"
):
    @trace.bundle(trainable=True)
    def policy(obs):
        '''
        A policy that moves the paddle towards the ball to deflect the ball.
        If the paddle is below the ball, move up; otherwise, move down.
        Make prediction on the ball's moving direction and velocity to adjust the paddle action.

        Args:
            obs (dict): A dictionary with keys "ball_pos" and "paddle_pos" and values the corresponding [x, y, w, h], coordinates, width and height of the ball and agent paddle in the game screen of (210, 160).
        Output:
            action (int): The action to take among 0 (NOOP), 1 (FIRE), 2 (DOWN), 3 (UP).
        '''
        ball_pos = obs["ball_pos"]
        paddle_pos = obs["paddle_pos"]

        action = 0 # NOOP
        if ball_pos and paddle_pos:
            ball_y = ball_pos[1]
            paddle_y = paddle_pos[1]
            
            if paddle_y + 10 < ball_y:  # Paddle is below the ball, move up
                action = 3  
            elif paddle_y > ball_y + 10:  # Paddle is above the ball, move down
                action = 2
        return action
    
    # Get the config file path from environment variable
    config_path = os.getenv("OAI_CONFIG_LIST")
    config_list = config_list_from_json(config_path)
    config_list = [config for config in config_list if config["model"] == model]
    optimizer = OptoPrime(policy.parameters(), config_list=config_list, memory_size=memory_size)
    
    env = PongTracedEnv(env_name=env_name)
    try:
        rewards = []
        logger.info("Optimization Starts")
        for i in range(n_optimization_steps):
            env.init()
            traj, error = rollout(env, horizon, policy)

            if error is None:
                feedback = f"Episode ends after {traj['steps']} steps with total score: {sum(traj['rewards']):.1f}"
                if sum(traj['rewards']) > 0:
                    feedback += "\nGood job! You're scoring points against the opponent."
                elif sum(traj['rewards']) <= 0:
                    feedback += "\nTry to improve paddle positioning to prevent opponent scoring."
                target = traj['observations'][-1]
                
                rewards.append(sum(traj['rewards']))
            else:
                feedback = error.exception_node.create_feedback()
                target = error.exception_node
            
            logger.info(f"Iteration: {i}, Feedback: {feedback}, target: {target}, Parameter: {policy.parameters()}")

            instruction = "In Pong, you control the right paddle and compete against the computer on the left. "
            instruction += "The goal is to keep deflecting the ball away from your goal and into your opponent's goal to maximize your score and win the game. "
            instruction += "You score one point when the opponent misses the ball or hits it out of bounds. "
            instruction += "The policy should move the right paddle up or down or NOOP to hit the ball. "
            
            optimizer.objective = instruction + optimizer.default_objective
            
            optimizer.zero_feedback()
            optimizer.backward(target, feedback, visualize=True)
            logger.info(optimizer.problem_instance(optimizer.summarize()))
            
            stdout_buffer = io.StringIO()
            with contextlib.redirect_stdout(stdout_buffer):
                optimizer.step(verbose=verbose)
                llm_output = stdout_buffer.getvalue()
                if llm_output:
                    logger.info(f"LLM response:\n {llm_output}")
            
            logger.info(f"Iteration: {i}, Feedback: {feedback}, Parameter: {policy.parameters()}")
    finally:
        if env is not None:
            env.close()
    
    logger.info(f"Final Average Reward: {sum(rewards) / len(rewards)}")
    return rewards

if __name__ == "__main__":
    # Set up logging
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.INFO)
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
    logger.addHandler(console_handler)

    # Set up file logging
    log_dir = Path("logs")
    log_dir.mkdir(exist_ok=True)
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    log_file = log_dir / f"pong_ai_{timestamp}.log"
    
    file_handler = logging.FileHandler(log_file)
    file_handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
    logger.addHandler(file_handler)
    
    logger.info("Starting Pong AI training...")
    rewards = optimize_policy(
        env_name="ALE/Pong-v5",
        horizon=800,
        n_optimization_steps=5,
        memory_size=5,
        verbose='output',
        model="gpt-4o-mini"
    )
    logger.info("Training completed.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants