What about Large Language models?

Laboratory of Statistics and Mathematics 2025/2026

Giuseppe Alfonzetti

Data Science and LLMs

Do not blindly trust LLMs!

I know you might be tempted…

I know you might be tempted…

I know you might be tempted…

but…

mobile_app_data
# A tibble: 142 × 5
   ad_cost generated_revenue ad_type country game_type
     <dbl>             <dbl> <chr>   <chr>   <chr>    
 1    55.4              97.2 banner  France  Farming  
 2    51.5              96.0 banner  Spain   Farming  
 3    46.2              94.5 banner  Spain   Warzone  
 4    42.8              91.4 banner  Italy   Warzone  
 5    40.8              88.3 banner  Germany Racing   
 6    38.7              84.9 banner  France  Racing   
 7    35.6              79.9 banner  France  Racing   
 8    33.1              77.6 banner  Italy   Warzone  
 9    29.0              74.5 banner  Germany Warzone  
10    26.2              71.4 banner  Italy   Srategy  
# ℹ 132 more rows

but…

Note

Hi ChatGPT/Claude/Gemini! I need to analyze the attached datasets which collects information about the results of some ad campaigns run within different mobile apps. You need to answer to:

  • Which ad type generated more revenue?
  • Is there any preference across countries in terms of money spent and game type?
  • Which game type costed more in ad investment?
  • Use the t-test to answer previous questions when needed.

Furthermore, show me the R code to reproduce the analysis.

Keep the driver’s seat!

mobile_app_data |> 
  ggplot(aes(x=ad_cost, y=generated_revenue))+
  geom_point(size=5)+
  theme_minimal()