DM-Proba-PANINI-GUOKAI

Définition des variables: T:le nombre total de vignettes achetees pour avoir toute la collection. M: = 300(places distinctes pour les vignettes autocollantes).

Q1).Ecrire un simulateur en R générant T en fonction de M.

set.seed(53)
PANINI<-function(T=1,M=300){
  
  #====Definition des variables===========
  album<-array(0,c(1,M)) # album des vignettes en fonction de M
  count=0                # compteur de fois de réussite
  T=0                    # compteur de consommations
  vignette=0             # indice de vignette actuelle achetée
  #========================================
  while(count<M){
    T=T+1
    vignette=ceiling(runif(1,0,M))
    if(album[vignette]==0){
      album[vignette]=1
      count=count+1
    }
    
  }
  return(T)
}
PANINI()

## [1] 2146

Q2) Etudier expérimentalement la valeur moyenne de T en fonction de M. On prendra des échantillons de l’ordre de 100 ou 1000 et une dizaine de valeurs M entre 10 et 200. Une représentation graphique sera appréciée.Que remarquez vous ?

set.seed(53)
# library(ggplot2)
library(MASS)
library(calibrate)
SAMPLE<-function(times=100){
  test<-ceiling(runif(13,9,200))
  TArray<-c()
  display<-c()
  for(j in 1:13){
    t=0
    somme=0
     while(t<times){
      somme=somme+PANINI(1,test[j])
      t=t+1
       }
  TArray[j]=ceiling(somme/times)
      }
  
 
plot(test,TArray,xlab="M",ylab="T",ylim=c(0,max(TArray)),xlim=c(0,max(test)),main = paste(paste("Average T of 13 samples for each ",times),"times"),col="black",pch=21,panel.first = grid(10, 10),bg="yellow",lty = "dashed",frame.plot=TRUE)

textxy(test,TArray,paste(paste("(",paste(test,paste(",",TArray))),")"),cex = 0.6,offset = -0.8)
abline(lm(TArray ~test), lwd = 2, col="green")
points(test,100*test/TArray,pch=21,bg="lightblue",col="black")
abline(lm(100*test/TArray ~test), lwd = 2, col="brown")
text(100,sum(100*test/TArray)/13+40,paste("Average ratio:",round(sum(test/TArray)/13,digits=4)),cex = 0.8,col="red")
legend("topleft", c("Developping trend of T", "Developping trend of P(Xi=i)"), col="black" , pt.bg = c("green", "brown"), pch = 21,bg="lightgoldenrodyellow",trace = TRUE,xjust=1)
# textxy(test,round(100*test/TArray,digits=0),TArray/test,cex = 0.8,offset = 0)

}
# par(mfrow = c(1,2),mar=c(1,1,2,2),mai=c(1,0.8,0.8,0.4),cex=0.8)
SAMPLE(50)

##   xchar= 4.978 ; (yextra,ychar)= 0 66.24 
##   rect2(-7.08,1008, w=81.46, h=198.7, ...)
##   points2( -2.102 -2.102 , 941.5 875.3 , pch= 21 21 , ...)

SAMPLE(100)

##   xchar= 5.597 ; (yextra,ychar)= 0 78.61 
##   rect2(-7.96,1196, w=91.59, h=235.8, ...)
##   points2( -2.363 -2.363 , 1117 1039 , pch= 21 21 , ...)

Evidemment, on remarque que la probabilité de tirer une vignette différente chaque fois que les précédentes est toujours inférieur que 1.

=================================================================================

Q3) Quelles sont les hypothèses statistiques sur la suite des variables Xn que l’énoncé suggère ?

Réponse: en remarquant les résultat obtenu des graphes, on en déduit que Xn est distribué complètement aléatoirement malgré la suit précédente entre 1~M.

set.seed(53)
M=300
album<-array(0,c(1,M))
reco<-c()
count=0
T=0
while(count < M){
  
  T=T+1
  reco[T]=ceiling(runif(1,0,M))
  
  if(album[reco[T]] == 0){
    album[reco[T]]=1
    count=count+1
  } 
}

plot(reco,type="p",col="brown",main="Distrubition of bought vignette", xlab="Times of buying the vignettes",ylab="Vignettes")

par(mfrow = c(1,1),mar=c(1,1,1,1),bg="yellow")

======================================================================================= Q4) Lorsque la collection comporte déjà i-1 vignettes, on note Yi le nombre de vignettes à acheter pour avoir une vignette qui n’est pas dans la collection. Donner, en la justifiant, la loi de Yi (vous pourrez calculer la probabilité P(Yi = k) et vous en déduirez l’espérance et la variance de Yi). Les variables Yi sont-elles indépendantes ? En déduire E(T) et V ar(T) ainsi que des équivalents quant M est grand.

Analyse: En obtenant (i-1) vignettes distinctes, pour avoir la prochaine vignettes distincte, on acheterai k fois d’une vignette. C-est-à-dire qu’on a acheté (k-1) vignettes des vignettes doublées dans (i-1) vignettes qu’on a déjà donc cette probablilité est ((i-1)/M)^(k-1) et la dernière fois on en a acheté une nouvelle distincte dont la probabilité est (M-(i-1))/M; donc au total, P(Yi=k)=((i-1)/M)^(k-1)*(M-(i-1))/M.

Réponse: On remarque que la valeur de Yi se peut varier de 1 à infini car chaque consommation de pochette est une tirage aléatoire de vignette dans M vignettes dont la probabilité est 1/M. Donc l’éxpérance de Yi est: E(Yi)= ∑(j=1,n)(Yi* P(Yi=j)) =∑(j=1,n)(j* ((i-1)/M)^(k-1) * (M-(i-1))/M) =M/(M-i+1). La variance de Yi est: Var(Yi)=(E(Yi))^2 - E((Yi)^2) =∑(j=1,n)(j(((i-1)/M)^(k-1)) (M-(i-1))/M) + ∑(j=1,n)(j^2 * (((i-1)/M)^(k-1)) * (M - (i-1))/M) =(i-1)(3i-M)/(M-i+1)^2 Donc E(T)=∑(i=1,M)(M/(M-(i-1))) =M∑(i=1,M)(1/(M-(i-1))) =M∑(k=0,M-1)(1/(M-k) =M(1/M + 1/(M-1) + 1/(M-2) + …+ 1/2+1) Etant donné la valeur M=300 E(T)=300*(1/300+1/299+….+1/2+1) =1885 Comme l’évaluation nous a montré, Yi dépend de la valeur de i obtenu précédant mais pas la valeur de Y(i-1), donc Yi est indépendante en fonction de i.

============================================================================================ Synthèse ============================================================================================

Q5) Vérifier graphiquement que les résultats obtenus par simulation sont cohérents avec vos calculs.

set.seed(53)
M=300
album<-array(0,c(1,M)) # album des vignettes en fonction de M
reco<-c() #toutes les entrées générées pour avoir toutes les vignettes
Y<-c() #variable de nombre de fois de consommation des pochettes pour avoir la prochaine vignette distinc

#Initialization 
#=========================================================================
count=0 # compteur de fois de réussite
T=0 # compteur de consommations
k=0 # nombre de fois de consommation des pochettes pour avoir la prochaine vignette distinct
#=========================================================================

while(count<M){
  T=T+1
  k=k+1
  reco[T]=ceiling(runif(1,0,M))
  if(album[reco[T]]==0){
    album[reco[T]]=1
    count=count+1
    Y[count]=k
    k=0 #une fois réussir d'avoir la prochaine vignette distincte, remettre à zéro
  }
}
cord<-c(1:300)
plot(cord,Y,type="p",pch=21,col="green",main="The seqnence of Yi when M=300")
lines(cord,Y+20,type="l",col="brown",bg="gray")
legend("topleft", c("Yi","Developping Trend" ), col="lightgray" , pt.bg = c("green", "brown"), pch = 21,bg="lightgray",trace = TRUE,xjust=1)

##   xchar= 8.409 ; (yextra,ychar)= 0 23.86 
##   rect2(-10.96,364, w=98.32, h=71.57, ...)
##   points2( -2.551 -2.551 , 340.1 316.2 , pch= 21 21 , ...)

text(120,sum(Y)/300+40,"Vignette to buy for getting the next distinct one",cex = 0.8,col="brown")
text(150,200,paste("AVE Yi: ",ceiling(sum(Y/cord))),cex = 1.5,col="red")

# par(bg="gray")

Q6) Au fait, combien votre petit cousin devra-t-il dépenser en moyenne sachant que chaque paquet de 2 euros contient 10 vignettes et que l’album comporte 300 places ?

set.seed(53)
library(MASS)
library(calibrate)
M=300
album<-array(0,c(1,M))
Xn<-c() #array of Xn 
count=0
i=0 # indice of the sequence Xn
k=0 # counter for X[i]
while(count < M){ 
  i=i+1
  X<-c()
  X=ceiling(runif(10,0,M))
  
  for(j in 1:10){
    if(album[X[j]]==0 && count< M){ # if hitting, marking the hit page and adding the counter k 
    album[X[j]]=1
    count=count+1
    k=k+1
  }
  }
  Xn[i]=k # recording the hitting number of the current sequence X[count]
  k=0 # clearing the counter for the preparation of the X[count+1]
  }
  maxxn=max(Xn)
  minxn=min(Xn)
  pp=0
  qq=0
  coMax<-c()
  coMin<-c()
  for(ss in 1:i){  
    if(Xn[ss] == maxxn){
      pp=pp+1
      coMax[pp]=ss
    }else if(Xn[ss] == minxn){
      qq=qq+1
      coMin[qq]=ss
    }
  }

plot(Xn,col="red",type="p",xlab="Times of buying the pochettes",main="Buying behaviour for having the album fully collected",bg="gray")
axis(2,at=Xn)
lines(c(1:i),Xn,type="l",col="brown")
text(120,6.5,paste("Hitting probability on average:",round(sum(Xn)/(10*i),digits=4)),cex = 0.9,col="brown")
text(120,9.5,paste(paste("Mon petit cousin devra dépenser en moyen:", 2*i),"euros"),cex = 0.9,col="red"
text(120,8.5,paste(paste(paste(pp, "times hitting "),maxxn), "in 10"),cex = 0.9,col="blue")
text(120,7.5,paste(paste(paste(qq, "times hitting least:"),minxn),"in 10"),cex = 0.9,col="blue")
textxy(which.max(Xn),maxxn,paste(paste("(",paste(which.max(Xn),paste(",",max(Xn)))),")"),cex = 0.6,offset = -0.4)
textxy(which.min(Xn),minxn,paste(paste("(",paste(which.min(Xn),paste(",",min(Xn)))),")"),cex = 0.6,offset = -0.4)

# par(mfrow = c(1,1),bg="gray")
print(I)

## function (x) 
## {
##     structure(x, class = unique(c("AsIs", oldClass(x))))
## }
## <bytecode: 0x000000000767f6c8>
## <environment: namespace:base>

========================================================================================

Q7)

Nous avons bien fini le corps de générateur pour chaque cas: “pi” correspond le premier cas et “pi2” le suivant, et nous avons essayé de générer une suite de chiffres aléatoirement et ça bien marche, mais il reste quelques bogues dedans que nous n’avons pas toutes résolu en raison du manque du temps(vu qu’on a déjà dépensé trop de temps pour toutes les questions et l’apprentissage de R. Par contre, la partie ci-dessous est généralement raisonnable pour votre questions.

set.seed(53)
library(MASS)
library(calibrate)
M=300
#=================Returning a certain proba according to the para "type"======
P<-c()
initP<-function(type = "pi"){
  somme=0
  
  if(type=="pi"){
    for(i in 1:M){
      for(cc in i:M){
      somme=somme+1/cc
    }
    Pp[i]=1/(somme*i)
    }
    
  }else if(type=="pi2"){
    for(i in 1:M){
      for(dd in i:M){
      somme=somme+1/(dd*dd)
    }
    P[i]=1/(somme*i*i)
    }
     
  }else{
    P<-array(i/M,c(1,M)) #by defaut pi
  }
}

#==============generate a rand of "type"===============================
Pi<-function(type="pi"){
  rand=1
  get = FALSE
  boundary_lower=0 # the lower boundary of the current interval
  boundary_upper=0 # the upper boundary of the current interval
  j=1 # assuring the obtained random ranging from 1 to M
  #==========Generating the a random within 1 to M============================
  while(!get){ #if not obtain, generate one
    pf=runif(1) #acquiring a random
    boundary_upper=P[j] # asigning the current upper boundary
    if((pf>boundary_lower) && (pf<boundary_upper)){ # testing if the proba is within the boundaries
      rand=j
      get=TRUE
    }
    boundary_lower=boundary_upper # assigning the next lower boundary
    j=j+1
    if(j>300){
      j=1
      boundary_lower=0
    }
  }
return(rand)
}

#==============Generating a specified number of rand according to the passing "type"=============== 
geneX<-function(type="pi",num=1){
  uni<-c()
  if(type=="Pi"||type=="pi"){
    for(i in 1:num){
      uni[i]=Pi("pi")
    }
  }else if(type=="Pi2" || type == "pi2" ){
    for(j in 1:num){
      uni[j]=Pi("pi2")
    }
  }
  return(uni)
}

PANINI_Extended<-function(L=1,type="pi"){
  #==========Initialize the prob series===========
  initP(type)
  album<-array(0,c(1,M))
  X<-c()
  X=geneX(type,L)
  Xn<-c() #array of Xn 
  count=0
  i=0 # indice of the sequence Xn
  k=0 # counter for X[i]

  #==========initialisation album=================
  for(l in 1:L){
    album[X[l]]=1
  }
  #===========================================
  while(count < M){ 
  
    i=i+1
    X=geneX(type,L)
  
    for(j in 1:L){
      if((album[X[j]]==0) && (count< M)){ # if hitting, marking the hit page and adding the counter k 
      album[X[j]]=1
      count=count+1
      k=k+1
    }
  }
  Xn[i]=k/L # recording the hitting number of the current sequence X[count]
  k=0 # clearing the counter for the preparation of the X[count+1]
  }
  return(Xn)
}

sXn<-c()
sn1<-c()
sn2<-c()
# sn1=PANINI_Extended(1,"pi")
# sn2=PANINI_Extended(1,"pi2")
# for(indice in 1:2){ 
#   if(indice==1){
#     sXn=sn1
#   }else if(indice==2){
#     sXn=sn2
#   }
#   maxxn=max(sXn)
#   minxn=min(sXn)
#   pp=0
#   qq=0
#   coMax<-c()
#   coMin<-c()
#   tt="pi"
#   for(ss in 1:i){  
#     if(Xn[ss] == maxxn){
#       tt="pi"
#       pp=pp+1
#       coMax[pp]=ss
#     }else if(Xn[ss] == minxn){
#       tt="pi2"
#       qq=qq+1
#       coMin[qq]=ss
#     }
#   }
# #plot(sXn)
#  plot(sXn,col="red",,type="p",xlab="Times of buying the pochettes",main=paste("Panini(Extended Version): ",tt))
#  lines(c(1:300),sXn,type="l",col="brown")
#  text(120,6.5,paste("Hitting probability on average:",round(sum(sXn)/(10*i),digits=4)),cex = 0.9,col="brown")
#  text(120,9.5,paste(paste("Mon petit cousin devra dépenser en moyen:", 2*i),"euros"),cex = 0.9,col="red")
#  text(120,8.5,paste(paste(paste(pp, "times hitting "),maxxn), "in 10"),cex = 0.9,col="blue")
#  text(120,7.5,paste(paste(paste(qq, "times hitting least:"),minxn),"in 10"),cex = 0.9,col="blue")
#  textxy(which.max(sXn),maxxn,paste(paste("(",paste(which.max(sXn),paste(",",maxxn))),")"),cex = 0.6,offset = -0.4)
#  textxy(which.min(Xn),minxn,paste(paste("(",paste(which.min(Xn),paste(",",minxn))),")"),cex = 0.6,offset = -0.4)
# }
# 
# par(mfrow = c(1,2),bg="gray")

======================================================================================== Q8)