Jean-Marc, Robin and Lucas have this nice technique to aggregate data and depict such information using for example treemaps. Lucas was visiting us this week and some of the reviewers of one of their article asked for large non-synthetic traces. G5K has a natural hierarchy so we decided to try extract interesting information. Fortunately, Elodie, Pierre, and Bruno who are in the office next door and administrate G5K and Ciment could help and point us quickly to the right places. :)

Useful links¶

Obtain useful information via REST¶

sudo gem install restfully
restfully -u alegrand -p 'myg5kpassword' --uri https://api.grid5000.fr/stable/grid5000

Here is a first attempt to obtain the machine load. The good thing with interactive ruby and object inspection is you can simply tab to obtain the list of methods and explore the object:

``` r linenums="1"uby

I could browse like this:¶

root.sites[:rennes].clusters[:parapide].nodes[:'parapide-1'] # this allows to iterate per site/cluster/node but provides mainly static informations

To access ganglia information, you have to look at the metrics field:¶

pp root.sites[:rennes].metrics[:cpu_idle].timeseries[:'paradent-5']

It is even possible to filter a bit¶

pp root.sites[:rennes].metrics[:cpu_idle].timeseries.load(:query => {:resolution => 15, :from => Time.now.to_i-3600*1})[:'paradent-5']

So for a particular node, the last measured value is:¶

root.sites[:rennes].metrics[:cpu_idle].timeseries[:'paradent-5'].properties['values'][0]

So let's iterate over all machines¶

file = File.open("/tmp/rest.txt", 'w') root.sites.each do |site| site.metrics[:cpu_idle].timeseries.each do |node| file.write node.properties['hostname'] + ", " + node.properties['values'][0].to_s + "\n" end end file.close

Unfortunately, this provides a rather poor information as only running
nodes that are not deployed may return this information:

``` bash
tail /tmp/rest.txt

pastel-62.toulouse.grid5000.fr, 
pastel-83.toulouse.grid5000.fr, 
pastel-63.toulouse.grid5000.fr, 
pastel-55.toulouse.grid5000.fr, 
pastel-56.toulouse.grid5000.fr, 99.9
pastel-140.toulouse.grid5000.fr, 
pastel-76.toulouse.grid5000.fr, 100.0
pastel-57.toulouse.grid5000.fr, 100.0
pastel-21.toulouse.grid5000.fr, 99.875
pastel-77.toulouse.grid5000.fr,

grep ', [0-9]' /tmp/res.txt | wc -l
cat /tmp/res.txt | wc -l

622
1426

So instead, let's try to capture the state of the machines:

``` r linenums="1"uby file = File.open("/tmp/state_rest.txt", 'w') root.sites.each do |site| site.status.each do |node| file.write site.properties['uid'] + "/" + node.properties['node_uid'] + ", " + node.properties['system_state'] + "\n" end end file.close

``` bash
tail /tmp/state_rest.txt

toulouse/pastel-6, unknown
toulouse/pastel-93, free
toulouse/pastel-37, unknown
toulouse/pastel-65, unknown
toulouse/pastel-7, unknown
toulouse/pastel-94, unknown
toulouse/pastel-38, free
toulouse/pastel-114, free
toulouse/pastel-66, free
toulouse/pastel-8, besteffort

sed 's/.*, //' /tmp/state_rest.txt | sort | uniq

for i in `sed 's/.*, //' /tmp/state_rest.txt | sort | uniq` ; do echo "$i :" `grep $i /tmp/state_rest.txt | wc -l` ; done

besteffort : 150
busy : 232
free : 582
unknown : 205

So we could set up an observation for a month but this would be long. Instead we decided we should rather try to get all this information from the OAR database.

Useful information from OAR mysql¶

Dumping the database¶

I wanted to first get a local version to browse it more comfortably.

ssh access.grenoble.grid5000.fr "mysqldump --lock-tables=false --quick -uoarreader -pread -h mysql.grenoble.grid5000.fr oar2" > oar2.sql
cat oar2.sql | mysql -u root -p$PASSWORD -h localhost oar2-grenoble

Obviously, when we will work on getting such information for all sites, we should dump to csv remotely to save space.

Looking at the tables, here is what I found that may be of interest for us:

resources:

resource_id
network_address
cpu
cpuset
jobs
job_id
start_time
stop_time
assigned_resources
moldable_jobid
resource_id
resource_logs
resource_id
date_start
attribute
value
job_types
job_id
type

OK, so let's write a tiny script to extract the right information:

echo $FIELDS > /tmp/$TABLE.csv
echo "SELECT $FIELDS FROM $TABLE INTO OUTFILE \"/tmp/foo.csv\" FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY \"\\n\"" | mysql -u root -p$PASSWORD -h localhost $DATABASE
cat /tmp/foo.csv >> /tmp/$TABLE.csv

And let's call it for the different tables I am interested in (see in the org source at the bottom of the page how I reuse the previous block to do the job)…

Having fun with R¶

OK, now, I can read these information and exploit them in R.

jobs <- read.csv("/tmp/jobs.csv")
jobs <- jobs[jobs$start_time>3000,]             #cleanups
jobs <- jobs[jobs$stop_time>3000,]              #cleanups
jobs <- jobs[jobs$stop_time>jobs$start_time,]   #cleanups
assigned_resources <- read.csv("/tmp/assigned_resources.csv")
job_types <- read.csv("/tmp/job_types.csv")
resources <- read.csv("/tmp/resources.csv")
resource_logs <- read.csv("/tmp/resource_logs.csv")
names(resource_logs)=c("resource_id", "start_time", "attribute", "state")
resource_logs <- resource_logs[resource_logs$attribute=="state",]
resource_logs <- resource_logs[!(names(resource_logs) %in% c("attribute"))]

Let's select at random a week time interval.

start <- sample(jobs$start_time, 1)
end <- start+7*24*3600
job_resources <- merge(jobs[jobs$stop_time<=end & jobs$start_time>=start,],assigned_resources,by.x="job_id",by.y="moldable_job_id")
job_resources <- merge(job_resources,job_types,by.x="job_id",by.y="job_id")
job_resources <- merge(job_resources,resources,by.x="resource_id",by.y="resource_id")

# Mmmh, I need to get resource_state into a similar format so that dataframes can be merged
resource_states <- resource_logs[resource_logs$start_time <=end & resource_logs$start_time >= start, ]
resource_states <- resource_states[with(resource_states, order(resource_id,start_time)),]
block <- function(proc) {
  end_v <- c(tail(proc$start_time,length(proc$start_time)-1),end)
  cbind(proc,stop_time=end_v)
}
compute_durations <- function(df) {
  d <- data.frame()
  for(rank in unique(df$resource_id)) {
    d=rbind(d,block(df[df$resource_id==rank,]))
  }
  d
}
resource_states <- compute_durations(resource_states)
resource_states <- resource_states[resource_states$state != "Alive",]
resource_states <- merge(resource_states,resources,by.x="resource_id",by.y="resource_id")
names(resource_states)[names(resource_states)%in% c("state")] <- "type"
df <- rbind.fill(resource_states,job_resources)

And voilà, I can plot the Gantt chart now.

library(ggplot2)
ggplot(df)+
    theme_bw()+geom_rect(aes(xmin=start_time,xmax=stop_time, ymin=resource_id, ymax=resource_id+1,fill=factor(type)))
# + scale_y_continuous(limits=c(min(as.numeric(df_native$ResourceId)),max(as.numeric(df_native$ResourceId))+1))

So now Lucas can convert such thing into a simple Paje trace and use triva to see whether he can turn this into interesting visualizations.

Entered on [2013-07-10 mer. 17:13]