|
|
|
|
|
|
|
|
|
|
xen-devel
[Xen-devel] Memory Trace Project
Hi guys, I hope to get your valuable
inputs to this pet project of mine, please do feel free to mention your
ideas, suggestions and recommendations for the same. I've collected a huge
number of memory traces almost 10 GB of data. These memory traces were
gathered from a set of servers, desktops, and laptops in a university CS
Department. Each trace file contains a list of hashes representing the
contents of the machine's memory, as well as some meta information about
the running processes and OS type. The traces have been
grouped by type and date. Traces were recorded approximately every 30
minutes, although if machines were turned off or away from an internet
connection for a long period, no traces were acquired. Each trace file
is split into two portions. The top segment is ASCII text containing the
system meta data about operating system type and a list of running
processes. This is followed by binary data containing the list of hashes
generated for each page in the system. Hashes are stored as consecutive
32bit values. There is a simple tool called "traceReader" for
extracting the hashes from a trace file. This takes as an argument the
file to be parsed, and will output the hash list as a series of integer
values. If you would like to compare to traces to estimate the amount of
sharing between them, you could run: ./traceReader trace-x.dat > trace-all ./traceReader trace-y.dat >> trace-all cat trace-all | sort | uniq -c This will tell you the number of times that each hash occurs in the system.
Now my idea is to take the trace for every interval (every 30 mins) for
each of the systems and find the frequency of each memory hash. I then
plan to collect the highest frequencies (hashes maximally occurring) of
the entire hour (60 mins) and then divide the memory into 'k' different
patterns based on the count of these frequencies. Like for instance say
hashes 14F430C8 ,1550068, 15AD480A, 161384B6, 16985213, 17CA274B,
18E5F038 and 1A3329 have the highest frequencies then I might divide the
memory into 8 patterns (k=8). I plan to use the Approximate Nearest
neighbor algorithm (ANN) http://www.cs.umd.edu/~mount/ANN/
for this division. In ANN one needs to provide a set of query points,
data points and dimensions. I guess in my case my query points can be
all the remaining hashes other than the highest frequency ones, the data
points are all the hashes for the hour and dimension can be 1. I can
thus formulate the memory patterns for every hour, I then plan to
formulate memory patterns for every 3 hrs, 6 hrs, 12 hrs and finally all
the 24 hrs. Armed with these statistics, I plan to compare the patterns
based on the time of the day. I hope to provide certain overlap with
the patterns and create what I call as "heat zones" for memory based on
the time of the day and finally come up with a suitable report for the
same. The entire objective of this project is to provide a
sort of relation between the memory page access and the interval of time
of the day. So for specific intervals there are certain memory "heat
zones". I understand that these "heat zones" might change and may not be
consistent with every system and user. The study here intends to only
establish this relationship and doesn't do any kind of qualitative or
quantitative analysis of these heat zones per system and user. The above
can be considered to be an extension of this work. Please feel free to comment and suggest for any new insights
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|