Browsed by
Category: computer

A short script for testing writing many files to a folder

A short script for testing writing many files to a folder

The challenge: We want to see when the number of files in a folder decrease the performance on adding new files into the same folder. Two examples where we may need to do to this are: to get an overview of the performance of the file system node structure, or to test Windows function for 8dot3 format compatibility.

The solution: We want to create a script that writes a large amount of files to the folder in question and is logging the time taken at specific milestones. The records logged from the execution of this script can give us a time on how long it takes to write the number of files until the milestones are reached, and from this we can infer how efficient the file system is at writing files between the different milestones.

Example of output
A graph representing the number of files created over time. The X axis convey the number of seconds elapsed, and the Y axis the number of files created. How does your function look like?

The implementation: I’ve chose to set the creation of new files in a for loop which runs N times based on user input. The loop will start, open a new file with an incremental file name, write the payload to the file, and finally close the file and increment the loop counter.

Wrapped around this core functionality, we need to define into which folder the files will be created, what data is to be read and written. We need to read the defined data into a variable (we don’t want to attach too much overhead by reading the data-to-write for each iteration), create a test-folder if this is not already excising. In addition we need a function to write the timestamp, and the iteration number to a file.

To open for multiprocessor testing I’ve also add a loop for spawning new processes and passing on the data about the number of files, and to test for more scenarios e.g. renaming and deleting files, more actions have been added.

The actions, the test folder path, the input file and the number of files and processors are something which the user most likely will change frequently, so instead of keeping this hard coded in the code this is branched out to be provided by the user as command line arguments. As always when dealing with command line arguments: provide good defaults, the user is often likely not to use all the parameters editable.

From description to code this will look something like this:

import time
import os
import string
import random
from multiprocessing import Process
import multiprocessing
import optparse
import os.path

def main(files_each=100, processes=10, actions="a", log_interval=100, temp_path="temp_files", infile="infile.txt"):
  path = temp_path
  check_and_create_folder_path(path)
  for i in range(processes):
    p = Process(target=spawnTask, args=(path, files_each, actions, log_interval, infile))
    p.start()

def print_time_delta(start_time, comment, outfile=False):
  if not outfile:
    print(comment," | ",time.time() - start_time, " seconds")
  else:
    with open(outfile, 'a+') as out:
      out.write("{0} | {1} \n".format(time.time() - start_time, comment))

def spawnTask(path,files_each, actions,log_interval, infile):
  start_time = time.time()
  content = read_file_data(infile)

  print_time_delta(start_time,"creating files for process: "+str(os.getpid()))
  created_files = createfiles(files_each, content,path,start_time, log_interval)
  if(actions == 'a' or actions == 'cr'):
    print_time_delta(start_time,"renaming files for process: " +str(os.getpid()))
    renamed_files = rename_files(created_files,path,start_time, log_interval)
  if(actions == 'a'):
    print_time_delta(start_time,"deleting files for process: "+str(os.getpid()))
    delete_files(renamed_files,path,start_time, log_interval)

  print_time_delta(start_time,"operations have ended. Terminating process:"+str(os.getpid()))

def createfiles(number_of_files, content,path,start_time, log_interval):
  own_pid = str(os.getpid())
  created_files = []
  for i in range(number_of_files):
    if (i % log_interval == 0):
      print_time_delta(start_time, str(i)+" | "+own_pid+" | "+"create","prod_log.txt")
      filename = "wordfile_test_"+"_"+own_pid+"_"+str(i)+".docx"
      created_files.append(filename)
      with open(path+"\\"+filename,"wb") as print_file:
        print_file.write(content)

  print_time_delta(start_time, str(number_of_files) +" | "+own_pid+" | "+"create","prod_log.txt")

  return created_files

def rename_files(filenames,path,start_time, log_interval):
  new_filenames = []
  own_pid = str(os.getpid())
  i = 0
  for file in filenames:
    if (i % log_interval == 0):
      print_time_delta(start_time, str(i)+" | "+own_pid+" | "+"rename","prod_log.txt")
      lst =[random.choice(string.ascii_letters + string.digits) for n in range(30)]
      text = "".join(lst)
      os.rename(path+"\\"+file,path+"\\"+text+".docx")
      new_filenames.append(text+".docx")
      i += 1

  print_time_delta(start_time, str(len(new_filenames))+" | "+own_pid+" | "+"rename","prod_log.txt")

return new_filenames

def delete_files(filenames,path,start_time, log_interval):
  num_files = len(filenames)
  own_pid = str(os.getpid())
  i = 0
  for file in filenames:
    if (i % log_interval == 0):
      print_time_delta(start_time, str(i)+" | "+own_pid+" | "+"delete","prod_log.txt")
      os.remove(path+"\\"+file)
      i += 1
      print_time_delta(start_time, str(num_files)+" | "+own_pid+" | "+"delete","prod_log.txt")

def check_and_create_folder_path(path):
  if not os.path.exists(path):
    os.makedirs(path)

def read_file_data(infile):
  with open(infile,"rb") as content_file:
    content = content_file.read()
  return content

if __name__ == "__main__":
  multiprocessing.freeze_support()
  parser = optparse.OptionParser()
  parser.add_option('-f', '--files', default=100, help="The number of files each process should create. Default is 100")
  parser.add_option('-p', '--processes', default=10, help="The number of processes the program should create. Default is 10")
  parser.add_option('-a', '--action', default='a', help="The action which the program should perform. The default is a.\n Opions include a (all), c (create), cr (create and rename)")
  parser.add_option('-l', '--log_interval', default=100, help="The interval between when a process is logging files created. Default is 100")
  parser.add_option('-t', '--temp_path', default="temp_files", help="Path where the file processes will be done")
  parser.add_option('-i', '--infile', default="infile.txt", help="The file which will be used in the test")

  options, args = parser.parse_args()
  main(int(options.files), int(options.processes), options.action, int(options.log_interval), options.temp_path, options.infile)

 

 

sample_from_output

The output from running this script will be a pipe separated (‘|’) list with seconds, number of files, the process ID (since we enable the program to spawn and run similar processes simultaneously we need to have a way to identify the processes) and actions. This will look like the image below, and from this number you can create statistics on performance at different folder sizes.

The idea of performing this analysis and valuable feedback in the process came from great colleagues at Steria AS.  Any issues, problems, responsibilities etc. with the code or text are solely my own. Whatever you use this information to do, try out or anything is solely your own responsibility.

The folder image is by Erik Yeoh and is released under a Creative Commons Attribution-NonCommercial-ShareAlike License. The image can be found on Flickr.

Two Good Tools for Peeking Inside Windows

Two Good Tools for Peeking Inside Windows

Over the last couple of weeks I have explored the inner mechanics of Microsoft Windows, and the processes that run in this context. In this process two tools have proved especially useful: Xperf logs with WPA and Sysinternal’s Process monitor.

Xperf/WPA

During execution Windows is continuously surveying it’s internal processes through the Event Tracing for Windows (ETW). We may harness this internal surveying by extracting the data through the program Xperf. From specific points in the execution (which we decide when we initially start the logging) we can sample what is happening within the program and in the system as a whole.Two good features I have utilized from  Xperf/WPA combo are:

  •  Process Images We can which ‘.dll’-images are loaded by each process, and when they are loaded.
  •  We can view, over time, what system resources such as memory. CPU, and IO are used by each process or by the system as a whole.

Both these tools are included in the Windows Performance Toolkit which again is part of the Windows Assessment and Deployment Kit. You can during the assessment and deployment kit installation choose to install only the performance toolkit.

To record a session you need to call Xperf twice from the command shell: first to start the logging with specific flags to point out from which internal flags a sample should be made, then secondly to stop the logging and print the results to an .etl file.

A typical start command could be:

xperf -on latency -stackwalk profile

In this example xperf is called with the latency group in kernel mode which is looking at the following flags: PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PROFILE. The stackwalk options provides a stack for the flags or group provided. For a complete list of kernel flags you can use the “xperf -providers k” command.

Once you have started Xperf and performed the action you wanted to record, you may stop xperf with this command

xperf -d output.etl

The -d option is explicitly telling xperf to append the logged session to the file output.etl (or create this file if not existing). The command also implicitly tells the logging session to stop.

For full overview over the commands accepted by Xperf, please refer to the Xperf options documentation at MSDN.

To analyze an .etl file, and the data that has been collected in the logging session, Microsoft has made available a good tool: Windows Performance Analyzer.

Windows Performance Analyzer is a part of the Windows Performance Toolkit.
Windows Performance Analyzer is a part of the Windows Performance Toolkit.

This neat tool provides small views for viewing the genral KPI for the resources to the left, and all of the main resources has expandable menus for more detailed views. Double clicking, or right clicking and selecting to open the section in the main window opens a more detailed overview in the right view of the application. Here the user can go into detailed depth of the applications. In the screenshot you can see the images loaded by the relatively simple command line application Ping.

Process Monitor

The Sysinternal toolkit contains many useful tools for various Windows-related tasks among others the ability to see the activities of a process over time. The latter is straight in the domain of the Process Monitor. With this convenient tool you can get an overview over what operations including registry queries, use of file resources and loading of images a process is doing.

With the Process Monitor you can surveying the system as a whole, and also filter for a specific process. The program traces many of the calls the program is making to the system, and you can use this trace to see in what sequence a program is executing and also which and what kind of system resources it relies on. The combination Xperf and WPA could give an good overview over the images loaded by a process, and with the Process Monitor you may expand this knowledge with Registry queries and Network calls, you can also look at when different profiling actions are called.

Process Monitor from the Sys internal Suite is a good tool to scrutinize what is happening with one or more process.
Process Monitor from the Sys internal Suite is a good tool to scrutinize what is happening with one or more process.

Process Monitor is used both for recording a trace, and for analyzing this afterwards. The traces can be saved to file. They can also be conveniently filtered through the filter functionality, either on the specific types of actions performed by a process (registry, file system, network resources, process and thread activity and profiling), using the symbols to the right of the menu. There is also a filter functionality, displayed in the image by the overlying window, here a good rule to make is to exclude all the actions not associated to the process which you want to survey.

Be advised that Process Monitor records a huge amount of actions. It can be a good idea to turn off recording when you not intend to record, and this can be achieved by toggling the magnifying glass in the menu.

An advantages of the programs in the Sysinternal toolkit, Xperf and WPA is that they do not need to be installed to work. All these tools can be put on a USB stick, and with some training you have suddenly become an one-man-army ready for examining the inside out of Windows.

The image used to illustrate this blog post is by Julian E…, it’s found through Flickr and shared under a Creative Commons by-nc-nd license. 

Three Useful Terminal Commands

Three Useful Terminal Commands

See my terminal-tools tag for more articles on nifty terminal tools.

The *nix terminal is a great tool. In everyday life this is where a lot of things can be done easy and fast with this power tool. There are an abundance of tools and ways of solving problems with the terminal, but here you have three practical applications.

Find a text string within current folder

grep your_magic_string -H -R *

The command ‘grep’ is used as a search tool for plain text files matching a regular expression. In the example above we want to find the string ‘your_magic_string’ within the current folder, so we want grep to search recursively as is done through passing with the option ‘-R’ and the wildcard (‘*’). This command will return all text files within the current folder where this text streng can be found. The -H option prints the filename.

Search and replace text within a file

sed -i 's/wrong/right/g' /home/ola/list-with-stuff/somethingwrong.txt

Sed is a stream editor that parses text and implements a programming language so that this text can be modified. The script above reads a file at the endmost location (‘/home/ola/list-with-stuff/somethingwrong.txt’) and searches through the file replacing the text pattern ‘wrong’ with ‘right’. The -i option overwrite the original file with the alterations. What used to be a negative text is now super positive.

Download a file from the Internet

curl -o http://http://www.lovholm.net/wp-content/uploads/2013/06/mega_secret.jpg megasecret.jpg

Curl is a nice tool to access resources on the Internet. If you want to see the source of a web-page just enter the URL after curl. For binaries and files you want to retain you can add the -o option. This lets you print the output to a file instead of displaying the text representation in your terminal window.

Bonus: Star Wars

All work and no play, makes ___________ (enter your name) a dull ___________ (enter your gender here). Bring popcorn, kick back in your chair and type the command: telnet towel.blinkenlights.nl. Enjoy!

Mac OSX

The mac terminal icon
The mac terminal icon

What many Mac OSX users doesn’t know is that the OSX is built on Unix. In the Utilities folder under Applications you will find a terminal.Through this application you can do lots of things quick and easy. A google search reveal many good tutorials

Note: If some of the commands above prompts and error message saying that the command is unknown, you would need to install the program. Use your local packet manager such as yum or apt-get to install these. On Mac macports is a nifty packet manager.

Image: The terminal can also be used for cool stuff such as surfing the waves of the internet. The picture shows the web browser Lynx. The terminal was after all the main mode of communication with the computer before the dawn of the Graphical User Interface, and is still widely used.

The Lynx Web browser displaying lovholm.net