data_hacking

Examples of using IPython, Pandas, and Scikit Learn to get the most out of your security data.
data_hacking logo

Data Hacking Project

A collection of projects and exercises related to data hacking.


Projects

  • Hacking Clustering: A project that uses clustering algorithms to group similar binary files.
    • Includes two notebooks: one for PE (Portable Executable) files, and another for Mach-O (Mach Object) files.
  • SWF Classification: A project that classifies SWF (Shockwave Flash) files using various classification techniques.
    • Includes a notebook viewer and GitHub repository.
  • Java Class File Classification: A project that classifies Java class files using various classification techniques.
    • Includes a notebook viewer and GitHub repository.

  • PE File Similarity Graph using Workbench: A notebook that creates a graph of similar PE (Portable Executable) files.
  • Windows Executable Clustering by Image Similarity: A notebook that clusters Windows executables based on their image similarity.

Setup

Required packages:

  • Brew/apt-get: graphviz, freetype, and zmq
  • Python: ipython, pygraphviz, pandas, matplotlib, networkx, pyzmq, jinja2, scipy, patsy, statsmodels, and pefile

Some exercises use packages from the data_hacking repository. To install these packages, run:

%> sudo python setup.py install

To uninstall, run:

%> sudo pip uninstall data_hacking

Install IPython

Install IPython using the normal method.


Running Notebooks

Most notebooks will have relative paths to some resources, data files, or images. To run a notebook, change into the project directory and run ipython with this alias:

alias ipython='ipython notebook  --FileNotebookManager.notebook_dir=`pwd`'

Then, run:

$ cd data_hacking/fun_with_syslog
$ ipython (as aliased above)




> Visit data_hacking Website <