Examples of using IPython, Pandas, and Scikit Learn to get the most out of your security data.
Data Hacking Project
A collection of projects and exercises related to data hacking.
Projects
- Hacking Clustering: A project that uses clustering algorithms to group similar binary files.
- Includes two notebooks: one for PE (Portable Executable) files, and another for Mach-O (Mach Object) files.
- SWF Classification: A project that classifies SWF (Shockwave Flash) files using various classification techniques.
- Includes a notebook viewer and GitHub repository.
- Java Class File Classification: A project that classifies Java class files using various classification techniques.
- Includes a notebook viewer and GitHub repository.
- PE File Similarity Graph using Workbench: A notebook that creates a graph of similar PE (Portable Executable) files.
- Windows Executable Clustering by Image Similarity: A notebook that clusters Windows executables based on their image similarity.
Setup
Required packages:
- Brew/apt-get:
graphviz
, freetype
, and zmq
- Python:
ipython
, pygraphviz
, pandas
, matplotlib
, networkx
, pyzmq
, jinja2
, scipy
, patsy
, statsmodels
, and pefile
Some exercises use packages from the data_hacking
repository. To install these packages, run:
%> sudo python setup.py install
To uninstall, run:
%> sudo pip uninstall data_hacking
Install IPython
Install IPython using the normal method.
Running Notebooks
Most notebooks will have relative paths to some resources, data files, or images. To run a notebook, change into the project directory and run ipython
with this alias:
alias ipython='ipython notebook --FileNotebookManager.notebook_dir=`pwd`'
Then, run:
$ cd data_hacking/fun_with_syslog
$ ipython (as aliased above)
> Visit data_hacking Website <