- 它发现的电子邮件地址,URL和信用卡号码,其他工具错过,因为它可以处理压缩的数据(如ZIP,PDF和GZIP网络LES)和不完整的或部分损坏的数据。它也能开JPEG文件,Office文档和其他类型的文件进行压缩数据的片段。它会检测并瓜分加密的RAR文件。
- 它建立基于所有的数据中发现,即使是那些在那些在未分配空间压缩文件词语词列表。这些词汇表可用于密码破解有用。
- 它是多线程的;运行bulk_extractor的计算机上的核心数量的两倍通常使其完成一个运行在一半的时间。
- 它创建直方图显示最常用的电子邮件地址,网址,域名,搜索词和其他类型的驱动器上的信息。 bulk_extractor运行在磁盘映像,文件或文件的目录,并提取有用的信息,而无需解析文件系统或文件系统的结构。输入被分成页,由一个或多个扫描仪处理。结果被存储在可以很容易地进行检查,解析,或加工与其他自动化工具特征文件。 bulk_extractor也创造了它找到的特征直方图。因为功能,如电子邮件地址和互联网搜索词是比较常见的往往是重要的,这非常有用。 除了上述的功能,bulk_extractor还包括:
- 图形用户界面,批量提取器,用于浏览功能存储功能,网络连接LES和发起bulk_extractor扫描
- 少数Python程序执行额外的分析功能,网络连接LES 资料来源:http://digitalcorpora.org/downloads/bulk_extractor/BEUsersManual.pdf
- 作者:Simson的L.加芬克尔
- 许可:GPL第二版
0x01 列入批量解压包工具
bulk_extractor - 提取信息,而不解析文件系统:~# bulk_extractor bulk_extractor version 1.3 $Rev: 10606 $ Usage: bulk_extractor [options] imagefile runs bulk extractor and outputs to stdout a summary of what was found where Required parameters: imagefile - the file to extract or -R filedir - recurse through a directory of files SUPPORT FOR E01 FILES COMPILED IN SUPPORT FOR AFF FILES COMPILED IN -o outdir - specifies output directory. Must not exist. bulk_extractor creates this directory. Options: -b banner.txt- Add banner.txt contents to the top of every output file. -r alert_list.txt - a file containing the alert list of features to alert (can be a feature file or a list of globs) (can be repeated.) -w stop_list.txt - a file containing the stop list of features (white list (can be a feature file or a list of globs)s (can be repeated.) -F <rfile> - Read a list of regular expressions from <rfile> to find -f <regex> - find occurrences of <regex>; may be repeated. results go into find.txt -q nn - Quiet Rate; only print every nn status reports. Default 0; -1 for no status at all Tuning parameters: -C NN - specifies the size of the context window (default 16) -G NN - specify the page size (default 16777216) -g NN - specify margin (default 4194304) -W n1:n2 - Specifies minimum and maximum word size (default is -w6:14) -B NN - Specify the blocksize for bulk data analysis (default 512) -j NN - Number of analysis threads to run (default 2) -M nn - sets max recursion depth (default 5) Path Processing Mode: -p <path>/f - print the value of <path> with a given format. formats: r = raw; h = hex. Specify -p - for interactive mode. Specify -p -http for HTTP mode. Parallelizing: -Y <o1> - Start processing at o1 (o1 may be 1, 1K, 1M or 1G) -Y <o1>-<o2> - Process o1-o2 -A <off> - Add <off> to all reported feature offsets Debugging: -h - print this message -H - print detailed info on the scanners -V - print version number -z nn - start on page nn -dN - debug mode (see source code -Z - zap (erase) output directory Control of Scanners: -P <dir> - Specifies a plugin directory -E scanner - turn off all scanners except scanner -m <max> - maximum number of minutes to wait for memory starvation default is 60 -s name=value - sets a bulk extractor option name to be value -e bulk - enable scanner bulk -e wordlist - enable scanner wordlist -x accts - disable scanner accts -x aes - disable scanner aes -x base16 - disable scanner base16 -x base64 - disable scanner base64 -x elf - disable scanner elf -x email - disable scanner email -x exif - disable scanner exif -x gps - disable scanner gps -x gzip - disable scanner gzip -x hiber - disable scanner hiber -x json - disable scanner json -x kml - disable scanner kml -x net - disable scanner net -x pdf - disable scanner pdf -x vcard - disable scanner vcard -x windirs - disable scanner windirs -x winpe - disable scanner winpe -x winprefetch - disable scanner winprefetch -x zip - disable scanner zip
0x02 bulk_extractor用法示例
文件提取到输出目录 (-o批量出) 分析图像 文件(XP的笔记本电脑2005-07-04-1430.img) 后: ```bash :~# bulk_extractor -o bulk-out xp-laptop-2005-07-04-1430.img bulk_extractor version: 1.3 Hostname: kali Input file: xp-laptop-2005-07-04-1430.img Output directory: bulk-out Disk Size: 536715264 Threads: 1 Phase 1. 13:02:46 Offset 0MB (0.00%) Done in n/a at 13:02:45 13:03:39 Offset 67MB (12.50%) Done in 0:06:14 at 13:09:53 13:04:43 Offset 134MB (25.01%) Done in 0:05:50 at 13:10:33 13:04:55 Offset 201MB (37.51%) Done in 0:03:36 at 13:08:31 13:06:01 Offset 268MB (50.01%) Done in 0:03:15 at 13:09:16 13:06:48 Offset 335MB (62.52%) Done in 0:02:25 at 13:09:13 13:07:04 Offset 402MB (75.02%) Done in 0:01:25 at 13:08:29 13:07:20 Offset 469MB (87.53%) Done in 0:00:39 at 13:07:59 All Data is Read; waiting for threads to finish... Time elapsed waiting for 1 thread to finish: (please wait for another 60 min .) Time elapsed waiting for 1 thread to finish: 6 sec (please wait for another 59 min 54 sec.) Thread 0: Processing 520093696 Time elapsed waiting for 1 thread to finish: 12 sec (please wait for another 59 min 48 sec.) Thread 0: Processing 520093696 Time elapsed waiting for 1 thread to finish: 18 sec (please wait for another 59 min 42 sec.) Thread 0: Processing 520093696 Time elapsed waiting for 1 thread to finish: 24 sec (please wait for another 59 min 36 sec.) Thread 0: Processing 520093696 Time elapsed waiting for 1 thread to finish: 30 sec (please wait for another 59 min 30 sec.) Thread 0: Processing 520093696 All Threads Finished! Producer time spent waiting: 335.984 sec. Average consumer time spent waiting: 0.143353 sec.
bulk_extractor is probably CPU bound. Run on a computer with more cores to get better performance.
Phase 2. Shutting down scanners Phase 3. Creating Histograms ccn histogram... ccn_track2 histogram... domain histogram... email histogram... ether histogram... find histogram... ip histogram... tcp histogram... telephone histogram... url histogram... url microsoft-live... url services... url facebook-address... url facebook-id... url searches... Elapsed time: 378.5 sec. Overall performance: 1.418 MBytes/sec. Total email features found: 899 ```