Header Banner
Null Byte Logo
Null Byte
wonderhowto.mark.png
Cyber Weapons Lab Forum Metasploit Basics Facebook Hacks Password Cracking Top Wi-Fi Adapters Wi-Fi Hacking Linux Basics Mr. Robot Hacks Hack Like a Pro Forensics Recon Social Engineering Networking Basics Antivirus Evasion Spy Tactics MitM Advice from a Hacker

Gathering Data for Fun and Profit

Apr 9, 2012 07:46 PM

Oh Data, You so Awesome!

We are going to use Node.JS to gather us some data.  Given nodes plethora of well abstracted network abilities and it's deep evened nature, it will make quick work of plugging into various data sources and gathering / making good use of said data.  

Data sources we will be using are: Twitter, IRC and RSS.

Install yourself some Node.JS

If you are on most modern systems there is a package manager in place.. use that to install node.. 

apt-get install node.js node.js-dev or pkg_add -vi node or brew install node or  yum install node

If your system is crazy use the following steps to install node:

  1. curl -O http://nodejs.org/dist/v0.6.13/node-v0.6.13.tar.gz
  2. tar -zxvf node-v0.6.13.tar.gz
  3. cd node-v0.6.13
  4. ./configure --prefix=/usr/local
  5. make && make install 

Identify the data you wish to gather

I have chosen to gather data pertaining to the "That's what she said" joke.  I have a bot named "mcchunkie" ( link takes you to mcchunkie's brain - so you can see words it thinks are "funny" ) that has done some crowd sourcing of "twss" data.  It uses a naive baysian classifier to identify words in a sentence that are "twss" worthy. 

But it needs MOAR DATA! 

So we will keep any string that has /twss/i in it ( and various meta information provided by the sources ).

Start writing some code!

I am only going to post images of the code with a brief description in this section.  The end of the article will contain links to a github project with all the running code.

Gather Twitter data:

twitter.stream( 'statuses/filter', { track: data_source.twitter }, function( str ) {

    str.on( 'tweet', function( tw ) {

gather_data( 'tweet', tw );

});

Gather RSS Data:

rss.on( 'article', function( article ) {

if ( article.content.match( /twss/i ) ) {

gather_data( 'rss', article );

}

});

rss.start();

Gather IRC Data:

irc_client.addListener( 'message', function( from, to, msg ) {

if ( msg.match( /twss/i ) ) {

gather_data( 'irc', { from: from, to: to, msg: msg } );

}

});

As you can see.. there isn't much in the way of code.  Each block is establishing an event listener through the various libraries being used. 

When a new event is triggered ( new line sent to irc, or new article published ), the listener triggers a function that we have passed to it. 

The function then hands the data to the "gather_data" function, which simply logs the data to STDOUT.

Conclusion

Data is awesome. 

Link to the code: https://github.com/qbit/whtgather

Image by Behrig

The next big software update for iPhone is coming sometime in April and will include a Food section in Apple News+, an easy-to-miss new Ambient Music app, Priority Notifications thanks to Apple Intelligence, and updates to apps like Mail, Photos, Podcasts, and Safari. See what else is coming to your iPhone with the iOS 18.4 update.

Related Articles

Comments

No Comments Exist

Be the first, drop a comment!