Bypassing Cloudfare DDoS in Scrapy

While doing web scraping I came across with a website who has implemented Cloudfare DDoS (Distributed Denial of Service) protection. DDoS is an attempt where a target host is attacked by multiple sources commonly to bring it down. Wikipedia. Cloudfare, apart from being a usual CDN also provides security features to the websites. One of which is the DDoS protection. Cloudfare sits between a client and the actual server as a proxy and block any bot to hit the actual website.
Read more →

Linked list implementation in Python

class Node(object): """ A Node is a single element in a linked list datastructure. It consists of two parts 1 - Data (Can be of any type) 2 - A reference to the next Node If the next_node pointers happens to be null (None) it indicates that this is the last element (Node) of the linked list :param object: """ def __init__(self, data, next_node): self.data = data self.next_node = next_node class LinkedList(object): """ A LinkedList class acts as a containers of all nodes.
Read more →

Process CSV files with multiprocessing in Pandas

Pandas gives you the ability to read large csv in chunks using a iterator. This way you don’t have to load the full csv file into memory before you start processing. My objective was to extract, transform and load (ETL) CSV files that is around 15GB. Here is the code snippter that can be used to delegate jobs to multi cores to speed up a linear process. import pandas as pd from multiprocessing import Pool def process(df): """ Worker function to process the dataframe.
Read more →

Python argparse and subparsers

Often times we have to create command line utilities and it doesn’t make sense every time to put one command in a separate file specially when you have a bunch of related functionalities. This is a quick tutorial for handling sub commands in python which shows how we can write multiple functions in one module and call them via terminal. I will create a simple indices.py file which will contain two commands
Read more →

Multiple database instances with single class

During the development of an API for a client i feel the need to have one class that can return me instance of different databases. I made a singleton pattern to achieve this. Below is the class that you can utilize if needed. <?php class Connection { /** * [$_mongo_connection description] * @var null */ private static $_mongo_connection = null; /** * [$_mysql_connection description] * @var null */ private static $_mysql_connection = null; /** * A private construction to achieve singleton * @param string $type type of database */ private function __construct($type) { switch ($type) { case 'Mongo': self::$_mongo_connection = new MongoClient('mongodb://localhost'); break; case 'MySQL': $dsn = 'mysql:dbname=mydatabase;host=localhost'; $user = 'root'; $password = 'mypassword'; self::$_mysql_connection = new PDO($dsn, $user, $password); break; default: return; break; } } /** * Returns instance of MongoDB * @return Resource */ public static function getMongoInstance() { if(is_null(self::$_mongo_connection)) { new Connection('Mongo'); } return self::$_mongo_connection; } /** * Returns instance of MongoDB * @return Resource */ public static function getMySQLInstance() { if(is_null(self::$_mysql_connection)) { new Connection('MySQL'); } return self::$_mysql_connection; } /** * Set the instannces to null on object disposal */ public function __destruct() { self::$_mongo_connection = null; self::$_mysql_connection = null; } }
Read more →