Tumgik
pythonplay · 10 years
Text
Data Mining With Python & Pandas - 2 of N - Why Indians Should Be Happy That Germany Won Against France
It's the 13th minute and Hummels just scored! The excitement!
In one of the earlier posts, I had mentioned about the data.gov.in website which is really a fantastic place to explore some interesting data.
I have taken this particular data source to decide whom to support in today's FIFA World Cup 2014 Quarter Finals.
Let's get started from where we previously left off.
df.head()
Tumblr media
The data is loaded and ready to go.
Germany vs. France, let the game begin!
Before we begin with meddling around the data that we have in our hands, let's just look at this snippet from Wikipedia:
According to the Department of Commerce, the fifteen largest trading partners of India represent 62.1% of Indian imports, and 58.1% of Indian exports as of December 2010. These figures do not include services or foreign direct investment, but only trade in goods.
Well, that's one goal to Germany!
But what does the data really say? Well, let's find out...
#some assignments to speed up things later on country = 'Country exporting to India' value = 'Value (INR) - 2012-13' #create a filter fromGermans = df[country] == 'GERMANY' #slice the dataframe germany = df[fromGermans] germany.sort(columns=[value], ascending=False).head(10)
Interesting code isn't it? I'll explain it line-by-line in just a moment, but now, let's take a look at what the above lines of code produces:
Tumblr media
Those are the top 10 goods that India imports from Germany ordered descending by how much India had to spend on each of those - i.e - Costliest on top.
For some reason however, data.gov.in hasn't updated the quantity of import for most of the goods in the top 10. Weird!
Okay, let's get back to the code. Pandas does some really clever data indexing, so once you've loaded data into your DataFrame, they can be selected, sliced, drilled-down, etc. in any manner you want (and in some really clever ways that you will find out exclusively on pythonplay.com - I couldn't resist a marketing pitch. The effects of late night blogging after watching the World Cup quarters I suppose! )
Also, in another quarters, Federer won and moves on in the Wimbledon to the next round.
What I'm doing here is basically called boolean indexing:
#create a filter fromGermans = df[country] == 'GERMANY' #slice the dataframe germany = df[fromGermans]
I create a filter / criterion for slicing the DataFrame - notice that it's a vector operation, but essentially Pandas gives you the power to do it by pretending that it is a scalar value.
Hold on, France is attacking...
fromFrench = df[country] == 'FRANCE' france = df[fromFrench] france.sort(columns=[value], ascending=False).head(10)
Tumblr media
Ah, but - How much more do we spend on German goods than French goods? Turns out that number is - 560723316290! I can't even comprehend this number at one look.
57000 Crores.
germany.sum()[value] - france.sum()[value]
India imports stuff from Germany that is worth INR 57K Crores more than that from France!
So what if the Indian economy is influenced by all this? We just want a good game of football, don't we?
Germany 1 - 0 France.
1 note · View note
pythonplay · 10 years
Text
Data Mining With Python & Pandas - 1 of N - A Sunday Afternoon Data Hack!
This is the first post of a series titled 'India In Numbers' that's more politically, economically and socially explored on my personal blog - suhas.co
Here I'll be talking about the science of it - and the beauty of the library called Pandas.
I started by exploring data.gov.in which publicly provides some really interesting data - and I found this which excited me - Country-wise commodity imports of India
Pandas is in-memory, so once I downloaded the csv from the above link, I have to load it to analyze the data, and this is how you do it:
import pandas as pd df = pd.read_csv("data.csv")
And that's it - the data structure that I've named df is basically what's called a Data Frame, take a look at its documentation.
All operations in pandas is now a breeze once our data has been 'loaded' into the Data Frame.
To really get a feel of the data, we usually need to take a sneak peek into the actual data itself, and sneak peek's are easy to do from a Data Frame, here's how you do it -
df.head()
and sure enough, we get our first few tuples -
Tumblr media
Now that we've loaded the data and taken a sneak peek at it, let's analyze and dig some knowledge. Look out for it in the article titled Data Mining With Python & Pandas - 2 of N.
- by Suhas SG
1 note · View note
pythonplay · 10 years
Text
Highest (Greatest) Common Element Among Two Lists In Python
>>> from collections import Counter >>> xs = [1, 3, 5, 7, 9, 15, 45] >>> ys = [7, 8, 9, 15, 1, 1, 2] >>> max(list((Counter(xs) & Counter(ys)).elements())) 15
0 notes
pythonplay · 10 years
Text
Code Golf, Recursive Lambdas and Saturday Morning Fun with Fibonacci Numbers
This is probably the naive way of implementing Fibonacci Numbers:
def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) print [fib(_) for _ in range(10)]
But that is six lines of code (including the blank line). No I'm not happy with six, I want to do it in a one-liner.
Okay, a slightly condensed form of the above can be something like this:
def fib(n): return n if n < 2 else fib(n-1) + fib(n-2) print [fib(_) for _ in range(10)]
Still, that's four lines of code. Not good enough. Is there a way to do this in just one line?
Turns out, there is a way! Recursive Lambdas! :)
print [(lambda x:lambda y:x(x, y))(lambda z, u:u if u < 2 else z(z, u-1) + z(z, u-2))(_) for _ in range(10)]
Looks a little crazy doesn't it? :)
1 note · View note
pythonplay · 10 years
Text
Merging Two Dicts On Common Keys With Values As A List
What do you do when you have two dicts like this
x = {'one': 1, 'three': 3, 'two': 2} y = {'one': 1.0, 'two': 2.0, 'three': 3.0}
and you need to combine them in such a way that the resultant dict will be something like this:
{'one': [1, 1.0], 'three': [3, 3.0], 'two': [2, 2.0]}
Here's a quick way of doing it:
result = dict(x.items() + y.items() + [(k, [x[k], y[k]]) for k in x.viewkeys() & y.viewkeys()])
2 notes · View notes
pythonplay · 11 years
Text
Grouping Consecutive Occurences of Tuples in a List
So I have a data set like this,
data = [('a',1),('a',2),('a',3),('b',1),('b',2),('a',4)]
Let's say I want to group them by consecutive occurences of the first element of each tuple. So the output I'm expecting is something like this,
a 1,2,3 b 1,2 a 4
Take a deep breath, and don't be surprised, I'm going to do this in just two lines.
>>> for k, v in itertools.groupby(data, key = lambda x : x[0]): >>> print k, [_[1] for _ in list(v)] a [1, 2, 3] b [1, 2] a [4]
Isn't python fun?
0 notes
pythonplay · 11 years
Text
How to find out CDF of data in python? (The simple, non-probabilistic version)
In [1]:
book_prices = [23.5,47.5,55.0,21.0,1.5,2.6,33.5,45.5,99.5,20.5,21.5,100.0,88.5,40.5, 30.0,18.99,23.5,22.25,45.5,90.0,85.5,90.0,15.0]
In [2]:
i = 0 cumulative_prices = [] for p in sorted(book_prices, reverse=True): if i==0: cumulative_prices.append(p) else: cumulative_prices.append(p+cumulative_prices[i-1]) i+=1 cumulative_prices
Out[2]:
[100.0, 199.5, 289.5, 379.5, 468.0, 553.5, 608.5, 656.0, 701.5, 747.0, 787.5, 821.0, 851.0, 874.5, 898.0, 920.25, 941.75, 962.75, 983.25, 1002.24, 1017.24, 1019.84, 1021.34]
In [3]:
sum(book_prices)
Out[3]:
1021.34
In [4]:
cumulative_percentages = [ (c*100.0)/ sum(book_prices) for c in cumulative_prices ]
In [5]:
cumulative_percentages
Out[5]:
[9.7910588050991834, 19.53316231617287, 28.345115240762134, 37.157068165351397, 45.822155207864178, 54.193510486223978, 59.57859282902853, 64.229345761450645, 68.684277517770767, 73.139209274090902, 77.104588090156071, 80.3845927898643, 83.321910431394045, 85.622809250592354, 87.923708069790663, 90.102218653925235, 92.207296297021557, 94.26341864609239, 96.270585701137719, 98.129907768226047, 99.598566588990934, 99.853134117923503, 100.0]
In [6]:
import matplotlib.pyplot as plt
In [7]:
plt.plot(cumulative_percentages)
Out[7]:
[<matplotlib.lines.Line2D at 0x79ab978>]
4 notes · View notes
pythonplay · 11 years
Text
Top 10 Highlights From PyCon India 2013
1. A lot of people use IPython Notebooks.
2. A lot of people use Python for Data Analysis, and Scientific Computing, and especially Machine Learning.
3. Python is up and coming, it's used increasingly in production, and people love its simplicity and elegance.
4. Web Frameworks like Django and Flask are popular. The former being an all-in-one solution and the latter a light weight get-your-app-running-in-five-minutes type.
5. Some people are targeting python early in education, and are campaigning towards a notion of fun and easy coding for kids. It's already part of School Syllabus in CBSE, and some parents are worried about this.
6. Python is diverse in its capabilities, from telephony to robotics, and even to predict black swan events!
7. People are hiring if you are capable of conversing in python. (shouldn't take an awful lot of time thanks to its simplicity)
8. The Tee Shirt sizes were properly thought out, mine fit me!
9. Students of India are an enthusiastic lot, so are the corporates. Enthusiasm was everywhere.
10. Python, although gaining more popularity by the day, is still not being used in highly scalable, and time-efficient systems.
1 note · View note
pythonplay · 11 years
Text
An Elegant Way To Find Out Median Of Three Numbers In Python
There are multiple ways of doing this.
The first one is the comparison way -
>>> def median3(a,b,c): ... if a<b: ... if c<a: ... return a ... elif b<c: ... return b ... else: ... return c ... else: ... if a<c: ... return a ... elif c<b: ... return b ... else: ... return c ... >>> median3(1,5,2) 2 >>> median3(3,5,2) 3 >>> median3(3,5,7) 5 >>> median3(7,5,2) 5 >>> median3(7,5,1) 5 >>> median3(2,5,1) 2
this takes at least two comparisons and at most three to compute.
Another elegant way to do this might be to sort the three numbers and return the middle one.
Like this:
>>> def median3(a,b,c): ... return sorted([a,b,c])[1] ... >>> median3(1,5,2) 2 >>> median3(3,5,2) 3 >>> median3(3,5,7) 5 >>> median3(7,5,6) 6
0 notes
pythonplay · 11 years
Text
Divide and Conquer With Python - Merge Sort
In Python, the messy handling of memory with pointers and such are non-existent - which really makes way for more fluent translation of pseudo-code into a working python code.
Sometimes, it's so simple and elegant that the python code looks simpler than the pseudo-code.
That said, let's look at how to approach merge sort in python.
Merge Sort is a divide and conquer based approach to sorting which runs in O(nlogn) time. All you have to do is to divide the unsorted input array into a LEFT sub-array and a RIGHT sub-array and recursively call it again and again until the base case when the sub-array is of length 1 - where it's trivially sorted, and then merge back (conquer).
Here's how simple it is:
def mergesort(nums): if len(nums) <= 1: return nums mid = len(nums)/2 sorted_left_array = mergesort(nums[:mid]) sorted_right_array = mergesort(nums[mid:]) return merge(sorted_left_array,sorted_right_array)
Quite simple.
Now all we have to do is implement the merge method that takes two sorted arrays and merges them together into one single sorted array.
Here's the idea of the merge sub-routine:
1. Traverse through sorted_left_array and sorted_right_array with say indices i and j
2. Compare sorted_left_array[i] and sorted_right_array[j] and whichever is smaller, append it to the result - increment i if sorted_left_array[i] is appended or increment j otherwise. (This will become obvious once you see the implementation.)
This is how you can do it -
def merge(xs,ys): ms = [] i = 0 j = 0 while i < len(xs) and j < len(ys): if xs[i] <= ys[j]: ms.append(xs[i]) i = i+1 else: ms.append(ys[j]) j = j+1 while i < len(xs) and j == len(ys): ms.append(xs[i]) i = i+1 while i == len(xs) and j < len(ys): ms.append(ys[j]) j = j+1 return ms
How many Lines Of Code will it take in other languages? :)
2 notes · View notes
pythonplay · 11 years
Text
Functional Programming - lambdas, some fun, and then Currying!
Functions can be assigned.
>>>def square(x): ... return x*x ... >>>f=square >>>f(4) 16
They can also be passed as arguments.
>>>def fsum(f,foo): ... return sum(map(f,foo)) ... >>>fsum(square,[3,4]) 25
Lambdas are convenient anonymous functions. They are defined without being bound to any identifier - Nameless functions! Lambdas can be handy when you need to pass one function to some higher order function. Like in the above example, you can do this:
>>> fsum(lambda x: x*x, [3,4]) 25
Now that we know that, let's try currying in python.
As always, let's start with a list.
Today, this will be my list:
foo = range(1,10)
I need to do the following additive operations on this list:
Sum of squares of each element in the list (1^2) + (2^2) + (3^2) + (4^2) ...
Sum of all the elements in the list when each element is multiplied by, say 3 (1*3) + (2*3) + (3*3) + (4*3) ...
Sum of all the elements in the list when each element is added with, say 5 (1+5) + (2+5) + (3+5) + (4+5) ...
At first thoughts, you'd probably think of defining three functions for them, no?
>>> foo = range(1,10) >>> def square_adder(foo): ... return sum(map(lambda x: x**2, foo)) ... >>> def mul_adder(foo): ... return sum(map(lambda x: x*3, foo)) ... >>> def sum_adder(foo): ... return sum(map(lambda x: x+5, foo)) ... >>> square_adder(foo) 285 >>> mul_adder(foo) 135 >>> sum_adder(foo) 90
Well, here's how it can be done using currying:
>>> def fsum(f): ... return lambda x,y: sum(map(f,range(x,y))) ... >>> fsum(lambda x: x**2)(1,10) 285 >>> fsum(lambda x: x*3)(1,10) 135 >>> fsum(lambda x: x+5)(1,10) 90
and this is possible too:
>>> square_adder = fsum(lambda x: x**2) >>> square_adder(1,10) 285 >>> mul_adder = fsum(lambda x: x*3) >>> mul_adder(1,10) 135 >>> sum_adder = fsum(lambda x: x+5) >>> sum_adder(1,10) 90
0 notes
pythonplay · 11 years
Text
Functional Programming - map()
From the docs:
map(function,sequence)calls function(item)for each of the sequence’s items and returns a list of the return values. For example, to compute some cubes:
>>> def cube(x): ... return x*x*x ... >>> map(cube,range(1,11)) [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
More than one sequence may be passed; the function must then have as many arguments as there are sequences and is called with the corresponding item from each sequence (or Noneif some sequence is shorter than another). For example:
>>> seq=range(8) >>> def add(x,y): return x+y ... >>> map(add,seq,seq) [0, 2, 4, 6, 8, 10, 12, 14]
If I wanted the sum of squares of 1 to 10, then this would do it:
>>> def square(x): ... return x*x ... >>> sum(map(square,range(1,11))) 385
0 notes
pythonplay · 11 years
Text
Functional Programming In Python - Filtering Lists Using filter()
Consider this list,
foo = range(1,100)
Now I have to find only multiples of 5 in foo.
Without FP, You (or I) would probably do something like this:
>>> >>> bar = [] >>> for x in foo: ... if x%5 == 0: ... bar.append(x) ... >>> print bar [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95] >>>
But with FP, the solution is quite elegant, here's how it can be done:
>>> def f(x): return x%5 == 0 ... >>> filter(f, foo) [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
filter(function, list) takes the list and applies the function on each item of the list.
From the docs:
filter(function, sequence) returns a sequence consisting of those items from the sequence for which function(item) is true.
0 notes
pythonplay · 11 years
Text
Introduction To List Comprehensions And Reading Lines From File Into A List
List comprehensions is an exciting feature of Python. You can build entire lists using one statement. Say, I want a list of all squares of numbers between 1 to 10, or I want a list of numbers x to the power of some y, or I have to find only even squares, then I do this:
Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> [x**2 for x in range(1,10)] [1, 4, 9, 16, 25, 36, 49, 64, 81] >>> [x**y for x in range(1,5) for y in range(1,3)] [1, 1, 2, 4, 3, 9, 4, 16] >>> [x**2 for x in range(1,10) if x%2 == 0] [4, 16, 36, 64] >>> [x**y for x in range(1,5) for y in range(1,4) if not x==y] [1, 1, 2, 8, 3, 9, 4, 16, 64] >>>
And here's how you can read lines from a file into a list using list comprehension:
import sys lines = [line.strip() for line in open("".join(sys.argv[1]),"r")] for line in lines: print line
0 notes
pythonplay · 11 years
Text
Executing Linux Shell (Bash) Commands Within Python Code
Ever wondered how to run a linux command like say, grep from within python code? It's fairly simple. Consider I have file1.txt with the following lines:
Hello1 World1 Hello2 World2
Now let's say I have to move all the lines from file1.txt that starts with Hello, to file2.txt
Here's the grep command that'll do it:
grep "^Hello" file1.txt > file2.txt
Okay, now how do I execute this inside a python code? Using subprocess:
import subprocess p = subprocess.Popen('grep "^Hello" file1.txt > file2.txt', stdout=subprocess.PIPE,shell=True) p.communicate()
If you don't want to redirect to file2.txt, but rather you want to parse the output of the command within python, then you can do this:
import subprocess p = subprocess.Popen('grep "^Hello" file1.txt', stdout=subprocess.PIPE,shell=True) output, errormsg = p.communicate() #do something with output print output
Also do note that as the docs say:
Warning: Passing shell=True can be a security hazard if combined with untrusted input.
0 notes
pythonplay · 11 years
Text
Reading And Parsing Files Using Command Line Arguments In Python
For every line in this file.txt:
Line1 Line2 Line3
I need to parse each line, line by line.
For this, I can do something like this in python:
#readfile.py import sys def read_file(): #sys.argv[1] will read the first argument #"".join() will string-ize the list element for line in open("".join(sys.argv[1]),"r"): #do something with each line here #.strip() function removes \n characters print line.strip() if __name__ == "__main__": read_file()
We can run this like so:
$ python readfile.py file.txt Line1 Line2 Line3
0 notes
pythonplay · 11 years
Text
What Do The Underscores Mean In __init__()?
From Python Style Guide:
_single_leading_underscore: weak "internal use" indicator. E.g.
from M import *
does NOT import objects whose name starts with an underscore.
single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g.
Tkinter.Toplevel(master, class_='ClassName')
__double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
0 notes