How to get key and value in well format and n/a values at the end in pandas

Sort the data in ascending order and the keys which are not present need to be printed in the last.

Please suggest a solution and also suggest if any modifications is required.

Input.txt-

3=1388|4=1388|5=M|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46
3=1388|4=1388|5=M|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45

Program-Code-

import pandas as pd
import numpy as np 
from operator import itemgetter   
df = pd.read_csv("C:\",index_col=None, names=['text'])
s = df.text.str.split('|')
ds =[dict(w.split('=',1 ) for w in x) for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.replace(np.nan,'n/a', regex=True)
st = p1.stack(level=0,dropna=False)
dfs = [g for i, g in st.groupby(level=0)]
dfs_length = len(dfs)
i = 0
while i < len(dfs):    
    print '\nindex[%d]'%i
    for (_,k),v in dfs[i].iteritems():
        print k,'\t',v
    i = i + 1

OUTPUT (I got):

index[0]
1021      1500      
1584      88929     
16        1468416600.6006
18        1468416600.6006
22        89020     
3         1388      
361       0.73      
362       0.46     
388       157.75    
394       157.75    
4         1388      
400       157.75    
444       20160713  
459       93000546718000
461       7         
474       157.75    
5         M       
55        93000552181000
8         157.75    
854       n         
9         88929     
981       0         

index[1]
1021      0         
1584      89029     
16        1468416601.372
18        1468416601.372
22        89120     
3         1388      
361       0.71      
362       0.45     
388       n/a       
394       157.73    
4         1388      
400       n/a       
444       20160713  
459       93001362639104
461       26142     
474       157.749977558
5         IBM       
55        93001362849000
8         157.73    
854       p         
9         100       
981       0         

EXPECTED OUTPUT

index[0]
3         1388
4         1388
5         M
8         157.75
9         88929
16        1468416600.6006
18        1468416600.6006
22        89020
55        93000552181000
361       0.73
388       157.75
394       157.75
400       157.75
444       20160714
459       93000546718000
461       7
474       157.75
854       n
981       0
1021      1500
1584      88929

index[1]
3         1388 
4         1388 
5         M 
8         157.73 
9         100      
16        1468416601.372
18        1468416601.372
22        89120 
55        9300136284900 
361       0.71      
362       0.45  
394       157.73 
444       20160713  
459       93001362639104
461       26142     
474       157.749977558 
854       p   
981       0    
1021      0         
1584      89029          
388       n/a       
400       n/a       

Answers


You can use stack for creating Series, which is splited by = by split and converted to DataFrame. Then first column is casted to int, overwrite index by first column by set_index and sort_index:

temp=u"""3=1388|4=1388|5=M|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|',index_col=None, header=None)
df1 = df.stack().str.split('=', expand=True)
df1.iloc[:,0] = df1.iloc[:,0].astype(int)
df1 = df1.set_index(0).sort_index()
print (df1)
                    1
0                    
3                1388
4                1388
5                   M
8              157.75
9               88929
16    1468416600.6006
18    1468416600.6006
22              89020
55     93000552181000
361              0.73
388            157.75
394            157.75
400            157.75
444          20160713
459    93000546718000
461                 7
474            157.75
854                 n
981                 0
1021             1500
1584            88929

Another solution with sort_values:

df1= df.stack().str.split('=', expand=True)
df1.columns = ['a','b']
df1['a'] = df1['a'].astype(int)
df1 = df1.reset_index(drop=True).sort_values('a')
print (df1)
       a                b
0      3             1388
1      4             1388
2      5                M
3      8           157.75
4      9            88929
19    16  1468416600.6006
20    18  1468416600.6006
15    22            89020
14    55   93000552181000
17   361             0.73
7    388           157.75
8    394           157.75
16   400           157.75
11   444         20160713
12   459   93000546718000
13   461                7
9    474           157.75
6    854                n
18   981                0
5   1021             1500
10  1584            88929

Need Your Help

SLF4J logging, different Levels

java logging

In SLF4J (Logging) how levels are different in characteristic. i.e. How ERROR message is different than DEBUG message.

How to display sinatra error and other variables in minitest output

ruby unit-testing sinatra minitest

I am testing and coding a sinatra app using minitest. Currently, I use last_response.status to see what happened with the request by: