Python/pandas data code analysis

Import pandas as pd
Df = pd.DataFrame({'key1':list('aabba'),
'key2': ['one','two','one','two','one'],
'data1': np.random.randn(5),
'data2': np.random.randn(5)})
Df123456
Grouped=df['data1'].groupby(df['key1'])
Grouped.mean()12
The above grouping keys are Series, in fact the grouping key can be any array of appropriate length
States=np.array(['Ohio','California', 'California', 'Ohio', 'Ohio'])
Years=np.array([2005,2005,2006,2005,2006])
Df['data1'].groupby([states,years]).mean()123
Df.groupby('key1').mean()1
It can be seen that there is no key2 column, because df['key2'] is not numeric data, so it is removed from the result. By default, all numeric columns are aggregated, although sometimes they may be filtered into a subset.
Iterate over the group
For name, group in df.groupby('key1'):
Print (name)
Print (group)123
It can be seen that name is the value of key1 in groupby, and group is the content to be output.
The same reason:
For (k1,k2),group in df.groupby(['key1','key2']):
Print ('===k1,k2:')
Print (k1,k2)
Print ('===k3:')
Print (group)12345
Operate the contents of group by, such as converting to a dictionary
Piece=dict(list(df.groupby('key1')))
Piece
{'a': data1 data2 key1 key2
0 -0.233405 -0.756316 a one
1 -0.232103 -0.095894 a two
4 1.056224 0.736629 a one, 'b': data1 data2 key1 key2
2 0.200875 0.598282 b one
3 -1.437782 0.107547 b two}
Piece['a']123456789101112
Groupby defaults to grouping on axis=0, and can be grouped on any other axis by setting.
Grouped=df.groupby(df.dtypes, axis=1)
Dict(list(grouped))
{dtype('float64'): data1 data2
0 -0.233405 -0.756316
1 -0.232103 -0.095894
2 0.200875 0.598282
3 -1.437782 0.107547
4 1.056224 0.736629, dtype('O'): key1 key2
0 a one
1 a two
2 b one
3 b two
4 a one
123456789101112131415
Select one or a group of columns
For big data, in many cases, only some columns need to be aggregated.
Df.groupby(['key1','key2'])[['data2']].mean()
12
Group by dictionary or series
People=pd.DataFrame(np.random.randn(5,5),
Columns=list('abcde'),
Index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])
People.ix[2:3,['b','c']]=np.nan #Set a few nan
People123456
Known grouping relationship of columns
Mapping={'a':'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f': 'orange' }
By_column=people.groupby(mapping, axis=1)
By_column.sum()12345
If you do not add axis=1, only abcde will appear.
The same is true for Series
Map_series=pd.Series(mapping)
Map_series
a red
b red
c blue
d blue
e red
f orange
Dtype: object
People.groupby(map_series,axis=1).count()123456789101112
Group by function
Compared to dic or Series, python functions are more creative when defining grouping relationship mapping. Any function that is treated as a grouping key will be called once on each index, and its return value will be used as the group name. Suppose you group by the length of the person's name, just pass in len
People.groupby(len).sum() abcde 3 -1.308709 -2.353354 1.585584 2.908360 -1.267162 5 -0.688506 -0.187575 -0.048742 1.491272 -0.636704 6 0.110028 -0.932493 1.343791 -1.928363 -0.36474512
Mixing functions with arrays, lists, dictionaries, and Series is not a problem, because anything will eventually be converted to an array.
Key_list=['one','one','one','two','two'] people.groupby([len,key_list]).sum()1
Group by index level
The most convenient part of a hierarchical index is that he can aggregate based on the index level. To achieve this, you can enter the level number or name through the level keyword:
Columns=pd.MulTIIndex.from_arrays([['US','US','US','JP','JP'],[1,3,5,1,3]],names=['cty' , 'tenor'])
Hier_df=pd.DataFrame(np.random.randn(4,5),columns=columns)
Hier_df123
Hier_df.groupby(level='cty', axis=1).count()1
Data aggregation
Call a custom aggregate function
Column-oriented multi-function application
Aggregate operations on Series or DataFrame columns actually use aggregate or call mean, std, and so on. Next we want to use different aggregate functions for different columns, or apply multiple functions at once.
Grouped=TIps.groupby(['sex','smoker'])
Grouped_pct=grouped['TIp_pct'] #TIp_pct
Grouped_pct.agg('mean')# For the statistics described in the 9-1 icon, you can pass the function name directly as a string.
#If you pass in a set of functions, the column name of the obtained df will be named 12345 with the corresponding function.
The automatically given column name is low. If a list of (name, function) tuples is passed in, the first element of each tuple will be used as the column name of df.
For df, you can define a set of functions for all columns, or apply different functions in different columns.
If you want to apply different functions to different columns, the specific way is to agg a dictionary that maps from column names to functions.
Df can have hierarchical columns only when applying multiple functions to at least one column
Group-level operations and transformations
Aggregation is just one type of grouping operation, which is a special column of data conversion. Transform and apply are more forked.
Transform will apply a function to each group and then put the results in the appropriate location. If each group produces a scalar value, the scalar value will be broadcast.
Transform is also a special function with strict conditions: the passed function can only produce two kinds of results, either to produce a scalar value that can be broadcast (eg np.mean), or to produce an array of results of the same size.
People=pd.DataFrame(np.random.randn(5,5),
Columns=list('abcde'),
Index=['Joe','Steve', 'Wes', 'Jim', 'Travis'])
People
12345
Key=['one','two','one','two','one']
People.groupby(key).mean()12
People.groupby(key).transform(np.mean)1
You can see that there are many values â€‹â€‹as in Table 2.
Def demean(arr):
Return arr-arr.mean()
Demeaned=people.groupby(key).transform(demean)
Demeaned12345
Demeaned.groupby(key).mean()1
The most general groupby method is apply.
Tips=pd.read_csv('C:\\Users\\ecaoyng\\Desktop\\work space\\Python\\py_for_analysis_code\\pydata-book-master\\ch08\ips.csv')
Tips[:5]12
New generation of a column
Tips['tip_pct']=tips['tip']/tips['total_bill']
Tips[:6]12
Select the top 5 tip_pct values â€‹â€‹based on the grouping
Def top(df,n=5,column='tip_pct'):
Return df.sort_index(by=column)[-n:]
Top(tips,n=6)123
Group the smoker and apply the function
Tips.groupby('smoker').apply(top)1
Multi-parameter version
Tips.groupby(['smoker','day']).apply(top,n=1,column='total_bill')1
Quantile and barrel analysis
Cut and qcut combined with groupby makes it easy to analyze the bucket or quantile of the dataset.
Frame=pd.DataFrame({'data1':np.random.randn(1000),
'data2': np.random.randn(1000)})
Frame[:5]123
Factor=pd.cut(frame.data1,4)
Factor[:10]
0 (0.281, 2.00374]
1 (0.281, 2.00374]
2 (-3.172, -1.442)
3 (-1.442, 0.281)
4 (0.281, 2.00374]
5 (0.281, 2.00374]
6 (-1.442, 0.281)
7 (-1.442, 0.281)
8 (-1.442, 0.281)
9 (-1.442, 0.281)
Name: data1, dtype: category
Categories (4, object): [(-3.172, -1.442] " (-1.442, 0.281) " (0.281, 2.00374] " (2.00374, 3.727]] 123456789101112131415
Def get_stats(group):
Return {'min':group.min(),'max':group.max(),'count':group.count(),'mean':group.mean()}
Grouped=frame.data2.groupby(factor)
Grouped.apply(get_stats).unstack()1234
These are buckets of equal length. To get equal buckets based on the number of samples, use qcut.
Buckets of equal length: equal intervals
Equal-sized buckets: equal number of data points
Grouping=pd.qcut(frame.data1,10,labels=False)#label=false can get the quantile number
Grouped=frame.data2.groupby(grouping)
Grouped.apply(get_stats).unstack()123

57 Modular Jack
57 Jack.China RJ11 Jack 1X5P,RJ11 Connector with Panel supplier & manufacturer, offer low price, high quality 4 Ports RJ11 Female Connector,RJ11 Jack 6P6C Right Angle, etc.

The RJ-45 interface can be used to connect the RJ-45 connector. It is suitable for the network constructed by twisted pair. This port is the most common port, which is generally provided by Ethernet hub. The number of hubs we usually talk about is the number of RJ-45 ports. The RJ-45 port of the hub can be directly connected to terminal devices such as computers and network printers, and can also be connected with other hub equipment and routers such as switches and hubs.
RJ11 Jack 1X5P,RJ11 Connector with Panel,4 Ports RJ11 Female Connector,RJ11 Jack 6P6C Right Angle
ShenZhen Antenk Electronics Co,Ltd , https://www.antenksocket.com