Michael E. Byczek
Software Engineer


The Python Programming Language

My expertise with Python has spanned the entire range of features and capabilities. Representative Projects:

Real Estate

Developed and continually enhance/update a data-driven analytics platform along with web-based review tools scalable to index every residential home in Illinois for homeowners to benefit from instant evaluation of all comparable properties for reduced tax assessments.

Finance

Financial analysis using the Quant Platform and DX Analytics library to model derivatives instruments and portfolios. This included the Fundamental Theorem of Asset Pricing, the European and American exercise to value options and derivatives, Monte Carlo simulations, stochastic processes, and risk management.

Other financial tools include the mean-variance portfolio theory (MPT) to model diversification and portfolio optimization for minimal risk or maximum return.

Physics

Contribution to independent analysis of physics experiments at CERN, the European Council for Nuclear Research using the ROOT scientific software framework (C++ with Python extension modules). Analysis of collisions at near the speed of light (one billion per second) to examine subatomic particles.

Built a mathematics platform with Python CGI scripts (local & cloud) to solve calculus and physics equations.

Social Media

Insight into how consumers engage with products, services, and brand names through real-time analysis of hashtag posts on Twitter with Python APIs.

Account management of Twitter and Tumblr posts through APIs.

Intellectual Property

Evaluation of trademarks for particular industries through brand recognition, such as restaurants, sporting events, fitness clubs, hotels, and retail companies.

Copyright infringement in the entertainment business, such as music, videos, and live performances.

Design of a machine learning platform to classify ten million patents in the United States into 200,000 internationally-recognized technical areas of subject matter specification to expedite the research of similar applications and identify trends in emerging technologies.

Useful Python packages

Scientific Computation
SciPy

A collection of mathematics, scientific, and engineering packages that include NumPy, Sympy, Matplotlib, IPython, and pandas.

NumPy

A fundamental scientific computing package that offers N-dimensional array object, broadcasting functions, integration with C/C++ and Fortran code, linear algebra, Fourier transform, multi-dimensional container of generic data, arbitrary data types, and database integration.

pandas

Data structures (dataframe and series) and data analysis tools similar to what is offered by default in the R language. Features include read/write data in various formats (i.e. CSV, Excel, and SQL databases), data alignment, handling of missing data, pivoting of data sets, size mutability, slicing, subsetting large data sets, split-apply-combine operations, merge/join data sets, and time-series functionality.

IPython

Provides an interactive shell, data visualization, GUI tools, parallel computing. Used for advanced statistics and quantum mechanics. Also acts as a kernel for Jupyter.

Math and Statistics

SymPy

Symbolic mathematics and a full-featured computer algebra system. Statistics capabilities include probability, probability density, expected value/variance, and random variable types. The package can is used for solving equations, calculus, matrices, and discrete math.

Statsmodels

Explore data, estimate statistical models and perform statistical tests. This includes descriptive statistics, statistical tests, plotting, and result statistics. Features: linear regression, time series, nonparametric estimators, and unit tests for correctness of results.

Machine Learning

scikit-learn

Machine learning capabilities built on NumPy, SciPy, and matplotlib. Used for data mining and data analysis: classification (identifying which category an object belongs), regression (predicting a continuous-valued attribute association with an object), clustering (group similar items into sets), dimensionality reduction (reduce number of random variables), model selection (compare, validate, and choose parameters/models), and preprocessing (feature extraction and normalization).

SHOGUN

Designed for unified large-scale learning for classification, regression, and explorative data analysis. A primary feature is the unified interface from multiple languages, such as Python, R, Java, and C++. Other benefits include clustering, metric, structured output, online learning algorithms.

PyBrain

Offers flexible and easy-to-use algorithms and the ability to test/compare these methods. The software is designed for both entry level students and state-of-the-art research. Algorithms include neural networks, reinforcement learning, unsupervised learning, black box optimization, and evolution.

PyMC

Implements the Metropolis-Hastings algorithm as a statistical package for Markov Chain Monte Carlo sampling. Includes methods for summarizing output, plotting, goodness-of-fit, and convergence diagnostics. Intended to provide efficient Bayesian analysis.

Plotting and Visualization

matplotlib

2D plotting for publication quality figures in hardcopy format. Used in python scripts, shell, Mathematica, Matlab, web application servers, and graphical user interfaces. Generate plots, histograms, power spectra, bar charts, errorcharts, and scatterplots.

Bokeh

Interactive visualization library for web browsers. Used to build elegant graphics with interactivity over very large or streaming data applications.

ggplot

Plotting system based on ggplot2 available for R. Used to make professional quality plots with minimal code. Not intended for highly customized data visualizations. Multiple layers can be combined, such as points, lines, and trendline.

Plotly

Used for dashboards, scatter plots, charts (line, bubble, bar, pie), time series, treemaps, and tables. Statistical features include error bars, box plots, histograms, 2D density plots, and distplots. 3D plots include wireframe, point clustering, parametric, scatter, surface, ribbon, and filled line.

prettyplotlib

Used to enhance mathplotlib plots through color perception and information design.

Seaborn

Visualization library based on mathplotlib for drawing attractive statistical graphics. Also supports numpy and pandas data structures long with statistical routines from scipy and statsmodels. Benefits include the ability to reveal patterns in data, comparisons between subsets, discover structure in matrices, and represent uncertainty of time series estimation.

Main