Michael E. Byczek, Software Engineer

Data Science

Data tells a story. Data science is about telling that story through effective communication and business acumen.

Data science is interdisciplinary through the intersection of computer science to develop the algorithms to store, process, and visualize data with domain expertise to formulate the right questions and statistics to model datasets.

Subject matter expertise is critical to ask the right questions and transform business challenges into data solutions.

Specialize in understanding the data being used for analytics to obtain the highest quality data set for the most accurate results, such as feature engineering.

Optimize business intelligence for innovative strategies to maintain a competitive advantage, increase revenue, and retain customers.

An intriguing challenge is how best to present this data in a manner to assist each employee and decision-maker at the company to maximize the insight obtained from analytical research.

The goal is to assist clients with optimizing data science through the process of obtaining knowledge and insight from data with a comprehensive approach that involves predictive (forecasting), descriptive (data mining and business intelligence), and prescriptive (optimization) techniques.

Descriptive: what happened and why did it happen, such as data mining. This is based on analyzing past events to understand the reasons for particular observations or trends.

Predictive: utilize current and historical data to make predictions about the future through the probability that a certain event might occur, such as at what price an item might sell.

Prescriptive: how best this insight can be used to capitalize on the predictions.

For example, the sequence of events taken by an online shopper form the decision-making process. A collection of transactions, such as monthly sales, is the voice of a company's customers. The ability to understand why these decisions were made provides the opportunity to make future predictions.

The most important aspect of data science is selecting how best to format the data being used for analytics through feature engineering. A business might retain all billing addresses for their customers. However, it is better to split the billing address into street, city, and zip code. That will answer questions that pertain all customers in a particular city, neighborhood, or larger region to identify shopping patterns.

Another example is a calendar. Data alone does not differentiate between shopping on Black Friday, Cyber Monday, or any other day of the week, month, or year. It would be better to tag certain days or range of days with special attention to understand shopper habits.

The goal of data mining is to extract information from data and structure the results for further analysis. Factors that impact big data analysis include volume of data, variety of the data, speed at which generated and processed, inconsistency, and quality.

Machine Learning

Machine learning is used to understand data without explicitly programming a computer with instructions to obtain a result. A pattern in existing data is used to make predictions about future data sets.

Algorithms are used to find those patterns in existing data and construct mathematical models from those patterns. The most important step is to collect and prepare the best possible data values to improve the quality of predictions.

A large portion of the data is used for the training phase (find patterns) and the remaining segment to evaluate the model before being applied to actual business operations.