My focus has been data-driven research using Python for the design and implementation of entire data analytical platforms. This has included user interfaces, database back-ends, web crawlers, file parsers, and automated reports. For example, the pandas package is used to load data directly from databases (MySQL, PostgreSQL, and MongoDB), CSV files, and Microsoft Excel spreadsheets. This has involved manipulating, processing, cleaning, and crunching structured data. The primary goal has been to gain insight from data. I adhere to the interdisciplinary nature of data science through the intersection of computer science, domain expertise, and mathematics. A representative sample of my projects include how consumers engage with products, services, and brand names through real-time analysis of social media posts on Twitter and Tumblr with Python APIs; every residential and commercial property in the counties of Cook, DuPage, and Will in Illinois for real estate analysis; ten million patents registered in the U.S.; evaluation of trademarks for particular industries through brand recognition, such as restaurants, sporting events, fitness clubs, hotels, and retail companies; and copyright infringement in the entertainment business, such as music, videos, and live performances.
Data Visualization Techniques
Data visualization was performed with standalone and cloud-based access to Microsoft Power BI through integration with Microsoft Excel spreadsheets, such as PivotTables, PowerPivot, and PivotCharts. Advanced experience with Excel has also covered slicers and relationships; formulas, equations, and functions; macros; management of worksheets/workbooks; table filtering/sorting; and charts/graphs. Skills include the capabilities of Visual Basic for Applications (VBA) and macros to automate spreadsheet tasks. I use both pivot tables and standard tools, like VLookup and HLookup to perform analysis. Data science projects has included converting a large Excel-based analytics tool into a full data platform using Python and database back-end.
The R Statistical Language
I have used R Studio for statistical analysis, including data sets with over three million entries, with tools such as min/max, mean, mode, percentile, aggregates, deviation, count, sum, and frequency. This has involved the stats, base, and utils packages. A recent example involved several million consumer transactions at multiple business locations to gain insight from inventory and supply chain activity, performance at particular shops, calendar-based data, and ways to improve lower performing factors.
Data Models
Manage dali stage includes all three conceptual, logical, and physical data models. Conceptual aspects have involved scope and facts to be represented, such as entity classes and their relationships. Logical model design has covered the descriptive components that designate tables and columns in the physical database. The primary goal has been to devise a standard and formal framework for projects to expand as more data and analytical capabilities are added to the system. Projects have involved the design of data warehouses for analytics, collecting data from disparate sources, and providing historical access to several years worth of information. Design has covered lake architecture to leverage raw data for exploration and discovery through quick ingestion and rapid processing for analytics. Experience with data architecture has covered every stage from start to finish of identifying data, locating the sources, extracting elements, cleaning the data, inserting the results into a database, and documenting every aspect of these steps to fully explain and describe the data. I have used formal techniques to model data in a standardized and consistent model for managing all data retention, data warehousing, and documenting how users can access/analyze the contents. I have documented and designed the best methods to collect data values, define relationships, and write the code to perform functions. This includes linked lists, arrays, dictionaries, tuples, and classes. data collection from identifying sources to acquisition of datasets with wrangling and archival for specialized tasks.
Data Engineering
I have extensive experience throughout the entire data-driven research lifecycle that combines the role of data engineer (provide readiness of data) and data scientist (derive insight from data). Within the role of data engineering, I have gathered and collected data, administered the database architecture for archival, process the data, and provide a means to access the backend for analytics. This has involved laying the data out in the best meaningful format for future analytical research through development of a data pipeline process. My focus as a data engineer has been to create software solutions for big data. As an entrepreneur in law, real estate, and engineering, I have very specialized skills using both data science to stay competitive in a revenue-generating business model and data engineering to build the data tools to achieve these results.
Data Science Techniques
Data science is interdisciplinary through the intersection of computer science to develop the algorithms to store, process, and visualize data with domain expertise to formulate the right questions and statistics to model datasets.
Subject matter expertise is critical to ask the right questions and transform business challenges into data solutions.
Specialize in understanding the data being used for analytics to obtain the highest quality data set for the most accurate results, such as feature engineering.
Optimize business intelligence for innovative strategies to maintain a competitive advantage, increase revenue, and retain customers.
An intriguing challenge is how best to present this data in a manner to assist each employee and decision-maker at the company to maximize the insight obtained from analytical research.
The goal is to assist clients with optimizing data science through the process of obtaining knowledge and insight from data with a comprehensive approach that involves predictive (forecasting), descriptive (data mining and business intelligence), and prescriptive (optimization) techniques.
Descriptive: what happened and why did it happen, such as data mining. This is based on analyzing past events to understand the reasons for particular observations or trends.
Predictive: utilize current and historical data to make predictions about the future through the probability that a certain event might occur, such as at what price an item might sell.
Prescriptive: how best this insight can be used to capitalize on the predictions.
For example, the sequence of events taken by an online shopper form the decision-making process. A collection of transactions, such as monthly sales, is the voice of a company's customers. The ability to understand why these decisions were made provides the opportunity to make future predictions.
The most important aspect of data science is selecting how best to format the data being used for analytics through feature engineering. A business might retain all billing addresses for their customers. However, it is better to split the billing address into street, city, and zip code. That will answer questions that pertain all customers in a particular city, neighborhood, or larger region to identify shopping patterns.
Another example is a calendar. Data alone does not differentiate between shopping on Black Friday, Cyber Monday, or any other day of the week, month, or year. It would be better to tag certain days or range of days with special attention to understand shopper habits.
The goal of data mining is to extract information from data and structure the results for further analysis. Factors that impact big data analysis include volume of data, variety of the data, speed at which generated and processed, inconsistency, and quality.
Machine Learning
Machine learning is used to understand data without explicitly programming a computer with instructions to obtain a result. A pattern in existing data is used to make predictions about future data sets.
Algorithms are used to find those patterns in existing data and construct mathematical models from those patterns. The most important step is to collect and prepare the best possible data values to improve the quality of predictions.
A large portion of the data is used for the training phase (find patterns) and the remaining segment to evaluate the model before being applied to actual business operations.