Occasionally while out at social engagements I am asked the ominous question “What do you do?” While I have no shame admitting I am a Systems Engineering faculty member, inevitably a sizable fraction of people follow up with something along the lines of “Oh… so, like, computers?”
My response is somewhat complicated by the fact that I do indeed use computers in my research and my academic unit is the Systems and Software Engineering Division but I am definitely not a computer systems engineer. The subsequent conversation opens a much longer discussion of what, then, exactly is Systems Engineering? Depending on the captive audience, I have explained Systems Engineering in various ways ranging from management of large projects or technical organizations to the business and economics of engineering.
However, with the advent of machine learning, we can now avoid any awkward explanations and rely on cold, hard data. As a recent foray to sklearn, a Python library for machine learning algorithms, I decided to answer the question for myself: What is Systems Engineering as a field of study?
My data sources included course catalog descriptions from 10 leading institutions with distinguished Industrial and Systems Engineering graduate programs: University of California, Berkeley, Columbia University, Cornell University, Georgia Tech, University of Michigan, Northwestern University, Purdue University, Virginia Tech, University of Wisconsin, and my own Stevens Institute of Technology. It should be noted that each program has a unique name or names for its programs which include nearly all permutations of the terms systems engineering, industrial engineering, engineering management, and operations research. I did my best to collect all relevant courses but inevitably missed some. From the data I collected, I omitted any non-specific courses like senior design, co-ops, independent study, special topics, seminars, or thesis credits, spell-checked descriptions (lots of typos!), and censored meta-information (e.g. prerequisites, requirements). I ended up with about 430 KB of raw data on 830 total courses (50-100 per university).
I used the k-means clustering algorithm to identify 12 clusters of courses using the course titles and descriptions. In no specific order, these include the following with corresponding top 4 centroid words:
- Quality, Control, Manufacturing, Reliability
- Experiments, Factorial, Regression, Designs
- Optimization, Programming, Problems, Linear
- Simulation, Analysis, Modeling, Event
- Financial, Risk, Markets, Finance
- Information, Supply, Chain, Management
- Human, Design, Factors, Systems
- Production, Planning, Control, Inventory
- Systems, Design, Engineering, Analysis
- Data, Business, Students, Course
- Processes, Markov, Stochastic, Queuing
- Probability, Statistics, Regression, Statistical
To be clear, the terms above were automatically generated as outputs from the k-means clustering process. I was blown away when I first saw the results that they were so descriptive and coherent. Some caveats: there are several tuning parameters I used to arrive at this list and there is a non-zero effect of initial conditions so results are not perfectly stable – for example, information is grouped with supply chain management and data with business while they are grouped together in their own cluster under other initial conditions. Results are reasonably stable for the fundamental features to limit quibbling over details.
My other main output was a post-processed visualization of the high-dimensional classification. The results below show the 830 courses classified by cluster and projected onto a two-dimensional space. The proximity of any two courses is roughly a measure of their similarity; although the projection is not perfect and can be roughly compared to a cartographic projection. I colored the background based on the most dominant cluster at each point and weighted the alpha channel to show the degree of domination by particular clusters.
A few interesting takeaways:
- The human factors cluster (light orange region in the east) which contains courses related to ergonomics, cognitive science, and other softer sciences is nearly diametrically opposed to the optimization cluster which contains courses related to linear and integer programming (light green region in the west).
- The six most defined clusters are: optimization (light green), production planning and inventory control (dark orange), human factors (light orange), modeling and simulation (dark green), stochastic processes (yellow), and risk and financial markets (light red).
- The systems engineering and design courses (light purple) are the most numerous, but also distributed throughout the eastern half of the projection and do not dominate any compact region.
- The two courses I teach at Stevens – SYS 611 Simulation and Modeling and SYS 601 Probability and Statistics for Systems Engineers – are near the center of the projection. This doesn’t quite mean I am at the center of the field, but at least I on the map!
While it is not perfect, this chart goes a long ways towards describing what systems engineering is and reminds me of other topographical representations. Furthermore, it is entirely based on objective data and should only improve in accuracy with additional data! The underlying analysis methods are inherently statistical and pretty robust to mediocre-quality data as usually provided in catalog descriptions. Now I have a pretty good answer for the next time I encounter an unsuspecting stranger!