SETTING THE BENCHMARK

Data science and big data: It’s not just a curiosity anymore

by David Cavanaugh / September 9, 2019

Data science is data transformed into information and then transformed into insight

Say you have a big pile of data; is a good summary enough? The obvious answer is that it’s not enough for key decision makers who have real money on the line. Modern database technology is a marvel for efficient data organization and information from queries and reports for functional operational silos. On the other hand, modern databases like Enterprise Resource Planning (ERP) systems alone cannot provide the top level insight that guide company level decisions that lead to significant operational efficiency improvements, increased market penetration for the company and avoidance of large costs from bad data/poor analysis (6).

Big data in various incarnations has been at home in advanced manufacturing for a number of years. Some of the earliest examples of leveraging big data in a corporate setting were the supply chain and costing analysis tools manufacturers developed to work with data accumulating in early ERPs and in manufacturing process control and data collection systems. These tools have evolved to leverage both advanced database/storage architectures and machine/learning/advanced analytical platforms to arrive at more advanced insights that drive purchasing and sourcing decisions and advanced manufacturing solutions. The natural next step is to add additional data gathered in the manufacturing process itself from IoT-enabled machines — true Industry 4.0. At each stage in this evolution, the ability to convert data from raw numbers to recommendations and, in some cases, automated decision making, is the difference between a jumble of raw data and actionable insights.

It’s the interconnections within the data

Unfortunately, within too many companies relevant data is separated into silos with no easy way to hook the data together in meaningful ways (5). Often the correlations within the data defy the second transformation to insight because of the limitations of first order logic inherent within traditional computing languages, including the database languages in modern relational databases such as SQL. These are the interconnections or correlations that subject matter experts might see with tools from statistics, pattern recognition and machine learning, providing they are able to acquire sufficiently broad, cleansed and normalized data. Fundamentally data insight is a journey, so what can be done to bring software automation to bear? In other words, can the software  at some level “think” like people?

Sometimes the data is fuzzy

data-science-big-data-02Sadly, we have to face the brutal truth that our data has non-trivial errors, noise and ambiguity. With enough data and the right tools we can work around these problems (2). Analytical systems based on standard, first level logic do not deal very well with these problems, so systems such as statistical and fuzzy logic must be used to extract the information from the data and insight from the information.

Machine learning

This is a relatively recent buzz word for a number of extant systems that have been brought together in new ways (4). One of the major streams is neural networks where a mathematical model of a simulated neuron is put to work in a complex arrangement (fields) of interconnected “neurons.” Neural nets have been successively deployed to solve hard computational problems in natural language processing, image recognition, image classification and general analysis. Less recognized are multivariate statistical methods like cluster detection algorithms and hybrid algorithms like multi-tree (or forest) classification methods. To an extent more traditional pattern recognition methods such as support vector machines, Principal Components Analysis and Analysis of Patterns (1) can perform nearly as well as neural nets and can serve as important adjunct methods.

Big data takes new hardware and software architectures

data-science-big-data-03When your “data lake” gets really big, you will need one or more specialists to navigate the complex issues associated with the hardware, system software and application software (i.e. the stack). Orchestrating a collection of very “beefy” servers and a number of software processes running at the same time is no simple task. There are a couple of major software technologies to address the whole system level control (VM and Kubernetes/Containers). The analysis software has quite a few platforms and software applications to reformat/normalize data, handle software/data logistics and analysis software.

What does this mean for companies?

Big data (5) and data science are major trends that are making large penetrations into companies, academia and government, a trend that can no longer be treated as a curiosity. If done correctly, and at a sensible tempo,  data science can really pay off for small to large institutions and companies. It’s time to cure the “data bloat” and really tap the unrealized potential that lies within our data (5, 6).

Benchmark is part of the big data movement both as a user applying big data capabilities to supply chain and process management, as well as working with our connected industry and connected medical customers to collect reliable, relevant data. We put the insights we generate to work in supply chain optimization, as well as sharing with our customers through dashboards and portals to provide a holistic picture of their product.

Interested in talking with David or other members of the Benchmark supply chain solutions team, operational excellence team or connected device design engineering team about how the right data and analysis can drive supply chain and process improvements?  Contact us today. 

References

  1. Analysis of Patterns (ANOPA), a New Pattern Recognition Mathematical Procedure for Problems in Engineering and Science. ResearchGate. DOI: 10.13140/RG.2.1.1172.1041. https://www.researchgate.net/publication/284470493_Analysis_of_Patterns_ANOPA_a_New_Pattern_Recognition_Mathematical_Procedure_for_Problems_in_Engineering_and_Science
  2. 6 myths about big data. TechRepublic. https://www.techrepublic.com/article/6-myths-about-big-data/
  3. Demand for data scientists is booming and will only increase. TechTarget. https://searchbusinessanalytics.techtarget.com/feature/Demand-for-data-scientists-is-booming-and-will-increase
  4. Statistics and Machine Learning at Scale. SAS. https://www.sas.com/content/dam/SAS/en_us/doc/conclusionpaper1/statistics-machine-learning-at-scale-107284.pdf
  5. Big Data Computing. Edited By Rajendra Akerkar; Chapter 3, Big Data: Challenges and Opportunities. Roberto V. Zicari. https://www.taylorfrancis.com/books/e/9780429101366/chapters/10.1201/b16014-10
  6. Big Data Statistics & Facts for 2017. Waterford Technologies. https://waterfordtechnologies.com/big-data-interesting-facts/

 

About Benchmark Computing

about the author

David Cavanaugh

David Cavanaugh is the Director of Component Engineering at Benchmark. With more than 38 years at Benchmark, David has broad expertise spanning component engineering, failure analysis, conflict minerals, and alternate parts for cost savings. He is a Ph.D. candidate in Biotechnology Science and Engineering from the University of Alabama in Huntsville and holds a B.S. in Chemistry and Computer Science from Oakland University.

get
up-to-date content

SHARE THIS POST