With Tecton, transformations are an optional component of the system. Similar to Feast, you can bypass the Transform component to ingest data directly from external pipelines. You typically do this when you have existing data pipelines and you want to make the values available in a Feature Store.
However, if you don't have an existing stream / batch data pipeline infrastructure that your data scientists / data engineers can easily contribute to, a Feature Store's Transform component is an easy way for them to be fully self-sufficient. Tecton makes it easy to express feature transformations using Spark's native DataFrame API, Python, SQL or Tecton's DSL.
Besides the self-sufficiency, there are a few other advantages you get from having a feature store manage your feature transformations:
- Feature Versioning: If you change a feature transformation, the Feature Store will know to increment the version of that feature and ensure that you don't accidentally mix features that were computed using two different implementations
- End-to-end lineage tracking and reproducibility: If a feature store manages your transformation, it can tie exact feature definitions all the way through a training data set and a model that's used in production. So, if years later you want to reproduce a model of a certain time in the past, a Feature Store that supports transformations would be able to recreate that model as long as the raw data still exists
- Trust: It's more likely that a data scientist will trust and then reuse another user's feature, if they can peek under the hood and see how the feature is actually calculated
- On-Demand Features: These transformations cannot be executed by existing data processing pipelines because they have to be computed in real-time when the prediction is made — which happens in the operational environment.
In reality, you will frequently see multi-stage data processing workflows in an organization: You will have a lot of data cleaning and preprocessing happening in an organization's standard and ML-independent data processing infrastructure. Afterwards, a Feature Store will pick up and transform that preprocessed data and turn it into feature values.
(Tecton CTO here) You’re absolutely right that ML projects can’t be solved with technology alone. Besides the right tooling, they also require process, organizational setup, buy-in from multiple stakeholders, etc. By itself, no technology will turn a company into an “ML-first” company. Both technology and organizational problems need to be solved.
A while back, we published a blog post that discusses how we approached these organizational challenges at Uber: https://eng.uber.com/scaling-michelangelo/. With Michelangelo, we found that the right tooling can both solve technical challenges and help with some organizational challenges. For example: If a standardized and centralized platform is the path of least resistance to get ML into production and solve your business problem, you get the organizational benefits of that centralization (governance/visibility/collaboration) along the way.
We're Dispatcher - Uber for Longhaul Trucking. Our mission is to build an automated system that disrupts the $170bn US freight industry by connecting large international business directly with long-haul truck drivers on their smartphone. 10 years from now, we will be the leading global logistics company. What started with smartphones and intelligent algorithms will eventually lead to an automated fleet of self-driving trucks.
Initially started by two Stanford alums in 2014, Dispatcher has since grown to 7 employees. We are located in the heart of SOMA in San Francisco. We went through Stanford's StartX accelerator and subsequently raised $1.6 million from two leading seed funds in Silicon Valley as well as Stanford University.
The Stack
Backend: Microservices oriented architecture built on nodejs, rabbitmq, mongo, Firebase and R Stats
Web-Frontend: AngularJs, Pusher
Mobile App: Cordova based HybridMobileApp
Deployment: AWS, CircleCI, Saltstack
What gives you peace at night: Splunk, Sentry, PagerDuty
For internationals:
We are very open to working together with talented engineers who don't reside in the US yet. We sponsor H1Bs and Green Cards.
Please email me at kevin[ a t ]dispatchertrucking.com
We used Branch Metrics for our deep links in our app because they were the only solution we found that did deep linking past install. They require an SDK to do the post install routing, but give you a lot of cool insights into you installs and conversions. http://branchmetrics.io