How Robots Turn Language into Motion: The AI Stack Behind Physical AI

Show notes

How do robots go from human instruction to real movement?

Telling a robot to “pick up a box” sounds simple. But behind that command is a complex chain of decisions: understanding language, interpreting the environment, choosing the right action and turning it into physical movement.

In this episode, Clemens (Principal Engineer) and Robert (Robotics Engineer & Researcher) explain how RobCo approaches this challenge with ALFIE - combining classical robotics, AI models, sensors, safety systems and real-world industrial requirements.

You'll gain insights into:

  • the three-layer hierarchy (System 2 / System 1 / System 0) that turns language into motor currents
  • why physical grounding is the hardest unsolved problem in robotics today
  • how 100-200 demonstrations are enough to fine-tune Alfie on a new use case
  • why methods that brought man to the moon are now central to physical AI

More about RobCo: Website:https://www.rob.co LinkedIn: https://www.linkedin.com/company/robco-therobotcompany/ Instagram: https://www.instagram.com/robco_therobotcompany/

Chapter markers 00:00 Controlling robots with language 00:32 Meet Clemens and Robert 02:22 System 2, 1, 0: How robots think 04:35 The driving analogy explained 06:28 What's the hardest part of the chain? 07:15 Translating language into robot action 08:43 What really happens when you say "pick up the glass" 11:04 Why neural nets find their own language 15:21 Introducing Alfie 21:09 Pre-training + fine-tuning a robot 24:49 How commands become motor currents 28:31 Top 3 questions from Hannover Messe 35:04 The funniest moment at the trade fair 38:02 What makes Alfie different 40:28 World models: The next big unlock?

Show transcript

00:00:00: Today at Rob Talk we have an interesting question that were going to answer and it's about robots in how we actually control robots via language.

00:00:07: because one of the really interesting things is.

00:00:09: We can tell a robot what do but other decisions don't I have to be made between them are difficult for system on the road.

00:00:17: so if you tell me where to pick up box there's lot different thing has happened in-between, the absolute two specialists from rocko here is clements and we have robot here which are introduced in a second.

00:00:35: And they're going to tell us how this actually works.

00:00:39: rob talk, the autonomous robotics podcast physical ai no theory just reality.

00:00:47: so let me briefly introduce obviously clements Clemens, we've had him on the show several times.

00:00:52: Clemens is a principal engineer at RobCo.

00:00:54: he oversees development of all these amazing robots and physical AI.

00:01:00: that's happening for Robert.

00:01:02: just to give you an introduction.

00:01:04: Robert is a robotics engineer researcher with deep roots both in academia and industry.

00:01:10: He holds PhDs in control theory from Örebrø University and supervised by Professor Achim Lillenthal, completed two post-doctoral positions most recently at Danika Krasic Robot Lab in Stockholm as well.

00:01:28: Which is pretty amazing!

00:01:29: And then you spent nearly six years of Bosch also developing the very practical stuff from eBikes up to plug into Bosch production facilities which are obviously quite intense than an industry standpoint.

00:01:43: And today, obviously Robert you're working alongside with Clemens and you are developing the new product Alfie which is an industry robot that well it's going to be I think industry changing.

00:01:56: You take responsibility for everything from hardware design and control systems to machine learning model selection data collection.

00:02:03: so yeah your one of the rare engineers that bridges the full stack from high-level autonomy concepts down to motor currents, and actually move the robot.

00:02:13: And that's exactly what we're going talk about today.

00:02:16: so welcome you both!

00:02:18: Nice to have your own Rob Talk today.

00:02:20: Thanks, exciting to be here.

00:02:21: Thanks

00:02:22: for having us

00:02:23: Exactly So.

00:02:23: did I miss anything Robert?

00:02:25: That we have say... For the introduction

00:02:27: Sounds good.

00:02:28: Okay

00:02:28: very good.

00:02:30: Yeah let's talk how this all happens.

00:02:32: How does really the physical movement work from a standpoint of language input, because I think that's going to be the future.

00:02:40: So what we all wish for is obviously giving your robot a command by voice like you're doing right now with all these amazing LLMs.

00:02:47: and then well how can we do it?

00:02:49: And how can our robot follow instructions?

00:02:54: so i guess one way to think about it on how its often framed in this field as hierarchy system kind of a terminology borrowed from Kahneman's seminal book.

00:03:06: We often talk about system two, system one systems zero.

00:03:10: those are you can basically see as controllers that take in sensory input from the robot sensors monitoring environment and they make decisions to output commands at different frequency.

00:03:25: typically System Two is the slow thinker which is good or has the capability to reason about vision and language, then this gives sort of commands through the underlying system one which outputs kind of commands that a robot can understand.

00:03:44: Things like for example target positions and orientations off its hands in effect also are joint angles at their higher frequency typically around something between fifty five hundred hertz per second, which then goes to system zero.

00:04:03: Which is sort of the low level real time control layer in the end makes a robot.

00:04:10: so that's interesting.

00:04:11: So we're basically using this system that Daniel Kahneman and his famous book thinking fast thing thinking slow as mentioned And you use those systems too.

00:04:20: help The Robot understand its environment?

00:04:22: Then actually move it into an environment right.

00:04:25: I think it's a bit of loose analogy.

00:04:27: Right now there is really a zoo of model architectures and the interplay in different ways with each other, maybe we can talk more about this.

00:04:37: but roughly i think its fair to say that theres sort-of hierarchy feedback loops going on which some of them especially at higher level learned from data.

00:04:50: their lower levels typically are based on classical control.

00:04:54: Okay, so more classic on robotics.

00:04:58: Maybe one way to make that analogy is if you learn how to drive a car then the first thing as you get into the car and the teacher tells you okay now put in the first gear yeah.

00:05:11: And what do we actually do?

00:05:12: Symbolic processing right You have somehow come up with very complicated means To turn these symbols The words That you hear Into actions.

00:05:23: So this is the whole system.

00:05:25: two thing.

00:05:26: And so then what you do, it's slow and clumsy takes a lot of energy It probably taking huge part your brain as well to big model Over time over course getting driver lessons.

00:05:43: The task was make that automatic.

00:05:46: And so then we're in system one land and that becomes like second nature muscle memory as you call it.

00:05:53: And does the system zero mean, is there anything comparable to a human person?

00:05:59: To get that analogy right?

00:06:00: or would be subconscious already?

00:06:03: System One

00:06:04: Absolutely Subconscious.

00:06:05: So That's almost happening In The Muscles themselves.

00:06:09: It's Like Another Analogy When You Think About A Gym Somebody Tells You To Move You don't really have to think about which muscle.

00:06:20: it just happens.

00:06:22: And so that's the subconscious part, we don't need to think of

00:06:26: what is the hardest part?

00:06:28: So when you're talking system two or system one and zero... What are the hardest parts in the chain?

00:06:34: The language understanding decision-making or physical control?

00:06:38: I think all them are hard!

00:06:43: But system zero is the one that in robotics, at least has been researched for decades.

00:06:48: Whereas other ones have worked around by algorithms until recently.

00:06:54: and now we're entering this stage where these can be learned And there's still a lot of research happening Where you haven't quite figured out everything end-to-end but already see things start to work really well.

00:07:09: I think One key point here maybe challenges, what you could call the physical grounding.

00:07:15: So there has been a lot of progress in large models really understanding vision and language to some extent quite well.

00:07:23: but the problem remains how to translate this then into an action representation that robot kind of understands.

00:07:31: so I would argue is one key points which still needs sort-of work.

00:07:42: Is that also the biggest challenge of system two learning?

00:07:45: So, that the system is translation language into robot understanding what to do in first step and then translating it how processor moves arm or robot.

00:08:00: I would say if you want use this hierarchical analogy maybe its rather placed on a system one.

00:08:06: so System Two as some high level planner which produces sub goals.

00:08:12: if I tell the robot, go and pick up a water glass what system two high level planner would do is sort of implicitly break this down into sub goals.

00:08:25: I don't know.

00:08:25: wrap that glass move it over there put it on their an output representation which depending how its implemented not really human understandably might be A number of rows in a high dimensional space, which is then passed on to the lower level system.

00:08:50: Which there and that translation tool robot action has to happen.

00:08:55: Okay?

00:08:55: Then maybe two explain to our audience let's take us through the moment that we tell somebody to tell their robot to pick up this glass for instance where it's like what happens in System Two or whether they're clear handoff to System One.

00:09:09: Or is that like, it's still?

00:09:11: I don't know.

00:09:11: just maybe you can take us through the process.

00:09:14: I would say it heavily depends on their specific model.

00:09:16: looking at there are models which have a pretty clear distinction Which might be even semantic kind of goals sub-goals Like key points or object things like That more typicalist that they hand over happens via some abstract row of numbers, typically in a high-dimensional space.

00:09:41: Other models don't really have this strict explicit separation between system one and two and rather combine them into big model.

00:09:51: It's fair to say it is still an open question what the best or most promising architecture will be.

00:09:58: pretty much now every month there are new innovations, new models popping up in research or from startups like ours, we have to see where it goes.

00:10:12: Yeah so what we're interested is always taking the one that at a particular point of time works best.

00:10:18: So were kind looking at metrics and how well they perform.

00:10:22: then plug-in swap out try different things.

00:10:26: That's an applied research task but its also in the position to work with, right?

00:10:37: We have the actual use cases at hand.

00:10:41: And so we just try to solve the problem by whatever means.

00:10:46: and I think as Robert said that the main difference is do you turn a high level command into language first or Do You Turn It Into The Internal Language Of The Machine meaning these are big vectors and numbers?

00:11:00: nobody really can look at it understand?

00:11:04: That's very interesting.

00:11:05: We always, I think right now if you're looking at how large language models work we just think about that when we enter a language and it kind of understands what were saying.

00:11:15: but for robot obviously the dimensional space makes even more difficult to understand.

00:11:20: like we never said where are glasses?

00:11:22: so there is lot complexity in this which was easy for us humans.

00:11:26: You compared it with a toddler on one podcast at the beginning.

00:11:32: for a baby, it's very hard to understand.

00:11:34: The physical realm then gets super easy because learned and the brain has such an amazing capacity that for robot is much harder in the beginning.

00:11:43: you just set system two needs lot of processing power for this And then it gets easier and easier beacuse.

00:11:47: its learnt and tends too get better I guess when everybody using AI right now i think people underestimate how difficult their physical realm is to understand.

00:12:01: compared just a piece of text, which was conquered three years ago from the LLM producers.

00:12:09: And you guys are now on the path of conquering the physical realm.

00:12:13: so that's very interesting how it has done and how complexity it is!

00:12:19: But I think we're at this point where we have a chance to solve our problem because if you think about robotics ten or fifteen You can you could say the mathematical functions, but they all talked their own language was really hard to connect them.

00:12:36: Yeah?

00:12:37: For example how do you connect a language input for which she had language experts dealing with it finding some representation and two robot movements which has a completely different representation?

00:12:50: so The more you you connect these in the neural net sense the more you can train these things end to and so then internally they find their own language, we start communicating random numbers.

00:13:05: And by giving them a constraint to learn at that goal?

00:13:10: They suddenly form structures on their own though That's I think most fascinating thing has happened for like general area of computer science in the past ten fifteen years In a small community of believers in the thirty years before.

00:13:29: It's fair to say, but I think kind as you alluded too Before language is inherently discreet right.

00:13:37: so for each Language you picked there's a finite amount of words which can use to form expressions.

00:13:45: But physical world space of motions positions and velocities inherently continuous you can imagine it as an infinite dimensional kind of problem.

00:13:59: So, yes!

00:14:00: It remains challenging in that sense...

00:14:03: Yeah just as your mentioning if I tell the robot to pick up a glass for instance he doesn't know which speed is required slow-speed save-speed or super fast-speed.

00:14:13: right so there's lot we humans don't think too much about.

00:14:17: but yeah this all has to be decided by machine.

00:14:21: Is that something the system too would then decide?

00:14:25: Yes, it would decided.

00:14:28: The current paradigm is basically or mainly based on imitation learning.

00:14:32: so basically It needs to be fed with expert demonstrations.

00:14:36: which one big open question right now how get them at a scale?

00:14:41: That's needed and what this system will implicitly learn is to mimic these demonstrations Which also encode things like speed for example.

00:14:52: Tell me a bit about Alfie from RobCo.

00:14:54: How does that fit in and what is the idea for this robot?

00:14:59: And how does that... Well, What's your plan on using different systems?

00:15:05: So Alfie is the physical realization of what we had in mind quite awhile making an AI-first robot starting with having very useful robots in industry space.

00:15:21: So it is a half humanoid.

00:15:28: We're focusing on the manipulation problem, we are putting eyes there where human eyes would be simply because this how you collect data.

00:15:41: but has all properties of an industry robot meaning its maintainable and sturdy.

00:15:50: It's made for use, twenty-four seven.

00:15:55: And it has all the properties of the Robco industry robots that we have.

00:16:00: so brings together the best off both worlds.

00:16:03: something interesting than you said about Alfios that doesn't follow a certain script and basically is calculating probabilities in real time.

00:16:12: So its not classic robot Its more like well It's a very dynamic robot.

00:16:18: And how does that actually work?

00:16:21: So first of all, it can follow scripts as well because we have this five-level ladder off autonomy.

00:16:30: so level three in that would still combine things with classic you know, Cartesian programming if you will.

00:16:40: You can move somewhere and then to do the last five centimeters.

00:16:50: But then, uh... The other extreme is a topic that we're talking about today which everything's end-to-end learned and so it's all picked up from demonstrations From being pre trained with large models who have seen lot of things in the world And hopefully you only need very few demonstrations In order make something really well.

00:17:15: So that means the basic function, the basic movement is not a policy.

00:17:21: It's more like it does itself and then... But the policy at the end how to tightly grab the glass or maybe not tip-it over?

00:17:33: Would that be defined in a policy?

00:17:35: Or How Does The Policy Work And Who Teaches The Policy?

00:17:38: So demonstration

00:17:43: an image for that, what was the policy?

00:17:46: If you imagine your approach with a car or traffic lights.

00:17:49: The traffic light is yellow now yet You can do two things speed up or slow down right.

00:17:56: so these are the two actions said he can't and given all of when we've learned about the world flows in within a millisecond.

00:18:05: So either like without really thinking about it a lot.

00:18:13: And what comes in here is all your experience with how fast am I?

00:18:17: How far away from the traffic light?

00:18:20: and also things like, okay.

00:18:21: What's the cost of a ticket if i get caught?

00:18:25: these are thanks These are constraints.

00:18:28: Also Of course you know will be dead when do it Like The Cost Functions that have come through lots of experience In That Single Decision.

00:18:38: So Really input Is Sensory Information state information where I am, all the things that i see in my surroundings and the output is a probability for certain actions.

00:18:53: And well maybe you can answer this one?

00:18:56: Maybe just as small add-on to what Clement said...I think the idea of Alfie's for now that we can combine sort of best of both worlds..we can choose parts of classical choreographed automation if he will and only have subproblems solved by a learned policy.

00:19:13: I guess we envision over time, the sort of proportion of what's learned in accomplishing a task will increase.

00:19:22: but they don't necessarily right now have to solve everything end-to-end with your own policy which depending on use case and look at huge variety different use cases from sorting industrial assembly logistics.

00:19:41: it is Right now would be very challenging to say we come up with one AI model, which can solve it all.

00:19:47: So what you will do is go step by step and try to introduce these techniques gradually And Alfie will be for us the platform

00:19:56: too.

00:19:58: But sounds like a really huge advantage that You don't have to train end-to-end because I think That takes a lot of training on this probably very limiting in The amount of time you need to train but then having specific policies to do something at the end of movement or action.

00:20:13: Very precisely?

00:20:15: It's

00:20:15: very use case dependent, we have looked at use cases like lawn resorting for example... That is where you just hit a wall with classical automation.

00:20:25: how would you detect that pose position and orientation in a dirty sock?

00:20:32: so they are right now their only avenue basically end-to-end learning.

00:20:37: However, right now these models are not perfect especially when it comes to accuracy and things like that.

00:20:43: so for precise positioning manoeuvres maybe a classical approach still has advantages.

00:20:51: And who would then teach the action of folding the laundry?

00:20:58: For instance if you have a sock Would be learned by a human or guidance, you showed some amazing tools like grippers and stuff.

00:21:09: Is that the way how we teach it once?

00:21:12: Or several times?

00:21:13: Right now... We followed the paradigm.

00:21:15: where start with large model which has been pre-trained on internet scale data images videos text And we collect fairly small amount of demonstrations in the robot On a per use case basis could be something hundred, two-hundred times to three hours of data.

00:21:37: and we do what's called fine tuning in the field.

00:21:39: So we adopt this big model which is basically reasonably good at many tasks to be really good at these one task.

00:21:49: so that's how we um do it now.

00:21:52: And we also envision that there's a necessary amount of fine-tuning of tasks of use case.

00:21:58: specific demonstration data will decrease over time.

00:22:03: And we are also working on ways to really streamline this process of gathering this use case data.

00:22:08: that's an active work stream.

00:22:12: Very interesting, I mean...I think it is everybody's dream that the robot does the laundry and do some more housework but obviously first will solve industry problems.

00:22:22: What happens if a policy that is suggested or maybe described even points something that is physically impossible?

00:22:29: Or may be even dangerous?

00:22:30: how alpha react or how does it work?

00:22:35: So everything in the industrial scenario needs to be embedded in a safety envelope.

00:22:46: This is also where classical robotics comes in, we have safety systems in place which you can override so they give guarantees And so you can think of it as watchdogs or guardrails that make sure the robot doesn't break out.

00:23:07: Okay, yeah, Guardrail is a really cool system.

00:23:09: I mean i learned that working with LLMs and to be honest It's... The concept which obviously makes sense but uh..I think it-it can solve all these issues people have basically with a robot or an AI going off the rails That where your have the guardrail.

00:23:25: So we find they're are good concepts.

00:23:27: But it is a good question.

00:23:31: and one of the still big open problems, I would argue in physical AI which inherently Is about A data-driven way Of generating robot motion.

00:23:42: So opposed to classical system a priori There are no guarantees really what this policy will produce.

00:23:50: our rails off course a Sanity check to make sure nothing really bad happens but there's a lot of records ongoing now in really making these policies also aware of what's, for example safe or unsafe actions by trying to limit the kinetic energy a certain action would produce.

00:24:13: And that is something we as large are looking into because safety is critical when you actually want to deploy this system.

00:24:24: So when you have these policies, by the way then We talked a little bit about how the movements are created.

00:24:30: But I really want to understand like When you've had policy and you have maybe even the free movement.

00:24:35: How does that actually work?

00:24:38: It turned into a movement

00:24:41: Right so that would allude to system zero layer.

00:24:46: we Talked before can keep this analogy.

00:24:51: Clemens mentioned if I don't know, want to move the hand in front of your face.

00:24:56: What needs to happen?

00:24:57: You need to activate it in your muscles such that for one like the gravitational force is overcome.

00:25:03: you need to produce enough force to actually hold your arm counteract the masses but you also need to counteract inertia...you have to add additional force to accelerate and decelerate in time movement.

00:25:19: How humans do it, so I'm not a neuroscientist.

00:25:21: So don't want to lean too far out of the window.

00:25:23: but i think It's fair To say that there is sort of internal representation Of the physics of your body like how long?

00:25:30: The length of you're in links of your arm arms are how heavy this body parts of yours Are and similar its in the system zero.

00:25:39: in robotics There Is A model of the robot which captures kinematics and dynamics, which is masses inertia.

00:25:46: And so on.

00:25:46: that's used basically to produce motor currents because motors in this sense would be the analogy through them for the muscles and sort of reactively adjust based on their sensors on the robot.

00:26:04: In this case some joint angle sensors joined velocity senses.

00:26:09: very interesting For me That it something I see it as the system zero.

00:26:14: It must be like an API where you don't have to program it per se any more directly, but you can send commands in and then we'll be executed

00:26:22: from there.

00:26:22: exterior.

00:26:23: pretty much exactly that yeah i would argue.

00:26:26: And its In modern systems largely based on classical Newtonian physics and math.

00:26:34: so this layer typically is deterministic well understood has clear guarantees although humanoid robotics, the layers even on system zero learn.

00:26:45: So if you see fancy kung fu videos of humanoids these have some learned aspects Even really at a very low level because You cannot arbitrarily move?

00:26:57: You have to stabilize their whole system as well.

00:26:59: Okay.

00:27:00: But I mean one has to say that this has been in development for many decades and Of course comes from time when computers were not very powerful And it needs to run a loop feedback loop thousands of times a second.

00:27:14: So these are really, very fine tuned use very limited representations.

00:27:20: you can think about the digital twin of our robot as being just a bunch of numbers that represent the extent on the robot and certain physical properties.

00:27:29: so it is real model or real digital when we really trim down to the basics.

00:27:37: That's exactly what I was thinking like.

00:27:40: how does this translate into movement?

00:27:43: I don't think about my movement, but for the looking at a classical robot fifteen years ago you had to program every move went one hundred percent in this perfect position.

00:27:53: Perfect movement velocity and everything.

00:27:55: i think that is changing so rapidly right now And then makes it yeah That makes more like a companion that you work with as a robot or not just machine that your programmed once?

00:28:05: And that has to do only that task.

00:28:07: I Think that there's very very interesting.

00:28:09: I mean mixed way were flexible and faster to learn.

00:28:12: I guess that's what Alfie is about, right?

00:28:14: To help in the industry.

00:28:17: Another question you were at Hannover Messe one of the biggest trade fairs for the industry or for industry goods overall.

00:28:27: there was some questions asked and we took the top three audience questions And i'm going to ask them to you Right now.

00:28:33: You can answer them For our audience because then they will know What Was The Most Asked Question At HannoverMesse.

00:28:40: Can Alfie move around freely?

00:28:43: Yeah, it will be moving around.

00:28:47: That's like when we started Alfie.

00:28:49: We thought that we would focus on the manipulation problem itself being at an assembly line and so on.

00:28:57: but since RobCo is in talks with customers all of time not doing some blue sky development you have this running pipeline where requests come in roughly what our customers want And mobility is very high up on the list.

00:29:15: Yeah, Very cool!

00:29:17: Tell us a bit about how Alfie was received at an Overmesser?

00:29:20: I mean...I heard and saw some pictures that it was a big hit.

00:29:24: It was so.

00:29:25: we presented Alfie to The Real World.

00:29:28: We had lot of press coverage..we have demo running with our prototype which you know.

00:29:35: You could press a button and would pick-up an analog camera A Polaroid Camera and will take picture With you Cool.

00:29:45: And hand it off to you, and so we could demonstrate as well that you can disturb the movements a little bit... ...and would still work at what's self-correct?

00:29:57: So overall did several hundred demos and worked pretty well!

00:30:03: That was my second question like how does Alpha really perceive its surroundings?

00:30:06: because I mean.. You said just interrupting because Alpha is taking a polaroid picture then handing over but if move into that space and something would happen.

00:30:15: That's the second biggest question, how does Alfie perceive his surroundings?

00:30:21: Yeah so Alfie has several sensors.

00:30:26: an important part is visual input of course.

00:30:29: So Alfie have a camera in its head which can perceive regular to-do images like you me but also sort depth data.

00:30:40: with this camera it generates really representation of the environment.

00:30:45: Additionally, the current version has two cameras on its wrist which are very helpful to see close-up views and manipulating it as like classical robots what's called proprioceptive sensors that measure basically their configuration in its own body.

00:31:04: so joint angles, joint velocities.

00:31:07: In future we will envision Alfie also to extend its sensor suits or things like force and tactile sensors, maybe contact microphones.

00:31:17: Interesting I think in the future?

00:31:19: Yeah!

00:31:19: The point is that obviously we want to keep the robot maintainable.

00:31:24: so for example these wrist cameras.

00:31:27: they pick up a lot of signal you would otherwise only get from tactile sensors.

00:31:32: So there are kind-of not comparable with humans.

00:31:36: But they give us a lot of information that we would otherwise have to use artificial skins or some sort,

00:31:42: which

00:31:44: the robots are not self-healing.

00:31:45: So per se it's a problem and cameras much easier to replace.

00:31:50: so as far like... We used them as far as it gets.

00:31:56: You always mentioned at the self healing part but I think humans take is such natural.

00:32:03: But obviously, like also in industry use if you have a robot it will.

00:32:08: parts will break.

00:32:09: Parts we'll need to be replaced.

00:32:10: so that's very interesting topic

00:32:12: yeah.

00:32:12: and then I think Also why?

00:32:13: We're having such A hard time using That human analogy?

00:32:19: because Because once You put this image out that something looks similar Human you you Put certain attributes onto It And the self-feeling part is i think one of The most important ones.

00:32:32: So we have had tons of discussions on whether you should look different.

00:32:38: And I think it still can, right?

00:32:41: You could put a head down there and then have several arms operating conveyor belt for example.

00:32:46: so don't want to rule that out.

00:32:48: We've got the modular robot system but now the form factor is just the easiest one.

00:32:55: Let me ask about the form vector.

00:32:56: That was number three.

00:32:57: top question Was does Alfie have ahead?

00:33:01: where are the sensors located?

00:33:03: So it has ahead right now, mostly as a position for cameras.

00:33:10: Future human-robot interaction mode is well not the super highest priority.

00:33:17: It's something new.

00:33:18: people have been experimenting with Human-Robot Interaction Mode also for decades.

00:33:23: We just do what is necessary to build an industrial autonomous robot

00:33:30: Maybe just add on to that.

00:33:33: so the human form factor lends itself well for data collection if you minimize sort of the embodiment gap between your machine, the robot and the humans which as we discussed before still right now has to provide some use case specific data.

00:33:50: It makes the process just much easier.

00:33:53: but I would say it's not to say that the Human Form Factor is per se optimal.

00:34:00: in the future when data that needs to be collected or need for data, it needs to collect.

00:34:08: specific use cases go down.

00:34:11: We will see also robot morphologies evolve so-to speak.

00:34:15: two different form factors

00:34:18: I think interesting part there is obviously like a human can walk quite well but if you look at how good dog and spider can walk then of course huge advantages for certain terrains that were different, even wheels can make sense.

00:34:33: I mean

00:34:34: humans evolve to run swim climb.

00:34:38: but if you just want to move parts in a factory yes maybe we're just going to roll right put much more energy efficient absolutely and easier.

00:34:47: yeah much

00:34:48: safer especially with the heavy objects.

00:34:51: something with wheels is it can be very efficient Yes.

00:34:55: So what was your funniest moment from Hannover Messer?

00:34:59: Oh, can we say this as well?

00:35:04: I have a very clear candidate.

00:35:08: No the funniest moment is always when there's problems.

00:35:12: out of you know.

00:35:13: let's say three hundred demos two hundred ninety went very well.

00:35:18: We obviously for our demo You only go so far making it really robust.

00:35:26: You can't guard for everything.

00:35:28: So what we did is, We put Alfie in the box so that there's one element which out of a question you know.

00:35:38: large changes and lighting For example.

00:35:40: everybody knows in robotics Lighting Can cause issues.

00:35:45: Nobody really understands That The human eye Is working on a logarithmic level?

00:35:50: Here In the studio What is it looks?

00:35:54: Oh, I don't know how many looks.

00:35:56: It's pretty bright though

00:35:58: maybe a thousand.

00:35:59: if you go outside even since the cloudy day it's one hundred thousand.

00:36:01: yeah right.

00:36:03: so having sun shining directly on you everybody knows this from cameras right suddenly see white.

00:36:09: and So This has happened to us after two days cloudy sky that there was like A time window.

00:36:16: when they're When a Window in The roof just You Know There Was Bright sunlight that caused some trouble, let's say for an hour.

00:36:27: Which happened to be in our very prominent business.

00:36:30: Some prominent guests came over and we were...

00:36:32: People had to

00:36:33: scratch them.

00:36:36: Taping umbrellas?

00:36:40: We realized the box is not completely shielded off so it was just two planes or sheets of fabric And so there was some light coming through and that's in the end, together with a door.

00:36:57: But again this is something you cannot account for when your developing it.

00:37:01: but once you demoed to see if we can fix then thats cool part right?

00:37:05: I mean one thing of course... This learning not that we didn't know before.. Of course!

00:37:12: If you go on our customer side obviously you guard for that For a demo Again.

00:37:19: there's only so many things that you want to shield for.

00:37:24: But just operating the robot, having young guys directly out of university be responsible I think is a huge experience.

00:37:37: What was your most fun part?

00:37:49: Very cool.

00:37:51: Yeah, closing questions if you had to sum up in one sentence what the difference is or what differentiates Alfie from other concepts in the market?

00:37:59: What would you say that be

00:38:02: so I think again we do.

00:38:06: this combination of industry technology and sturdiness together with learning aspect from human robotics really finding where we can take the best of both worlds, this is what really striving to achieve.

00:38:26: Very cool!

00:38:28: Yeah Robert?

00:38:30: I would say Alfie's pragmatic marriage between classic well-proven technology and modern data driven approaches with which we go into first deployments now... We will learn a lot.

00:38:55: this approach will bring us far in that respect.

00:38:58: Yeah, and is it the future you also see for Alfie?

00:39:03: Or what's your future?

00:39:06: So I think development would be extremely fast especially on a model side.

00:39:14: we had an example of Will Smith eating spaghetti on the hardware side, everything has not changed as much.

00:39:32: We still build hardware like any other system.

00:39:37: so there it's more about.

00:39:38: do you have experience to really bring a product from concept drawings to series production where with RobCo we had our share of experience and bringing that you know, to the next level.

00:39:56: That's where I see making us have rapid progress.

00:40:01: Yeah so i also think that most progress will come on a model site.

00:40:06: So it can use this opportunity to sprinkling one of my favorite topics which is called world models.

00:40:12: in the field The models we talked about before learned by imitation are largely purely reactive.

00:40:22: So they are pre-trained, consume sensor input and react to it.

00:40:29: But arguably what we really want is models that understand the effect their actions have on their surroundings... ...and able to reason or plan over this.

00:40:42: In a field called World Models there's just right now we see I would say changing trends towards these kind of models.

00:40:51: There's a lot of happening there.

00:40:54: And I think they will be a big unlock in the future to achieve real dexterity and also kind of solve that data problem, but still plagues robotics To some extent.

00:41:04: then for me as a control engineer That's particularly nice because it ties back two methods from optical optimal control which were developed in the space race in the fifties.

00:41:15: so Where there we're kind of I don't know satellites and rockets where modeled how they react under influence of the control actions and kind of that what brought man to the moon, right?

00:41:27: I think it's beautiful.

00:41:29: The circle closes now back when we see this push in robotics using similar ideas but on a much bigger scale.

00:41:35: from this sense

00:41:37: Yes!

00:41:38: Absolutely beautiful And very nice to close the circle there.

00:41:42: But yeah It is an amazing future.

00:41:44: We think world models will be pure game changer as they were for Plastic alalems as well.

00:41:50: They understand basically the text world, right?

00:41:53: And it's a much easier word than physical work.

00:41:55: but The World Model then yeah what I really appreciate about Robcon maybe you can say something about that is how modular You always think.

00:42:05: do you think in the robots modular and also thinking of models Modular?

00:42:09: um i Think That Is One Of The Huge Advantage Is That We Can Create A Really Big System Robot But You Can Also Create Just Small one-arm robot that does something really niche and does it well.

00:42:23: I think this is the huge advantage of using Robco, you're never stuck with anything because you can always use your advantages when developing at Robco every day or week to plug them into that robot!

00:42:37: Great.

00:42:39: Thank you so much for being here, and if obviously the audience... If you want to learn more as you can always visit Robco's website You can also connect with Clemens under Robert or LinkedIn And we'll put their contact details in these show notes.

00:42:52: and Obviously if you like this episode please Like and subscribe leave a comment there and send it to somebody else who might be interested In how yeah?

00:42:59: The world of robotics is evolving right now.

00:43:01: I can only say thank you very much for listening to these two experts.

00:43:06: Well, I hope you'll be here again when it says Rob Talk and we're talking about autonomy and the physical AI in the future of robotics.

00:43:28: Thank.